perf(visual): skip re-OCR of unchanged pixels, right-size the OCR input, pin the clipboard preface by FuJacob · Pull Request #685 · FuJacob/cotabby

FuJacob · 2026-06-12T01:58:53Z

Summary

Three efficiency cuts in the visual-context pipeline, sized from the energy-audit follow-up (the OCR lane was the one remaining tier-B lever):

Pixel-hash extraction cache. Refocusing a window re-ran the full Vision pass even when the captured pixels were identical (alt-tab away and back is the common case). A small bounded cache keyed by an FNV-1a stride hash of the capture now reuses the raw extraction; hygiene, normalization, and the field-text stripping still rerun against the live field text, so a hit stays byte-identical to re-OCRing the same pixels.
Downscale cap 1600 to 1200. The Retina capture of the 700pt strip arrives above both caps, so this only changes how much gets handed to Vision: ~44% fewer pixels per accurate-mode pass while typical 11-13pt UI text stays comfortably above the recognition floor (~1.2 px/pt on the strip). The recognition level itself intentionally stays .accurate: measured data puts .fast at only ~1.6x on Apple Silicon with a real recall cost.
Clipboard preface pinning. The clipboard relevance verdict was re-evaluated against the live prefix on every request, and the clipboard section precedes the typed prefix in the prompt, so every verdict flip rewrote the prompt head and collapsed the llama engine's reusable KV common prefix into a full re-prefill. An accepted verdict is now pinned per (field session, pasteboard change count); a nil verdict keeps re-evaluating because it adds nothing to the prompt (head-stable) and the clipboard may only become relevant as more text is typed.

Validation

xcodebuild -project Cotabby.xcodeproj -scheme Cotabby -destination 'platform=macOS' build-for-testing \
  -derivedDataPath build/DerivedData CODE_SIGNING_ALLOWED=NO CODE_SIGNING_REQUIRED=NO
# ** TEST BUILD SUCCEEDED **

xcodebuild ... test-without-building \
  -only-testing:CotabbyTests/ScreenshotContextGeneratorTests \
  -only-testing:CotabbyTests/VisualContextStartCoalescerTests \
  -only-testing:CotabbyTests/SuggestionCoordinatorAcceptanceTests
# 0 failures (wall time inflated by an unrelated local disk-pressure incident during the run)

swiftlint lint --quiet
# exit 0

New test: test_generateContext_reusesExtractionForIdenticalPixels (counting extractor proves the Vision pass is skipped and the excerpt is identical).

Linked issues

Refs #661

Risk / rollout notes

A stride-hash collision would reuse OCR text for a window whose pixels changed; the stride still touches every row and any text change moves antialiased pixels broadly, so this is vanishingly unlikely, and the blast radius is one stale excerpt for one field session.
The 1200px cap is a measured-tradeoff default, not a hard floor; if small-text recall regresses in practice the constant is one line.
Pinning changes when clipboard context can ENTER a session's prompts (it always could before via flips); it cannot change what the relevance filter accepts. A new copy re-evaluates immediately.
Follow-up candidates deliberately not in this PR: battery-aware capture policy via the existing power-profile machinery, and raising minimumTextHeight with measurement.

🤖 Generated with Claude Code

Greptile Summary

Three targeted efficiency improvements to the visual-context pipeline: a bounded FNV-1a pixel-hash cache that skips re-OCR when captured pixels are unchanged, a downscale cap reduction from 1600 → 1200 px (≈44% fewer pixels per Vision pass), and clipboard-relevance pinning that stabilises the prompt head so the engine's KV common prefix survives across keystrokes.

Pixel-hash cache (ScreenshotContextGenerator): stride-17 FNV-1a hash gates the Vision pass; finishedExcerpt is refactored into a shared helper so hygiene and field-text stripping rerun on every call regardless of cache hit. The noRecognizedText/windowTitle fallback path does not populate the cache (noted in a previous review thread).
1200 px OCR cap (VisualContextModels): measured-tradeoff constant with inline rationale; one-line revert if small-text recall regresses.
Clipboard preface pinning (SuggestionCoordinator+Prediction): non-nil verdicts are pinned per (focusChangeSequence, changeCount); nil verdicts keep re-evaluating. handleSuggestionSettingsChange does not clear the memo, so a pinned verdict from one engine can survive an engine switch within the same field session.

Confidence Score: 5/5

Safe to merge; all three optimizations are narrowly scoped, each with a documented worst-case blast radius of one stale value for one field session.

Changes are well-contained: the pixel-hash cache degrades gracefully (nil hash disables it, stride-17 was corrected in the follow-up commit), the 1200 px cap is a single constant, and the clipboard memo self-heals on focus change or clipboard change. The only gap is that the memo is not invalidated on engine/settings change, which could leave a stale relevance verdict for the remainder of a field session — a minor behavioural edge case with no data-loss or security implications.

Cotabby/App/Coordinators/SuggestionCoordinator+Lifecycle.swift — handleSuggestionSettingsChange does not clear clipboardPrefaceMemo.

Important Files Changed

Filename	Overview
Cotabby/Services/Visual/ScreenshotContextGenerator.swift	Adds FNV-1a pixel-hash extraction cache (stride 17 to sample all channels) and refactors finishedExcerpt into a shared helper; noRecognizedText/windowTitle path skips storeExtraction so that branch never warms the cache.
Cotabby/App/Coordinators/SuggestionCoordinator+Prediction.swift	Extracts clipboard resolution into pinnedClipboardContext with correct (focusSequence, changeCount) keying; nil verdicts keep re-evaluating, non-nil are pinned. Memo is not cleared on settings change.
Cotabby/App/Coordinators/SuggestionCoordinator.swift	Adds ClipboardPrefaceMemo struct and clipboardPrefaceMemo property; straightforward data model change with clear documentation.
Cotabby/Models/VisualContextModels.swift	Lowers maxImageDimension default from 1600 to 1200; well-documented tradeoff with measured rationale in comments.
CotabbyTests/ScreenshotContextGeneratorTests.swift	Adds test_generateContext_reusesExtractionForIdenticalPixels with a CountingTextExtractor stub; correctly validates Vision-pass is skipped and output is byte-identical on cache hit.
CotabbyTests/PermissionAndContextModelTests.swift	Updates maxImageDimension expectation to 1200 with explanatory comment; trivial alignment change.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[generateContext called] --> B[captureScreenshot]
    B --> C[pixelHash of image]
    C --> D{hash in cache?}
    D -- yes --> E[finishedExcerpt\nhygiene + field-text strip]
    D -- no --> F[textExtractor.extractText\nVision OCR pass]
    F --> G{noRecognizedText?}
    G -- yes, windowTitle exists --> H[return title excerpt\ncache NOT populated]
    G -- no --> I[storeExtraction in cache]
    I --> E
    E --> J[VisualContextExcerpt]

    K[pinnedClipboardContext] --> L{settings enabled?}
    L -- no --> M[return nil]
    L -- yes --> N{memo hit?\nfocusSeq + changeCount match\nvalue != nil}
    N -- yes --> O[return pinned value]
    N -- no --> P[truncatedPromptPrefix\nclipboardRelevanceFilter]
    P --> Q[store ClipboardPrefaceMemo]
    Q --> R[return value]

Comments Outside Diff (1)

Cotabby/Services/Visual/ScreenshotContextGenerator.swift, line 79-98 (link)

noRecognizedText branch never populates the cache

When Vision finds no text but a valid windowTitle is present, the function returns early at line 98 without calling storeExtraction. The pixelHash was computed, but the cache is never populated for this path. On the next call with identical pixels — the exact alt-tab scenario the PR targets — the cache check misses and Vision runs again. Applications like Figma or image-heavy UIs that commonly land in this branch get no benefit from the pixel-hash cache.

A lightweight fix would be to store a sentinel ExtractedScreenText (e.g., empty lines) before the early return so the key is recorded; alternatively, the windowTitle path could be cached separately.

_{Reviews (3): Last reviewed commit: "review: stride the pixel hash coprime wi..." | Re-trigger Greptile}

…ut, pin the clipboard preface Three visual-context efficiency cuts. Refocusing a window re-ran the full Vision pass even when the captured pixels were identical; a small pixel-hash cache now reuses the raw extraction while hygiene and bounding still rerun against the live field text, so a hit stays byte-identical to re-OCRing the same pixels. The pre-OCR downscale cap drops from 1600 to 1200 (the Retina capture of the 700pt strip exceeds both caps, and 1200 keeps UI text well above Vision's recognition floor while cutting the Vision workload ~44%). And the clipboard relevance verdict, which was re-evaluated against the live prefix on every request, is now pinned per field session once accepted: the clipboard section precedes the typed prefix in the prompt, so each verdict flip rewrote the prompt head and collapsed the engine's reusable KV common prefix into a full re-prefill.

…input cap

…nels are sampled

greptile-apps Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread Cotabby/Services/Visual/ScreenshotContextGenerator.swift

FuJacob added 2 commits June 11, 2026 19:08

fix(tests): align the default-config expectation with the 1200px OCR …

0dc803a

…input cap

review: stride the pixel hash coprime with the pixel size so all chan…

de46342

…nels are sampled

FuJacob merged commit 802a291 into main Jun 12, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(visual): skip re-OCR of unchanged pixels, right-size the OCR input, pin the clipboard preface#685

perf(visual): skip re-OCR of unchanged pixels, right-size the OCR input, pin the clipboard preface#685
FuJacob merged 3 commits into
mainfrom
perf/visual-context-efficiency

FuJacob commented Jun 12, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

FuJacob commented Jun 12, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Linked issues

Risk / rollout notes

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Comments Outside Diff (1)

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

FuJacob commented Jun 12, 2026 •

edited by greptile-apps Bot

Loading