perf(visual): skip re-OCR of unchanged pixels, right-size the OCR input, pin the clipboard preface#685
Merged
Merged
Conversation
…ut, pin the clipboard preface Three visual-context efficiency cuts. Refocusing a window re-ran the full Vision pass even when the captured pixels were identical; a small pixel-hash cache now reuses the raw extraction while hygiene and bounding still rerun against the live field text, so a hit stays byte-identical to re-OCRing the same pixels. The pre-OCR downscale cap drops from 1600 to 1200 (the Retina capture of the 700pt strip exceeds both caps, and 1200 keeps UI text well above Vision's recognition floor while cutting the Vision workload ~44%). And the clipboard relevance verdict, which was re-evaluated against the live prefix on every request, is now pinned per field session once accepted: the clipboard section precedes the typed prefix in the prompt, so each verdict flip rewrote the prompt head and collapsed the engine's reusable KV common prefix into a full re-prefill.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three efficiency cuts in the visual-context pipeline, sized from the energy-audit follow-up (the OCR lane was the one remaining tier-B lever):
.accurate: measured data puts.fastat only ~1.6x on Apple Silicon with a real recall cost.Validation
New test:
test_generateContext_reusesExtractionForIdenticalPixels(counting extractor proves the Vision pass is skipped and the excerpt is identical).Linked issues
Refs #661
Risk / rollout notes
minimumTextHeightwith measurement.🤖 Generated with Claude Code
Greptile Summary
Three targeted efficiency improvements to the visual-context pipeline: a bounded FNV-1a pixel-hash cache that skips re-OCR when captured pixels are unchanged, a downscale cap reduction from 1600 → 1200 px (≈44% fewer pixels per Vision pass), and clipboard-relevance pinning that stabilises the prompt head so the engine's KV common prefix survives across keystrokes.
ScreenshotContextGenerator): stride-17 FNV-1a hash gates the Vision pass;finishedExcerptis refactored into a shared helper so hygiene and field-text stripping rerun on every call regardless of cache hit. ThenoRecognizedText/windowTitle fallback path does not populate the cache (noted in a previous review thread).VisualContextModels): measured-tradeoff constant with inline rationale; one-line revert if small-text recall regresses.SuggestionCoordinator+Prediction): non-nil verdicts are pinned per(focusChangeSequence, changeCount); nil verdicts keep re-evaluating.handleSuggestionSettingsChangedoes not clear the memo, so a pinned verdict from one engine can survive an engine switch within the same field session.Confidence Score: 5/5
Safe to merge; all three optimizations are narrowly scoped, each with a documented worst-case blast radius of one stale value for one field session.
Changes are well-contained: the pixel-hash cache degrades gracefully (nil hash disables it, stride-17 was corrected in the follow-up commit), the 1200 px cap is a single constant, and the clipboard memo self-heals on focus change or clipboard change. The only gap is that the memo is not invalidated on engine/settings change, which could leave a stale relevance verdict for the remainder of a field session — a minor behavioural edge case with no data-loss or security implications.
Cotabby/App/Coordinators/SuggestionCoordinator+Lifecycle.swift — handleSuggestionSettingsChange does not clear clipboardPrefaceMemo.
Important Files Changed
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[generateContext called] --> B[captureScreenshot] B --> C[pixelHash of image] C --> D{hash in cache?} D -- yes --> E[finishedExcerpt\nhygiene + field-text strip] D -- no --> F[textExtractor.extractText\nVision OCR pass] F --> G{noRecognizedText?} G -- yes, windowTitle exists --> H[return title excerpt\ncache NOT populated] G -- no --> I[storeExtraction in cache] I --> E E --> J[VisualContextExcerpt] K[pinnedClipboardContext] --> L{settings enabled?} L -- no --> M[return nil] L -- yes --> N{memo hit?\nfocusSeq + changeCount match\nvalue != nil} N -- yes --> O[return pinned value] N -- no --> P[truncatedPromptPrefix\nclipboardRelevanceFilter] P --> Q[store ClipboardPrefaceMemo] Q --> R[return value]Comments Outside Diff (1)
Cotabby/Services/Visual/ScreenshotContextGenerator.swift, line 79-98 (link)noRecognizedTextbranch never populates the cacheWhen Vision finds no text but a valid
windowTitleis present, the function returns early at line 98 without callingstoreExtraction. ThepixelHashwas computed, but the cache is never populated for this path. On the next call with identical pixels — the exact alt-tab scenario the PR targets — the cache check misses and Vision runs again. Applications like Figma or image-heavy UIs that commonly land in this branch get no benefit from the pixel-hash cache.A lightweight fix would be to store a sentinel
ExtractedScreenText(e.g., empty lines) before the early return so the key is recorded; alternatively, the windowTitle path could be cached separately.Reviews (3): Last reviewed commit: "review: stride the pixel hash coprime wi..." | Re-trigger Greptile