Read Cursor state.vscdb values only for matching keys#45
Merged
Conversation
why: parse_cursor_state_db read every key/value row from ItemTable and cursorDiskKV before checking whether the key could hold chat or prompt history. On large state.vscdb databases that materializes gigabytes of irrelevant BLOB values just to discard them, dominating search time. A key-only first pass rides the covering key index, so non-matching BLOB pages are never read, and values are fetched only for keys that can hold chat or prompt history. what: - Split iter_key_value_rows into a key-only scan with SQL-side key-token filtering (LIKE ... COLLATE NOCASE) and per-key indexed value fetches, deduplicating keys while preserving scan order and still yielding every row for repeated keys in index-less databases. - Pass CURSOR_STATE_TOKENS from parse_cursor_state_db and drop its Python-side key filter. - Cover the two-stage SQL trace shapes (both tables, case-insensitive and duplicate keys) and add a regression fixture of many large irrelevant blobs proving value reads stay keyed to matching rows.
why: Record the Cursor IDE state.vscdb fix for the unreleased version so readers with multi-gigabyte Cursor databases know searches stop loading unrelated editor data. what: - Add a Fixes deliverable under the unreleased 0.1.0a17 section describing the key-first read of chat and prompt entries.
7fcbb31 to
989ea18
Compare
Owner
Author
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
state.vscdbtables in two stages: a key-only scan with SQL-side key-token filtering (LIKE … COLLATE NOCASE), then indexedvaluefetches for the matching keys, so large unrelated BLOBs are never materialized.SELECT key, valuescan pulled gigabytes of editor data through memory just to discard it. On real Cursor schemas the key scan rides the covering key index; index-less databases degrade to a plain scan with identical results, including duplicate-key rows.Test Plan
rm -rf docs/_build; uv run ruff check . --fix --show-fixes; uv run ruff format .; uv run ty check; uv run py.test --reruns 0 -vvv; just build-docs;