Skip to content

Read Cursor state.vscdb values only for matching keys#45

Merged
tony merged 2 commits into
masterfrom
fix-cursor-state-two-stage
Jun 5, 2026
Merged

Read Cursor state.vscdb values only for matching keys#45
tony merged 2 commits into
masterfrom
fix-cursor-state-two-stage

Conversation

@tony
Copy link
Copy Markdown
Owner

@tony tony commented Jun 5, 2026

Summary

  • Reads Cursor IDE state.vscdb tables in two stages: a key-only scan with SQL-side key-token filtering (LIKE … COLLATE NOCASE), then indexed value fetches for the matching keys, so large unrelated BLOBs are never materialized.
  • Speeds up searches over multi-gigabyte Cursor IDE databases, where the previous full SELECT key, value scan pulled gigabytes of editor data through memory just to discard it. On real Cursor schemas the key scan rides the covering key index; index-less databases degrade to a plain scan with identical results, including duplicate-key rows.
  • Adds SQL-trace tests covering the two-stage statement shapes (both tables, case-insensitive and duplicate keys) and a regression fixture seeding many large irrelevant blobs to prove value reads stay keyed to matching rows.
  • Records the fix in CHANGES for the unreleased version.

Test Plan

  • rm -rf docs/_build; uv run ruff check . --fix --show-fixes; uv run ruff format .; uv run ty check; uv run py.test --reruns 0 -vvv; just build-docs;

why: parse_cursor_state_db read every key/value row from ItemTable and
cursorDiskKV before checking whether the key could hold chat or prompt
history. On large state.vscdb databases that materializes gigabytes of
irrelevant BLOB values just to discard them, dominating search time. A
key-only first pass rides the covering key index, so non-matching BLOB
pages are never read, and values are fetched only for keys that can
hold chat or prompt history.

what:
- Split iter_key_value_rows into a key-only scan with SQL-side
  key-token filtering (LIKE ... COLLATE NOCASE) and per-key indexed
  value fetches, deduplicating keys while preserving scan order and
  still yielding every row for repeated keys in index-less databases.
- Pass CURSOR_STATE_TOKENS from parse_cursor_state_db and drop its
  Python-side key filter.
- Cover the two-stage SQL trace shapes (both tables, case-insensitive
  and duplicate keys) and add a regression fixture of many large
  irrelevant blobs proving value reads stay keyed to matching rows.
why: Record the Cursor IDE state.vscdb fix for the unreleased version so
readers with multi-gigabyte Cursor databases know searches stop loading
unrelated editor data.
what:
- Add a Fixes deliverable under the unreleased 0.1.0a17 section
  describing the key-first read of chat and prompt entries.
@tony tony force-pushed the fix-cursor-state-two-stage branch from 7fcbb31 to 989ea18 Compare June 5, 2026 02:30
@tony
Copy link
Copy Markdown
Owner Author

tony commented Jun 5, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@tony tony merged commit 0c90e11 into master Jun 5, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant