A public, opinionated Readwise retrieval skill/tooling repo for assistants that need grounded answers from a user's saved reading.
It combines:
- Readwise Reader document search and inspection
- local SQLite mirroring for speed and reproducibility
- evidence-set construction for retrieval-first answers
- synthesis-packet generation from cached evidence
- export download/ingest flows
- semantic prep/embed scaffolding
- retrieval eval tooling
This repo is optimized for personal knowledge retrieval, not generic web search and not Reader write automation.
This repo is for the class of questions where a generic model answer is not enough and the real value comes from a user's own reading history.
Typical examples:
- "What have I saved about product strategy?"
- "Did I save anything on tenant isolation?"
- "Show me the strongest things I've read about leadership."
- "What else have I saved that's adjacent to this topic?"
- "Can you give me a grounded synthesis instead of just a generic answer?"
The goal is to help an assistant do more than keyword search:
- retrieve from a personal reading corpus
- weight stronger signals like tags and document quality
- build evidence sets from cached material
- produce answers that can point back to sources
Useful now, but still evolving.
- Retrieval and local mirror flows are the strongest parts.
- Evidence assembly is solid enough for real use.
- Synthesis output is helpful, but still less polished than the retrieval layer.
- Semantic indexing and eval features are advanced capabilities.
Search, list, and inspect Reader documents.
Useful when you want to answer questions like:
- "What have I read about AI agents?"
- "Show me archived docs tagged strategy."
- "Open this specific saved document and inspect the content."
Core commands:
python3 scripts/readwise_cli.py search-docs "ai agents" --json
python3 scripts/readwise_cli.py list-docs --location archive --limit 10 --json
python3 scripts/readwise_cli.py get-doc <document_id> --jsonSearch highlights globally or fetch highlights for one document.
Useful when you want:
- specific passages instead of full documents
- quote-level evidence
- dense retrieval against what was actually highlighted
Core commands:
python3 scripts/readwise_cli.py search-highlights "deliberate practice" --json
python3 scripts/readwise_cli.py get-doc-highlights <document_id> --jsonStore Reader metadata, full document details, tags, highlights, and export-derived content in a local SQLite cache.
Why this matters:
- faster repeated retrieval
- more reproducible outputs
- better downstream ranking and synthesis
- reduced dependence on live API/CLI calls for every query
Core commands:
python3 scripts/readwise_cli.py init-store
python3 scripts/readwise_cli.py cache-tags --json
python3 scripts/readwise_cli.py cache-list-docs --location archive --limit 25 --json
python3 scripts/readwise_cli.py cache-doc <document_id> --with-highlights --json
python3 scripts/readwise_cli.py cache-tagged-docs --location archive --page-limit 3 --page-size 50 --detail-limit 10 --jsonBuild a structured evidence set from cached documents and highlights before answering.
This is the main bridge between raw retrieval and assistant-quality responses.
Useful when you want to answer:
- "What are the strongest ideas I've saved on product strategy?"
- "What have I read about row-level access control?"
- "Use only tagged documents for this answer."
Core command:
python3 scripts/readwise_cli.py evidence-set "product strategy" --jsonHelpful controls:
python3 scripts/readwise_cli.py evidence-set "product strategy" --strict --json
python3 scripts/readwise_cli.py evidence-set "ai agents" --tagged-only --json
python3 scripts/readwise_cli.py evidence-set "leadership" --broad --json
python3 scripts/readwise_cli.py evidence-set "org design" --counterpoint --jsonBuild a synthesis-oriented payload from cached evidence.
This is useful when you want:
- themes across multiple saved documents
- counterpoints and tensions
- source-backed synthesis instead of unsupported prose
- a downstream prompt/input for another assistant layer
Core command:
python3 scripts/readwise_cli.py synthesize "product strategy" --jsonUseful variants:
python3 scripts/readwise_cli.py synthesize "product strategy" --strict --json
python3 scripts/readwise_cli.py synthesize "ai agents" --tagged-only --json
python3 scripts/readwise_cli.py synthesize "leadership" --counterpoint --jsonSuggest broader related queries, then optionally run them, cache the results, and re-synthesize.
Useful when the first pass is too narrow and you want:
- adjacent ideas
- related tags
- broader coverage before synthesizing
Core commands:
python3 scripts/readwise_cli.py expand-query "openclaw" --json
python3 scripts/readwise_cli.py expand-and-cache "openclaw" --resynthesize --jsonWork with Reader export jobs so the local mirror can be refreshed in larger batches and kept current over time.
Useful when you want:
- a fuller local mirror
- incremental refreshes
- export-based ingestion instead of only live per-query fetches
Core commands:
python3 scripts/readwise_cli.py trigger-export --json
python3 scripts/readwise_cli.py export-status <export_id> --json
python3 scripts/readwise_cli.py wait-export-and-ingest <export_id> --json
python3 scripts/readwise_cli.py latest-export-anchor --json
python3 scripts/readwise_cli.py trigger-delta-export --json
python3 scripts/readwise_cli.py run-delta-refresh --jsonInspect how fresh the local mirror is and whether recent sync/export activity looks healthy.
Useful when you want to answer:
- "Is the cache stale?"
- "Did the last export ingest succeed?"
- "What should I run next to refresh the mirror?"
Core commands:
python3 scripts/readwise_cli.py store-stats --json
python3 scripts/readwise_cli.py sync-health --jsonPrepare semantic representations for cached documents and optionally embed them.
Useful when you want:
- a stronger retrieval layer for broader semantic matching
- experimental ranking improvements
- embedding-backed retrieval scaffolding without adding a heavyweight framework
Core commands:
python3 scripts/readwise_cli.py semantic-prepare-tagged-docs --limit 50 --json
python3 scripts/readwise_cli.py semantic-embed-tagged-docs --limit 100 --json
python3 scripts/readwise_cli.py semantic-list-docs --status embedded --json
python3 scripts/readwise_cli.py semantic-stats --jsonEvaluate retrieval behavior for a single query or against a labeled suite.
Useful when you want to ask:
- "Is the ranking behaving well for technical compound queries?"
- "Are broad conceptual searches drifting too much?"
- "Did my heuristic changes improve retrieval quality?"
Core commands:
python3 scripts/readwise_cli.py eval-query "tenant isolation"
python3 scripts/readwise_cli.py eval-suite --json
python3 scripts/readwise_cli.py eval-suite --mode specific_technical_compound --json- Search or use the local mirror:
python3 scripts/readwise_cli.py search-docs "tenant isolation" --json- Build a grounded evidence set:
python3 scripts/readwise_cli.py evidence-set "tenant isolation" --strict --json- If needed, synthesize:
python3 scripts/readwise_cli.py synthesize "tenant isolation" --strict --jsonBest fit:
- precise technical lookup
- source-backed answer
- low tolerance for topical drift
- Start with cached evidence:
python3 scripts/readwise_cli.py evidence-set "product strategy" --strict --json- If coverage is thin, expand:
python3 scripts/readwise_cli.py expand-and-cache "product strategy" --resynthesize --strict --json- Build synthesis packet:
python3 scripts/readwise_cli.py synthesize "product strategy" --strict --jsonBest fit:
- broad conceptual retrieval
- theme extraction
- grounded synthesis with precision safeguards
python3 scripts/readwise_cli.py evidence-set "ai agents" --tagged-only --json
python3 scripts/readwise_cli.py synthesize "ai agents" --tagged-only --jsonBest fit:
- high-signal retrieval
- reducing noise
- using manual tags as a quality filter
python3 scripts/readwise_cli.py init-store
python3 scripts/readwise_cli.py cache-tags --json
python3 scripts/readwise_cli.py cache-tagged-docs --location archive --page-limit 3 --page-size 50 --detail-limit 10 --json
python3 scripts/readwise_cli.py sync-health --jsonThen, for larger refreshes:
python3 scripts/readwise_cli.py trigger-export --json
python3 scripts/readwise_cli.py wait-export-and-ingest <export_id> --jsonBest fit:
- preparing the system for repeated use
- improving speed and reproducibility
- building a stronger local retrieval base
python3 scripts/readwise_cli.py semantic-prepare-tagged-docs --limit 50 --json
python3 scripts/readwise_cli.py semantic-embed-tagged-docs --limit 100 --json
python3 scripts/readwise_cli.py semantic-stats --jsonBest fit:
- advanced retrieval tuning
- embedding-backed experimentation
- evaluation and iteration
.
├── SKILL.md
├── README.md
├── LICENSE
├── pyproject.toml
├── scripts/
│ ├── readwise_cli.py
│ ├── readwise_connector.py
│ ├── readwise_export.py
│ ├── readwise_normalize.py
│ ├── readwise_semantic.py
│ ├── readwise_store.py
│ └── readwise_synthesis.py
├── references/
│ ├── behavior-contract.md
│ └── eval-cases.example.json
└── data/
└── readwise/ # runtime cache/output, gitignored
- Python 3.11+
- the
readwiseCLI installed and authenticated - optional:
OPENAI_API_KEYfor embedding flows
This repo assumes the Readwise CLI is available on PATH.
By default runtime data is stored under:
data/readwise/
You can override that location with:
export READWISE_LOOKUP_DATA_DIR=/path/to/data/readwiseThis affects the local SQLite cache and export artifacts.
python3 scripts/readwise_cli.py --helppython3 scripts/readwise_cli.py init-storepython3 scripts/readwise_cli.py search-docs "tenant isolation" --jsonpython3 scripts/readwise_cli.py evidence-set "product strategy" --jsonpython3 scripts/readwise_cli.py synthesize "product strategy" --jsonThis repo intentionally favors:
- manual tags as a strong signal
- cached/local evidence before expensive or noisy live retrieval
- precision safeguards for broad conceptual queries
- grounded evidence packets over unsupported synthesis
These are workflow defaults, not universal truths.
- Reader write automation as a primary use case
- polished end-user UI
- generic vector database abstractions
- current-events retrieval
A good assistant experience usually looks like this:
- detect that a question is really about the user's saved reading
- check the local mirror first when it is likely fresh enough
- retrieve documents/highlights
- build an evidence set
- optionally synthesize across evidence
- answer naturally, with provenance when useful
That means the user should not always need to say:
- "check Readwise"
- "search Reader"
- "run the Python script"
The best use of this repo is often as a retrieval/evidence layer underneath a conversational assistant.
references/behavior-contract.mddescribes the intended retrieval behavior.references/eval-cases.example.jsonprovides a starter eval suite format.- Semantic embedding uses direct HTTP calls and does not require a heavyweight SDK.
If you adapt this repo for your own corpus:
- do not commit exported reading data
- do not commit local cache databases
- do not commit auth material or environment files
- audit examples and docs for personal identifiers before publishing
MIT