OpenClaw Readwise

A public, opinionated Readwise retrieval skill/tooling repo for assistants that need grounded answers from a user's saved reading.

It combines:

Readwise Reader document search and inspection
local SQLite mirroring for speed and reproducibility
evidence-set construction for retrieval-first answers
synthesis-packet generation from cached evidence
export download/ingest flows
semantic prep/embed scaffolding
retrieval eval tooling

This repo is optimized for personal knowledge retrieval, not generic web search and not Reader write automation.

What this is for

This repo is for the class of questions where a generic model answer is not enough and the real value comes from a user's own reading history.

Typical examples:

"What have I saved about product strategy?"
"Did I save anything on tenant isolation?"
"Show me the strongest things I've read about leadership."
"What else have I saved that's adjacent to this topic?"
"Can you give me a grounded synthesis instead of just a generic answer?"

The goal is to help an assistant do more than keyword search:

retrieve from a personal reading corpus
weight stronger signals like tags and document quality
build evidence sets from cached material
produce answers that can point back to sources

Status

Useful now, but still evolving.

Retrieval and local mirror flows are the strongest parts.
Evidence assembly is solid enough for real use.
Synthesis output is helpful, but still less polished than the retrieval layer.
Semantic indexing and eval features are advanced capabilities.

Feature overview

1) Reader document retrieval

Search, list, and inspect Reader documents.

Useful when you want to answer questions like:

"What have I read about AI agents?"
"Show me archived docs tagged strategy."
"Open this specific saved document and inspect the content."

Core commands:

python3 scripts/readwise_cli.py search-docs "ai agents" --json
python3 scripts/readwise_cli.py list-docs --location archive --limit 10 --json
python3 scripts/readwise_cli.py get-doc <document_id> --json

2) Highlight retrieval

Search highlights globally or fetch highlights for one document.

Useful when you want:

specific passages instead of full documents
quote-level evidence
dense retrieval against what was actually highlighted

Core commands:

python3 scripts/readwise_cli.py search-highlights "deliberate practice" --json
python3 scripts/readwise_cli.py get-doc-highlights <document_id> --json

3) Local mirror / cache

Store Reader metadata, full document details, tags, highlights, and export-derived content in a local SQLite cache.

Why this matters:

faster repeated retrieval
more reproducible outputs
better downstream ranking and synthesis
reduced dependence on live API/CLI calls for every query

Core commands:

python3 scripts/readwise_cli.py init-store
python3 scripts/readwise_cli.py cache-tags --json
python3 scripts/readwise_cli.py cache-list-docs --location archive --limit 25 --json
python3 scripts/readwise_cli.py cache-doc <document_id> --with-highlights --json
python3 scripts/readwise_cli.py cache-tagged-docs --location archive --page-limit 3 --page-size 50 --detail-limit 10 --json

4) Evidence sets for grounded answers

Build a structured evidence set from cached documents and highlights before answering.

This is the main bridge between raw retrieval and assistant-quality responses.

Useful when you want to answer:

"What are the strongest ideas I've saved on product strategy?"
"What have I read about row-level access control?"
"Use only tagged documents for this answer."

Core command:

python3 scripts/readwise_cli.py evidence-set "product strategy" --json

Helpful controls:

python3 scripts/readwise_cli.py evidence-set "product strategy" --strict --json
python3 scripts/readwise_cli.py evidence-set "ai agents" --tagged-only --json
python3 scripts/readwise_cli.py evidence-set "leadership" --broad --json
python3 scripts/readwise_cli.py evidence-set "org design" --counterpoint --json

5) Synthesis packets

Build a synthesis-oriented payload from cached evidence.

This is useful when you want:

themes across multiple saved documents
counterpoints and tensions
source-backed synthesis instead of unsupported prose
a downstream prompt/input for another assistant layer

Core command:

python3 scripts/readwise_cli.py synthesize "product strategy" --json

Useful variants:

python3 scripts/readwise_cli.py synthesize "product strategy" --strict --json
python3 scripts/readwise_cli.py synthesize "ai agents" --tagged-only --json
python3 scripts/readwise_cli.py synthesize "leadership" --counterpoint --json

6) Query expansion and adjacent discovery

Suggest broader related queries, then optionally run them, cache the results, and re-synthesize.

Useful when the first pass is too narrow and you want:

adjacent ideas
related tags
broader coverage before synthesizing

Core commands:

python3 scripts/readwise_cli.py expand-query "openclaw" --json
python3 scripts/readwise_cli.py expand-and-cache "openclaw" --resynthesize --json

7) Export and delta refresh flows

Work with Reader export jobs so the local mirror can be refreshed in larger batches and kept current over time.

Useful when you want:

a fuller local mirror
incremental refreshes
export-based ingestion instead of only live per-query fetches

Core commands:

python3 scripts/readwise_cli.py trigger-export --json
python3 scripts/readwise_cli.py export-status <export_id> --json
python3 scripts/readwise_cli.py wait-export-and-ingest <export_id> --json
python3 scripts/readwise_cli.py latest-export-anchor --json
python3 scripts/readwise_cli.py trigger-delta-export --json
python3 scripts/readwise_cli.py run-delta-refresh --json

8) Sync health / mirror inspection

Inspect how fresh the local mirror is and whether recent sync/export activity looks healthy.

Useful when you want to answer:

"Is the cache stale?"
"Did the last export ingest succeed?"
"What should I run next to refresh the mirror?"

Core commands:

python3 scripts/readwise_cli.py store-stats --json
python3 scripts/readwise_cli.py sync-health --json

9) Semantic prep and embeddings

Prepare semantic representations for cached documents and optionally embed them.

Useful when you want:

a stronger retrieval layer for broader semantic matching
experimental ranking improvements
embedding-backed retrieval scaffolding without adding a heavyweight framework

Core commands:

python3 scripts/readwise_cli.py semantic-prepare-tagged-docs --limit 50 --json
python3 scripts/readwise_cli.py semantic-embed-tagged-docs --limit 100 --json
python3 scripts/readwise_cli.py semantic-list-docs --status embedded --json
python3 scripts/readwise_cli.py semantic-stats --json

10) Retrieval evaluation

Evaluate retrieval behavior for a single query or against a labeled suite.

Useful when you want to ask:

"Is the ranking behaving well for technical compound queries?"
"Are broad conceptual searches drifting too much?"
"Did my heuristic changes improve retrieval quality?"

Core commands:

python3 scripts/readwise_cli.py eval-query "tenant isolation"
python3 scripts/readwise_cli.py eval-suite --json
python3 scripts/readwise_cli.py eval-suite --mode specific_technical_compound --json

Example workflows

Workflow 1: answer "What have I saved about tenant isolation?"

Search or use the local mirror:

python3 scripts/readwise_cli.py search-docs "tenant isolation" --json

Build a grounded evidence set:

python3 scripts/readwise_cli.py evidence-set "tenant isolation" --strict --json

If needed, synthesize:

python3 scripts/readwise_cli.py synthesize "tenant isolation" --strict --json

Best fit:

precise technical lookup
source-backed answer
low tolerance for topical drift

Workflow 2: answer "What are the strongest ideas I've saved on product strategy?"

Start with cached evidence:

python3 scripts/readwise_cli.py evidence-set "product strategy" --strict --json

If coverage is thin, expand:

python3 scripts/readwise_cli.py expand-and-cache "product strategy" --resynthesize --strict --json

Build synthesis packet:

python3 scripts/readwise_cli.py synthesize "product strategy" --strict --json

Best fit:

broad conceptual retrieval
theme extraction
grounded synthesis with precision safeguards

Workflow 3: prioritize curated/tagged reading only

python3 scripts/readwise_cli.py evidence-set "ai agents" --tagged-only --json
python3 scripts/readwise_cli.py synthesize "ai agents" --tagged-only --json

Best fit:

high-signal retrieval
reducing noise
using manual tags as a quality filter

Workflow 4: build and refresh the local mirror

python3 scripts/readwise_cli.py init-store
python3 scripts/readwise_cli.py cache-tags --json
python3 scripts/readwise_cli.py cache-tagged-docs --location archive --page-limit 3 --page-size 50 --detail-limit 10 --json
python3 scripts/readwise_cli.py sync-health --json

Then, for larger refreshes:

python3 scripts/readwise_cli.py trigger-export --json
python3 scripts/readwise_cli.py wait-export-and-ingest <export_id> --json

Best fit:

preparing the system for repeated use
improving speed and reproducibility
building a stronger local retrieval base

Workflow 5: experiment with semantic indexing

python3 scripts/readwise_cli.py semantic-prepare-tagged-docs --limit 50 --json
python3 scripts/readwise_cli.py semantic-embed-tagged-docs --limit 100 --json
python3 scripts/readwise_cli.py semantic-stats --json

Best fit:

advanced retrieval tuning
embedding-backed experimentation
evaluation and iteration

Repository layout

.
├── SKILL.md
├── README.md
├── LICENSE
├── pyproject.toml
├── scripts/
│   ├── readwise_cli.py
│   ├── readwise_connector.py
│   ├── readwise_export.py
│   ├── readwise_normalize.py
│   ├── readwise_semantic.py
│   ├── readwise_store.py
│   └── readwise_synthesis.py
├── references/
│   ├── behavior-contract.md
│   └── eval-cases.example.json
└── data/
    └── readwise/   # runtime cache/output, gitignored

Prerequisites

Python 3.11+
the readwise CLI installed and authenticated
optional: OPENAI_API_KEY for embedding flows

This repo assumes the Readwise CLI is available on PATH.

Data location

By default runtime data is stored under:

data/readwise/

You can override that location with:

export READWISE_LOOKUP_DATA_DIR=/path/to/data/readwise

This affects the local SQLite cache and export artifacts.

Quick start

1) Check the CLI

python3 scripts/readwise_cli.py --help

2) Initialize the local store

python3 scripts/readwise_cli.py init-store

3) Search Reader documents

python3 scripts/readwise_cli.py search-docs "tenant isolation" --json

4) Build an evidence set

python3 scripts/readwise_cli.py evidence-set "product strategy" --json

5) Build a synthesis packet

python3 scripts/readwise_cli.py synthesize "product strategy" --json

Opinionated defaults

This repo intentionally favors:

manual tags as a strong signal
cached/local evidence before expensive or noisy live retrieval
precision safeguards for broad conceptual queries
grounded evidence packets over unsupported synthesis

These are workflow defaults, not universal truths.

What this repo does not focus on

Reader write automation as a primary use case
polished end-user UI
generic vector database abstractions
current-events retrieval

How an assistant might use this repo

A good assistant experience usually looks like this:

detect that a question is really about the user's saved reading
check the local mirror first when it is likely fresh enough
retrieve documents/highlights
build an evidence set
optionally synthesize across evidence
answer naturally, with provenance when useful

That means the user should not always need to say:

"check Readwise"
"search Reader"
"run the Python script"

The best use of this repo is often as a retrieval/evidence layer underneath a conversational assistant.

Advanced notes

references/behavior-contract.md describes the intended retrieval behavior.
references/eval-cases.example.json provides a starter eval suite format.
Semantic embedding uses direct HTTP calls and does not require a heavyweight SDK.

Publishing / safety guidance

If you adapt this repo for your own corpus:

do not commit exported reading data
do not commit local cache databases
do not commit auth material or environment files
audit examples and docs for personal identifiers before publishing

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
docs		docs
references		references
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

OpenClaw Readwise

What this is for

Status

Feature overview

1) Reader document retrieval

2) Highlight retrieval

3) Local mirror / cache

4) Evidence sets for grounded answers

5) Synthesis packets

6) Query expansion and adjacent discovery

7) Export and delta refresh flows

8) Sync health / mirror inspection

9) Semantic prep and embeddings

10) Retrieval evaluation

Example workflows

Workflow 1: answer "What have I saved about tenant isolation?"

Workflow 2: answer "What are the strongest ideas I've saved on product strategy?"

Workflow 3: prioritize curated/tagged reading only

Workflow 4: build and refresh the local mirror

Workflow 5: experiment with semantic indexing

Repository layout

Prerequisites

Data location

Quick start

1) Check the CLI

2) Initialize the local store

3) Search Reader documents

4) Build an evidence set

5) Build a synthesis packet

Opinionated defaults

What this repo does not focus on

How an assistant might use this repo

Advanced notes

Publishing / safety guidance

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages