Motivation
Currently, curaitor re-reads full note text every session. Topic notes (e.g., Improving LLM Guidance) can be 500+ lines. Pre-synthesizing key claims, relationships, and summaries would reduce token usage and enable faster reasoning.
Inspired by obsidian-nerv — a typed ontology system for Obsidian vaults with write-time validation. See MCP support issue filed to enable integration.
Design
Pre-synthesis: hash-gated frontmatter + derived index
For article notes (small): add to frontmatter:
```yaml
content_hash: "a3f7b2c1"
llm_summary: "Novel method linking chromatin accessibility to gene expression..."
key_claims:
- "Outperforms existing multiome prediction by 15%"
relationships:
- {type: "extends", target: "[[scMultiPreDICT]]"}
```
LLM reads frontmatter first. If `content_hash` matches body, trust the synthesis. If stale, re-synthesize.
For topic notes (large, relationship-heavy): derive a compiled index:
```yaml
config/knowledge-index.yaml
topics:
Improving LLM Guidance:
hash: "b4e8c3d2"
summary: "Patterns for making AI agents reliable..."
key_themes: [harness-design, context-management, verification]
relationships:
- {to: "AI-Assisted Development", type: "see-also"}
- {to: "Agent Memory Strategies", type: "depends-on"}
```
Typed ontology for notes
Enforce note types and relationships:
- Article types:
article, paper, tool, post
- Topic types:
topic, idea, reference
- Relationship types:
extends, compares-to, depends-on, replaces, contains
- Write-time validation: triage-write.py validates frontmatter schema
Sync mechanism
content_hash in frontmatter/index detects stale synthesis
- Pre-session script (
scripts/synthesize.py) validates hashes, re-synthesizes stale entries
- Extend existing
prefetch-review.py to check synthesis freshness
- Re-synthesis runs LLM pass only on changed notes (not all)
Implementation steps
- Add
content_hash + llm_summary + key_claims to triage-write.py note template
- Create
scripts/synthesize.py — hash-check + LLM synthesis for stale notes
- Create
config/knowledge-index.yaml — derived index for topic notes
- Extend
prefetch-review.py to load synthesis data
- Add relationship types to reading-prefs or a new ontology config
- Optional: integrate with obsidian-nerv if MCP support lands
Open questions
- Is YAML frontmatter sufficient, or do we need sidecar files for large syntheses?
- Should relationships be stored per-note or in the central index?
- How to handle synthesis for notes that are mostly links (e.g., Tools & Projects)?
- Should the synthesis script run as a cron job or on-demand before review sessions?
Sync problem: three approaches
The core challenge with pre-synthesized content is keeping it in sync with the source text. Three approaches with different tradeoffs:
Option A: Hash-gated frontmatter synthesis
Store synthesis inline in YAML frontmatter alongside a content hash:
content_hash: "a3f7b2c1" # SHA256 of body text
llm_summary: "Novel method linking chromatin accessibility to gene expression..."
key_claims:
- "Outperforms existing multiome prediction by 15%"
relationships:
- {type: extends, target: "[[scMultiPreDICT]]"}
- LLM reads frontmatter first; if
content_hash matches body, trusts the synthesis
- If hash is stale, re-synthesize on demand
- Pro: co-located, human can inspect synthesis, single file
- Con: frontmatter gets large; YAML isn't great for long text
Option B: Sidecar files (.synthesis.yaml)
Separate file alongside each note:
Curaitor/Inbox/SPEAR.md <- human reads this
Curaitor/Inbox/.SPEAR.synth.yaml <- LLM reads this
- Pro: clean separation, no frontmatter bloat, richer structures
- Con: two files to manage, Obsidian doesn't index hidden files, can drift
Option C: Derived index (single compiled file)
One file for the entire knowledge base:
# config/knowledge-index.yaml
articles:
SPEAR:
path: Curaitor/Inbox/SPEAR.md
hash: a3f7b2c1
summary: "..."
claims: [...]
relationships: [...]
- Pro: single file for LLM to read, efficient token-wise, queryable
- Con: another thing to keep in sync; pre-session script validates all hashes
Recommendation
Option A (hash-gated frontmatter) for article notes (small, structured). Option C (derived index) for topic notes (large, relationship-heavy). A pre-session script (scripts/synthesize.py) validates hashes and re-synthesizes stale entries. Extend existing prefetch-review.py to check synthesis freshness.
Motivation
Currently, curaitor re-reads full note text every session. Topic notes (e.g., Improving LLM Guidance) can be 500+ lines. Pre-synthesizing key claims, relationships, and summaries would reduce token usage and enable faster reasoning.
Inspired by obsidian-nerv — a typed ontology system for Obsidian vaults with write-time validation. See MCP support issue filed to enable integration.
Design
Pre-synthesis: hash-gated frontmatter + derived index
For article notes (small): add to frontmatter:
```yaml
content_hash: "a3f7b2c1"
llm_summary: "Novel method linking chromatin accessibility to gene expression..."
key_claims:
relationships:
```
LLM reads frontmatter first. If `content_hash` matches body, trust the synthesis. If stale, re-synthesize.
For topic notes (large, relationship-heavy): derive a compiled index:
```yaml
config/knowledge-index.yaml
topics:
Improving LLM Guidance:
hash: "b4e8c3d2"
summary: "Patterns for making AI agents reliable..."
key_themes: [harness-design, context-management, verification]
relationships:
- {to: "AI-Assisted Development", type: "see-also"}
- {to: "Agent Memory Strategies", type: "depends-on"}
```
Typed ontology for notes
Enforce note types and relationships:
article,paper,tool,posttopic,idea,referenceextends,compares-to,depends-on,replaces,containsSync mechanism
content_hashin frontmatter/index detects stale synthesisscripts/synthesize.py) validates hashes, re-synthesizes stale entriesprefetch-review.pyto check synthesis freshnessImplementation steps
content_hash+llm_summary+key_claimsto triage-write.py note templatescripts/synthesize.py— hash-check + LLM synthesis for stale notesconfig/knowledge-index.yaml— derived index for topic notesprefetch-review.pyto load synthesis dataOpen questions
Sync problem: three approaches
The core challenge with pre-synthesized content is keeping it in sync with the source text. Three approaches with different tradeoffs:
Option A: Hash-gated frontmatter synthesis
Store synthesis inline in YAML frontmatter alongside a content hash:
content_hashmatches body, trusts the synthesisOption B: Sidecar files (
.synthesis.yaml)Separate file alongside each note:
Option C: Derived index (single compiled file)
One file for the entire knowledge base:
Recommendation
Option A (hash-gated frontmatter) for article notes (small, structured). Option C (derived index) for topic notes (large, relationship-heavy). A pre-session script (
scripts/synthesize.py) validates hashes and re-synthesizes stale entries. Extend existingprefetch-review.pyto check synthesis freshness.