Pre-synthesized content and typed ontology for curaitor knowledge base

## Motivation

Currently, curaitor re-reads full note text every session. Topic notes (e.g., Improving LLM Guidance) can be 500+ lines. Pre-synthesizing key claims, relationships, and summaries would reduce token usage and enable faster reasoning.

Inspired by [obsidian-nerv](https://github.com/odere-pro/obsidian-nerv) — a typed ontology system for Obsidian vaults with write-time validation. See [MCP support issue](https://github.com/odere-pro/obsidian-nerv/issues/5) filed to enable integration.

## Design

### Pre-synthesis: hash-gated frontmatter + derived index

**For article notes** (small): add to frontmatter:
\`\`\`yaml
content_hash: "a3f7b2c1"
llm_summary: "Novel method linking chromatin accessibility to gene expression..."
key_claims:
  - "Outperforms existing multiome prediction by 15%"
relationships:
  - {type: "extends", target: "[[scMultiPreDICT]]"}
\`\`\`
LLM reads frontmatter first. If \`content_hash\` matches body, trust the synthesis. If stale, re-synthesize.

**For topic notes** (large, relationship-heavy): derive a compiled index:
\`\`\`yaml
# config/knowledge-index.yaml
topics:
  Improving LLM Guidance:
    hash: "b4e8c3d2"
    summary: "Patterns for making AI agents reliable..."
    key_themes: [harness-design, context-management, verification]
    relationships:
      - {to: "AI-Assisted Development", type: "see-also"}
      - {to: "Agent Memory Strategies", type: "depends-on"}
\`\`\`

### Typed ontology for notes

Enforce note types and relationships:
- Article types: `article`, `paper`, `tool`, `post`
- Topic types: `topic`, `idea`, `reference`
- Relationship types: `extends`, `compares-to`, `depends-on`, `replaces`, `contains`
- Write-time validation: triage-write.py validates frontmatter schema

### Sync mechanism

- `content_hash` in frontmatter/index detects stale synthesis
- Pre-session script (`scripts/synthesize.py`) validates hashes, re-synthesizes stale entries
- Extend existing `prefetch-review.py` to check synthesis freshness
- Re-synthesis runs LLM pass only on changed notes (not all)

## Implementation steps

1. Add `content_hash` + `llm_summary` + `key_claims` to triage-write.py note template
2. Create `scripts/synthesize.py` — hash-check + LLM synthesis for stale notes
3. Create `config/knowledge-index.yaml` — derived index for topic notes
4. Extend `prefetch-review.py` to load synthesis data
5. Add relationship types to reading-prefs or a new ontology config
6. Optional: integrate with obsidian-nerv if MCP support lands

## Open questions

- Is YAML frontmatter sufficient, or do we need sidecar files for large syntheses?
- Should relationships be stored per-note or in the central index?
- How to handle synthesis for notes that are mostly links (e.g., Tools & Projects)?
- Should the synthesis script run as a cron job or on-demand before review sessions?

---

## Sync problem: three approaches

The core challenge with pre-synthesized content is keeping it in sync with the source text. Three approaches with different tradeoffs:

### Option A: Hash-gated frontmatter synthesis
Store synthesis inline in YAML frontmatter alongside a content hash:
```yaml
content_hash: "a3f7b2c1"  # SHA256 of body text
llm_summary: "Novel method linking chromatin accessibility to gene expression..."
key_claims:
  - "Outperforms existing multiome prediction by 15%"
relationships:
  - {type: extends, target: "[[scMultiPreDICT]]"}
```
- LLM reads frontmatter first; if `content_hash` matches body, trusts the synthesis
- If hash is stale, re-synthesize on demand
- **Pro:** co-located, human can inspect synthesis, single file
- **Con:** frontmatter gets large; YAML isn't great for long text

### Option B: Sidecar files (`.synthesis.yaml`)
Separate file alongside each note:
```
Curaitor/Inbox/SPEAR.md            <- human reads this
Curaitor/Inbox/.SPEAR.synth.yaml   <- LLM reads this
```
- **Pro:** clean separation, no frontmatter bloat, richer structures
- **Con:** two files to manage, Obsidian doesn't index hidden files, can drift

### Option C: Derived index (single compiled file)
One file for the entire knowledge base:
```yaml
# config/knowledge-index.yaml
articles:
  SPEAR:
    path: Curaitor/Inbox/SPEAR.md
    hash: a3f7b2c1
    summary: "..."
    claims: [...]
    relationships: [...]
```
- **Pro:** single file for LLM to read, efficient token-wise, queryable
- **Con:** another thing to keep in sync; pre-session script validates all hashes

### Recommendation
Option A (hash-gated frontmatter) for article notes (small, structured). Option C (derived index) for topic notes (large, relationship-heavy). A pre-session script (`scripts/synthesize.py`) validates hashes and re-synthesizes stale entries. Extend existing `prefetch-review.py` to check synthesis freshness.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-synthesized content and typed ontology for curaitor knowledge base #1

Motivation

Design

Pre-synthesis: hash-gated frontmatter + derived index

config/knowledge-index.yaml

Typed ontology for notes

Sync mechanism

Implementation steps

Open questions

Sync problem: three approaches

Option A: Hash-gated frontmatter synthesis

Option B: Sidecar files (`.synthesis.yaml`)

Option C: Derived index (single compiled file)

Recommendation

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Pre-synthesized content and typed ontology for curaitor knowledge base #1

Description

Motivation

Design

Pre-synthesis: hash-gated frontmatter + derived index

config/knowledge-index.yaml

Typed ontology for notes

Sync mechanism

Implementation steps

Open questions

Sync problem: three approaches

Option A: Hash-gated frontmatter synthesis

Option B: Sidecar files (.synthesis.yaml)

Option C: Derived index (single compiled file)

Recommendation

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Option B: Sidecar files (`.synthesis.yaml`)