Polylogue is a local archive for AI conversations. The system has four rings:
- archive substrate
- derived read models
- user and machine surfaces
- verification and maintenance
Owns stored meaning:
- source acquisition and provider detection
- provider parsing and normalization
- SQLite persistence and search indexes
- archive-level query and runtime operations
Primary modules:
polylogue/sources/polylogue/pipeline/polylogue/storage/polylogue/lib/polylogue/operations/archive.py
Stored products computed over the archive:
- session profiles
- work events, phases, threads
- day and week summaries
- provider-level analytics and tag rollups
Primary modules:
polylogue/products/polylogue/storage/session_product_*.pypolylogue/storage/repository_product_*.py
These expose the archive and its products:
- CLI:
polylogue/cli/ - Python API:
polylogue/api/__init__.py - MCP server:
polylogue/mcp/ - site generation:
polylogue/site/ - dashboard and TUI:
polylogue/ui/ - renderers:
polylogue/rendering/
Leaf adapters over archive operations and derived products.
- schema inference and verification
- synthetic corpus generation
- showcase and deterministic acceptance exercises
- validation lanes, mutation campaigns, benchmark campaigns
Primary modules:
polylogue/schemas/polylogue/showcase/devtools/tests/
source files (JSON/JSONL/ZIP)
→ detect_provider() # dispatch.py — shape-based, not filename
→ provider parser # parsers/{chatgpt,claude,codex,drive}.py
→ content hash (NFC) # pipeline/ids.py — SHA-256 over normalized payload
→ store (upsert-if-changed) # storage/ — idempotent by content hash
→ session products # session_product_*.py — profiles, work events, phases, threads
→ FTS index # search_providers/fts5.py — unicode61 tokenizer
CLI / MCP / Python API
↑
filter chain → query → storage
The all pipeline stage runs: acquire → parse → materialize → render → site → index.
reprocess runs: parse → materialize → render → index (skips acquire).
| Provider | Detected by | Parser |
|---|---|---|
| ChatGPT | mapping dict with message graph |
parsers/chatgpt.py |
| Claude web | chat_messages list |
parsers/claude.py |
| Claude Code | parentUuid/sessionId in record array |
parsers/claude.py (code path) |
| Codex | Session envelope structure | parsers/codex.py |
| Gemini | chunkedPrompt.chunks structure |
parsers/drive.py |
detect_provider() calls each parser's looks_like() in order.
| Abstraction | Location | Role |
|---|---|---|
Polylogue |
facade.py |
Async entry point. Wraps storage + search + pipeline. |
ConversationRepository |
storage/repository.py |
Mixin-composed async repository (10 mixins for reads, writes, products, vectors, raw). |
SearchProvider protocol |
protocols.py |
FTS5 and Hybrid (RRF fusion) implementations. |
ConversationFilter |
lib/filters.py |
Fluent filter chain used by CLI, MCP, and facade. |
Session Products |
storage/session_product_*.py |
Materialized read models: profiles, work events, phases, threads, aggregates. |
ContentHash |
pipeline/ids.py |
SHA-256 over NFC-normalized conversation payload. Title, timestamps, messages, attachments are hashed. User metadata (tags, summaries) is excluded — editable metadata doesn't trigger re-import. |
Provider enum |
types.py |
6 known providers + UNKNOWN. All provider identity flows through this enum. |
- Single SQLite file, WAL mode.
- Schema is fresh-only: no migration chain. On version mismatch the database is
wiped and rebuilt.
SCHEMA_VERSIONlives instorage/backends/schema_ddl.py. - FTS5 with
unicode61tokenizer (no porter stemmer in this SQLite build).
lib/— domain types, invariants, shared primitives (no I/O, no storage)storage/— SQLite backends, repositories, FTS, search providerssources/— provider detection, parsing, acquisitionpipeline/— stage execution, ingestion, validation, rendering pipelineproducts/— derived read models, session products, analyticsoperations/— operation specs, artifact graph, declared runtime contracts
cli/— Click commands, shared helpers, output formattingmcp/— MCP server toolsapi/— async library APIsite/— static site generationrendering/— markdown/HTML renderersui/— TUI, dashboard
proof/— proof obligations, subject discovery, claim catalog, witnessesdevtools/— operator tooling, lints, campaigns, renderingshowcase/— QA exercises, deterministic acceptance teststests/— pytest suite, property tests, integration tests
schemas/— provider schemas, schema inference, validationscenarios/— synthetic corpus, scenario families
- Surfaces may not import substrate internals directly (see layering.yaml).
- New semantics go into substrate or products first, then surfaces adapt.
- Proof subjects and claims live in
proof/; devtools commands that exercise them live indevtools/.