Polylogue Architecture

Polylogue is a local archive for AI conversations. The system has four rings:

archive substrate
derived read models
user and machine surfaces
verification and maintenance

Rings

1. Archive Substrate

Owns stored meaning:

source acquisition and provider detection
provider parsing and normalization
SQLite persistence and search indexes
archive-level query and runtime operations

Primary modules:

polylogue/sources/
polylogue/pipeline/
polylogue/storage/
polylogue/lib/
polylogue/operations/archive.py

2. Derived Read Models

Stored products computed over the archive:

session profiles
work events, phases, threads
day and week summaries
provider-level analytics and tag rollups

Primary modules:

polylogue/products/
polylogue/storage/session_product_*.py
polylogue/storage/repository_product_*.py

3. Surfaces

These expose the archive and its products:

CLI: polylogue/cli/
Python API: polylogue/api/__init__.py
MCP server: polylogue/mcp/
site generation: polylogue/site/
dashboard and TUI: polylogue/ui/
renderers: polylogue/rendering/

Leaf adapters over archive operations and derived products.

4. Verification and Maintenance

schema inference and verification
synthetic corpus generation
showcase and deterministic acceptance exercises
validation lanes, mutation campaigns, benchmark campaigns

Primary modules:

polylogue/schemas/
polylogue/showcase/
devtools/
tests/

Data Flow

source files (JSON/JSONL/ZIP)
  → detect_provider()          # dispatch.py — shape-based, not filename
  → provider parser            # parsers/{chatgpt,claude,codex,drive}.py
  → content hash (NFC)         # pipeline/ids.py — SHA-256 over normalized payload
  → store (upsert-if-changed)  # storage/ — idempotent by content hash
  → session products           # session_product_*.py — profiles, work events, phases, threads
  → FTS index                  # search_providers/fts5.py — unicode61 tokenizer

           CLI / MCP / Python API
                   ↑
             filter chain → query → storage

The all pipeline stage runs: acquire → parse → materialize → render → site → index. reprocess runs: parse → materialize → render → index (skips acquire).

Provider Detection

Provider	Detected by	Parser
ChatGPT	`mapping` dict with message graph	`parsers/chatgpt.py`
Claude web	`chat_messages` list	`parsers/claude.py`
Claude Code	`parentUuid`/`sessionId` in record array	`parsers/claude.py` (code path)
Codex	Session envelope structure	`parsers/codex.py`
Gemini	`chunkedPrompt.chunks` structure	`parsers/drive.py`

detect_provider() calls each parser's looks_like() in order.

Key Abstractions

Abstraction	Location	Role
`Polylogue`	`facade.py`	Async entry point. Wraps storage + search + pipeline.
`ConversationRepository`	`storage/repository.py`	Mixin-composed async repository (10 mixins for reads, writes, products, vectors, raw).
`SearchProvider` protocol	`protocols.py`	FTS5 and Hybrid (RRF fusion) implementations.
`ConversationFilter`	`lib/filters.py`	Fluent filter chain used by CLI, MCP, and facade.
`Session Products`	`storage/session_product_*.py`	Materialized read models: profiles, work events, phases, threads, aggregates.
`ContentHash`	`pipeline/ids.py`	SHA-256 over NFC-normalized conversation payload. Title, timestamps, messages, attachments are hashed. User metadata (tags, summaries) is excluded — editable metadata doesn't trigger re-import.
`Provider` enum	`types.py`	6 known providers + UNKNOWN. All provider identity flows through this enum.

Database

Single SQLite file, WAL mode.
Schema is fresh-only: no migration chain. On version mismatch the database is wiped and rebuilt. SCHEMA_VERSION lives in storage/backends/schema_ddl.py.
FTS5 with unicode61 tokenizer (no porter stemmer in this SQLite build).

Placement Rules

Substrate (archive meaning)

lib/ — domain types, invariants, shared primitives (no I/O, no storage)
storage/ — SQLite backends, repositories, FTS, search providers
sources/ — provider detection, parsing, acquisition
pipeline/ — stage execution, ingestion, validation, rendering pipeline
products/ — derived read models, session products, analytics
operations/ — operation specs, artifact graph, declared runtime contracts

Surfaces (presentation only)

cli/ — Click commands, shared helpers, output formatting
mcp/ — MCP server tools
api/ — async library API
site/ — static site generation
rendering/ — markdown/HTML renderers
ui/ — TUI, dashboard

Verification (repo health)

proof/ — proof obligations, subject discovery, claim catalog, witnesses
devtools/ — operator tooling, lints, campaigns, rendering
showcase/ — QA exercises, deterministic acceptance tests
tests/ — pytest suite, property tests, integration tests

Cross-cutting

schemas/ — provider schemas, schema inference, validation
scenarios/ — synthetic corpus, scenario families

Key rules

Surfaces may not import substrate internals directly (see layering.yaml).
New semantics go into substrate or products first, then surfaces adapt.
Proof subjects and claims live in proof/; devtools commands that exercise them live in devtools/.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Polylogue Architecture

Rings

1. Archive Substrate

2. Derived Read Models

3. Surfaces

4. Verification and Maintenance

Data Flow

Provider Detection

Key Abstractions

Database

Placement Rules

Substrate (archive meaning)

Surfaces (presentation only)

Verification (repo health)

Cross-cutting

Key rules

FilesExpand file tree

architecture.md

Latest commit

History

architecture.md

File metadata and controls

Polylogue Architecture

Rings

1. Archive Substrate

2. Derived Read Models

3. Surfaces

4. Verification and Maintenance

Data Flow

Provider Detection

Key Abstractions

Database

Placement Rules

Substrate (archive meaning)

Surfaces (presentation only)

Verification (repo health)

Cross-cutting

Key rules