This document records key architectural decisions taken in the Optimo project.
The goal is to preserve reasoning context and prevent architectural drift over time.
Each decision includes:
- context
- decision
- consequences
This is a lightweight ADR log.
Optimo processes multiple OCR variants and must produce stable, reproducible results. Future goals include replay, auditability, and distributed processing.
The reducer is implemented as a pure deterministic function:
(State, Input) -> ReducerResult
The reducer must:
- perform no I/O
- generate no timestamps or random identifiers
- depend only on explicit inputs
- exact replay is possible
- testability is significantly improved
- architectural discipline is required to prevent side effects from leaking into the core
The system must evolve from JSONL logging to more advanced storage systems (e.g., SQLite).
All persistence operations are routed through a dedicated boundary:
state_bridge.rs
The reducer and observation layers must remain storage-agnostic.
- storage can evolve without modifying core logic
- clear separation between computation and infrastructure
- additional mapping layer complexity
Deterministic computation alone is insufficient for audit and diagnostics. System decisions must be explainable.
Every reducer execution produces a structured observation record.
Observations describe:
- convergence status
- ambiguity
- failure conditions
- relevant metadata for analysis
- improved debuggability
- audit-ready execution traces
- increased data volume in logs
Certain reducer outcomes (e.g., ambiguous or failed convergence) must trigger higher-level system reactions.
The reducer signals event necessity via its result. Event construction and persistence are handled by the runtime layer.
- deterministic core remains pure
- event-driven extensions become possible
- additional coordination logic required in orchestration layer
Full event replay may become expensive as the system scales.
Periodic state snapshots are persisted independently of observations.
Current implementation:
data/snapshots.jsonl
- faster replay initialization
- storage overhead increases
- snapshot strategy may need refinement over time
The current implementation uses OCR to generate input data. However, the long-term system goal is broader deterministic document intelligence.
OCR is treated as an input generator rather than the defining capability of the system.
- architecture remains generalizable
- future input sources can be integrated without core redesign
- system messaging must avoid being perceived as “just OCR”
Potential upcoming decisions:
- introduction of SQLite storage backend
- distributed reducer execution model
- deterministic ID strategy across services
- observation schema versioning
- DSL-based document parsing layer