JAMES β a local-first, auditable knowledge reasoning platform. Graph-RAG retrieval, deterministic contradiction arbitration, append-only audit log, replayable knowledge state, and a human approval gate for self-evolution. Built as a general mother platform through v1.0; domain packs (legal, food, retail, β¦) branch off only after v1.0 (see
docs/PLATFORM_READINESS.md).One differentiator highlight: Replayable RAG β the system's state at any past point can be reconstructed byte-identically via the T7 supersede chain + append-only audit log (
reconstruct_view_at). Other first-class differentiators include Graph-RAG retrieval, Knowledge Cascade (Layer 3), Layer 4 Lifecycle (T1βT7), the Plugin API, and the deterministic 4-rule contradiction tree.
νκ΅μ΄ README Β· π μ²μ μμνμλ λΆ (10μ΄λ λ°λΌν μ μμ΄μ)
The numbers below come from the current main branch β not aspirational, not from an older release. Every value is reproducible by cloning + running the listed command.
| Surface | Verified | Where to check |
|---|---|---|
| Test suite | 3290 tests collected across tests/ (224 test files), all green on PR CI |
python -m pytest tests/ --collect-only -q |
| CASCADE / EVENT separation | Provable end-to-end via 5 release-gating invariants run against a real wiki fixture (not mocks) | tests/test_t7_release_gating_invariants.py |
| T6 causality cascade | 4 additional release-gating invariants pin foundational vs corroborative semantics | tests/test_t6_release_gating_invariants.py |
| QVT 3-axis quality baseline | path_recall 1.00 / graded_answer 0.58 / abstention_f1 0.67 (median, post-calibration, N=3 paired reruns) | eval/qvt/baseline_2a31b20.json |
| STEP 7 regression | 17-query suite with gold_signals + abstention_truth + expected_path.nodes ground truth on 5 queries |
eval/regression/step7_queries.json v6 |
| F9 entity-anchor q15 fix | q15 ("David Soria Parraκ° λꡬμΌ?") path_recall 0.00 β 1.00 after JAMES_ENABLE_ENTITY_ANCHOR=1 + JAMES_EMBEDDING_MODEL=BAAI/bge-m3 + JAMES_ENABLE_QUERY_REWRITE=1 |
reports/research-runs/step7-bench-baseline-run*.json |
| Module size discipline | 20 KB cap enforced on every core/ file. Largest current: core/lifecycle/schema.py at 18.9 KB |
CLAUDE.md rule 5 + module-size CI gate |
| Default-off invariant | Every routing layer added since v0.3 (D5 / LEO / D1 / T2.D / T6 LLM) defaults OFF β production fleets pulling v0.4.1 see byte-identical retrieval to v0.3.3 unless they opt in | JAMES_* env audit (CHANGELOG [0.4.x] table) |
| Deterministic contradiction arbitration | classify_contradiction is an LLM-free 4-rule decision tree (~10.2 KB pure function). Audit-replay-safe by construction. |
core/lifecycle/contradiction_arbiter.py |
What is NOT yet headline-verified: a single-page ablation card showing Graph-RAG vs flat RAG on the same fixture. The infrastructure to produce it (scripts/qvt_capture_baseline.py + the 18-cell ablation matrix design from QVT memo Β§5) is wired; the operator-run capture is the late-June deliverable. Until then, the graph contribution is measurable via graph_paths_count per query in any STEP 7 bench output, but not summarized in one table.
Released 2026-05-28. v0.4.1 closes the CASCADE pillar that v0.4.0 only half-finished: when a base fact's sources are fully removed, edges whose derived_from references that base now auto-invalidate via invalidate_derived_facts β the derivation chain stays internally consistent without manual operator intervention. Per-derivation-type semantics (T6.C.b refinement): transitive / inferred are structural chain links (any base empty β invalidate); operator is corroborative (only invalidates when no hard deps AND all operator bases empty).
Pre-v0.4.1: v0.4.0 (2026-05-27) shipped the Layer 4 first
bundle β T1 Temporal Validity + T7 Supersede Chain + T2
Contradiction Arbitration β across an 8-PR Sprint 5 sequence.
The CASCADE vs EVENT separation invariant is provable end-to-end
via tests/test_t7_release_gating_invariants.py, run against the
actual wiki fixture (not mocks). The supersede chain primitive
(reconstruct_view_at) lets the system answer "what was true at
time T?" deterministically, even after destructive CASCADE
operations on unrelated facts.
Pre-v0.4: v0.3.0 (2026-05-17) closed Foundation Hardening β all six axes (architecture / eval / observability / security / controlled evolution / real-data validation) green; second-user validation closed 2026-05-13.
- NOT production-ready β operational maturity (HTTPS / SSO / multi-tenancy / backup CLI) is a v1.0 deliverable; see SECURITY.md
- Designed with security-first principles end to end
- Open to collaboration β external contributors sign a one-click CLA on their first PR (see License)
JAMES is not building one vertical. It is being hardened as a "mother platform" from which domain packs (legal, food, retail, travel, etc.) can branch off only at v1.0. Until then:
- No domain-specific features land in
core/ - Every change is graded against the same six-dimension readiness framework (architecture / extension API / eval contract / operational maturity / security boundary / production proof)
- The plugin contract that future packs will be built against is being designed and stress-tested
See docs/PLATFORM_READINESS.md for
the 6 dimensions, 4 gates (v0.2 / v0.3 / v0.4 / v1.0), and 3
branching forms (Domain Pack / Distribution / Vertical Product).
Most RAG systems answer one question: "what's the answer?" JAMES answers two extra:
- What did the system know at time T? β T7 supersede chains
preserve historical fact states;
reconstruct_view_at(t)returns the edge that was active at any past timestamp, even after unrelated CASCADE delete events. - Why did the system say that? β every reasoning step (query
rewrite, retrieval, rerank, planner, reflect, verify, synth)
writes an append-only audit row.
scripts/replay_trace.py <trace_id>reconstructs the full sequence byte-identically.
The two combined make JAMES a Replayable RAG system β a
category distinct from Agentic RAG (which optimises for what an
AI can do) and from Mem0-style memory layers (which use an LLM
judge to update beliefs). JAMES updates beliefs via a
deterministic 4-rule decision tree (core/lifecycle/ contradiction_arbiter.py:classify_contradiction) that is
LLM-free by design, and preserves both the old and the new fact
for replay rather than overwriting.
- Deterministic memory lifecycle (v0.4.0) β T1 Temporal
Validity + T7 Supersede Chain + T2 Contradiction Arbitration.
CASCADE (destructive, Layer 3) and EVENT (history-preserving,
Layer 4) are guaranteed-separate paths β release-gated by
tests/test_t7_release_gating_invariants.pyagainst the real wiki fixture. - Sources-aware Graph-RAG β 12 typed relations carry semantic
meaning beyond embeddings, and every relation carries
sources: [{doc_id, weight, role, ts}]so deleting or modifying a document surgically updates only the affected derived knowledge (Knowledge Cascade AβE, v0.3.0). - Cognitive Layer β cross-encoder reranker (default ON), LLM
query rewriter, reflection loop (draft β critique β revise),
verification engine (security + fact check), and tool router.
One
trace_idreconstructs the full 8-stage reasoning sequence viascripts/replay_trace.py. - PolicyEngine as a layer, not a sprinkle β single point of role / sensitivity decisions wired into retrieval, graph, output, and tools; removing it breaks 6+ modules (v0.2 Axis 4).
- Change Request primitive β every write (wiki edits, workspace jobs, self-evolution patches) routes through propose β review β admin approval β atomic apply β audit row. No silent writes.
- Self-evolution behind a human gate β feedback β candidate β
bench eval β human approval β deploy β auto-rollback on
regression. Every deployed patch has an
approver_usernameaudit row (v0.2 Axis 5). - 100% local β runs on a laptop with Ollama; no cloud LLM dependency by default.
Each feature is regression-tested against the STEP 7 13-query baseline + RAGAS metrics. PRs touching
core/{retrieval,graph,reasoning}cannot land without bench numbers.
- Python 3.11+
- Ollama installed and running
- Min 16GB RAM (32GB+ recommended)
- (Optional) NVIDIA GPU for faster inference
- (Optional) Tavily API key for web search (free 1k/month)
git clone https://github.com/Hashevolution/James-RAG-Evol
cd James-RAG-Evol
# Configure environment
cp .env.example .env
# Edit .env β set JAMES_API_KEY, JAMES_JWT_SECRET
# Install dependencies
pip install -r requirements.txt
# Start the server (admin wizard auto-recommends a model on first login)
python server_llmwiki.pyOpen http://localhost:8000/admin β the admin wizard measures your
hardware and offers a one-click install of an appropriate Ollama
model. Then open http://localhost:8000 for the chat UI.
[User Query]
β
[Security Filter] β injection patterns + PolicyEngine pre-check
β
[Query Router] β chat / coding / retrieval / web_search
β
[Query Rewriter] β LLM rewrite (opt-in, JAMES_ENABLE_QUERY_REWRITE)
β
[Hybrid Search] β Vector(60%) + BM25(20%) + keyword(10%) + name(10%)
β
[Cross-Encoder Rerank] β MiniLM-L-6-v2 (default ON; JAMES_DISABLE_RERANK=1 to disable)
β
[Graph Engine] β DFS + sources-aware + sensitivity gating
β
[Reasoning Loop] β retrieve β expand β reflect (opt-in) β verify (opt-in)
β
[Tool Router] β read tools direct; write tools β Change Request
β
[Output Filter] β PII masking + role-based filter
β
[Answer + Reasoning Path + trace_id]
Every stage emits a row tied to one trace_id.
scripts/replay_trace.py <trace_id> reconstructs the full sequence
from audit_log. See docs/ARCHITECTURE.md Β§5.7
for the Cognitive Layer design.
James-RAG-Evol/
βββ core/
β βββ reasoning/ retrieval/reflection/verification/tool router
β βββ retrieval/ hybrid search + cross-encoder reranker + query rewriter
β βββ memory/ long-term memory (db / conversation / summaries)
β βββ plugins/ plugin contract surface (Provider Protocol)
β βββ policy_engine.py single point of role/sensitivity decisions
β βββ change_request.py propose/review/approve write primitive
β βββ cascade.py file delete/modify β graph surgical update
β βββ graph_editor.py edge edit (replace/append/delete) + bidirectional sync
β βββ ...
βββ eval/ STEP 7 regression baseline + RAGAS suite
βββ llm/ LLM provider abstraction
βββ tools/ Capability-token gated tool modules
βββ frontend/ Web UI (HTML + JS)
βββ processors/ File preprocessing
βββ wiki/ Knowledge graph (markdown + sources)
βββ memory/ Long-term memory DB
βββ workspace/ Change requests, patches, proposals
βββ scripts/ bench.py / replay_trace.py / ops scripts
βββ reports/ Eval results + promo assets
βββ docs/ ARCHITECTURE / PLATFORM_READINESS / ROADMAP / handovers
βββ server_llmwiki.py Main server entry point
JAMES treats security as a design principle, not a feature:
- 3-stage access control: Vector β Graph β Output
- RBAC (4 roles) + ABAC (4 sensitivity levels)
- Instruction isolation: separates commands from data
- JWT auth + rate limiting + full audit log
- Sandboxed execution (for tool calls)
Realistic note: synthetic-data testing differs from adversarial production testing. See SECURITY.md.
| Feature | Status |
|---|---|
| Hybrid Search (Vector + BM25 + keyword + name) | Working |
| Cross-encoder reranker (MiniLM-L-6-v2) | Working β default ON (v0.3) |
| LLM query rewriter | Opt-in (v0.3) |
| Sources-aware Graph-RAG (Knowledge Cascade AβE) | Working (v0.3) |
| PolicyEngine (RBAC + ABAC + capability tokens) | Working (v0.2 Axis 4) |
| Reflection loop (draft β critique β revise) | Opt-in (v0.3) |
| Verification engine (security + fact check) | Opt-in (v0.3) |
| Tool router (read direct, write β Change Request) | Working (v0.3) |
| Change Request primitive (wiki + jobs + patches) | Working (v0.2.x + v0.3) |
| Self-evolution (human approval + auto-rollback) | Working (v0.2 Axis 5) |
Trace replay (one trace_id β full reasoning seq) |
Working (v0.3) |
| Multimodal (image/video/audio + OCR-poison quarantine) | Working (v0.2 Axis 4) |
| Web search (Tavily / DuckDuckGo fallback) | Working |
| Multi-LLM routing (Ollama + Claude CLI backends) | Working |
| STEP 7 regression baseline + RAGAS | Working (v0.2 Axis 2) |
| Real-data validation (second-user gate) | Passed 2026-05-13 |
- Backend: FastAPI + Uvicorn
- LLM: Ollama (Gemma, DeepSeek-Coder, LLaVA)
- Vector DB: ChromaDB
- Embedding: Sentence-Transformers (MiniLM)
- Search: BM25 + Vector hybrid
- Web search: Tavily (primary) + DuckDuckGo (fallback)
- Auth: JWT (python-jose)
- Storage: SQLite + markdown wiki
See ROADMAP.md and docs/PLATFORM_READINESS.md.
Summary:
- v0.1: Core engine + scaffolding (released)
- v0.2: Foundation Hardening β 6 axes (closed 2026-05-13)
- v0.3: Platform Skeleton β Cognitive Layer + Knowledge Cascade
- Change Request primitive (current; released 2026-05-17)
- v0.4: First Domain Pilot β one pack + one external customer, 6-month no-regression
- v1.0: Production-Grade Mother β HTTPS / SSO / multi-tenancy / SOC2 readiness; external developers can publish their own packs
Multi-agent specialists, optional Neo4j backend, OpenAI-compatible
API, streaming responses, and federation are speculative Beyond
v1.0 work β see ROADMAP.md Β§Beyond v1.0.
Welcome! See CONTRIBUTING.md.
Priority areas:
- Documentation, examples, translations
- Bug fixes, test coverage
- New tool integrations and LLM provider support
Licensed under the MIT License. Use freely. See LICENSE.
External contributors sign a one-click Contributor License Agreement on their first pull request (CLA Assistant). One signature covers all future contributions to the project. See CONTRIBUTING.md for the full Β§License & CLA section, and docs/legal/non-cla-contributions.md for contribution paths that don't require signing.
A full inventory of third-party dependency licenses is available in THIRD_PARTY_LICENSES.md.
Inspired by:
- Microsoft GraphRAG
- LightRAG
- Graphiti
- Palantir-style ontology approaches
- Architectural direction, Platform Readiness gates, and roadmap framing are discussed with LEO, continuing collaborator on this work
Use at your own risk. This is research code. No guarantees regarding sensitive-data handling or production security without further hardening.
