Skip to content

Hashevolution/James-RAG-Evol

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

660 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

PROJECT JAMES

JAMES β€” a local-first, auditable knowledge reasoning platform. Graph-RAG retrieval, deterministic contradiction arbitration, append-only audit log, replayable knowledge state, and a human approval gate for self-evolution. Built as a general mother platform through v1.0; domain packs (legal, food, retail, …) branch off only after v1.0 (see docs/PLATFORM_READINESS.md).

One differentiator highlight: Replayable RAG β€” the system's state at any past point can be reconstructed byte-identically via the T7 supersede chain + append-only audit log (reconstruct_view_at). Other first-class differentiators include Graph-RAG retrieval, Knowledge Cascade (Layer 3), Layer 4 Lifecycle (T1–T7), the Plugin API, and the deterministic 4-rule contradiction tree.

License: MIT Status Python 3.11+ OpenSSF Best Practices DOI

PROJECT JAMES β€” 3D ontology graph visualizer

ν•œκ΅­μ–΄ README Β· πŸš€ 처음 μ‹œμž‘ν•˜μ‹œλŠ” λΆ„ (10살도 따라할 수 μžˆμ–΄μš”)


What's Verified (one-screen summary)

The numbers below come from the current main branch β€” not aspirational, not from an older release. Every value is reproducible by cloning + running the listed command.

Surface Verified Where to check
Test suite 3290 tests collected across tests/ (224 test files), all green on PR CI python -m pytest tests/ --collect-only -q
CASCADE / EVENT separation Provable end-to-end via 5 release-gating invariants run against a real wiki fixture (not mocks) tests/test_t7_release_gating_invariants.py
T6 causality cascade 4 additional release-gating invariants pin foundational vs corroborative semantics tests/test_t6_release_gating_invariants.py
QVT 3-axis quality baseline path_recall 1.00 / graded_answer 0.58 / abstention_f1 0.67 (median, post-calibration, N=3 paired reruns) eval/qvt/baseline_2a31b20.json
STEP 7 regression 17-query suite with gold_signals + abstention_truth + expected_path.nodes ground truth on 5 queries eval/regression/step7_queries.json v6
F9 entity-anchor q15 fix q15 ("David Soria Parraκ°€ λˆ„κ΅¬μ•Ό?") path_recall 0.00 β†’ 1.00 after JAMES_ENABLE_ENTITY_ANCHOR=1 + JAMES_EMBEDDING_MODEL=BAAI/bge-m3 + JAMES_ENABLE_QUERY_REWRITE=1 reports/research-runs/step7-bench-baseline-run*.json
Module size discipline 20 KB cap enforced on every core/ file. Largest current: core/lifecycle/schema.py at 18.9 KB CLAUDE.md rule 5 + module-size CI gate
Default-off invariant Every routing layer added since v0.3 (D5 / LEO / D1 / T2.D / T6 LLM) defaults OFF β€” production fleets pulling v0.4.1 see byte-identical retrieval to v0.3.3 unless they opt in JAMES_* env audit (CHANGELOG [0.4.x] table)
Deterministic contradiction arbitration classify_contradiction is an LLM-free 4-rule decision tree (~10.2 KB pure function). Audit-replay-safe by construction. core/lifecycle/contradiction_arbiter.py

What is NOT yet headline-verified: a single-page ablation card showing Graph-RAG vs flat RAG on the same fixture. The infrastructure to produce it (scripts/qvt_capture_baseline.py + the 18-cell ablation matrix design from QVT memo Β§5) is wired; the operator-run capture is the late-June deliverable. Until then, the graph contribution is measurable via graph_paths_count per query in any STEP 7 bench output, but not summarized in one table.


Project Status: v0.4.1 β€” T6 Causality Chain (CASCADE extension)

Released 2026-05-28. v0.4.1 closes the CASCADE pillar that v0.4.0 only half-finished: when a base fact's sources are fully removed, edges whose derived_from references that base now auto-invalidate via invalidate_derived_facts β€” the derivation chain stays internally consistent without manual operator intervention. Per-derivation-type semantics (T6.C.b refinement): transitive / inferred are structural chain links (any base empty β†’ invalidate); operator is corroborative (only invalidates when no hard deps AND all operator bases empty).

Pre-v0.4.1: v0.4.0 (2026-05-27) shipped the Layer 4 first bundle β€” T1 Temporal Validity + T7 Supersede Chain + T2 Contradiction Arbitration β€” across an 8-PR Sprint 5 sequence. The CASCADE vs EVENT separation invariant is provable end-to-end via tests/test_t7_release_gating_invariants.py, run against the actual wiki fixture (not mocks). The supersede chain primitive (reconstruct_view_at) lets the system answer "what was true at time T?" deterministically, even after destructive CASCADE operations on unrelated facts.

Pre-v0.4: v0.3.0 (2026-05-17) closed Foundation Hardening β€” all six axes (architecture / eval / observability / security / controlled evolution / real-data validation) green; second-user validation closed 2026-05-13.

  • NOT production-ready β€” operational maturity (HTTPS / SSO / multi-tenancy / backup CLI) is a v1.0 deliverable; see SECURITY.md
  • Designed with security-first principles end to end
  • Open to collaboration β€” external contributors sign a one-click CLA on their first PR (see License)

Strategic frame: Mother Platform, not a single product

JAMES is not building one vertical. It is being hardened as a "mother platform" from which domain packs (legal, food, retail, travel, etc.) can branch off only at v1.0. Until then:

  • No domain-specific features land in core/
  • Every change is graded against the same six-dimension readiness framework (architecture / extension API / eval contract / operational maturity / security boundary / production proof)
  • The plugin contract that future packs will be built against is being designed and stress-tested

See docs/PLATFORM_READINESS.md for the 6 dimensions, 4 gates (v0.2 / v0.3 / v0.4 / v1.0), and 3 branching forms (Domain Pack / Distribution / Vertical Product).


What's Different β€” Replayable RAG

Most RAG systems answer one question: "what's the answer?" JAMES answers two extra:

  • What did the system know at time T? β€” T7 supersede chains preserve historical fact states; reconstruct_view_at(t) returns the edge that was active at any past timestamp, even after unrelated CASCADE delete events.
  • Why did the system say that? β€” every reasoning step (query rewrite, retrieval, rerank, planner, reflect, verify, synth) writes an append-only audit row. scripts/replay_trace.py <trace_id> reconstructs the full sequence byte-identically.

The two combined make JAMES a Replayable RAG system β€” a category distinct from Agentic RAG (which optimises for what an AI can do) and from Mem0-style memory layers (which use an LLM judge to update beliefs). JAMES updates beliefs via a deterministic 4-rule decision tree (core/lifecycle/ contradiction_arbiter.py:classify_contradiction) that is LLM-free by design, and preserves both the old and the new fact for replay rather than overwriting.

How that's built

  1. Deterministic memory lifecycle (v0.4.0) β€” T1 Temporal Validity + T7 Supersede Chain + T2 Contradiction Arbitration. CASCADE (destructive, Layer 3) and EVENT (history-preserving, Layer 4) are guaranteed-separate paths β€” release-gated by tests/test_t7_release_gating_invariants.py against the real wiki fixture.
  2. Sources-aware Graph-RAG — 12 typed relations carry semantic meaning beyond embeddings, and every relation carries sources: [{doc_id, weight, role, ts}] so deleting or modifying a document surgically updates only the affected derived knowledge (Knowledge Cascade A→E, v0.3.0).
  3. Cognitive Layer β€” cross-encoder reranker (default ON), LLM query rewriter, reflection loop (draft β†’ critique β†’ revise), verification engine (security + fact check), and tool router. One trace_id reconstructs the full 8-stage reasoning sequence via scripts/replay_trace.py.
  4. PolicyEngine as a layer, not a sprinkle β€” single point of role / sensitivity decisions wired into retrieval, graph, output, and tools; removing it breaks 6+ modules (v0.2 Axis 4).
  5. Change Request primitive β€” every write (wiki edits, workspace jobs, self-evolution patches) routes through propose β†’ review β†’ admin approval β†’ atomic apply β†’ audit row. No silent writes.
  6. Self-evolution behind a human gate β€” feedback β†’ candidate β†’ bench eval β†’ human approval β†’ deploy β†’ auto-rollback on regression. Every deployed patch has an approver_username audit row (v0.2 Axis 5).
  7. 100% local β€” runs on a laptop with Ollama; no cloud LLM dependency by default.

Each feature is regression-tested against the STEP 7 13-query baseline + RAGAS metrics. PRs touching core/{retrieval,graph,reasoning} cannot land without bench numbers.


Quick Start

Prerequisites

  • Python 3.11+
  • Ollama installed and running
  • Min 16GB RAM (32GB+ recommended)
  • (Optional) NVIDIA GPU for faster inference
  • (Optional) Tavily API key for web search (free 1k/month)

Installation

git clone https://github.com/Hashevolution/James-RAG-Evol
cd James-RAG-Evol

# Configure environment
cp .env.example .env
# Edit .env β€” set JAMES_API_KEY, JAMES_JWT_SECRET

# Install dependencies
pip install -r requirements.txt

# Start the server (admin wizard auto-recommends a model on first login)
python server_llmwiki.py

Open http://localhost:8000/admin β€” the admin wizard measures your hardware and offers a one-click install of an appropriate Ollama model. Then open http://localhost:8000 for the chat UI.


Architecture

[User Query]
     ↓
[Security Filter]      ← injection patterns + PolicyEngine pre-check
     ↓
[Query Router]         ← chat / coding / retrieval / web_search
     ↓
[Query Rewriter]       ← LLM rewrite (opt-in, JAMES_ENABLE_QUERY_REWRITE)
     ↓
[Hybrid Search]        ← Vector(60%) + BM25(20%) + keyword(10%) + name(10%)
     ↓
[Cross-Encoder Rerank] ← MiniLM-L-6-v2 (default ON; JAMES_DISABLE_RERANK=1 to disable)
     ↓
[Graph Engine]         ← DFS + sources-aware + sensitivity gating
     ↓
[Reasoning Loop]       ← retrieve β†’ expand β†’ reflect (opt-in) β†’ verify (opt-in)
     ↓
[Tool Router]          ← read tools direct; write tools β†’ Change Request
     ↓
[Output Filter]        ← PII masking + role-based filter
     ↓
[Answer + Reasoning Path + trace_id]

Every stage emits a row tied to one trace_id. scripts/replay_trace.py <trace_id> reconstructs the full sequence from audit_log. See docs/ARCHITECTURE.md Β§5.7 for the Cognitive Layer design.


Folder Structure

James-RAG-Evol/
β”œβ”€β”€ core/
β”‚   β”œβ”€β”€ reasoning/        retrieval/reflection/verification/tool router
β”‚   β”œβ”€β”€ retrieval/        hybrid search + cross-encoder reranker + query rewriter
β”‚   β”œβ”€β”€ memory/           long-term memory (db / conversation / summaries)
β”‚   β”œβ”€β”€ plugins/          plugin contract surface (Provider Protocol)
β”‚   β”œβ”€β”€ policy_engine.py  single point of role/sensitivity decisions
β”‚   β”œβ”€β”€ change_request.py propose/review/approve write primitive
β”‚   β”œβ”€β”€ cascade.py        file delete/modify β†’ graph surgical update
β”‚   β”œβ”€β”€ graph_editor.py   edge edit (replace/append/delete) + bidirectional sync
β”‚   └── ...
β”œβ”€β”€ eval/                 STEP 7 regression baseline + RAGAS suite
β”œβ”€β”€ llm/                  LLM provider abstraction
β”œβ”€β”€ tools/                Capability-token gated tool modules
β”œβ”€β”€ frontend/             Web UI (HTML + JS)
β”œβ”€β”€ processors/           File preprocessing
β”œβ”€β”€ wiki/                 Knowledge graph (markdown + sources)
β”œβ”€β”€ memory/               Long-term memory DB
β”œβ”€β”€ workspace/            Change requests, patches, proposals
β”œβ”€β”€ scripts/              bench.py / replay_trace.py / ops scripts
β”œβ”€β”€ reports/              Eval results + promo assets
β”œβ”€β”€ docs/                 ARCHITECTURE / PLATFORM_READINESS / ROADMAP / handovers
└── server_llmwiki.py     Main server entry point

Security Approach

JAMES treats security as a design principle, not a feature:

  • 3-stage access control: Vector β†’ Graph β†’ Output
  • RBAC (4 roles) + ABAC (4 sensitivity levels)
  • Instruction isolation: separates commands from data
  • JWT auth + rate limiting + full audit log
  • Sandboxed execution (for tool calls)

Realistic note: synthetic-data testing differs from adversarial production testing. See SECURITY.md.


Current Features

Feature Status
Hybrid Search (Vector + BM25 + keyword + name) Working
Cross-encoder reranker (MiniLM-L-6-v2) Working β€” default ON (v0.3)
LLM query rewriter Opt-in (v0.3)
Sources-aware Graph-RAG (Knowledge Cascade A→E) Working (v0.3)
PolicyEngine (RBAC + ABAC + capability tokens) Working (v0.2 Axis 4)
Reflection loop (draft β†’ critique β†’ revise) Opt-in (v0.3)
Verification engine (security + fact check) Opt-in (v0.3)
Tool router (read direct, write β†’ Change Request) Working (v0.3)
Change Request primitive (wiki + jobs + patches) Working (v0.2.x + v0.3)
Self-evolution (human approval + auto-rollback) Working (v0.2 Axis 5)
Trace replay (one trace_id β†’ full reasoning seq) Working (v0.3)
Multimodal (image/video/audio + OCR-poison quarantine) Working (v0.2 Axis 4)
Web search (Tavily / DuckDuckGo fallback) Working
Multi-LLM routing (Ollama + Claude CLI backends) Working
STEP 7 regression baseline + RAGAS Working (v0.2 Axis 2)
Real-data validation (second-user gate) Passed 2026-05-13

Tech Stack

  • Backend: FastAPI + Uvicorn
  • LLM: Ollama (Gemma, DeepSeek-Coder, LLaVA)
  • Vector DB: ChromaDB
  • Embedding: Sentence-Transformers (MiniLM)
  • Search: BM25 + Vector hybrid
  • Web search: Tavily (primary) + DuckDuckGo (fallback)
  • Auth: JWT (python-jose)
  • Storage: SQLite + markdown wiki

Roadmap

See ROADMAP.md and docs/PLATFORM_READINESS.md. Summary:

  • v0.1: Core engine + scaffolding (released)
  • v0.2: Foundation Hardening β€” 6 axes (closed 2026-05-13)
  • v0.3: Platform Skeleton β€” Cognitive Layer + Knowledge Cascade
    • Change Request primitive (current; released 2026-05-17)
  • v0.4: First Domain Pilot β€” one pack + one external customer, 6-month no-regression
  • v1.0: Production-Grade Mother β€” HTTPS / SSO / multi-tenancy / SOC2 readiness; external developers can publish their own packs

Multi-agent specialists, optional Neo4j backend, OpenAI-compatible API, streaming responses, and federation are speculative Beyond v1.0 work β€” see ROADMAP.md Β§Beyond v1.0.


Contributing

Welcome! See CONTRIBUTING.md.

Priority areas:

  • Documentation, examples, translations
  • Bug fixes, test coverage
  • New tool integrations and LLM provider support

License

Licensed under the MIT License. Use freely. See LICENSE.

External contributors sign a one-click Contributor License Agreement on their first pull request (CLA Assistant). One signature covers all future contributions to the project. See CONTRIBUTING.md for the full Β§License & CLA section, and docs/legal/non-cla-contributions.md for contribution paths that don't require signing.

A full inventory of third-party dependency licenses is available in THIRD_PARTY_LICENSES.md.


Acknowledgements

Inspired by:

  • Microsoft GraphRAG
  • LightRAG
  • Graphiti
  • Palantir-style ontology approaches
  • Architectural direction, Platform Readiness gates, and roadmap framing are discussed with LEO, continuing collaborator on this work

Disclaimer

Use at your own risk. This is research code. No guarantees regarding sensitive-data handling or production security without further hardening.

About

πŸ” Replayable RAG β€” a local-first, auditable knowledge-reasoning platform. Graph-RAG + append-only audit log + deterministic contradiction arbitration + human-gated self-evolution. Replay any past knowledge state byte-for-byte. 100% local (Ollama). MIT.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors