You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -260,37 +236,7 @@ <h2 class="sh">What It Is vs What It Is Not</h2>
260
236
<h1class="page-title">Architecture <span>at a Glance</span></h1>
261
237
<pclass="page-sub">Two layers: a mandatory base every deployment must include, and modular clusters that activate based on your investigation context.</p>
262
238
<divclass="dw"><divclass="dt">Full IRE Architecture — Base + 7 Clusters</div>
263
-
<divclass="mermaid">
264
-
graph TD
265
-
subgraph BASE["🔒 MANDATORY BASE — Always Active"]
266
-
B1[B1 Case Isolation] --- B2[B2 Retrieval Verifier]
<p>Logging that a human approved a finding is necessary but not sufficient. For outputs to be genuinely audit-ready, the approval record must capture the quality and basis of the human judgment — not just the fact that it occurred.</p>
536
456
<divclass="tw"><table>
@@ -565,28 +485,7 @@ <h2 class="sh">What Happens at Each Stage</h2>
<divclass="callout warn"><divclass="cl">Most Common Ingestion Failures</div>
591
490
<p><strong>Scanned PDFs without OCR layer:</strong> If PyMuPDF returns no text, auto-route to Tesseract. <strong>Password-protected files:</strong> Require investigator to decrypt before upload. <strong>Non-standard CSV encodings:</strong> Detect encoding with chardet before parsing.</p></div>
<pclass="page-sub">How the system determines "Rajesh Kumar", "R. Kumar", and "RJSH_KMR" are the same person — and what happens when it is not sure.</p>
<divclass="callout danger"><divclass="cl">Critical: Do Not Skip Entity Resolution</div>
627
500
<p>A 10% entity duplication rate in a 500-node graph produces 50 phantom nodes — enough to break circular flow detection entirely. Build Tier 2/3 resolution before building the graph layer.</p></div>
<pclass="page-sub">How the AI forms a hypothesis, tests it against evidence, and iterates until confident or flagged. This is Cluster D — the reasoning engine.</p>
634
507
<divclass="dw"><divclass="dt">Cluster D — Recursive Reasoning Loop</div>
635
-
<divclass="mermaid">
636
-
flowchart TD
637
-
A([Initialise: Case State + Goal + Model Version]) --> B[PLAN\nWhat evidence do I need next?]
638
-
B --> C[RETRIEVE\nRAG Query + Graph Query]
639
-
C --> D[ANALYZE\n70B Agent temp=0\nRetrieval-only mode]
640
-
D --> E[VERIFY — DETERMINISTIC\nDoes chunk exist? Is claim recoverable?]
641
-
E --> F{Claim passes?}
642
-
F -->|Yes| G[GROUND\nAttach chunk_id + source + page_ref]
643
-
F -->|No| H[Strip Claim\nLog to Unverified Register]
<pclass="page-sub">How PII is removed before data reaches any external AI, and restored only after investigator approval — inside your environment.</p>
670
521
<divclass="dw"><divclass="dt">Cluster E — Pseudonymisation Pipeline</div>
671
-
<divclass="mermaid">
672
-
flowchart TD
673
-
A([Raw Case Data with PII]) --> B[Presidio NER + Custom Recognisers\nDetect: names · IDs · accounts · spoken refs]
674
-
B --> C[Token Map Store\nIn-memory · Encrypted · Never leaves environment]
675
-
C --> D[Pseudonymised Payload\nNames to IND_T001 · Accounts to ACCT_T089\nAmounts/dates/types RETAINED]
676
-
D --> E[External SOTA API\nReasons on tokens only]
677
-
E --> F[Tokenised Reasoning Output]
678
-
F --> G[Human Checkpoint\nCheck for re-identification risk]
679
-
G --> H{Investigator Approves?}
680
-
H -->|No| I[Edit or Reject + Log]
681
-
H -->|Yes| J[De-tokeniser\nRestores real identifiers\nON-PREMISE ONLY]
682
-
J --> K[Final Report with Real Names]
683
-
K --> L[Audit Log: pseudonymisation + approval + de-tokenisation]
<p>Pseudonymisation controls what enters the AI — not what it infers. A model given pseudonymised data may produce output that, combined with other information, re-identifies a data subject. The Human Checkpoint is the primary control. Get legal counsel to review your approach before going live.</p></div>
0 commit comments