[Hackathon] interview-integrity: Problem 08 (datafacts) — ogha_facts plugin: audience ACLs + pre-hash PII redaction + screening-evaluation scenario & validators by puja-ankitha-ivaturi-ogha · Pull Request #59 · projnanda/nandatown

puja-ankitha-ivaturi-ogha · 2026-07-03T03:51:37Z

Problem 08 — datafacts (persona: interview-integrity engineer)

I run the AI services for a verified-screening business — the filter at the front of a client's hiring funnel. A human interviewer conducts a recorded screening interview, then files an evaluation — a PASSED/FAILED verdict plus a detailed report drawn from the interview recording — that we deliver to the company that posted the job. We only screen; candidates who pass go on to the client's own rounds (the hiring decision is theirs, not ours). Our entire value proposition is that the verdict is genuine and grounded: the right candidate, the right job, from the assigned interviewer, provably derived from a real interview recording — not swapped, fabricated, leaked, or stale. This PR models that pipeline on-protocol and hardens it.

What this adds over the merged `cid_facts`

cid_facts (merged in #31) already solves the three properties Problem 08 asks for — content-addressed URLs, a parents provenance DAG, and identity-signed freshness proofs. Re-implementing those would be duplicate work, so I don't. ogha_facts subclasses CidFacts and adds the two capabilities the datafacts.md wishlist explicitly lists as unclaimed ("fine-grained ACLs", redaction), which an evaluation-delivery pipeline genuinely needs:

Audience-scoped ACLs. CidFacts.request_access is binary — owner-or-public gets read, everyone else is hard-denied with PermissionError. That's wrong for a verdict: it's private, but has several legitimate readers (the interviewer, the service, the posting company, the candidate), while every other company must still be able to audit that a record exists without reading it. ogha_facts.request_access returns read to the content-bound ACL audience and a metadata-tier grant (never a hard denial) to everyone else — fail-closed on content, fail-open on existence. The ACL is part of the hashed content, so who may read a verdict cannot be changed without minting a new URL.
Pre-hash PII redaction. A content hash is permanent — once a candidate's SSN, salary, phone, or DOB is folded into the hash and the URL is handed out, it can't be unpublished without breaking every provenance link to it. So redaction must run before hashing. publish() scrubs free-text fields through redact_pii (a fixpoint loop guaranteeing scan_pii(redact_pii(t)) == {}) before delegating to CidFacts.publish. This is our production PiiDetectionGuardrail re-expressed as pure-Python regexes so Tier 1 stays deterministic.

Nothing fabricated. evaluation_dataset enforces two grounding invariants: the verdict must be exactly PASSED or FAILED (any other string raises), and the interview recording is a required provenance parent — so an evaluation whose recording was never published (a verdict with no interview behind it) is rejected at publish time. Every delivered verdict is cryptographically bound to the recording it was drawn from.

It also tightens one inherited behaviour: verify_freshness anchors signature verification as-of the tick the proof was signed at, so a verdict signed by an interviewer whose key later rotated (they left the panel) still verifies, while a post-rotation forgery with the retired key fails — composing the as-of semantics of ed25519_rotating. Against did_key it degrades to the inherited behaviour.

Deliverables

Plugin datafacts/ogha_facts.py → registered ("datafacts", "ogha_facts") in plugins.py + pyproject.toml entry point.
Scenario scenarios/interview_evaluation_delivery.yaml + scenarios_builtin/interview_evaluation_delivery.py — five parties (company · OGHA · interviewer · candidate · rival), the interviewer publishes a signed evaluation grounded in the interview recording, OGHA delivers it, and the company verifies the chain and runs four attacks.
Two new joint validators in validators.py — evaluation_pii_redacted, evaluation_acl_enforced — registered alongside the four reused provenance validators under interview_evaluation_delivery.
Tests — test_ogha_facts.py (unit + Hypothesis property tests: redaction-leaves-no-PII, redaction idempotent, publish idempotent, ACL read-iff-audience, as-of freshness across key rotation) and test_interview_evaluation_delivery.py (synthetic-trace validator tests + end-to-end runner, determinism across seeds 42/7/1337).

The adversarial proof (FAILS on `datafacts_v1`, PASSES on `ogha_facts`)

Running the scenario with layers.datafacts: datafacts_v1:

PASS provenance_chain_integrity        - chain resolved to depth 2   (liveness; holds on both)
FAIL provenance_substitution_resistant - tampered verdict landed on the real evaluation's URL
FAIL provenance_freshness_unforgeable  - forged freshness claim was accepted
FAIL provenance_chain_unforgeable      - verdict with no interview recording behind it was not rejected
FAIL evaluation_pii_redacted           - 4 PII token(s) survived into the permanent record
FAIL evaluation_acl_enforced           - rival-corp got tier='read' (expected 'metadata')

With layers.datafacts: ogha_facts, all six PASS.

Verify

uv sync
nest run scenarios/interview_evaluation_delivery.yaml
python -c "from pathlib import Path; from nest_core.validators import validate_trace; \
  [print('PASS' if r.passed else 'FAIL', r.name, '-', r.detail) \
   for r in validate_trace(Path('traces/interview_evaluation_delivery.jsonl'), 'interview_evaluation_delivery')]"
# CI: uv run ruff check . && uv run ruff format --check . && uv run pyright && uv run pytest -v

Tradeoffs / scope

Redaction is regex-based (SSN, card, email, phone, salary, DOB) — deterministic and dependency-free by design; a production system would layer NER on top, out of scope for Tier 1.
The ACL is a content-bound allow-list; capability delegation is the auth layer's job, not datafacts'.
Determinism: no time.time(), no unseeded RNG — freshness uses the inherited logical SharedClock; runs are byte-identical across the seed bank.
Stays strictly inside datafacts; the identity composition reuses the merged ed25519_rotating (not a second problem).

Draft opened early per the invitation — feedback on the ACL tier model and the as-of freshness composition especially welcome.

…ce ACLs + pre-hash PII redaction + joint interview-evaluation validators Problem 08 (datafacts). Builds on the merged cid_facts rather than re-implementing it: adds fine-grained audience ACLs and pre-hash PII redaction (both unclaimed on the datafacts wishlist), an as-of freshness verification that composes with ed25519_rotating, a five-party interview_evaluation_delivery scenario, and two new joint validators (PII redaction, ACL enforcement) that FAIL on datafacts_v1 and PASS on ogha_facts alongside the reused substitution/freshness/provenance checks. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… recording OGHA is the front-of-funnel screening filter, not a hiring decision — the verdict is strictly PASSED/FAILED (passed candidates advance to the client's own rounds). Every evaluation is now bound by provenance to the interview recording it was drawn from: evaluation_dataset rejects any verdict outside {PASSED, FAILED} and requires the recording as its sole parent, so publish refuses a verdict with no real interview behind it. Nothing is fabricated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

PujaIvaturi and others added 3 commits July 2, 2026 20:03

Attribute authored files to Puja Ivaturi, Mahesh Gottam

df2be72

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Hackathon] interview-integrity: Problem 08 (datafacts) — ogha_facts plugin: audience ACLs + pre-hash PII redaction + screening-evaluation scenario & validators#59

puja-ankitha-ivaturi-ogha commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

puja-ankitha-ivaturi-ogha commented Jul 3, 2026

Problem 08 — datafacts (persona: interview-integrity engineer)

What this adds over the merged cid_facts

Deliverables

The adversarial proof (FAILS on datafacts_v1, PASSES on ogha_facts)

Verify

Tradeoffs / scope

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

What this adds over the merged `cid_facts`

The adversarial proof (FAILS on `datafacts_v1`, PASSES on `ogha_facts`)