Skip to content

[Hackathon] interview-integrity: Problem 08 (datafacts) — ogha_facts plugin: audience ACLs + pre-hash PII redaction + screening-evaluation scenario & validators#59

Draft
puja-ankitha-ivaturi-ogha wants to merge 3 commits into
projnanda:mainfrom
puja-ankitha-ivaturi-ogha:hackathon/puja-ankitha-ivaturi-ogha-interview-evaluation
Draft

Conversation

@puja-ankitha-ivaturi-ogha

Copy link
Copy Markdown

Problem 08 — datafacts (persona: interview-integrity engineer)

I run the AI services for a verified-screening business — the filter at the front of a client's hiring funnel. A human interviewer conducts a recorded screening interview, then files an evaluation — a PASSED/FAILED verdict plus a detailed report drawn from the interview recording — that we deliver to the company that posted the job. We only screen; candidates who pass go on to the client's own rounds (the hiring decision is theirs, not ours). Our entire value proposition is that the verdict is genuine and grounded: the right candidate, the right job, from the assigned interviewer, provably derived from a real interview recording — not swapped, fabricated, leaked, or stale. This PR models that pipeline on-protocol and hardens it.

What this adds over the merged cid_facts

cid_facts (merged in #31) already solves the three properties Problem 08 asks for — content-addressed URLs, a parents provenance DAG, and identity-signed freshness proofs. Re-implementing those would be duplicate work, so I don't. ogha_facts subclasses CidFacts and adds the two capabilities the datafacts.md wishlist explicitly lists as unclaimed ("fine-grained ACLs", redaction), which an evaluation-delivery pipeline genuinely needs:

  1. Audience-scoped ACLs. CidFacts.request_access is binary — owner-or-public gets read, everyone else is hard-denied with PermissionError. That's wrong for a verdict: it's private, but has several legitimate readers (the interviewer, the service, the posting company, the candidate), while every other company must still be able to audit that a record exists without reading it. ogha_facts.request_access returns read to the content-bound ACL audience and a metadata-tier grant (never a hard denial) to everyone else — fail-closed on content, fail-open on existence. The ACL is part of the hashed content, so who may read a verdict cannot be changed without minting a new URL.

  2. Pre-hash PII redaction. A content hash is permanent — once a candidate's SSN, salary, phone, or DOB is folded into the hash and the URL is handed out, it can't be unpublished without breaking every provenance link to it. So redaction must run before hashing. publish() scrubs free-text fields through redact_pii (a fixpoint loop guaranteeing scan_pii(redact_pii(t)) == {}) before delegating to CidFacts.publish. This is our production PiiDetectionGuardrail re-expressed as pure-Python regexes so Tier 1 stays deterministic.

Nothing fabricated. evaluation_dataset enforces two grounding invariants: the verdict must be exactly PASSED or FAILED (any other string raises), and the interview recording is a required provenance parent — so an evaluation whose recording was never published (a verdict with no interview behind it) is rejected at publish time. Every delivered verdict is cryptographically bound to the recording it was drawn from.

It also tightens one inherited behaviour: verify_freshness anchors signature verification as-of the tick the proof was signed at, so a verdict signed by an interviewer whose key later rotated (they left the panel) still verifies, while a post-rotation forgery with the retired key fails — composing the as-of semantics of ed25519_rotating. Against did_key it degrades to the inherited behaviour.

Deliverables

  • Plugin datafacts/ogha_facts.py → registered ("datafacts", "ogha_facts") in plugins.py + pyproject.toml entry point.
  • Scenario scenarios/interview_evaluation_delivery.yaml + scenarios_builtin/interview_evaluation_delivery.py — five parties (company · OGHA · interviewer · candidate · rival), the interviewer publishes a signed evaluation grounded in the interview recording, OGHA delivers it, and the company verifies the chain and runs four attacks.
  • Two new joint validators in validators.pyevaluation_pii_redacted, evaluation_acl_enforced — registered alongside the four reused provenance validators under interview_evaluation_delivery.
  • Teststest_ogha_facts.py (unit + Hypothesis property tests: redaction-leaves-no-PII, redaction idempotent, publish idempotent, ACL read-iff-audience, as-of freshness across key rotation) and test_interview_evaluation_delivery.py (synthetic-trace validator tests + end-to-end runner, determinism across seeds 42/7/1337).

The adversarial proof (FAILS on datafacts_v1, PASSES on ogha_facts)

Running the scenario with layers.datafacts: datafacts_v1:

PASS provenance_chain_integrity        - chain resolved to depth 2   (liveness; holds on both)
FAIL provenance_substitution_resistant - tampered verdict landed on the real evaluation's URL
FAIL provenance_freshness_unforgeable  - forged freshness claim was accepted
FAIL provenance_chain_unforgeable      - verdict with no interview recording behind it was not rejected
FAIL evaluation_pii_redacted           - 4 PII token(s) survived into the permanent record
FAIL evaluation_acl_enforced           - rival-corp got tier='read' (expected 'metadata')

With layers.datafacts: ogha_facts, all six PASS.

Verify

uv sync
nest run scenarios/interview_evaluation_delivery.yaml
python -c "from pathlib import Path; from nest_core.validators import validate_trace; \
  [print('PASS' if r.passed else 'FAIL', r.name, '-', r.detail) \
   for r in validate_trace(Path('traces/interview_evaluation_delivery.jsonl'), 'interview_evaluation_delivery')]"
# CI: uv run ruff check . && uv run ruff format --check . && uv run pyright && uv run pytest -v

Tradeoffs / scope

  • Redaction is regex-based (SSN, card, email, phone, salary, DOB) — deterministic and dependency-free by design; a production system would layer NER on top, out of scope for Tier 1.
  • The ACL is a content-bound allow-list; capability delegation is the auth layer's job, not datafacts'.
  • Determinism: no time.time(), no unseeded RNG — freshness uses the inherited logical SharedClock; runs are byte-identical across the seed bank.
  • Stays strictly inside datafacts; the identity composition reuses the merged ed25519_rotating (not a second problem).

Draft opened early per the invitation — feedback on the ACL tier model and the as-of freshness composition especially welcome.

PujaIvaturi and others added 3 commits July 2, 2026 20:03
…ce ACLs + pre-hash PII redaction + joint interview-evaluation validators

Problem 08 (datafacts). Builds on the merged cid_facts rather than
re-implementing it: adds fine-grained audience ACLs and pre-hash PII
redaction (both unclaimed on the datafacts wishlist), an as-of freshness
verification that composes with ed25519_rotating, a five-party
interview_evaluation_delivery scenario, and two new joint validators
(PII redaction, ACL enforcement) that FAIL on datafacts_v1 and PASS on
ogha_facts alongside the reused substitution/freshness/provenance checks.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… recording

OGHA is the front-of-funnel screening filter, not a hiring decision — the
verdict is strictly PASSED/FAILED (passed candidates advance to the client's
own rounds). Every evaluation is now bound by provenance to the interview
recording it was drawn from: evaluation_dataset rejects any verdict outside
{PASSED, FAILED} and requires the recording as its sole parent, so publish
refuses a verdict with no real interview behind it. Nothing is fabricated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@puja-ankitha-ivaturi-ogha puja-ankitha-ivaturi-ogha changed the title [Hackathon] interview-integrity: ogha_facts datafacts plugin — audience ACLs + pre-hash PII redaction + screening-evaluation scenario & validators [Hackathon] interview-integrity: Problem 08 (datafacts) — ogha_facts plugin: audience ACLs + pre-hash PII redaction + screening-evaluation scenario & validators Jul 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants