[Hackathon] interview-integrity: Problem 08 (datafacts) — ogha_facts plugin: audience ACLs + pre-hash PII redaction + screening-evaluation scenario & validators#59
Draft
puja-ankitha-ivaturi-ogha wants to merge 3 commits into
Conversation
…ce ACLs + pre-hash PII redaction + joint interview-evaluation validators Problem 08 (datafacts). Builds on the merged cid_facts rather than re-implementing it: adds fine-grained audience ACLs and pre-hash PII redaction (both unclaimed on the datafacts wishlist), an as-of freshness verification that composes with ed25519_rotating, a five-party interview_evaluation_delivery scenario, and two new joint validators (PII redaction, ACL enforcement) that FAIL on datafacts_v1 and PASS on ogha_facts alongside the reused substitution/freshness/provenance checks. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… recording
OGHA is the front-of-funnel screening filter, not a hiring decision — the
verdict is strictly PASSED/FAILED (passed candidates advance to the client's
own rounds). Every evaluation is now bound by provenance to the interview
recording it was drawn from: evaluation_dataset rejects any verdict outside
{PASSED, FAILED} and requires the recording as its sole parent, so publish
refuses a verdict with no real interview behind it. Nothing is fabricated.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem 08 — datafacts (persona: interview-integrity engineer)
I run the AI services for a verified-screening business — the filter at the front of a client's hiring funnel. A human interviewer conducts a recorded screening interview, then files an evaluation — a
PASSED/FAILEDverdict plus a detailed report drawn from the interview recording — that we deliver to the company that posted the job. We only screen; candidates who pass go on to the client's own rounds (the hiring decision is theirs, not ours). Our entire value proposition is that the verdict is genuine and grounded: the right candidate, the right job, from the assigned interviewer, provably derived from a real interview recording — not swapped, fabricated, leaked, or stale. This PR models that pipeline on-protocol and hardens it.What this adds over the merged
cid_factscid_facts(merged in #31) already solves the three properties Problem 08 asks for — content-addressed URLs, aparentsprovenance DAG, and identity-signed freshness proofs. Re-implementing those would be duplicate work, so I don't.ogha_factssubclassesCidFactsand adds the two capabilities thedatafacts.mdwishlist explicitly lists as unclaimed ("fine-grained ACLs", redaction), which an evaluation-delivery pipeline genuinely needs:Audience-scoped ACLs.
CidFacts.request_accessis binary — owner-or-public getsread, everyone else is hard-denied withPermissionError. That's wrong for a verdict: it's private, but has several legitimate readers (the interviewer, the service, the posting company, the candidate), while every other company must still be able to audit that a record exists without reading it.ogha_facts.request_accessreturnsreadto the content-bound ACL audience and ametadata-tier grant (never a hard denial) to everyone else — fail-closed on content, fail-open on existence. The ACL is part of the hashed content, so who may read a verdict cannot be changed without minting a new URL.Pre-hash PII redaction. A content hash is permanent — once a candidate's SSN, salary, phone, or DOB is folded into the hash and the URL is handed out, it can't be unpublished without breaking every provenance link to it. So redaction must run before hashing.
publish()scrubs free-text fields throughredact_pii(a fixpoint loop guaranteeingscan_pii(redact_pii(t)) == {}) before delegating toCidFacts.publish. This is our productionPiiDetectionGuardrailre-expressed as pure-Python regexes so Tier 1 stays deterministic.Nothing fabricated.
evaluation_datasetenforces two grounding invariants: the verdict must be exactlyPASSEDorFAILED(any other string raises), and the interview recording is a required provenance parent — so an evaluation whose recording was never published (a verdict with no interview behind it) is rejected atpublishtime. Every delivered verdict is cryptographically bound to the recording it was drawn from.It also tightens one inherited behaviour:
verify_freshnessanchors signature verification as-of the tick the proof was signed at, so a verdict signed by an interviewer whose key later rotated (they left the panel) still verifies, while a post-rotation forgery with the retired key fails — composing the as-of semantics ofed25519_rotating. Againstdid_keyit degrades to the inherited behaviour.Deliverables
datafacts/ogha_facts.py→ registered("datafacts", "ogha_facts")inplugins.py+pyproject.tomlentry point.scenarios/interview_evaluation_delivery.yaml+scenarios_builtin/interview_evaluation_delivery.py— five parties (company · OGHA · interviewer · candidate · rival), the interviewer publishes a signed evaluation grounded in the interview recording, OGHA delivers it, and the company verifies the chain and runs four attacks.validators.py—evaluation_pii_redacted,evaluation_acl_enforced— registered alongside the four reused provenance validators underinterview_evaluation_delivery.test_ogha_facts.py(unit + Hypothesis property tests: redaction-leaves-no-PII, redaction idempotent, publish idempotent, ACL read-iff-audience, as-of freshness across key rotation) andtest_interview_evaluation_delivery.py(synthetic-trace validator tests + end-to-end runner, determinism across seeds 42/7/1337).The adversarial proof (FAILS on
datafacts_v1, PASSES onogha_facts)Running the scenario with
layers.datafacts: datafacts_v1:With
layers.datafacts: ogha_facts, all six PASS.Verify
Tradeoffs / scope
time.time(), no unseeded RNG — freshness uses the inherited logicalSharedClock; runs are byte-identical across the seed bank.datafacts; the identity composition reuses the mergeded25519_rotating(not a second problem).Draft opened early per the invitation — feedback on the ACL tier model and the as-of freshness composition especially welcome.