Skip to content

sust4in/agentproof

Repository files navigation

agentproof

PyPI version Python versions License: MIT

Composable reward signals from agent trajectories using programmatic verification. agentproof runs real tools against agent outputs and produces deterministic scores for RL training (GRPO, DPO, SFT filtering) or agent quality analysis.

Quick start

pip install agentproof

Compose multiple verifiers into a single reward

from agentproof import RewardComposer, verifier
from agentproof.verifiers import CodeExecution, FormatCheck, StepEfficiency

# Define a custom verifier -- just decorate a function
@verifier(deterministic=True)
def vuln_eliminated(target) -> float:
    """Check if SAST findings decreased after the agent's patch."""
    before = run_scanner(target.context["original_code"])
    after = run_scanner(target.outcome["patched_code"])
    return max(0, len(before.findings) - len(after.findings)) / max(len(before.findings), 1)

# Compose verifiers with weights and gating
composer = RewardComposer([
    vuln_eliminated.with_weight(0.5, required=True),   # required: total=0 if this fails
    CodeExecution(cmd="pytest").with_weight(0.3, required=True),
    FormatCheck(schema="output_schema.json").with_weight(0.1),
    StepEfficiency(max_steps=15).with_weight(0.1),
])

# Score a trajectory
result = my_agent.run(task)
scored = composer.score(trajectory=result.trajectory, context={"original_code": src})

print(scored.reward)       # 0.0 if any required verifier failed, weighted sum otherwise
print(scored.breakdown)    # per-verifier scores and evidence

required=True is the anti-reward-hacking mechanism. If a required verifier returns 0, the entire composed reward is 0 -- no matter how well other verifiers score. An agent cannot game easy signals while failing on the ones that matter.

CLI Usage

# Generate a config file
agentproof init

# Score trajectories from a JSONL file
agentproof check traces.jsonl

# JSON output for scripts
agentproof check --format json | jq .summary

# JUnit XML for CI dashboards (GitHub Actions, Jenkins)
agentproof check --format junit > results.xml

# Override threshold from CLI
agentproof check -t 0.8 traces.jsonl

Export scored data for training

from agentproof import to_sft, to_dpo
from agentproof.sources import JSONLSource

# Load historical trajectories and score them
trajectories = JSONLSource("./agent_runs.jsonl").fetch()
scored = composer.score_batch(trajectories)

# Export for different training methods
to_sft(scored, "./data", min_reward=0.7)   # filter to good examples
to_dpo(scored, "./data")                     # preference pairs

Installation

pip install agentproof                     # core + built-in verifiers
pip install agentproof[jsonschema]         # with JSON schema validation support
pip install agentproof[langsmith]          # with LangSmith source adapter

Built-in verifiers

Verifier What it does
CodeExecution Runs a shell command against trajectory output. Score 1.0 for exit code 0.
FormatCheck Validates trajectory outcome against a JSON schema.
RegexMatch Checks if trajectory outcome matches a regex pattern.
StepEfficiency Penalizes trajectories with too many steps.

Custom verifiers

The @verifier decorator is the primary extension point:

from agentproof import verifier

@verifier(deterministic=True)
def tests_pass(target) -> float:
    """Run pytest and return 1.0 if all tests pass."""
    result = subprocess.run(["pytest", target.context["test_path"]], capture_output=True)
    return 1.0 if result.returncode == 0 else 0.0

For verifiers needing configuration or state, use the class form:

from agentproof import Verifier, VerifyResult

class SASTDiffVerifier(Verifier):
    name = "sast_diff"
    deterministic = True

    def __init__(self, scanner_cmd: str):
        self.scanner_cmd = scanner_cmd

    def verify(self, target) -> VerifyResult:
        # Run scanner before/after comparison
        ...

Third-party verifier packs can register via entry_points:

[project.entry-points."agentproof.verifiers"]
my_verifier = "my_package:MyVerifier"

Export to training frameworks

Export adapters produce JSONL files that TRL, veRL, and OpenRLHF can consume. agentproof never imports training libraries.

Export Format Use case
to_sft instruction/response JSONL Filter high-scoring trajectories for supervised fine-tuning
to_dpo prompt/chosen/rejected JSONL Create preference pairs for DPO training
to_grpo grouped completions JSONL Batch GRPO training with group-level reward normalization

v1.3: Training Loop Validation

v1.3 added closed-loop validation between agentproof scoring and real training frameworks.

Feedback Ingestion from LangSmith

Fetch human and automated feedback from LangSmith and use it as a reward signal:

from agentproof import LangSmithSource, get_feedback, verifier

source = LangSmithSource(project_name="my-agent")
trajectories = source.fetch(include_feedback=True)

@verifier(deterministic=True)
def human_score(target) -> float:
    """Use human correctness feedback as the reward signal."""
    fb = get_feedback(target.context, "correctness")
    return float(fb.get("score", 0.0)) if fb is not None else 0.5

Ground Truth Matching

Match trajectories against a LangSmith dataset and verify against expected outputs:

trajectories = source.fetch(dataset_name="my-labeled-dataset")

from agentproof import get_ground_truth

@verifier(deterministic=True)
def exact_match(target) -> float:
    gt = get_ground_truth(target.context)
    if gt is None:
        return 0.0
    return 1.0 if str(target.trajectory.outcome) == str(gt.get("answer", "")) else 0.0

TRL Integration

Wrap a composer as a live TRL reward function for online GRPO training:

from agentproof.export.grpo import as_trl_reward_func

reward_fn = as_trl_reward_func(composer)
# Pass to TRL: GRPOTrainer(reward_funcs=[reward_fn])

Documentation

Full documentation: https://ogulcanarbc.github.io/agentproof/

Development

See CONTRIBUTING.md for setup instructions and development workflow.

git clone https://github.com/ogulcanarbc/agentproof.git
cd agentproof
make install    # installs dev deps via uv
make all        # runs lint + format-check + typecheck + test

License

MIT

About

No description, website, or topics provided.

Resources

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors