VeriAct: Beyond Verifiability

VeriAct: Beyond Verifiability -- Agentic Synthesis of Correct and Complete Formal Specifications

Md Rakib Hossain Misu*, Iris Ma, Cristina V. Lopes

✨ Overview

This repository contains the implementation of our paper's methodology

VeriAct: Beyond Verifiability -- Agentic Synthesis of Correct and Complete Formal Specifications

💎 Key Components

baselines — implementation/execution scripts of classical (Daikon, Houdini) vs. prompt-based (SpecGen, AutoSpec, FormalBench) approaches.
optimizer — uses structured OpenJML feedback to iteratively refine prompts.
spec_harness — evaluates correctness and completeness of verifier-accepted specifications beyond syntactic verification.
veriact — an agentic loop that combines code execution, OpenJML verification, and Spec-Harness feedback to synthesize formal specifications.
benchmarks — Two normalized benchmarks are used across experiments.

For full details, see the paper.

🚀 Getting Started

💻 Prerequisites

Python >= 3.10
OpenJML — must be installed and available in PATH as openjml
Check openjml --version

⏳ Install

git clone https://github.com/Mondego/VeriAct.git
cd VeriAct
uv sync --all-extras
source .venv/bin/activate

🔑 API Keys

Create a .env file in config with the keys for the models you intend to use:

API Keys

OPENAI_API_KEY=...       # GPT-4o 
ANTHROPIC_API_KEY=...    # Claude models
GOOGLE_API_KEY=...       # Gemini models
DEEPSEEK_API_KEY=...     # DeepSeek API
MISTRAL_API_KEY=...      # Mistral API
VLLM_API_KEY=...         # Local vLLM server
VLLM_API_BASE=...        # e.g. http://localhost:8000/v1

Note:

💣 Run Command

Run Daikon

Note: Install Daikon and configure required jars in the PATH

python -m baselines.daikon.run \
    --name <experiment_name> \
    --input <path/to/benchmark.json> \
    --output <output_dir> \
    --openjml_timeout 300 \
    --daikon_timeout 600 \
    --threads 4 \
    --verbose

Run Houdini

Note: java version: 1.6.0_21 requires to run Houdini

python -m baselines.houdini.run \
    --name <experiment_name> \
    --input <path/to/benchmark.json> \
    --output <output_dir> \
    --openjml_timeout 300 \
    --threads 4 \
    --verbose

Run SpecGen

python -m baselines.specgen.run \
    --name <experiment_name> \
    --input <path/to/benchmark.json> \
    --output <output_dir> \
    --model gpt-4o \
    --temperature 0.7 \
    --prompt_type zero_shot \
    --max_iterations 10 \
    --openjml_timeout 300 \
    --threads 4 \
    --verbose

Note: prompt_type: zero_shot | two_shot | four_shot

Run AutoSpec

python -m baselines.autospec.run \
    --name <experiment_name> \
    --input <path/to/benchmark.json> \
    --output <output_dir> \
    --model gpt-4o \
    --temperature 0.7 \
    --prompt_type zero_shot \
    --max_iterations 10 \
    --openjml_timeout 300 \
    --threads 4 \
    --verbose \
    --simplify

Note: prompt_type: zero_shot | two_shot | four_shot

Run FormalBench

python -m baselines.formalbench.run \
    --name <experiment_name> \
    --input <path/to/benchmark.json> \
    --output <output_dir> \
    --model gpt-4o \
    --temperature 0.7 \
    --prompt_type zero_shot \
    --max_iterations 5 \
    --openjml_timeout 300 \
    --threads 4 \
    --verbose

Note: prompt_type: zero_shot | two_shot | zs_cot | fs_cot | fs_ltm

Run Optimizer

python -m optimizer.prompt_optimizer \
    --formalbench_path <path/to/benchmarks/formalbench/fb.json> \
    --specgenbench_path <path/to/benchmarks/specgenbench/sgb.json> \
    --optimizers gepa \
    --best_seed zero \           
    --model openai/gpt-4o \
    --reflection_model openai/gpt-4o \
    --log_dir optimizer_logs \
    --output_dir optimizer_results \
    --openjml_output_dir openjml_output

Note: --best_seed: zero | cot | ltm

Run Spec-Harness

python -m spec_harness.eval_llm_response \
  --benchmark_path benchmarks/formalbench/fb.json \
  --llm_response_path path/to/responses.jsonl \
  --openjml openjml \
  --output spec_harness_results \
  --threads 8 \
  --max-pairs 5 \
  --verbose

Note: responses.jsonl is the output file of running the baselines approaches

Note: --max-pairs is the max input/output test pairs to use in mutation

Run VeriAct

# Sequential (one task at a time)
python -m veriact.run_single \
    --benchmark <path/to/benchmark.json> \
    --model gpt-4o \
    --output-dir <output_dir> \
    --max-steps 12 \
    --planning_interval 3

# Parallel
python -m veriact.run_batch \
    --benchmark <path/to/benchmark.json> \
    --model gpt-4o \
    --threads 4 \
    --output-dir <output_dir> \
    --max-steps 12 \
    --planning_interval 3

Note: Note

📁 Repository Structure

├── baselines               # All baselines implementation
│   ├── houdini             # Classic, template based approach 
│   ├── daikon              # Classic, execution trace based approach
│   ├── specgen             # Prompt based, spec mutation 
│   ├── autospec            # Prompt based, program decomposing and static analysis
│   ├── formalbench         # Prompt based, advance prompting with repair guidance
├── benchmarks              # Normalized benchamrk datasets
│   └── specgenbench        # 120 tasks from Leetcode and JML examples
│   ├── formalbench         # 662 tasks from FormalBench and MBJP
├── config                  # API keys in .env file
├── optimizer               # Prompt optimizer implementation
├── spec_harness            # Spec-Harness implementation
├── veriact                 # VeriAct agent implementation
├── scripts                 # CLI run scripts for all baselines, veriact
├── pyproject.toml
├── README.md

📝 Citation

@misc{misu2026veriactverifiabilityagentic,
      title={VeriAct: Beyond Verifiability -- Agentic Synthesis of Correct and Complete Formal Specifications}, 
      author={Md Rakib Hossain Misu and Iris Ma and Cristina V. Lopes},
      year={2026},
      eprint={2604.00280},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2604.00280}, 
}

📧 Contact

If you have any questions or find any issues, please contact us at mdrh@uci.edu

📄 License

This repository is licensed under GNU General Public License v3.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VeriAct: Beyond Verifiability

VeriAct: Beyond Verifiability -- Agentic Synthesis of Correct and Complete Formal Specifications

✨ Overview

💎 Key Components

🚀 Getting Started

💻 Prerequisites

⏳ Install

🔑 API Keys

💣 Run Command

📁 Repository Structure

📝 Citation

📧 Contact

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
baselines		baselines
benchmarks		benchmarks
config		config
optimizer		optimizer
scripts		scripts
spec_harness		spec_harness
veriact		veriact
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

VeriAct: Beyond Verifiability

VeriAct: Beyond Verifiability -- Agentic Synthesis of Correct and Complete Formal Specifications

✨ Overview

💎 Key Components

🚀 Getting Started

💻 Prerequisites

⏳ Install

🔑 API Keys

💣 Run Command

📁 Repository Structure

📝 Citation

📧 Contact

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages