Skip to content

uw-swag/AssertFlip

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🐛 AssertFlip Replication Package

This is the official replication package for our paper:

AssertFlip: Reproducing Bugs via Inversion of LLM-Generated Passing Tests

AssertFlip is a system for automatically generating bug-reproducing tests from natural language reports


🔧 Setup Instructions

1. Requirements

  • Python 3.10+
  • Docker
  • conda (used inside Docker containers)

Install dependencies:

pip install -e .

2. Add LLM API Credentials

The file scripts/.env is already created.

Just open it and insert your own credentials like this:

AZURE_API_KEY=your_azure_api_key
AZURE_API_BASE=https://your_azure_endpoint
AZURE_API_VERSION=2024-05-01-preview

3. How to Run

Default (used in the paper)

python scripts/run_parallel.py

This uses:

  • Agentless localization
  • Pass-invert strategy
  • 10 regeneration attempts
  • 10 refinement attempts
  • LLM validation enabled
  • Planner enables

Config is controlled in scripts/config.py.


4. Datasets

All datasets are in the datasets folder. These are the exact files used in our experiments:

  • SWT_Verified_Agentless_Test_Source_Skeleton.json (default for Verified)
  • SWT_Verified_Test_Source_Skeleton.json (perfect localization dataset)
  • SWT_Lite_Agentless_Test_Source_Skeleton.json (default for Lite)
  • SWT_Lite_Agentless_Unique_Only.json (default for Lite 188 unique instances)

To switch datasets, change:

DATASET_PATH in scripts/config.py.

5. Running Ablations

Regeneration Ablation (0 or 5 attempts)

Edit this line in scripts/config.py:

max_regeneration_retries = 1  # for no regenerations 
# or
max_regeneration_retries = 5  # for the 5 regeneration ablation

Then run:

python scripts/run_parallel.py

6. Running No Validation Ablation

python scripts/run_parallel_without_validation_ablation.py

7. Running No Planner Ablation

python scripts/run_parallel_without_planner_ablation.py

8. Perfect Localization

Change dataset in scripts/config.py to:

DATASET_PATH = "datasets/SWT_Verified_Test_Source_Skeleton.json"

Then run the default script again.

python scripts/run_parallel.py

9. Generate Predictions

To generate preds.json from results:

python scripts/generate_preds_phases.py --results-dir results/

We also include our original prediction files in the preds_files folder for direct use.


10. Evaluation Instructions

The previous steps produces predictions in SWT-Bench format. You can then evaluate them using SWT-Bench instructions: https://github.com/logic-star-ai/swt-bench

We also provide:

  • Full outputs preds in preds_files/
  • Full results after evaluating on SWT-Bench for each reported run in evaluation_results_on_SWT_Bench/

Acknowledgment

This project uses components from the opensource test generator Coverup, licensed under the Apache 2.0 License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors