🐛 AssertFlip Replication Package

This is the official replication package for our paper:

AssertFlip: Reproducing Bugs via Inversion of LLM-Generated Passing Tests

AssertFlip is a system for automatically generating bug-reproducing tests from natural language reports

🔧 Setup Instructions

1. Requirements

Python 3.10+
Docker
conda (used inside Docker containers)

Install dependencies:

pip install -e .

2. Add LLM API Credentials

The file scripts/.env is already created.

Just open it and insert your own credentials like this:

AZURE_API_KEY=your_azure_api_key
AZURE_API_BASE=https://your_azure_endpoint
AZURE_API_VERSION=2024-05-01-preview

3. How to Run

Default (used in the paper)

python scripts/run_parallel.py

This uses:

Agentless localization
Pass-invert strategy
10 regeneration attempts
10 refinement attempts
LLM validation enabled
Planner enables

Config is controlled in scripts/config.py.

4. Datasets

All datasets are in the datasets folder. These are the exact files used in our experiments:

SWT_Verified_Agentless_Test_Source_Skeleton.json (default for Verified)
SWT_Verified_Test_Source_Skeleton.json (perfect localization dataset)
SWT_Lite_Agentless_Test_Source_Skeleton.json (default for Lite)
SWT_Lite_Agentless_Unique_Only.json (default for Lite 188 unique instances)

To switch datasets, change:

DATASET_PATH in scripts/config.py.

5. Running Ablations

Regeneration Ablation (0 or 5 attempts)

Edit this line in scripts/config.py:

max_regeneration_retries = 1  # for no regenerations 
# or
max_regeneration_retries = 5  # for the 5 regeneration ablation

Then run:

python scripts/run_parallel.py

6. Running No Validation Ablation

python scripts/run_parallel_without_validation_ablation.py

7. Running No Planner Ablation

python scripts/run_parallel_without_planner_ablation.py

8. Perfect Localization

Change dataset in scripts/config.py to:

DATASET_PATH = "datasets/SWT_Verified_Test_Source_Skeleton.json"

Then run the default script again.

python scripts/run_parallel.py

9. Generate Predictions

To generate preds.json from results:

python scripts/generate_preds_phases.py --results-dir results/

We also include our original prediction files in the preds_files folder for direct use.

10. Evaluation Instructions

The previous steps produces predictions in SWT-Bench format. You can then evaluate them using SWT-Bench instructions: https://github.com/logic-star-ai/swt-bench

We also provide:

Full outputs preds in preds_files/
Full results after evaluating on SWT-Bench for each reported run in evaluation_results_on_SWT_Bench/

Acknowledgment

This project uses components from the opensource test generator Coverup, licensed under the Apache 2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assertflip		assertflip
datasets		datasets
evaluation_results_on_SWT_Bench		evaluation_results_on_SWT_Bench
preds_files		preds_files
scripts		scripts
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🐛 AssertFlip Replication Package

🔧 Setup Instructions

1. Requirements

2. Add LLM API Credentials

3. How to Run

4. Datasets

5. Running Ablations

6. Running No Validation Ablation

7. Running No Planner Ablation

8. Perfect Localization

9. Generate Predictions

10. Evaluation Instructions

Acknowledgment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🐛 AssertFlip Replication Package

🔧 Setup Instructions

1. Requirements

2. Add LLM API Credentials

3. How to Run

4. Datasets

5. Running Ablations

6. Running No Validation Ablation

7. Running No Planner Ablation

8. Perfect Localization

9. Generate Predictions

10. Evaluation Instructions

Acknowledgment

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages