Skip to content

kupl/pig_artifact

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

PIG: Leveraging Large Language Models for Python Library Migrations

This repository contains the data and results for the paper "PIG: Leveraging Large Language Models for Python Library Migrations".

πŸ“„ Paper

PIG: Leveraging Large Language Models for Python Library Migrations

πŸ“¦ Artifact Documentation

This repository includes additional documentation required for artifact evaluation:

  • REQUIREMENTS: Describes the hardware and software dependencies required to run the artifact, including Docker environment and Python dependencies.
  • STATUS: explains the badge(s) applied for and justifies why the artifact satisfies the criteria.
  • LICENSE: specifies the distribution terms of this artifact.

Source Code: DOI

Running Via Docker

To run the code using Docker, follow these steps:

  1. Build the Docker Image: Run the following command in the terminal from the root directory of the repository:
git clone https://github.com/kupl/pig_artifact.git
cd pig_artifact
docker build -t pig dockerfile/.
  1. Run the Docker Container: After building the image, run the container with:
docker run -it pig

This command will start the Docker container and open an interactive terminal session.


πŸ“Š Reproducing Tables

The following scripts reproduce the main tables reported in the paper. Each script computes the results from manually reviewed data recorded per model, stored in rq1, rq2, and rq3 respectively. The scripts will read the data from these directories, perform the necessary calculations, and print the results to the terminal.

To reproduce each table, run:

# Table1: Effectiveness
python results/effectiveness.py

# Figure 6: Ablation Study
python results/ablation.py

# Talbe 2: Data Leakage
python results/leakage.py

Note: results/ablation.py generates a bar chart and saves it as results/ablation_result.png. To view the image, copy it out of the container using docker cp.

docker cp <container_id>:/artifact_pig/results/ablation_result.png ./ablation_result.png

πŸ“‰ Error Analysis (Fig. 2)

llm_answer/error_analysis.py reproduces the error type proportion chart (Fig. 2) using the manually labeled error data in llm_answer/error.json.

python llm_answer/error_analysis.py

Note: llm_answer/error_analysis.py generates a bar chart and saves it as llm_answer/error_proportions.png. To view the image, copy it out of the container using docker cp.

docker cp <container_id>:/artifact_pig/llm_answer/error_proportions.png ./error_proportions.png

πŸ” Limitations and Future Work (Section 4.3)

results/discussion.py analyzes the failure cases discussed in Section 4.3, using the manually labeled data in results/rq3/discussion.json.

python results/discussion.py

This script prints a breakdown of failure types and their proportions to the terminal.


How to execute the transplanting process

As described in the paper, Pig includes a LLM-based approach. As LLM resources may not be available to everyone, we provide the LLM-generated code for each experiment run in llm_answer/ for reference. Based on the LLM-generated code, you can reproduce the transplanting process and results by running the code in src/synth/. Specifically,

Usage

python src/synth/main.py [OPTIONS]

CLI Arguments

Argument Type Default Description
--model str gemma Model to use
--option str default Execution option
--postprocess bool True Enable post-processing
--gumtree bool True Enable GumTree matching
--file str 1.json Target file to process

Available Models

Key Model
llama llama3.1-8b
gemma gemma2-9b
qwen qwen2-7b
deepseek deepseek-r1-32b
gemma3 gemma3-27b
qwen3 qwen3-32b
gptoss gpt-oss-20b

Options

  • default β€” standard synthesis pipeline of Pig
  • +slicing β€” synthesis only with program slicing (No API candidate)

Examples

# Show help
python src/synth/main.py --help

Basic Usage

Run with default settings (model: gemma, file: 1.json (unipath β†’ pathlib migration), all options enabled, which is the standard Pig pipeline):

python src/synth/main.py

Case Study: 177.json (requests β†’ aiohttp migration)

177.json is a library migration case between requests and aiohttp. The following examples demonstrate how each option affects the synthesis outcome.

βœ… Successful β€” full pipeline with GumTree matching and post-processing enabled:

python src/synth/main.py --model gptoss --file 177.json

❌ Failed β€” both GumTree matching and post-processing disabled:

python src/synth/main.py --model gptoss --file 177.json --gumtree False --postprocess False
  • In this configuration, Pig's AST-based matching is entirely disabled.
  • As a result, the system cannot establish a correspondence between requests API calls and their aiohttp counterparts.
  • In particular, aiohttp introduces an asynchronous request pattern (e.g., async with, session.request), which cannot be reliably handled.
  • With standard AST matching, the system fails to locate and transform the relevant request nodes, leading to a breakdown in the synthesis process (line 89).

❌ Failed β€” post-processing disabled only:

python src/synth/main.py --model gptoss --file 177.json --postprocess False
  • In this case, Pig's AST matching is enabled, so the system successfully maps requests calls to aiohttp APIs.
  • However, aiohttp requires additional structural components, such as creating and managing a ClientSession, rather than issuing standalone request calls.
  • Without post-processing, the system cannot introduce or propagate these required auxiliary constructs (e.g., session initialization and usage).
  • Consequently, the generated code is incomplete and fails to execute properly (line 89).

These results indicate that both Pig's AST matching and post-processing are crucial steps for successful synthesis in this case. Final synthesized code is printed to the terminal for each run.

How to run the LLM answering process

We currently only support ollama as the LLM backend. We plan to add support for more LLM backends in the future. You should have an ollama server running with the models you want to use before executing the LLM answering process. Also, make sure to update the envinronment variable OLLAMA_HOST to point to your ollama server if it's not running on the default http://localhost:11434.

To run the LLM answering process, you can execute the following command:

python src/llm/mapping_llama.py [OPTIONS]

CLI Arguments

Argument Type Default Description
--output_path str llm_answer/your_path.xlsx
--model str llama3.1:8b Model to use
--file list 1.json Target files to process
--b_api bool True Enable API candidate information in the prompt (True/False)

Extra Files to refer

LLM Prompts and Pre-generated Outputs

The prompts used for querying the LLM are located in prompt. This directory contains the prompt templates for all baselines, ablation settings, and \textsc{Pig}'s pipeline.

We also provide pre-generated LLM outputs in llm_answer, allowing users to reproduce the results without requiring access to external LLM services.

How to Extend the Benchmark Data

πŸ“¦ Adding a New Benchmark

Follow these steps to add a new migration benchmark (e.g., library A β†’ library B):

1. Fill in src/sample/sample.json

Edit sample.json with the appropriate values for your migration target:

{
    "libo": "unipath",
    "libn": "pathlib",
    "libo_path": "Unipath-master/unipath",
    "libn_path": "pathlib.py",
    "codeo": "src/sample/codeo.py",
    "apios": [
        "Path",
        "parent"
    ],
    "signos": {
        "Path": {
            "args": ["*args", "**kwargs"]
        },
        "parent": {
            "args": []
        }
    },
    "model": "gpt-oss:20b"
}
Field Description
libo Source library name
libn Target library name
libo_path Path to the source library implementation
libn_path Path to the target library implementation
codeo Path to the source code file to migrate
apios List of API names to migrate
signos Signature of each API (argument names)
model LLM model to use for migration

2. Prepare the source code to migrate

Save the code you want to migrate as:

src/sample/codeo.py

3. Add library repositories

Place both the source and target library implementations under src/mapping/repos/:

src/mapping/repos/
β”œβ”€β”€ unipath/
└── pathlib.py

4. Run the migration

python src/main.py src/sample/sample.json

Note: If an LLM API is not available, the pipeline will fall back to a predefined sample output instead of a real model response. This allows you to run and test the full pipeline without requiring API access.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors