Skip to content

Latest commit

 

History

History
93 lines (71 loc) · 4.37 KB

File metadata and controls

93 lines (71 loc) · 4.37 KB

Task Requirement Document

Project Title:

SWE-Python-AI: A SWE-Bench-like benchmark for AI-related tasks in Python

Document Version:

Version 1.0
Date: 5 Nov 2024

Prepared By:

Duc-Manh Tran


1. Objective

SWE-Bench requires PyTest to be executable with each of its task instance (datapoint), which means that we need to specify installation for every versions there are.

Example:
For a task instance:

{
    "repo": "keras-team/keras",
    "pull_number": 20410,
    "instance_id": "keras-team__keras-20410",
    "issue_numbers": ["19740"],
    "base_commit": "0c2bdff313f7533f0d7e6670a906102cc2fb046d",
    "patch": "a very long string",
    "test_patch": "a very long string",
    "hints_text": "a very long string",
    "created_at": "2024-10-25T16:23:31Z"
}

we need to identify the version of the repository that associate with the base_commit and FAIL-TO-PASS and PASS-TO-PASS test cases of this instance.

Regard of versioning, for repository keras-team/keras, at version v3.6.0 (assume that this is the version that associate with commit 0c2bdff313f7533f0d7e6670a906102cc2fb046d), we need to identify the following information:

{
    "python": "3.9",
    "packages": "requirements.txt",
    "install": "pip install -e .",
    "pip_packages": ["pytest"],
    "pytest_cmd": "pytest keras/src/applications"
}

And for FAIL-TO-PASS and PASS-TO-PASS test cases of this instance, we need to manually compare the output of the PyTest.

2. Tasks & Responsibilities

Task ID Task Description Deadline Notes
T1 Versioning [Due Date] Run versioning scripts while looking for a better approach
T2 Run PyTest [Due Date] Read documents of the repository to specify information of each repository's version and create a script to validate that information
T3 Compare PyTest's outputs [Due Date] Run PyTest with above specified information on task instances to identify FAIL-TO-PASS and PASS-TO-PASS test cases

4. Deliverables

List the specific outcomes or products expected from the project, along with due dates.

Deliverable Description Due Date Notes
Repository's Specification Complete this file [Date]
F2P & P2P Identify FAIL-TO-PASS and PASS-TO-PASS test cases for every task instances [Date]
versioning A better approach on searching for version of a given commit [Date]

5. Timeline and Milestones

Provide an overview of the project timeline, including key milestones.

  • Kick-off Meeting: [Date]

6. Resource Requirements

  • Knowledge: Python, PyTest, basic Linux
  • Software/Tools: Kaggle

7. Communication Plan

  • Weekly Meetings: Every Monday
  • Status Reports: Due every Thusday