Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
8 changes: 8 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -229,3 +229,11 @@ scratch/pr_runs/qiskit_10651.md
scratch/pr_runs/qiskit_12869_final.md
scratch/pr_runs/qiskit_12869.md
.aider*

# Dataset artifacts (kept locally, not tracked)
dataset/

# Scratch artifacts (kept locally, not tracked)
scratch/example-repo/
scratch/merged_context_registry_*.json
scratch/artifacts/raw/online_dashboards.jsonl
43 changes: 43 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Repository Guidelines

This guide helps contributors work effectively in the datasmith repository.

## Project Structure & Module Organization
- Source: `src/datasmith/` — core modules: `agents/`, `docker/`, `scrape/`, `benchmark/`, `detection/`, `execution/`, `collation/`, `core/`.
- Tests: `tests/` — pytest suites (e.g., `tests/test_docker_*`, `tests/agents/`).
- Assets/Docs: `static/`, `docs/`.
- Artifacts: `scratch/` (generated data), `dist/` (wheels). Do not commit contents.

## Build, Test, and Development Commands
- `make install` — create env with uv and install pre-commit.
- `make check` — lock check, ruff lint/format, mypy, deptry.
- `make test` — run pytest with coverage (XML for CI/Codecov).
- `make build` — build wheel into `dist/`.
- `uv run <cmd>` — run tools inside the env (e.g., `uv run pytest`).
- `uvx tox -q` — run the tox matrix (py39–py312) if tox is installed.
- Optional: `make backup` uses `tokens.env` for `BACKUP_DIR` rsync.
- To run commands using the same environment variables as the user, use `uv run <command>`.

## Coding Style & Naming Conventions
- Python 3.9–3.12. 4‑space indentation, type hints required (mypy strict; see `pyproject.toml`).
- Lint/format via Ruff (line length 120). Run `make check` before pushing.
- Naming: modules/functions `snake_case`, classes `CamelCase`, constants `UPPER_SNAKE_CASE`.
- Prefer `logging` (see `src/datasmith/logging_config.py`) over prints.

## Testing Guidelines
- Framework: pytest + pytest‑cov. Place tests in `tests/` named `test_*.py`.
- Run locally: `make test` or `uv run pytest`.
- Coverage: Codecov target 90% (see `codecov.yaml`). Add tests for new code paths.
- Tests must be deterministic and offline; use fakes for network calls.

## Commit & Pull Request Guidelines
- History is informal; please use clear, present‑tense summaries, optionally prefixing a subsystem tag: `docker: prune dangling layers`, `agents: improve build plan`.
- PRs must include: description, rationale, test coverage notes, and any docs updates. Link issues. For CLI/UX changes, include sample output or screenshots.
- Ensure `make check` and `make test` pass; CI should be green.

## Security & Configuration Tips
- Create `tokens.env` (ignored) for `GH_TOKEN`, `CODECOV_TOKEN`, `CACHE_LOCATION`, `BACKUP_DIR`. Never commit secrets.
- Docker tooling exists in `src/datasmith/docker/`; validate locally before pushing remote runs.

## Agent‑Specific Instructions
- Keep changes small and focused; update/cover adjacent tests. Follow this guide for all files under the repo root.
57 changes: 57 additions & 0 deletions DATASET_CARD.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
![banner](https://raw.githubusercontent.com/formula-code/datasmith/main/static/formula-code-datasmith.svg)


<p align="center">
<a href="https://formula-code.github.io/">
<img src="https://img.shields.io/badge/%F0%9F%8C%90%20Website-0A7A5E?style=for-the-badge" alt="FormulaCode Website">
</a>
<a href="https://example.com">
<img src="https://img.shields.io/badge/Paper-1F6FEB?style=for-the-badge&logo=arxiv&logoColor=white" alt="FormulaCode Paper">
</a>
<a href="https://formula-code.github.io/leaderboard/">
<img src="https://img.shields.io/badge/%F0%9F%93%88%20Leaderboard-EA580C?style=for-the-badge&logoColor=white" alt="FormulaCode Leaderboard">
</a>
</p>

[FormulaCode](https://formula-code.github.io/) is a *live* benchmark for evaluating the holistic ability of LLM agents to optimize codebases. FormulaCode consists of two parts: a [pipeline](https://github.com/formula-code/datasmith) to construct performance optimization tasks, and an [execution harness](https://github.com/formula-code/terminal-bench) that connects a language model to our terminal sandbox.

This dataset contains **{total_rows}** enriched performance optimization tasks derived from real open-source Python projects, spanning {num_months} months of merged PRs.

## Quick Start

```python
from datasets import load_dataset

# Load all tasks
ds = load_dataset("formulacode/formulacode-all")

# Load only verified tasks (human-validated)
ds = load_dataset("formulacode/formulacode-all", "verified")

# Load tasks from a specific month
ds = load_dataset("formulacode/formulacode-all", "2024-07")
```

## Configs

| Config | Description | Tasks |
|--------|-------------|-------|
| `default` | All tasks | {total_rows} |
{verified_row}
| `YYYY-MM` | Tasks by PR merge month ({num_months} months available) | varies |

## Key Columns

| Column | Description |
|--------|-------------|
| `task_id` | Unique task identifier (e.g. `pandas-dev_pandas_1`) |
| `repo_name` | Source repository (e.g. `pandas-dev/pandas`) |
| `container_name` | Docker container reference (`<owner>-<repo>-<sha>:final`) |
| `image_name` | Full Docker Hub image reference |
| `difficulty` | Normalized difficulty: `easy`, `medium`, `hard` |
| `classification` | Optimization type (e.g. `use_better_algorithm`, `micro_optimizations`) |
| `patch` | Ground truth performance improvement patch |
| `final_md` | Task instructions in markdown |
| `pr_merged_at` | Date the PR was merged |
| `pr_merge_commit_sha` | Merge commit SHA |
| `pr_base_sha` | Base commit SHA |
Loading
Loading