formula-code
diff --git a/‎AGENTS.md‎
Lines changed: 43 additions & 0 deletions b/‎AGENTS.md‎
Lines changed: 43 additions & 0 deletions
diff --git a/‎DATASET_CARD.md‎
Lines changed: 57 additions & 0 deletions b/‎DATASET_CARD.md‎
Lines changed: 57 additions & 0 deletions
@@ -0,0 +1,43 @@
+# Repository Guidelines
+
+This guide helps contributors work effectively in the datasmith repository.
+
+## Project Structure & Module Organization
+- Source: `src/datasmith/` — core modules: `agents/`, `docker/`, `scrape/`, `benchmark/`, `detection/`, `execution/`, `collation/`, `core/`.
+- Tests: `tests/` — pytest suites (e.g., `tests/test_docker_*`, `tests/agents/`).
+- Assets/Docs: `static/`, `docs/`.
+- Artifacts: `scratch/` (generated data), `dist/` (wheels). Do not commit contents.
+
+## Build, Test, and Development Commands
+- `make install` — create env with uv and install pre-commit.
+- `make check` — lock check, ruff lint/format, mypy, deptry.
+- `make test` — run pytest with coverage (XML for CI/Codecov).
+- `make build` — build wheel into `dist/`.
+- `uv run <cmd>` — run tools inside the env (e.g., `uv run pytest`).
+- `uvx tox -q` — run the tox matrix (py39–py312) if tox is installed.
+- Optional: `make backup` uses `tokens.env` for `BACKUP_DIR` rsync.
+- To run commands using the same environment variables as the user, use `uv run <command>`.
+
+## Coding Style & Naming Conventions
+- Python 3.9–3.12. 4‑space indentation, type hints required (mypy strict; see `pyproject.toml`).
+- Lint/format via Ruff (line length 120). Run `make check` before pushing.
+- Naming: modules/functions `snake_case`, classes `CamelCase`, constants `UPPER_SNAKE_CASE`.
+- Prefer `logging` (see `src/datasmith/logging_config.py`) over prints.
+
+## Testing Guidelines
+- Framework: pytest + pytest‑cov. Place tests in `tests/` named `test_*.py`.
+- Run locally: `make test` or `uv run pytest`.
+- Coverage: Codecov target 90% (see `codecov.yaml`). Add tests for new code paths.
+- Tests must be deterministic and offline; use fakes for network calls.
+
+## Commit & Pull Request Guidelines
+- History is informal; please use clear, present‑tense summaries, optionally prefixing a subsystem tag: `docker: prune dangling layers`, `agents: improve build plan`.
+- PRs must include: description, rationale, test coverage notes, and any docs updates. Link issues. For CLI/UX changes, include sample output or screenshots.
+- Ensure `make check` and `make test` pass; CI should be green.
+
+## Security & Configuration Tips
+- Create `tokens.env` (ignored) for `GH_TOKEN`, `CODECOV_TOKEN`, `CACHE_LOCATION`, `BACKUP_DIR`. Never commit secrets.
+- Docker tooling exists in `src/datasmith/docker/`; validate locally before pushing remote runs.
+
+## Agent‑Specific Instructions
+- Keep changes small and focused; update/cover adjacent tests. Follow this guide for all files under the repo root.
@@ -0,0 +1,57 @@
+![banner](static/formula-code-datasmith.svg)
+
+
+<p align="center">
+  <a href="https://formula-code.github.io/">
+    <img src="https://img.shields.io/badge/%F0%9F%8C%90%20Website-0A7A5E?style=for-the-badge" alt="FormulaCode Website">
+  </a>
+  <a href="https://example.com">
+    <img src="https://img.shields.io/badge/Paper-1F6FEB?style=for-the-badge&logo=arxiv&logoColor=white" alt="FormulaCode Paper">
+  </a>
+  <a href="https://formula-code.github.io/leaderboard/">
+    <img src="https://img.shields.io/badge/%F0%9F%93%88%20Leaderboard-EA580C?style=for-the-badge&logoColor=white" alt="FormulaCode Leaderboard">
+  </a>
+</p>
+
+[FormulaCode](https://formula-code.github.io/) is a *live* benchmark for evaluating the holistic ability of LLM agents to optimize codebases. FormulaCode consists of two parts: a [pipeline](https://github.com/formula-code/datasmith) to construct performance optimization tasks, and an [execution harness](https://github.com/formula-code/terminal-bench) that connects a language model to our terminal sandbox.
+
+This dataset contains **{total_rows}** enriched performance optimization tasks derived from real open-source Python projects, spanning {num_months} months of merged PRs.
+
+## Quick Start
+
+```python
+from datasets import load_dataset
+
+# Load all tasks
+ds = load_dataset("formulacode/formulacode-all")
+
+# Load only verified tasks (human-validated)
+ds = load_dataset("formulacode/formulacode-all", "verified")
+
+# Load tasks from a specific month
+ds = load_dataset("formulacode/formulacode-all", "2024-07")
+```
+
+## Configs
+
+| Config | Description | Tasks |
+|--------|-------------|-------|
+| `default` | All tasks | {total_rows} |
+{verified_row}
+| `YYYY-MM` | Tasks by PR merge month ({num_months} months available) | varies |
+
+## Key Columns
+
+| Column | Description |
+|--------|-------------|
+| `task_id` | Unique task identifier (e.g. `pandas-dev_pandas_1`) |
+| `repo_name` | Source repository (e.g. `pandas-dev/pandas`) |
+| `container_name` | Docker container reference (`<owner>-<repo>-<sha>:final`) |
+| `image_name` | Full Docker Hub image reference |
+| `difficulty` | Normalized difficulty: `easy`, `medium`, `hard` |
+| `classification` | Optimization type (e.g. `use_better_algorithm`, `micro_optimizations`) |
+| `patch` | Ground truth performance improvement patch |
+| `final_md` | Task instructions in markdown |
+| `pr_merged_at` | Date the PR was merged |
+| `pr_merge_commit_sha` | Merge commit SHA |
+| `pr_base_sha` | Base commit SHA |