This guide helps contributors work effectively in the datasmith repository.
- Source:
src/datasmith/— core modules:agents/,docker/,scrape/,benchmark/,detection/,execution/,collation/,core/. - Tests:
tests/— pytest suites (e.g.,tests/test_docker_*,tests/agents/). - Assets/Docs:
static/,docs/. - Artifacts:
scratch/(generated data),dist/(wheels). Do not commit contents.
make install— create env with uv and install pre-commit.make check— lock check, ruff lint/format, mypy, deptry.make test— run pytest with coverage (XML for CI/Codecov).make build— build wheel intodist/.uv run <cmd>— run tools inside the env (e.g.,uv run pytest).uvx tox -q— run the tox matrix (py39–py312) if tox is installed.- Optional:
make backupusestokens.envforBACKUP_DIRrsync. - To run commands using the same environment variables as the user, use
uv run <command>.
- Python 3.9–3.12. 4‑space indentation, type hints required (mypy strict; see
pyproject.toml). - Lint/format via Ruff (line length 120). Run
make checkbefore pushing. - Naming: modules/functions
snake_case, classesCamelCase, constantsUPPER_SNAKE_CASE. - Prefer
logging(seesrc/datasmith/logging_config.py) over prints.
- Framework: pytest + pytest‑cov. Place tests in
tests/namedtest_*.py. - Run locally:
make testoruv run pytest. - Coverage: Codecov target 90% (see
codecov.yaml). Add tests for new code paths. - Tests must be deterministic and offline; use fakes for network calls.
- History is informal; please use clear, present‑tense summaries, optionally prefixing a subsystem tag:
docker: prune dangling layers,agents: improve build plan. - PRs must include: description, rationale, test coverage notes, and any docs updates. Link issues. For CLI/UX changes, include sample output or screenshots.
- Ensure
make checkandmake testpass; CI should be green.
- Create
tokens.env(ignored) forGH_TOKEN,CODECOV_TOKEN,CACHE_LOCATION,BACKUP_DIR. Never commit secrets. - Docker tooling exists in
src/datasmith/docker/; validate locally before pushing remote runs.
- Keep changes small and focused; update/cover adjacent tests. Follow this guide for all files under the repo root.