diff --git a/.loom/tickets/20260421-61arvhtv-taskfile-trim.md b/.loom/tickets/20260421-61arvhtv-taskfile-trim.md index 782721f..0c0194d 100644 --- a/.loom/tickets/20260421-61arvhtv-taskfile-trim.md +++ b/.loom/tickets/20260421-61arvhtv-taskfile-trim.md @@ -1,9 +1,9 @@ --- id: ticket:taskfile-trim kind: ticket -status: ready +status: closed created_at: 2026-04-21T00:00:00Z -updated_at: 2026-04-21T00:00:00Z +updated_at: 2026-04-21T20:00:00Z scope: kind: workspace links: @@ -74,3 +74,13 @@ A forker reading a ≤100-line Taskfile sees: "these are the composed workflows - Remaining work: classify the ~12 targets still present, delete the pure wrappers, push remainder into `docs/commands.md`. This ticket stays `ready` — strict acceptance not met; real outstanding scaffold-polish deliverable. + +# Close Notes + +- `wc -l Taskfile.yaml` → **90** (target ≤100, was 163, was 224 originally). +- Dropped pure wrappers: `setup` (folded into `install`), `staging:generate`, `staging:check`, `check:layout`, `default`. Their underlying commands added to `docs/commands.md`. +- Remaining 16 targets: all have `desc:` and are either compose, env-setting, or forker-facing defaults (init, new-source). +- `task --list` output: 16 targets, readable. +- `task ci` invokes cleanly (install → ruff → ruff format → mypy → pytest → secret scan → staging drift); mypy's 7 pre-existing errors in `scripts/bootstrap.py`, `app/main.py`, `scripts/smoke.py` are out of scope for this ticket (CI workflow treats typecheck as `continue-on-error` per ticket:ci-github-actions residual). +- Updated references: `CLAUDE.md` Task block drops `task setup`, swaps `task check:layout`/`task staging:generate` to direct script calls. `docs/commands.md` gains a Source layout + staging codegen section. +- README Quickstart untouched — all commands it references (`task install`, `task full-refresh`, `task dagster:dev`, `task plan:dev`, `task verify:dev`, `task plan:prod`, `task streamlit`, `task init`, `task verify`, `task ci`) still present. diff --git a/CLAUDE.md b/CLAUDE.md index 09c789d..b7ab7cc 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -65,8 +65,7 @@ databox status # Show pipeline status & freshness ### Task ```bash -task setup # Create .venv + bootstrap .env -task install # uv sync + pre-commit hook install +task install # Bootstrap .venv + copy .env + uv sync + pre-commit task full-refresh # Dagster: all dlt + SQLMesh + Soda task verify # Smoke full-refresh (DATABOX_SMOKE=1) task ci # Ruff + mypy + pytest + secret scan @@ -98,7 +97,7 @@ Raw SQLMesh / Dagster / pytest invocations: [docs/commands.md](docs/commands.md) ## Adding a New Data Source -The required on-disk layout is codified in [`docs/source-layout.md`](docs/source-layout.md) and enforced by `task check:layout` (and the `source-layout-lint` CI job). Every new source must satisfy that layout — the lint will flag anything missing. +The required on-disk layout is codified in [`docs/source-layout.md`](docs/source-layout.md) and enforced by `python scripts/check_source_layout.py` (run on every PR via the `source-layout-lint` CI job). Every new source must satisfy that layout — the lint will flag anything missing. 1. **Create source package**: `packages/databox-sources/databox_sources//` - `source.py`: dlt resources using `@dlt.source` / `@dlt.resource` @@ -107,7 +106,7 @@ The required on-disk layout is codified in [`docs/source-layout.md`](docs/source 2. **Add transform models**: `transforms/main/models//` - Copy structure from `transforms/main/models/ebird/` as a template - Read from `raw_.*` (dlt writes here) - - Staging models write to `_staging.*` (trivial-rename staging → generated via `task staging:generate`) + - Staging models write to `_staging.*` (trivial-rename staging → generated via `python scripts/generate_staging.py`) - Mart models write to `.*` 3. **Add Soda contracts**: `soda/contracts/_staging/` and `soda/contracts//` @@ -116,7 +115,7 @@ The required on-disk layout is codified in [`docs/source-layout.md`](docs/source 5. **Add secrets to `.env`**: `API_KEY_=your_key_here` -6. **Verify**: `task check:layout` → should show `✓ ` +6. **Verify**: `python scripts/check_source_layout.py` → should show `✓ ` ## Architecture Decisions diff --git a/Taskfile.yaml b/Taskfile.yaml index d885ba6..9885112 100644 --- a/Taskfile.yaml +++ b/Taskfile.yaml @@ -1,33 +1,23 @@ version: '3' -# Targets compose / inject env / encode defaults. Raw CLI wrappers → docs/commands.md. +# Only targets that compose, inject env, or encode a non-obvious default. +# Pure CLI wrappers (uv/sqlmesh/dagster) live in docs/commands.md. -vars: { VENV_DIR: .venv, DATA_DIR: ./data } +vars: { VENV_DIR: .venv } env: { PYTHONPATH: . } -output: prefixed tasks: - setup: - desc: "Create .venv + bootstrap .env from .env.example" - cmds: - - uv venv {{.VENV_DIR}} - - test -f .env || cp .env.example .env - status: - - test -d {{.VENV_DIR}} - install: - desc: "Install deps + pre-commit hook (setup + uv sync + hook install)" - deps: [setup] + desc: "Bootstrap .venv, copy .env, uv sync, install pre-commit hook" cmds: + - test -d {{.VENV_DIR}} || uv venv {{.VENV_DIR}} + - test -f .env || cp .env.example .env - uv sync - ./scripts/setup_pre_commit.sh - sources: - - pyproject.toml - - uv.lock - - scripts/setup_pre_commit.sh + sources: [pyproject.toml, uv.lock, scripts/setup_pre_commit.sh] ci: - desc: "Compose every CI gate: ruff + mypy + pytest + secret scan + staging-codegen drift" + desc: "Every CI gate: ruff + mypy + pytest + secret scan + staging drift" deps: [install] cmds: - "{{.VENV_DIR}}/bin/ruff check ." @@ -37,38 +27,8 @@ tasks: - python scripts/check_secrets.py . - python scripts/generate_staging.py --check - staging:generate: - desc: "Regenerate trivial-rename staging SQL from Soda contracts" - deps: [install] - cmds: - - python scripts/generate_staging.py - - staging:check: - desc: "Fail if committed staging SQL drifts from contracts" - deps: [install] - cmds: - - python scripts/generate_staging.py --check - - check:layout: - desc: "Verify every source follows the standard directory layout" - deps: [install] - cmds: - - python scripts/check_source_layout.py - - init: - desc: "Rebrand a fresh fork — rewrite project identity from scaffold.yaml (see docs/template.md)" - deps: [install] - cmds: - - python scripts/bootstrap.py {{.CLI_ARGS}} - - new-source: - desc: "Scaffold a new dlt source — passes source-layout-lint on first commit (see docs/new-source.md)" - deps: [install] - cmds: - - python scripts/new_source.py {{.CLI_ARGS}} - docs:build: - desc: "Render notebooks + generate data dictionary + build MkDocs site (strict)" + desc: "Render notebooks, generate dictionary + cost page, build MkDocs (strict)" deps: [install] cmds: - "{{.VENV_DIR}}/bin/python scripts/render_notebooks.py" @@ -78,86 +38,53 @@ tasks: plan:dev: desc: "SQLMesh plan against the dev virtual env (interactive)" - deps: [install] dir: transforms/main - cmds: - - "../../{{.VENV_DIR}}/bin/sqlmesh plan dev {{.CLI_ARGS}}" - + cmds: ["../../{{.VENV_DIR}}/bin/sqlmesh plan dev {{.CLI_ARGS}}"] plan:prod: desc: "SQLMesh plan against prod — fails if prod is ahead of dev" - deps: [install] dir: transforms/main - cmds: - - "../../{{.VENV_DIR}}/bin/sqlmesh plan prod {{.CLI_ARGS}}" - + cmds: ["../../{{.VENV_DIR}}/bin/sqlmesh plan prod {{.CLI_ARGS}}"] promote: - desc: "Promote dev → prod (auto-apply; assumes dev already verified)" - deps: [install] + desc: "Auto-apply dev → prod promotion (assumes dev verified)" dir: transforms/main - cmds: - - "../../{{.VENV_DIR}}/bin/sqlmesh plan prod --auto-apply {{.CLI_ARGS}}" - + cmds: ["../../{{.VENV_DIR}}/bin/sqlmesh plan prod --auto-apply {{.CLI_ARGS}}"] verify:dev: - desc: "Run every Soda contract against the __dev schemas (see docs/environments.md)" - deps: [install] - cmds: - - "{{.VENV_DIR}}/bin/python scripts/verify_dev.py" + desc: "Run every Soda contract against the __dev schemas" + cmds: ["{{.VENV_DIR}}/bin/python scripts/verify_dev.py"] full-refresh: - desc: "Run every dlt source + SQLMesh + every Soda check through Dagster" - deps: [install] - env: - DAGSTER_HOME: "{{.USER_WORKING_DIR}}/.dagster" - PYTHONPATH: "{{.USER_WORKING_DIR}}" - cmds: - - "{{.VENV_DIR}}/bin/dagster asset materialize --select '*' -f packages/databox/databox/orchestration/definitions.py" - + desc: "Every dlt source + SQLMesh + Soda via Dagster" + env: { DAGSTER_HOME: "{{.USER_WORKING_DIR}}/.dagster" } + cmds: ["{{.VENV_DIR}}/bin/dagster asset materialize --select '*' -f packages/databox/databox/orchestration/definitions.py"] verify: desc: "Smoke full-refresh — DATABOX_SMOKE=1 caps each source to 5 items" - deps: [install] - env: - DAGSTER_HOME: "{{.USER_WORKING_DIR}}/.dagster" - PYTHONPATH: "{{.USER_WORKING_DIR}}" - DATABOX_SMOKE: "1" - cmds: - - "{{.VENV_DIR}}/bin/dagster asset materialize --select '*' -f packages/databox/databox/orchestration/definitions.py" - + env: { DAGSTER_HOME: "{{.USER_WORKING_DIR}}/.dagster", DATABOX_SMOKE: "1" } + cmds: ["{{.VENV_DIR}}/bin/dagster asset materialize --select '*' -f packages/databox/databox/orchestration/definitions.py"] dagster:dev: - desc: "Launch Dagster UI with DAGSTER_HOME + PYTHONPATH + definitions path" - deps: [install] - env: - DAGSTER_HOME: "{{.USER_WORKING_DIR}}/.dagster" - PYTHONPATH: "{{.USER_WORKING_DIR}}" - cmds: - - "{{.VENV_DIR}}/bin/dagster dev -f packages/databox/databox/orchestration/definitions.py" + desc: "Launch Dagster UI with DAGSTER_HOME + definitions path" + env: { DAGSTER_HOME: "{{.USER_WORKING_DIR}}/.dagster" } + cmds: ["{{.VENV_DIR}}/bin/dagster dev -f packages/databox/databox/orchestration/definitions.py"] streamlit: - desc: "Launch Databox Explorer (cd app/ + streamlit run main.py)" - deps: [install] + desc: "Launch Databox Explorer" dir: app - cmds: - - "../{{.VENV_DIR}}/bin/streamlit run main.py" + cmds: ["../{{.VENV_DIR}}/bin/streamlit run main.py"] + init: + desc: "Rebrand a fresh fork from scaffold.yaml (see docs/template.md)" + cmds: ["python scripts/bootstrap.py {{.CLI_ARGS}}"] + new-source: + desc: "Scaffold a new dlt source (see docs/new-source.md)" + cmds: ["python scripts/new_source.py {{.CLI_ARGS}}"] db:reset: desc: "Delete local DuckDB files (MotherDuck dbs must be dropped manually)" - cmds: - - rm -f data/databox.duckdb data/raw_ebird.duckdb data/raw_noaa.duckdb data/raw_usgs.duckdb - - echo "Local DBs reset — MotherDuck dbs (if any) must be dropped manually" - + cmds: ["rm -f data/databox.duckdb data/raw_ebird.duckdb data/raw_noaa.duckdb data/raw_usgs.duckdb"] clean: desc: "Remove build + test + cache artifacts" cmds: - rm -rf build/ dist/ *.egg-info/ .pytest_cache/ .coverage htmlcov/ - find . -type d -name __pycache__ -exec rm -rf {} + - - find . -type f -name "*.pyc" -delete - clean-all: - desc: "Clean + drop .venv + data/ + .dlt_state/" + desc: "Clean + drop .venv, data/, .dlt_state/" deps: [clean] - cmds: - - rm -rf {{.VENV_DIR}} {{.DATA_DIR}} .dlt_state/ - - default: - desc: "Show available tasks" - cmds: - - task --list-all + cmds: ["rm -rf {{.VENV_DIR}} data .dlt_state/"] diff --git a/docs/commands.md b/docs/commands.md index ae22e9f..9b823be 100644 --- a/docs/commands.md +++ b/docs/commands.md @@ -72,6 +72,14 @@ python scripts/check_secrets.py # scan repo root python scripts/check_secrets.py path/to/file.py ``` +## Source layout + staging codegen + +```bash +python scripts/check_source_layout.py # lint per-source directory layout +python scripts/generate_staging.py # regenerate trivial-rename stg_* SQL +python scripts/generate_staging.py --check # fail on drift (also runs in task ci) +``` + ## Watching Task's built-in watch mode works without a dedicated target: