Skip to content

chore: scope sdist to exclude eval datasets and dashboard node_modules#98

Merged
antoinezambelli merged 1 commit into
mainfrom
az/sdist
Jun 1, 2026
Merged

chore: scope sdist to exclude eval datasets and dashboard node_modules#98
antoinezambelli merged 1 commit into
mainfrom
az/sdist

Conversation

@antoinezambelli

Copy link
Copy Markdown
Owner

What

Scope the sdist build so it stops bundling non-source artifacts.

Why

The wheel target is scoped (packages = ["src/forge"] → clean 115 KB), but there was no [tool.hatch.build.targets.sdist] config. With no scoping, hatchling sweeps the whole working tree into the sdist:

Uncompressed Files What
112 MB 2 eval_results_v*.jsonl — LFS benchmark datasets
97 MB 4,572 tests/eval/dashboard/node_modules/ — incl. Windows .exe/.node binaries
347 KB 41 src/forge — the actual package

So 96% of the file count and ~99.7% of the bytes were non-source artifacts — including platform binaries — in a Python library's PyPI source distribution. (node_modules is gitignored + untracked; hatchling's unscoped sdist sweep doesn't honor .gitignore, so it got vacuumed in anyway.)

Fix

Add a scoped sdist target excluding the two offenders. Nothing load-bearing is lost:

  • report.py rebuilds the dashboard via npm install + npm run build on demand, so the committed node_modules (Windows-only binaries) was never used by the report workflow.
  • The eval datasets remain in the repo via Git LFS; the dashboard source and the prebuilt docs/results/dashboard.html stay in the sdist.

Also refreshes a stale .gitignore comment to the versioned eval_results_v*.jsonl naming (post #96).

Verification

  • sdist 26 MB → 690 KB; entries 4,739 → 165.
  • node_modules and eval_results*.jsonl: 0 entries in the new sdist.
  • src/forge (41 files), README/LICENSE/CHANGELOG/pyproject, dashboard source, results HTML: all present.
  • twine check: PASSED on both wheel and sdist.
  • Wheel unchanged (115 KB).

The wheel target is scoped to src/forge, but the sdist target had no
configuration, so hatchling swept the entire working tree -- pulling the
LFS eval datasets (~112MB) and the eval dashboard's node_modules (~97MB,
including Windows .exe/.node binaries) into the source distribution.

Add a scoped [tool.hatch.build.targets.sdist] excluding both. The sdist
drops from ~26MB to ~690KB; src/forge, tests, docs, the dashboard source,
and the prebuilt results HTML are all retained. report.py rebuilds the
dashboard via npm on demand, so the committed node_modules was never
load-bearing; the eval datasets remain in the repo via LFS.

Also refresh a stale .gitignore comment to the versioned eval-results
naming (post #96).
@antoinezambelli antoinezambelli merged commit 92cbdc3 into main Jun 1, 2026
2 checks passed
@antoinezambelli antoinezambelli deleted the az/sdist branch June 1, 2026 06:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant