Skip to content

chore: generalize eval-results LFS rule to catch versioned dumps#96

Merged
antoinezambelli merged 1 commit into
mainfrom
az/eval-lfs-hygiene
May 31, 2026
Merged

chore: generalize eval-results LFS rule to catch versioned dumps#96
antoinezambelli merged 1 commit into
mainfrom
az/eval-lfs-hygiene

Conversation

@antoinezambelli

Copy link
Copy Markdown
Owner

What

Generalize the eval-results Git LFS rule so every result-dump naming variant is captured.

- eval_results.jsonl       filter=lfs ...
- eval_results_rig*.jsonl   filter=lfs ...
+ eval_results*.jsonl        filter=lfs ...

Why

The two old patterns matched only the bare eval_results.jsonl and the now-retired rig-tagged naming (eval_results_rig*.jsonl). The current scheme is version-taggedeval_results_v0.6.0.jsonl, eval_results_v0.7.0.jsonl — which neither pattern matched:

$ git check-attr filter eval_results_v0.7.0.jsonl
eval_results_v0.7.0.jsonl: filter: unspecified

The two existing v-files are already in LFS (added when a matching git lfs track pattern still existed), but the rule protecting them is gone. So the next release dump (eval_results_v0.7.3.jsonl) would have committed as a plain ~67MB git blob baked permanently into history — the "accidental regular-git churn" this cleans up.

The fix

One glob — eval_results*.jsonl — subsumes bare, rig-tagged, and version-tagged forms, so no future variant slips through.

Verification

  • git check-attr filter now returns lfs for eval_results_v0.7.0.jsonl and the future eval_results_v0.7.3.jsonl.
  • Existing v-file pointers unchanged (LF, same OIDs/sizes: 67MB + 45MB); git lfs fsck --pointers = OK.
  • Diff is .gitattributes only — no code, no proxy overlap.

The two LFS patterns matched only 'eval_results.jsonl' and the now-dead
rig-tagged 'eval_results_rig*.jsonl' naming. The current scheme is
version-tagged ('eval_results_v0.7.0.jsonl'), which neither pattern
matched -- git check-attr reported 'unspecified' for those files, so the
next release dump (eval_results_v0.7.3.jsonl) would have committed as a
plain ~67MB git blob instead of an LFS pointer.

Collapse both into one glob 'eval_results*.jsonl' covering bare,
rig-tagged, and version-tagged variants. Existing v-file pointers are
unchanged; this only closes the gap for future dumps.
@antoinezambelli antoinezambelli merged commit f7fb366 into main May 31, 2026
2 checks passed
@antoinezambelli antoinezambelli deleted the az/eval-lfs-hygiene branch May 31, 2026 22:50
antoinezambelli added a commit that referenced this pull request Jun 1, 2026
#98)

The wheel target is scoped to src/forge, but the sdist target had no
configuration, so hatchling swept the entire working tree -- pulling the
LFS eval datasets (~112MB) and the eval dashboard's node_modules (~97MB,
including Windows .exe/.node binaries) into the source distribution.

Add a scoped [tool.hatch.build.targets.sdist] excluding both. The sdist
drops from ~26MB to ~690KB; src/forge, tests, docs, the dashboard source,
and the prebuilt results HTML are all retained. report.py rebuilds the
dashboard via npm on demand, so the committed node_modules was never
load-bearing; the eval datasets remain in the repo via LFS.

Also refresh a stale .gitignore comment to the versioned eval-results
naming (post #96).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant