data/coreis the 5–10GB canonical course data; it is not in git. Integrity snapshots live indata/core-integrityas.meta.sha256mirrors and should be regenerated wheneverdata/corechanges.- Course lists come from
data/core/list.txt; each course hascourses/<id>/list.txtplustracks/media. - Outputs are content-addressed: assets are renamed to their SHA-256 hash and stored under a two-character prefix directory;
all-courses.jsonis the stable index that points at those hashes.
- Core integrity helpers:
coreIntegrity(core)builds the.meta.sha256tree;verifyCoreIntegrity(core, integrity)diffs expected vs. stored and printscore integrity OKon success. - Packaging pipeline (p-limit to 8 concurrent operations):
- Remux each lesson via pinned
ghcr.io/jrottenberg/ffmpeg:8.0-alpineto metadata-free MP4 (remuxLesson/remuxToMp4); durations are read withffprobe. - Low-quality AAC mono variant per lesson (
lowQualityLesson/lowQualityTrack). - Metadata files (
<course>-meta.jsonandall-courses.json) includebuildVersion(currently 2), lesson durations, and file pointers{object, filesize, mimeType}; onlymp4andjsonMIME types are allowed. - Caching is explicit: pass a writable
materializedCacheDir; cache keys are hashed per operation and returned by the*Cachefunctions (packageAllCoursesCache,buildCoursePackageCache) to persist between runs.
- Remux each lesson via pinned
- Public Dagger functions for consumers: package a single course (
buildCoursePackage) or all courses (packageAllCourses), plus their cache-writer counterparts;baseUrldefaults tohttps://downloads.languagetransfer.org/cas.
- Install Dagger deps inside
.dagger:yarn install --cwd .dagger. - Integrity:
./create-core-integrity-data.shregenerates checksums;./check-core-integrity.shverifiesdata/corevsdata/core-integrity. - Builds:
./build.shproduces the full CAS dump;./build-cache.shmaterializes cached steps. Language-scoped variants (build-for-language.sh,build-cache-for-language.sh) exist for partial runs.
- TypeScript: ESM, 2-space indent, deterministic/pure Dagger functions; keep MIME map/pointer shapes consistent.
- Bash scripts start with
#!/usr/bin/env bash, useset -euo pipefail, and assume repo root (cd "$(dirname "$0")"). .meta.sha256files must mirror asset paths exactly; regenerate after any asset rename or addition.
- After modifying assets or Dagger logic, run
./check-core-integrity.shand expectcore integrity OK. - When changing packaging behavior, consider running
dagger call packageAllCourses --core data/core --materialized-cache-dir <dir>plus the corresponding*Cachecall to refresh cache artifacts.