JEPA-based clip mining for driving data. The repo is organized around one product question:
rank clips by likely human review value.
The training objective remains JEPA embedding prediction. The primary experiment outcome is ranking quality on a human-labeled benchmark. Cosine similarity remains a secondary model-health metric.
The public entrypoints are:
train.pyscore.pyevaluate.pyrun_experiment.py
The normal sequence is:
- Build or migrate a clip manifest with explicit
clip_id,split, andscene_id. - Train JEPA on unlabeled
trainclips. - Score held-out clips with a clip-level review-value score.
- Evaluate ranking quality against human review-value labels.
- Compare experiment quality and efficiency from the run summaries.
Each run writes:
config_resolved.yamltraining/summary.jsontraining/checkpoints/scoring/scores.jsonlscoring/summary.jsonevaluation/summary.jsonsummary.json
src/jepa/
config.py Config loading and runtime-profile resolution
pipeline.py train -> score -> evaluate stage runners
data/ Manifest loading, dataset classes, transforms
models/ VJEPA encoder and JEPA predictor model
training/ Training loop and embedding loss
evaluation/ Ranking metrics, cosine similarity, telemetry helpers
experiments/ Factorial design utilities
scripts/
build_manifest_from_frames.py
migrate_manifest.py
run_factorial.py
analyze_factorial.py
train.py
score.py
evaluate.py
run_experiment.py
Use Python 3.10-3.12 for this project. Python 3.13+ is not supported by the pinned PyTorch stack, and older Linux clusters commonly require the manylinux2014 wheels provided by the 2.5.x line.
uv syncOr with pip:
pip install -e .
pip install -e ".[dev]"The pipeline consumes JSONL clip manifests, not raw videos directly.
Expected v1 clip record:
{
"clip_id": "scene_001__CAM_FRONT__000000__000015",
"split": "train",
"scene_id": "scene_001",
"camera": "CAM_FRONT",
"frame_paths": [
"scene_001/CAM_FRONT/000000.jpg",
"scene_001/CAM_FRONT/000001.jpg"
],
"timestamps": ["000000", "000001"],
"metadata": {}
}Frame directory layout:
data/raw/my_dataset/
scene_001/
CAM_FRONT/
000000.jpg
000001.jpg
Build a manifest:
uv run python scripts/build_manifest_from_frames.py \
--frames-root data/raw/my_dataset \
--output data/manifests/my_manifest.jsonl \
--clip-length 16 \
--stride 16 \
--train-ratio 0.7 \
--val-ratio 0.15 \
--seed 42 \
--camera CAM_FRONTuv run python scripts/migrate_manifest.py \
--input data/manifests/clips_manifest.jsonl \
--output data/manifests/clips_manifest_v1.jsonl \
--train-ratio 0.7 \
--score-ratio 0.15 \
--seed 42The default config is configs/default.yaml.
Top-level sections:
datasetmodeltrainscoreevaluationruntimeexperiment
Important controls:
model.init_mode:pretrained,resume,scratchmodel.encoder_mode:frozen,finetuneruntime.profile:cpu,gpu,ddpdataset.training_manifest,dataset.validation_manifest,dataset.scoring_manifest,dataset.evaluation_manifestdataset.training_split,dataset.validation_split,dataset.scoring_split,dataset.evaluation_splitdataset.evaluation_labels
Inline overrides are supported with --set as JSON:
python3 run_experiment.py \
--config configs/default.yaml \
--run-dir experiments/runs/smoke_cpu \
--set '{"runtime":{"profile":"cpu","batch_size_overrides":{"train":1,"score":1,"evaluation":1}}}' \
--set '{"train":{"epochs":1}}'Train only:
uv run python train.py \
--config configs/default.yaml \
--run-dir experiments/runs/baselineScore only:
uv run python score.py \
--config configs/default.yaml \
--run-dir experiments/runs/baselineEvaluate only:
uv run python evaluate.py \
--config configs/default.yaml \
--run-dir experiments/runs/baselineFor manual binary labeling of an evaluation-labels JSONL, use the local review app:
uv run python scripts/review_labels.py \
--labels-path data/manifests/baseline_manifest_evaluation_labels.jsonl \
--manifest-path data/manifests/baseline_manifest.py \
--data-root data/raw/fileThen open http://127.0.0.1:8765 in a browser. The app saves label edits back into the JSONL in place. Keys: 1 positive, 0 negative, u clear, j next, k previous.
It renders each clip as an in-browser looping animation and also shows sampled still frames underneath.
Full experiment:
uv run python run_experiment.py \
--config configs/default.yaml \
--run-dir experiments/runs/baselinescore.py does not require human labels. It produces one clip-level score per clip:
review_value_scoremean_cosine_similaritytubelet_score_meantubelet_score_stdtubelet_count
In the current implementation:
review_value_score = 1 - cosine_similarity
This is a novelty proxy, not ground truth.
evaluate.py requires human labels keyed by clip_id. It joins model scores to benchmark labels and computes:
Precision@KRecall@KAverage PrecisionPR-AUCNDCG
Human label schema:
{
"clip_id": "scene_001__CAM_FRONT__000080__000095",
"review_value": "high_value",
"review_value_grade": 2,
"reason_codes": ["safety_critical_interaction"],
"reviewer_id": "rater_01",
"adjudicated_label": "high_value",
"agreement": 1.0
}The intended review-value classes are:
high_valuemedium_valuelow_value
Training resource stats are stored in:
RUN_DIR/training/summary.json
Scoring / inference resource stats are stored in:
RUN_DIR/scoring/summary.json
These summaries include:
- wall-clock time
- samples or clips per second
- latency mean / p50 / p95
- peak memory
- effective batch size
- device and runtime profile
- estimated energy for scoring
Use configs/factorial.yaml to sweep config factors across the new pipeline.
Run a batch:
uv run python scripts/run_factorial.py --config configs/factorial.yamlAnalyze a completed batch:
uv run python scripts/analyze_factorial.py \
--results experiments/factorial_runs/<date>/batch_<time>/results.jsonl \
--factorial-config configs/factorial.yamlEach batch writes:
design_matrix.jsonlresults.jsonlbatch_summary.jsonruns/<run_name>/...
uv run pytest tests/