Accepted at: IEEE International Conference on Intelligent Transportation Systems (ITSC), 2025
Driver distraction is a major contributor to road traffic accidents worldwide. Real-world driving datasets are inherently long-tailed and imbalanced, making robust distraction detection particularly challenging.
This work introduces Clustered Feature Weighting (CFW), a label-free sampling strategy that leverages the geometric structure of learned feature embeddings to improve batch-level class balance and model robustness.
CFW transforms unsupervised structure in embedding space into training weights.
- Extract feature embeddings using a pretrained backbone (ViT / DINOv2 based ViT).
- Perform batch-wise HDBSCAN clustering in embedding space.
- Convert cluster density statistics into per-sample weights.
- Train with weighted random sampling using CFW-derived weights.
- Self-supervised learning significantly improves adaptability under distribution shift.
- CFW improves intra-batch class balance without using label priors.
- The method enhances accuracy, robustness, and generalization in imbalanced scenarios.
- SSL + CFW yields the most stable performance under long-tailed distributions.
Most imbalance mitigation techniques rely on:
- Explicit class frequency statistics
- Oversampling minority labels
- Static loss reweighting
CFW instead exploits the geometric structure of learned representations, enabling:
- Label-free imbalance mitigation
- Better adaptability to unlabeled or semi-supervised settings
- Improved robustness under distribution shift
This makes CFW particularly suitable for real-world driving datasets where annotations may be incomplete or noisy.
For a full reproduction run, use this order:
- Create conda virtual environments.
- Set runtime path environment variables (optional).
- Download and build datasets (Stages A-E) and perform EDA.
- Extract features and run feature/variance analysis.
- Compare dataloaders.
- Run vanilla grid search or BOHB with Ray on ViT-based or DINOv2-based encoders.
- Train/evaluate baseline and CFW models, including cross-modality and cross-view generalization on Drive&Act.
- Run Azure ML jobs for these steps if executing on Azure cloud (optional).
Clone and enter the repository:
git clone git@github.com:SurajBhar/cfw.git
cd cfw# DINOv2 environment
conda env create -f envs/environment_dinov2.yml
conda activate cfw_dinov2
pip install -e ".[azure,bayesian,dev]"
# ViT environment
conda env create -f envs/environment_vitrans.yml
conda activate cfw_vitrans
pip install -e ".[azure,bayesian,dev]"# DINOv2 Azure env
conda env create -f azureml/environments/conda_dinov2_azure.yml
conda activate cfw_dinov2_azure
pip install -e ".[azure,bayesian,dev]"
# ViT Azure env
conda env create -f azureml/environments/conda_vitrans_azure.yml
conda activate cfw_vitrans_azure
pip install -e ".[azure,bayesian,dev]"The project does not auto-load .env files. Source explicitly in each shell:
cat > .env.compute << 'ENV'
CFW_DRIVEACT_BINARY_SPLIT0_DIR=./data/processed/daa_binary_kinect_color/split_0
CFW_DRIVEACT_MULTICLASS_SPLIT0_DIR=./data/processed/daa_multiclass_kinect_color/split_0
CFW_STATEFARM_BINARY_DIR=./data/processed/statefarm_balanced_binary
CFW_STATEFARM_MULTICLASS_DIR=./data/processed/statefarm_balanced_multiclass
CFW_CFW_FEATURE_SPLIT0_DIR=./outputs/features_driveact_binary/features/train
CFW_CFW_MULTICLASS_FEATURE_SPLIT0_DIR=./outputs/features_driveact_multiclass/features/train
ENV
set -a
source .env.compute
set +apython scripts/data_ingestion/run_full_pipeline.py# A) Download raw data
python scripts/data_ingestion/download_data.py
# B) Build Drive&Act multiclass frame datasets
python scripts/data_ingestion/build_daa_dataset.py
# C) Derive Drive&Act binary datasets
python scripts/data_ingestion/derive_binary.py
# D) Build StateFarm balanced/imbalanced variants
python scripts/data_ingestion/build_statefarm.py
# E) Generate EDA reports
python scripts/data_ingestion/generate_eda.pyDrive&Act-only download (skip StateFarm and extra modalities):
python scripts/data_ingestion/download_data.py \
statefarm.enabled=false \
driveact.components.kinect_ir.enabled=false \
driveact.components.nir_right_top.enabled=false \
driveact.components.nir_front.enabled=false
# Generating only the Drive & Act Kinect Color dataset for split 0, Kinect IR Test Set, NIR Right Top and Front Top for Cross-modality and cross-view generalisation evaluation
python scripts/data_ingestion/build_daa_dataset.py \
'ingestion.input.annotations_dir=./data/raw/driveandact/iccv_activities_3s/activities_3s' \
'ingestion.splits.0.enabled=true' \
'ingestion.splits.1.enabled=false' \
'ingestion.splits.2.enabled=false' \
'ingestion.camera_views.kinect_color.enabled=true' \
'ingestion.camera_views.kinect_color.extract_splits=[train,val,test]' \
'ingestion.camera_views.kinect_ir.enabled=true' \
'ingestion.camera_views.kinect_ir.extract_splits=[test]' \
'ingestion.camera_views.nir_right_top.enabled=true' \
'ingestion.camera_views.nir_right_top.extract_splits=[test]' \
'ingestion.camera_views.nir_front.enabled=true' \
'ingestion.camera_views.nir_front.extract_splits=[test]'Fast EDA pass:
python scripts/data_ingestion/generate_eda.py \
processing.max_workers=4 \
integrity.enabled=false \
integrity.check_image_files=false
# Note : If your config is _global_, use the same command but remove ingestion. prefix from each override.
# Generate EDA only for Statefarm Imbalanced dataset:
python scripts/data_ingestion/generate_eda.py \
'ingestion.datasets.daa_multiclass_kinect_color.enabled=false' \
'ingestion.datasets.daa_binary_kinect_color.enabled=false' \
'ingestion.datasets.daa_multiclass_kinect_ir.enabled=false' \
'ingestion.datasets.daa_binary_kinect_ir.enabled=false' \
'ingestion.datasets.daa_multiclass_nir_right_top.enabled=false' \
'ingestion.datasets.daa_binary_nir_right_top.enabled=false' \
'ingestion.datasets.daa_multiclass_nir_front.enabled=false' \
'ingestion.datasets.daa_binary_nir_front.enabled=false' \
'ingestion.datasets.statefarm_balanced_multiclass.enabled=false' \
'ingestion.datasets.statefarm_balanced_binary.enabled=false' \
'ingestion.datasets.statefarm_imbalanced_multiclass.enabled=true' \
'ingestion.datasets.statefarm_imbalanced_binary.enabled=false' \
'ingestion.processing.max_workers=4' \
'ingestion.integrity.enabled=false'StateFarm smoke pool and splits:
python scripts/smoke/make_statefarm_smoke_pool.py \
--source-pool data/raw/statefarm_pool \
--dest-pool data/raw/statefarm_pool_smoke \
--per-class 80 \
--seed 42 \
--clear-dest
python scripts/data_ingestion/build_statefarm.py \
--config-name ingestion/statefarm_splits_smokeEnable imbalanced smoke variants:
python scripts/data_ingestion/build_statefarm.py \
--config-name ingestion/statefarm_splits_smoke \
ingestion.variants.imbalanced_multiclass.enabled=true \
ingestion.variants.imbalanced_binary.enabled=trueDrive&Act smoke CSV generation:
python scripts/smoke/make_driveact_smoke_csv.py \
--source-dir data/raw/driveandact/iccv_activities_3s/kinect_color \
--dest-dir data/raw/driveandact/iccv_activities_3s_smoke/kinect_color \
--split-id 0 \
--train-cap 120 --val-cap 40 --test-cap 40 \
--shuffle --ensure-class-coverageCFW requires pre-extracted features for the training split.
Important: keep extraction settings aligned with training (dataloader.batch_size, feature_extraction.shuffle, feature_extraction.drop_last) due to the coupled-batch metadata contract.
python scripts/extract_features.py \
experiment=cfw_dinov2_binary \
+split=train \
+output_dir=./outputs/features_driveact_binaryOutput files under ./outputs/features_driveact_binary/features/train/:
features.pkllabels.pklimage_paths.pklfeature_extraction_metadata.jsonfeature_metrics.json
python scripts/extract_features.py \
experiment=cfw_dinov2_binary \
+split=train \
+output_dir=./outputs/features_driveact_binary \
feature_analysis.normalize=l2 \
feature_analysis.nearest_centroid_mode=holdout \
feature_analysis.nearest_centroid_metric=cosine \
feature_analysis.knn_metric=cosinepython src/cfw/evaluation/feature_analysis.py \
./outputs/features_driveact_binary/features/train/features.pkl \
./outputs/features_driveact_binary/features/train/labels.pkl \
./outputs/features_driveact_binary/features/train/feature_metrics_recomputed.json# Run feature extraction Using `vit_h_14` pretrained Transformer based backbone (train split only)
python scripts/extract_features.py \
dataset=statefarm_multiclass \
model=vit_h_14 \
+split=train \
dataset.data_root=./data/processed_smoke/statefarm_imbalanced_multiclass_smoke \
+output_dir=./data/features/statefarm_imbalanced_multiclass_smoke/vit_h_14 \
dataloader.batch_size=16 \
hardware.num_workers=4 \
feature_extraction.shuffle=true \
feature_analysis.normalize=l2 \
feature_analysis.nearest_centroid_mode=holdout \
feature_analysis.nearest_centroid_metric=cosine \
feature_analysis.knn_metric=cosine
# Run feature extraction using `ResNet50` pretrained CNN based backbone (train split only)
python scripts/extract_features.py \
dataset=statefarm_multiclass \
model=resnet50 \
+split=train \
'dataset.data_root=./data/processed_smoke/statefarm_imbalanced_multiclass_smoke' \
'experiment.output_dir=./data/features/statefarm_imbalanced_multiclass_smoke/resnet50' \
'dataloader.batch_size=16' \
'hardware.num_workers=4' \
'feature_extraction.shuffle=true' \
'feature_analysis.large_scale_mode=false'Compare baseline vs CFW dataloader (KL divergence):
python scripts/eval_dataloader.py \
dataloader=cfw \
dataset=statefarm_multiclass \
'dataset.data_root=./data/processed_smoke/statefarm_imbalanced_multiclass_smoke' \
'dataloader.batch_size=16' \
'hardware.num_workers=4' \
'dataloader.cfw.train_feature_file=./data/features/statefarm_imbalanced_multiclass_smoke/vit_h_14/features/train/features.pkl' \
'dataloader.cfw.train_label_file=./data/features/statefarm_imbalanced_multiclass_smoke/vit_h_14/features/train/labels.pkl' \
'dataloader.cfw.train_img_path_file=./data/features/statefarm_imbalanced_multiclass_smoke/vit_h_14/features/train/image_paths.pkl' \
'+output_dir=./data/artifacts/statefarm_imbalanced_multiclass_smoke/vit_h_14/dataloader_eval'Optional: enable complex plots (heatmaps, weight-vs-frequency, Lorenz/Gini):
python scripts/eval_dataloader.py \
dataloader=cfw \
dataset=statefarm_multiclass \
'dataset.data_root=./data/processed_smoke/statefarm_imbalanced_multiclass_smoke' \
'dataloader.batch_size=16' \
'hardware.num_workers=4' \
'dataloader.cfw.train_feature_file=./data/features/statefarm_imbalanced_multiclass_smoke/vit_h_14/features/train/features.pkl' \
'dataloader.cfw.train_label_file=./data/features/statefarm_imbalanced_multiclass_smoke/vit_h_14/features/train/labels.pkl' \
'dataloader.cfw.train_img_path_file=./data/features/statefarm_imbalanced_multiclass_smoke/vit_h_14/features/train/image_paths.pkl' \
'dataloader_eval.plots.enable_complex=true' \
'+output_dir=./data/artifacts/statefarm_imbalanced_multiclass_smoke/vit_h_14/dataloader_eval_complex'Generate a persisted CFW weights table (Optional):
python scripts/generate_cfw_weights_table.py \
dataloader=cfw \
dataset=statefarm_multiclass \
+split=train \
dataset.data_root=./data/processed_smoke/statefarm_imbalanced_multiclass_smoke \
dataloader.cfw.train_feature_file=./data/features/statefarm_imbalanced_multiclass_smoke/vit_h_14/features/train/features.pkl \
dataloader.cfw.train_label_file=./data/features/statefarm_imbalanced_multiclass_smoke/vit_h_14/features/train/labels.pkl \
dataloader.cfw.train_img_path_file=./data/features/statefarm_imbalanced_multiclass_smoke/vit_h_14/features/train/image_paths.pkl \
dataloader.cfw.clustering_batch_size=16 \
dataloader.cfw.weighting.max_outlier_cluster_size=16 \
+output_dir=./data/artifacts/statefarm_imbalanced_multiclass_smoke/vit_h_14/cfwCompare baseline vs CFW dataloader behavior using configuration files (configs/experiment/cfw_dinov2_binary.yaml):
python scripts/eval_dataloader.py \
experiment=cfw_dinov2_binary \
+num_batches=100 \
+output_dir=./outputs/dataloader_eval_driveact_binary# Drive&Act multiclass baseline (DINOv2)
python scripts/train.py -m experiment=grid_search/baseline_dinov2_multiclass_lr_sweep
# Drive&Act multiclass CFW (DINOv2)
python scripts/train.py -m experiment=grid_search/cfw_dinov2_multiclass_lr_sweep
# Drive&Act multiclass baseline (ViT)
python scripts/train.py -m experiment=grid_search/baseline_vit_multiclass_lr_sweepFor CFW multiclass sweep:
export CFW_CFW_MULTICLASS_FEATURE_SPLIT0_DIR=/absolute/path/to/extracted/multiclass/features/trainUse the following baseline sweep command:
python scripts/train.py -m \
dataset=statefarm_multiclass \
model=vit_b_16 \
dataloader=baseline \
'dataset.data_root=./data/processed_smoke/statefarm_imbalanced_multiclass_smoke' \
'dataloader.batch_size=16' \
'hardware.num_workers=4' \
'trainer.num_epochs=10' \
'trainer.device=cuda:0' \
'optimizer=sgd' \
'optimizer.lr=0.0004,0.0003' \
'scheduler=linear_interpolation' \
'scheduler.start_lr=${optimizer.lr}' \
'scheduler.end_lr=0.0002' \
'scheduler.num_epochs=${trainer.num_epochs}' \
'experiment.name=stage_j_baseline_vit_b16_smoke'For the CFW sweep, rerun the same command with these changes:
dataloader=cfw- add
dataloader.cfw.train_feature_file=./data/features/statefarm_imbalanced_multiclass_smoke/vit_h_14/features/train/features.pkl - add
dataloader.cfw.train_label_file=./data/features/statefarm_imbalanced_multiclass_smoke/vit_h_14/features/train/labels.pkl - add
dataloader.cfw.train_img_path_file=./data/features/statefarm_imbalanced_multiclass_smoke/vit_h_14/features/train/image_paths.pkl - set
experiment.name=stage_j_cfw_vit_h14_smoke
python scripts/train.py -m \
dataset=statefarm_multiclass \
model=vit_b_16 \
dataloader=cfw \
'dataset.data_root=./data/processed_smoke/statefarm_imbalanced_multiclass_smoke' \
'dataloader.batch_size=16' \
'hardware.num_workers=4' \
'trainer.num_epochs=10' \
'trainer.device=cuda:0' \
'optimizer=sgd' \
'optimizer.lr=0.0004,0.0003' \
'scheduler=linear_interpolation' \
'scheduler.start_lr=${optimizer.lr}' \
'scheduler.end_lr=0.0002' \
'scheduler.num_epochs=${trainer.num_epochs}' \
'dataloader.cfw.train_feature_file=./data/features/statefarm_imbalanced_multiclass_smoke/vit_h_14/features/train/features.pkl' \
'dataloader.cfw.train_label_file=./data/features/statefarm_imbalanced_multiclass_smoke/vit_h_14/features/train/labels.pkl' \
'dataloader.cfw.train_img_path_file=./data/features/statefarm_imbalanced_multiclass_smoke/vit_h_14/features/train/image_paths.pkl' \
'experiment.name=stage_j_cfw_vit_h14_smoke'python scripts/bayesian_optimize.py \
+optimization=bayesian \
optimization.non_interactive=trueCluster-mode example:
python scripts/bayesian_optimize.py \
+optimization=bayesian \
optimization.non_interactive=true \
optimization.ray.mode=cluster \
optimization.ray.address=autoHere is a clean, copy-paste ready vertical table version for your README.md:
Use the same command template for both Drive&Act preset pairs.
| Parameter | Value |
|---|---|
BASE_EXP |
baseline_dinov2_binary |
CFW_EXP |
cfw_dinov2_binary |
BASE_RUN_DIR |
baseline_dinov2_vitb14_driveact_binary |
CFW_RUN_DIR |
cfw_dinov2_vitb14_driveact_binary |
FEATURE_OUT |
./outputs/features_driveact_binary |
FEATURE_ENV |
CFW_CFW_FEATURE_SPLIT0_DIR |
| Parameter | Value |
|---|---|
BASE_EXP |
baseline_vit_multiclass |
CFW_EXP |
cfw_vit_multiclass |
BASE_RUN_DIR |
baseline_vit_b_16_driveact_multiclass |
CFW_RUN_DIR |
cfw_vit_b_16_driveact_multiclass |
FEATURE_OUT |
./outputs/features_driveact_multiclass |
FEATURE_ENV |
CFW_CFW_MULTICLASS_FEATURE_SPLIT0_DIR |
# Fill these from one table above
BASE_EXP=<baseline_experiment>
CFW_EXP=<cfw_experiment>
BASE_RUN_DIR=<baseline_output_subdir>
CFW_RUN_DIR=<cfw_output_subdir>
FEATURE_OUT=<feature_output_dir>
FEATURE_ENV=<cfw_feature_env_var_name>
# Baseline train + test
python scripts/train.py experiment=${BASE_EXP} trainer.device=0
BASE_CKPT=$(find outputs/${BASE_RUN_DIR} -name "*_best.pth" | sort | tail -n 1)
python scripts/test.py experiment=${BASE_EXP} +checkpoint_path="${BASE_CKPT}" trainer.device=0
# CFW feature extraction + train + test
python scripts/extract_features.py experiment=${CFW_EXP} +split=train +output_dir=${FEATURE_OUT}
export ${FEATURE_ENV}=${FEATURE_OUT}/features/train
python scripts/train.py experiment=${CFW_EXP} trainer.device=0
CFW_CKPT=$(find outputs/${CFW_RUN_DIR} -name "*_best.pth" | sort | tail -n 1)
python scripts/test.py experiment=${CFW_EXP} +checkpoint_path="${CFW_CKPT}" trainer.device=0Using checkpoints from section 7.1 (BASE_CKPT, CFW_CKPT), evaluate on each target domain.
Target roots:
- Cross-view (Kinect IR):
./data/processed/daa_binary_kinect_ir/split_0 - Cross-modality (NIR Front):
./data/processed/daa_binary_nir_front/split_0
# Set once per target domain
TARGET_ROOT=<target_dataset_root>
TARGET_TAG=<kinect_ir_or_nir_front>
MODEL_FOR_EVAL=<model_used_for_training>
# Baseline checkpoint on target domain
python scripts/test.py \
dataset=driveact_binary \
model=${MODEL_FOR_EVAL} \
dataloader=baseline \
dataset.data_root=${TARGET_ROOT} \
+checkpoint_path="${BASE_CKPT}" \
trainer.device=cuda:0 \
experiment.name=eval_baseline_${TARGET_TAG}
# CFW checkpoint on target domain
python scripts/test.py \
dataset=driveact_binary \
model=${MODEL_FOR_EVAL} \
dataloader=baseline \
dataset.data_root=${TARGET_ROOT} \
+checkpoint_path="${CFW_CKPT}" \
trainer.device=cuda:0 \
experiment.name=eval_cfw_${TARGET_TAG}Use the following mapping:
- Binary:
dataset=statefarm_binary,dataset.data_root=./data/processed/statefarm_imbalanced_binary - Multiclass:
dataset=statefarm_multiclass,dataset.data_root=./data/processed/statefarm_imbalanced_multiclass
Baseline template:
python scripts/train.py \
dataset=<statefarm_binary_or_multiclass> \
model=vit_b_16 \
dataloader=baseline \
dataset.data_root=<statefarm_data_root> \
dataloader.batch_size=16 \
hardware.num_workers=4 \
trainer.num_epochs=80 \
trainer.device=cuda:0 \
optimizer=sgd \
optimizer.lr=0.0004 \
scheduler=linear_interpolation \
'scheduler.start_lr=${optimizer.lr}' \
scheduler.end_lr=0.0002 \
'scheduler.num_epochs=${trainer.num_epochs}' \
experiment.name=statefarm_baselineCFW template:
python scripts/extract_features.py \
dataset=<statefarm_binary_or_multiclass> \
model=vit_b_16 \
+split=train \
dataset.data_root=<statefarm_data_root> \
+output_dir=./data/features/statefarm_run/vit_b_16
python scripts/train.py \
dataset=<statefarm_binary_or_multiclass> \
model=vit_b_16 \
dataloader=cfw \
dataset.data_root=<statefarm_data_root> \
dataloader.batch_size=16 \
hardware.num_workers=4 \
trainer.num_epochs=80 \
trainer.device=cuda:0 \
optimizer=sgd \
optimizer.lr=0.0004 \
scheduler=linear_interpolation \
'scheduler.start_lr=${optimizer.lr}' \
scheduler.end_lr=0.0002 \
'scheduler.num_epochs=${trainer.num_epochs}' \
dataloader.cfw.train_feature_file=./data/features/statefarm_run/vit_b_16/features/train/features.pkl \
dataloader.cfw.train_label_file=./data/features/statefarm_run/vit_b_16/features/train/labels.pkl \
dataloader.cfw.train_img_path_file=./data/features/statefarm_run/vit_b_16/features/train/image_paths.pkl \
experiment.name=statefarm_cfwEvaluate with scripts/test.py by passing +checkpoint_path and matching dataset/model settings.
Feature extraction and CFW training via experiment presets:
Note: Models are configurable, you can switch between different models listed in the model registry or add custom models in the model registry as well.
# Extract Features
python scripts/extract_features.py \
experiment=statefarm_multiclass_smoke_extract_dinov2_vitb14
# Launch Training
python scripts/train.py \
experiment=statefarm_multiclass_smoke_cfw_dinov2_vitb14
# Launch Training using config overrides
python scripts/train.py \
experiment=statefarm_multiclass_smoke_cfw_dinov2_vitb14 \
model=vit_h_14 \
'experiment.name=stage_k_statefarm_smoke_cfw_vit_h14_multiclass' \
'dataloader.batch_size=16' \
'dataloader.cfw.clustering_batch_size=16' \
'dataloader.cfw.train_feature_file=./data/features/statefarm_imbalanced_multiclass_smoke/vit_h_14/features/train/features.pkl' \
'dataloader.cfw.train_label_file=./data/features/statefarm_imbalanced_multiclass_smoke/vit_h_14/features/train/labels.pkl' \
'dataloader.cfw.train_img_path_file=./data/features/statefarm_imbalanced_multiclass_smoke/vit_h_14/features/train/image_paths.pkl'
# Launch Testing using config overrides
python scripts/test.py \
dataset=statefarm_multiclass \
model=dinov2_vitb14 \
dataloader=baseline \
'dataset.data_root=./data/processed_smoke/statefarm_imbalanced_multiclass_smoke' \
'+checkpoint_path=./outputs/stage_k_statefarm_smoke_cfw_dinov2_multiclass/2026-02-19_14-33-44/checkpoints/stage_k_statefarm_smoke_cfw_dinov2_multiclass_best.pth' \
'trainer.device=cuda:0' \
'experiment.name=stage_k_statefarm_smoke_eval_cfw_dinov2_multiclass'
Prepare workspace config and environment registration:
cp azureml/config/workspace.env.template azureml/config/workspace.env
# Edit workspace.env values
set -a
source azureml/config/workspace.env
set +a
python scripts/azure/register_environments.pySubmit example jobs:
# Single GPU Mode
python scripts/azure/submit_job.py --file azureml/jobs/train_single_gpu.yaml --stream
python scripts/azure/submit_job.py --file azureml/jobs/train_cfw_single_gpu.yaml --stream
# DDP Mode (Multi Node DDP)
python scripts/azure/submit_job.py --file azureml/jobs/train_cfw_ddp_2nodes.yaml --dry-run
python scripts/azure/submit_job.py --file azureml/jobs/train_cfw_ddp_2nodes.yaml --stream
Requirements: sync local data/features and register as Azure data assets:
VER=20260219.1
AZUREML_DATASTORE_NAME=cfwdatastore
python scripts/azure/sync_local_to_datastore.py \
--source ./data/processed/statefarm_imbalanced_multiclass \
--target-path cfw/processed/statefarm_imbalanced_multiclass/v${VER} \
--overwrite
python scripts/azure/register_data_assets.py \
--name cfw_statefarm_imbalanced_multiclass \
--version ${VER} \
--path azureml://datastores/${AZUREML_DATASTORE_NAME}/paths/cfw/processed/statefarm_imbalanced_multiclass/v${VER}/Ray BOHB Azure job:
python scripts/azure/submit_job.py --file azureml/jobs/ray_bohb_distributed.yaml --stream- Training outputs:
outputs/<experiment_name>/<timestamp>/ - Checkpoints:
.../checkpoints/*_best.pth - Test results:
.../results/test_results.txt - Feature artifacts:
<output_dir>/features/<split>/ - Dataloader analysis:
<output_dir>/dataloader_statistics.txtand<output_dir>/plots/ - BOHB outputs:
./ray_results/<experiment_name>/
- Use
+for keys not defined in base Hydra config (for example+split,+output_dir,+checkpoint_path,+num_batches). - For CFW, keep feature extraction and training batch settings aligned (coupled-batch metadata contract).
.envfiles are not auto-loaded. Alwayssourcethem in the current shell.
Use CITATION.cff or:
@inproceedings{bhardwaj2025cfw,
title={Self-supervised Driver Distraction Detection for Imbalanced Datasets},
author={Bhardwaj, Suraj and Lerch, David J. and Martin, Manuel and Diederichs, Frederik and Stiefelhagen, Rainer},
booktitle={IEEE International Conference on Intelligent Transportation Systems (ITSC)},
year={2025}
}MIT License. See LICENSE.
