- Command:
poetry run python category_classification/train_embedding_classifier.py --data ./lspc_dataset_full --out ./embedding_classifier_prodcat --resume ./embedding_classifier_prodcat/checkpoint-epoch3.pt --epochs 5 --batch_size 254 --grad_accum 2 --lr_schedule cosine --lr_warmup_steps 3000 --lr_min_scale 0.05 --save_every 1 --amp - Result: Validation macro-F1 ~0.932 / micro-F1 ~0.948 at epoch 3. Test language metrics highlight the English skew (~844k/952k rows):
enmacro-F1 0.948 / micro-F1 0.963, whilede0.809,fr0.827,nl0.768,it0.772. - Issue: Strong language imbalance - non-English classes lag despite solid overall accuracy.
- Command:
poetry run python category_classification/train_embedding_classifier_v2.py --data ./lspc_dataset_full --out ./embedding_classifier_prodcat_v2 --resume ./embedding_classifier_prodcat_v2/checkpoint-epoch2.pt --epochs 5 --batch_size 128 --grad_accum 4 --lr_schedule cosine --lr_warmup_steps 2000 --lr_min_scale 0.05 --label_smoothing 0.05 --focal_gamma 1.5 --amp - Result: Validation macro-F1 plateaued near 0.9316; test macro-F1 0.9300 / micro-F1 0.9477.
- Language metrics (test):
en0.944 /de0.748 /fr0.800 /es0.781 /ja0.827. - Next steps: Try language-aware sampling or per-language loss weights; consider a short non-English-only fine-tune; experiment with lower focal gamma.
- Command:
poetry run python category_classification/train_embedding_classifier_v2.py --data ./lspc_dataset_full --out ./embedding_classifier_prodcat_v2_gamma0 --epochs 8 --batch_size 256 --grad_accum 4 --lr_schedule cosine --lr_warmup_steps 2000 --lr_min_scale 0.01 --weight_decay 0.02 --classifier_hidden 1024 --classifier_dropout 0.2 --label_smoothing 0.05 --focal_gamma 0 --amp - Progress: Training reached epoch 6 (best validation macro-F1 ~0.9326 at epoch 4).
- Observation: Even with wider head, stronger dropout, and no focal loss, validation macro-F1 stayed roughly in the 0.932-0.933 band, suggesting we need more drastic measures (sampling/contrastive) to lift minority languages.
- Command:
poetry run python category_classification/train_embedding_classifier_v2.py --data ./lspc_dataset_full --out ./embedding_classifier_prodcat_v2_en --epochs 5 --batch_size 256 --grad_accum 2 --lr_schedule cosine --lr_warmup_steps 2000 --lr_min_scale 0.05 --weight_decay 0.01 --classifier_hidden 384 --classifier_dropout 0.1 --label_smoothing 0.05 --focal_gamma 1.5 --amp - Result: Validation macro-F1 peaked at 0.9327; test macro-F1 0.9315 / micro-F1 0.9487.
- Language metrics (test):
en0.946 /de0.753 /fr0.809 /es0.783 /ja0.830. - Notes: Confirms that focusing on English gives a small boost for EN while other languages stay roughly the same.