Requesting a built-in Contrastive Representation Distillation (CRD) strategy (Tian et al., 2019) for tunix.distillation.DistillationTrainer.
What CRD does
Distills at the representation level using an InfoNCE/contrastive loss:
- positive pair: (student rep, teacher rep) from the same sample
- negatives: mismatched pairs (e.g., in-batch negatives)
Minimal form:
logits = (z_s @ z_t.T) / tau
labels = arange(B)
loss = CE(logits, labels) (optional symmetric term on logits.T)
Why this helps
- Complements / improves over logit-KD in many setups
Reference
Tian et al., Contrastive Representation Distillation, arXiv:1910.10699 (2019): https://arxiv.org/pdf/1910.10699
I can contribute a PR + tests if you’re open to it.
Requesting a built-in Contrastive Representation Distillation (CRD) strategy (Tian et al., 2019) for
tunix.distillation.DistillationTrainer.What CRD does
Distills at the representation level using an InfoNCE/contrastive loss:
Minimal form:
logits = (z_s @ z_t.T) / taulabels = arange(B)loss = CE(logits, labels)(optional symmetric term onlogits.T)Why this helps
Reference
Tian et al., Contrastive Representation Distillation, arXiv:1910.10699 (2019): https://arxiv.org/pdf/1910.10699
I can contribute a PR + tests if you’re open to it.