This is the official implementation of the paper DistillLens: Symmetric Knowledge Distillation Through Logit Lens.
Standard Knowledge Distillation (KD) compresses Large Language Models (LLMs) by optimizing final outputs, yet it typically treats the teacher's intermediate layer's thought process as a black box. While feature-based distillation attempts to bridge this gap, existing methods (e.g., MSE and asymmetric KL divergence) ignore the rich uncertainty profiles required for the final output. In this paper, we introduce DistillLens, a framework that symmetrically aligns the evolving thought processes of student and teacher models. By projecting intermediate hidden states into the vocabulary space via the Logit Lens, we enforce structural alignment using a symmetric divergence objective. Our analysis proves that this constraint imposes a dual-sided penalty, preventing both overconfidence and underconfidence while preserving the high-entropy information conduits essential for final deduction. Extensive experiments on GPT-2 and Llama architectures demonstrate that DistillLens consistently outperforms standard KD and feature-transfer baselines on diverse instruction-following benchmarks.
To get started, clone the repository and set up the required environment:
pip3 install git+https://github.com/t1101675/transformers@minillm
pip3 install torch
pip3 install deepspeed
pip3 install numerize
pip3 install rouge-score
pip3 install torchtyping
pip3 install rich
pip3 install accelerate
pip3 install datasets
pip3 install peft
pip3 install wandbCreate a checkpoints/ directory in the root of the proejct. Use Hugging-CLI to import the initial checkpoints:
#Student ckpts
huggingface-cli download gpt2 --repo-type model --local-dir checkpoints/gpt2-base
huggingface-cli download gpt2-medium --repo-type model --local-dir checkpoints/gpt2-medium
huggingface-cli download TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T --repo-type model --local-dir checkpoints/TinyLlama-1.1B
#Teacher ckpts
huggingface-cli download MiniLLM/teacher-gpt2-1.5B --repo-type model --local-dir checkpoints/teacher-gpt2-1.5B
huggingface-cli download MiniLLM/SFT-Llama-7B --repo-type model --local-dir checkpoints/SFT-Llama-7B Follow this link for dataset setup. Plain-text Corpus (
From the scripts/gpt2/distill_lens/train_all.sh file, update the student model argument as needed:
model="base" # "base" or "medium"Run training with:
bash scripts/gpt2/distill_lens/train_all.shRun training with:
bash scripts/llama/distill_lens/train_all.shFrom the scripts/gpt2/eval/run_eval.sh file, update the student model argument as needed:
CKPT_NAME="gpt2-base" # "gpt2-base" or "gpt2-medium"Run evaluation with:
bash scripts/gpt2/eval/run_eval.shRun evaluation with:
bash scripts/llama/eval/run_eval.shOur model achieves state-of-the-art performance on the multiple benchmarks.
If you find our work useful in your research:
@article{dhakal2026distilllens,
title={DistillLens: Symmetric Knowledge Distillation Through Logit Lens},
author={Dhakal, Manish and Jinadu, Uthman and Budathoki, Anjila and Sunderraman, Rajshekhar and Ding, Yi},
journal={arXiv preprint arXiv:2602.13567},
year={2026}
}

