DistillLens: Symmetric Knowledge Distillation Through Logit Lens

This is the official implementation of the paper DistillLens: Symmetric Knowledge Distillation Through Logit Lens.

Abstract

Standard Knowledge Distillation (KD) compresses Large Language Models (LLMs) by optimizing final outputs, yet it typically treats the teacher's intermediate layer's thought process as a black box. While feature-based distillation attempts to bridge this gap, existing methods (e.g., MSE and asymmetric KL divergence) ignore the rich uncertainty profiles required for the final output. In this paper, we introduce DistillLens, a framework that symmetrically aligns the evolving thought processes of student and teacher models. By projecting intermediate hidden states into the vocabulary space via the Logit Lens, we enforce structural alignment using a symmetric divergence objective. Our analysis proves that this constraint imposes a dual-sided penalty, preventing both overconfidence and underconfidence while preserving the high-entropy information conduits essential for final deduction. Extensive experiments on GPT-2 and Llama architectures demonstrate that DistillLens consistently outperforms standard KD and feature-transfer baselines on diverse instruction-following benchmarks.

Environment Setup

To get started, clone the repository and set up the required environment:

pip3 install git+https://github.com/t1101675/transformers@minillm
pip3 install torch
pip3 install deepspeed
pip3 install numerize
pip3 install rouge-score
pip3 install torchtyping
pip3 install rich
pip3 install accelerate
pip3 install datasets
pip3 install peft
pip3 install wandb

Method

Checkpoints

Create a checkpoints/ directory in the root of the proejct. Use Hugging-CLI to import the initial checkpoints:

#Student ckpts
huggingface-cli download gpt2 --repo-type model --local-dir checkpoints/gpt2-base
huggingface-cli download gpt2-medium --repo-type model --local-dir checkpoints/gpt2-medium
huggingface-cli download TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T --repo-type model --local-dir checkpoints/TinyLlama-1.1B 

#Teacher ckpts
huggingface-cli download MiniLLM/teacher-gpt2-1.5B --repo-type model --local-dir checkpoints/teacher-gpt2-1.5B
huggingface-cli download MiniLLM/SFT-Llama-7B --repo-type model --local-dir checkpoints/SFT-Llama-7B

Data Setup

Follow this link for dataset setup. Plain-text Corpus ($D_{PT}$) Setup section from that link can be ignored; not needed for our runs.

Training

GPT2

From the scripts/gpt2/distill_lens/train_all.sh file, update the student model argument as needed:

model="base"  # "base" or "medium"

Run training with:

bash scripts/gpt2/distill_lens/train_all.sh

TinyLlama

Run training with:

bash scripts/llama/distill_lens/train_all.sh

Evaluation

GPT2

From the scripts/gpt2/eval/run_eval.sh file, update the student model argument as needed:

CKPT_NAME="gpt2-base"   # "gpt2-base" or "gpt2-medium"

Run evaluation with:

bash scripts/gpt2/eval/run_eval.sh

TinyLlama

Run evaluation with:

bash scripts/llama/eval/run_eval.sh

Results

Our model achieves state-of-the-art performance on the multiple benchmarks.

Citation

If you find our work useful in your research:

@article{dhakal2026distilllens,
  title={DistillLens: Symmetric Knowledge Distillation Through Logit Lens},
  author={Dhakal, Manish and Jinadu, Uthman and Budathoki, Anjila and Sunderraman, Rajshekhar and Ding, Yi},
  journal={arXiv preprint arXiv:2602.13567},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
configs		configs
data_utils		data_utils
figures		figures
minillm		minillm
scripts		scripts
tools		tools
.deepspeed_env		.deepspeed_env
.env		.env
.gitignore		.gitignore
README.md		README.md
arguments.py		arguments.py
evaluate.py		evaluate.py
evaluate_exposure_bias.py		evaluate_exposure_bias.py
evaluate_main.py		evaluate_main.py
finetune.py		finetune.py
generate.py		generate.py
losses.py		losses.py
rouge_metric.py		rouge_metric.py
test.py		test.py
train_minillm.py		train_minillm.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DistillLens: Symmetric Knowledge Distillation Through Logit Lens

Abstract

Environment Setup

Method

Checkpoints

Data Setup

Training

GPT2

TinyLlama

Evaluation

GPT2

TinyLlama

Results

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DistillLens: Symmetric Knowledge Distillation Through Logit Lens

Abstract

Environment Setup

Method

Checkpoints

Data Setup

Training

GPT2

TinyLlama

Evaluation

GPT2

TinyLlama

Results

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages