Mechanism of Task-oriented Information Removal in In-context Learning

This repo contains the official code for the following paper accepted at ICLR 2026:

Hakaze Cho, et al. "Mechanism of Task-oriented Information Removal in In-context Learning."

Implemented by Hakaze Cho, the primary contributor of the paper.

Overview

Abstract

In-context Learning (ICL) is an emerging few-shot learning paradigm based on modern Language Models (LMs), yet its inner mechanism remains unclear. In this paper, we investigate the mechanism through a novel perspective of information removal. Specifically, we demonstrate that in the zero-shot scenario, LMs encode queries into non-selective representations in hidden states containing information for all possible tasks, leading to arbitrary outputs without focusing on the intended task, resulting in near-zero accuracy. Meanwhile, we find that selectively removing specific information from hidden states by a low-rank filter effectively steers LMs toward the intended task. Building on these findings, by measuring the hidden states on carefully designed metrics, we observe that few-shot ICL effectively simulates such task-oriented information removal processes, selectively removing the redundant information from entangled non-selective representations, and improving the output based on the demonstrations, which constitutes a key mechanism underlying ICL. Moreover, we identify essential attention heads inducing the removal operation, termed Denoising Heads, which enables the ablation experiments blocking the information removal operation from the inference, where the ICL accuracy significantly degrades, especially when the correct label is absent from the few-shot demonstrations, confirming both the critical role of the information removal mechanism and denoising heads.

Summary figure

(A) A zero-shot query is encoded into a non-selective semantic representation containing all possible label information on various subspaces, making the output arbitrary among these labels. (B) Demonstrations help LM filter the label information, saving only the task-related one on the specific subspace (termed Task-Verbalization Subspace (TVS)), leading to the task-specific output. (C) We explicitly find the TVS by injecting a low-rank filter into the residual stream of zero-shot inputs, and train only the filter to drive the final outputs towards the ground-truth labels.

Setup

0. Requirement

A GPU with more than 48GB VRAM and CUDA (Ver. 12.4 recommended) are strongly required to run all the experiments.
A local strorage with more than 1TB free space is recommended.
Network connection to huggingface is needed to download the pre-trained model. And a huggingface user token with access to the Llama Family model is recommended to run a part of the experiments.
Anaconda or Miniconda is needed.

1. Clone this repo

git clone https://github.com/hc495/Verb_subspace.git
cd Verb_subspace

2. Environment Installation

conda env create -f environment.yaml
conda activate verb_subspace

Major Experiment code

The file main_experiments.py contains the major experiment code to reproduce the results in the paper. You can run different experiments by changing the arguments.

Arguments

Argument	Type	Default	Description
`--model_name`	str	Required	Path to the pretrained model
`--ICL_dataset_index`	int	-1	ICL dataset index, defined in StaICC. If -1, use ICL_dataset_name instead.
`--ICL_dataset_name`	str	None	ICL dataset name (only used when ICL_dataset_index is -1). Selection: 'capital', 'profession', 'translation'
`--huggingface_token`	str	None	Huggingface token for model access. Empty to use os.environ['HF_TOKEN'] or no token.
`--residual`	bool	False	Use residual connection in the injected filter
`--quantized`	bool	False	Use quantized model
`--injected_rank`	int	16	Rank of the injected filter
`--num_epochs`	int	4	Number of training epochs
`--injected_activation`	str	"none"	Activation function for the injected filter
`--injected_layer_num`	int	0	Layer number to inject the filter into
`--bias`	str	"both"	Bias for the injected filter (encoder, decoder, both, none)
`--save_path`	str	"logs"	Path to save the logs
`--open_end_test`	bool	False	Whether to perform open-end test
`--symbolic_label`	bool	False	Whether to use symbolic label
`--trained_autoencoder`	str	None	Path to a trained autoencoder to load
`--train_part`	str	"both"	Train on encoder, decoder, none, or both
`--icl_demo_numbers`	int	0	Number of ICL demos to use for training and testing
`--ood_test`	bool	False	Whether to perform OOD test. Only works for StaICC datasets.
`--id_test`	bool	False	Whether to perform in-domain test. Only works for StaICC datasets.
`--random_label_word_test`	bool	False	Whether to perform random label (noisy label) test. Only works for StaICC datasets.
`--hook`	bool	False	Enable hook mode
`--pre_test_only`	bool	False	Whether to only perform pre-test without training
`--no_pre_test`	bool	False	Whether to skip pre-test and directly train the model
`--output_hidden_states`	bool	False	Whether to output hidden states from the model
`--output_full_hidden_states`	bool	False	Whether to output full hidden states from the model
`--output_attentions`	bool	False	Whether to output attentions from the model
`--instruction`	str	None	Instruction to use for the model (if applicable). Typically space needed in the end.
`--ablated_heads`	dict	{}	Heads to ablate in the model. Format: {'layer_num': [head1, head2, ...]}
`--random_ablate_heads`	bool	False	Whether to randomly ablate heads in the model, with the same layer-wise amount as specified in --ablated_heads.
`--ablate_last_label`	bool	False	Whether to ablate the last label in the prompts. Only works for StaICC datasets.
`--estimate_filter`	bool	False	Whether to estimate the filter based on the model's hidden states. Will cover the trained_autoencoder.
`--saved_zero_shot_hs`	str	None	0-shot hidden states for the filter estimation
`--saved_few_shot_hs`	str	None	few-shot hidden states for the filter estimation
`--amplify_factor`	float	0.1	Amplification factor for the amplified_head
`--amplified_head`	dict	{}	Head to amplify in the model. Format: {'layer_num': head_num}

Repeat Experiments

The folder SOP contains the scripts to repeat all the major experiments in the paper. Move the scripts to the root folder and run them.

In detail:

Num	File Name	Experiment Description	Result in the Paper	Requirement
1	`Exp_1_0_low_rank_filter.sh`	Train the explicit TVP and test the evaluation accuracy.	Fig. 3	-
2	`Exp_1_1_symbolic.sh`	Train only one part of TVP (encoder/decoder) and test the evaluation accuracy with symbolic labels.	Table 1	Experiment 1
3	`Exp_1_2_captial.sh`	Train the explicit TVP on the capital dataset.	Fig. 12	-
4	`Exp_1_3_profession.sh`	Train the explicit TVP on the profession dataset.	Fig. 13	-
5	`Exp_1_4_encoding_magnitude.sh`	Analyze the encoding magnitude on various layers. Taken from https://github.com/hc495/ICL_Circuit, also my work so that no IP issues arise.	Fig. 3	-
6	`Exp_2_0_hidden_state_geo.sh`	Collect the hidden states on various k and l.	Fig. 4	-
7	`Exp_2_1_hidden_state_instruction.sh`	Collect the hidden states with instructions.	Fig. 5	-
8	`Exp_2_2_k_8_condition.sh`	Collect the hidden states under the unseen and random label settings.	Fig. 6	-
9	`Exp_3_0_head_ablation.sh`	Ablate each heads and re-evaluate the 2 metrics, also accuracy.	Fig. 7	-
10	`Exp_3_1_induction_magnitude.ipynb`	Calculate the induction head magnitude. Taken from https://github.com/hc495/ICL_Circuit, also my work so that no IP issues arise.	Fig. 9	-
11	`Exp_3_2_denoising_head_ablation_acc.sh`	Ablate the all the denoising heads and test the evaluation accuracy under various settings.	Table 2	Experiment 9, Visualization 3
12	`Exp_3_3_denoising_head_amp.sh`	Amplify the denoising heads and test the evaluation accuracy under various settings.	Fig. 14	Experiment 9, Visualization 3

Result Visualization

The folder main_visualization contains the code to visualize the results. You should redirect the path in these ipynbs to your saved log path before running them.

Num	File Name	Visualization Description	Result in the Paper	Requirement
1	`explicit_tvp_acc.ipynb`	Visualize the explicit TVP accuracy results.	Fig. 3, Fig. 25	Experiment 1, 4
2	`eccen_and_cov.ipynb`	Visualize the eccentricity and cov flux results.	Fig. 4, Fig. 5, Fig. 6	Experiment 1, 6, 7, 8
3	`head_ablation_visualization.ipynb`	Visualize the head ablation and denoising head results. Also, find the denosing heads.	Fig. 7	Experiment 1, 9

Citation

If you find this work useful for your research, please cite our paper:

@inproceedings{cho2026mechanism,
  title={Mechanism of Task-oriented Information Removal in In-context Learning},
  author={Hakaze Cho and Haolin Yang and Gouki Minegishi and Naoya Inoue},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026},
  url={https://openreview.net/forum?id=VAv1rrPR1A}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
SOP		SOP
experiment_matrials/datasets		experiment_matrials/datasets
injection_inference		injection_inference
main_visualization		main_visualization
util		util
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
encoding_magnitude_and_inner_dimensions.py		encoding_magnitude_and_inner_dimensions.py
environment.yaml		environment.yaml
main_experiments.py		main_experiments.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mechanism of Task-oriented Information Removal in In-context Learning

Overview

Abstract

Summary figure

Setup

0. Requirement

1. Clone this repo

2. Environment Installation

Major Experiment code

Arguments

Repeat Experiments

Result Visualization

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mechanism of Task-oriented Information Removal in In-context Learning

Overview

Abstract

Summary figure

Setup

0. Requirement

1. Clone this repo

2. Environment Installation

Major Experiment code

Arguments

Repeat Experiments

Result Visualization

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages