Mitigating Catastrophic Forgetting in Target Language Adaptation of LLMs via Source-Shielded Updates

This is the official repository for the paper "Mitigating Catastrophic Forgetting in Target Language Adaptation of LLMs via Source-Shielded Updates". We provide code and step-by-step instructions for reproducing our experiments.

NOTE: Throughout the repository, you will find the following placeholders:

your-hf-id: Your Hugging Face ID to create and push datasets.
/path/to/containers/: A directory to store downloaded container images.
/path/to/cache/: A directory to store cache files.
/path/to/envs/: A directory to create virtual environments.
/path/to/processed/data: A directory to store processed data.
/path/to/models/: A directory to store model checkpoints.
/path/to/logs/: A directory to store logs for training.
/path/to/mnt: A directory to bind-mount your data directory if needed.
your_openai_api_key: Your OpenAI API key.

Please replace these placeholders with the actual paths relevant to your environment.

Also, please make sure to log in to the Hugging Face Hub using huggingface-cli login before running any scripts that require access to the Hugging Face Hub.

We do not expect you to modify any code other than the placeholders to reproduce our results.

Installation

We recommend using a pre-built Docker image for easy setup. We use the official ROCm+PyTorch image for AMD GPUs and the apptainer tool for container management. To avoid compatibility issues, we create four different environments.

For preprocessing and training

# Download the container
mkdir -p /path/to/containers/
APPTAINER_CACHEDIR=/path/to/containers/
export APPTAINER_CACHEDIR
apptainer pull --dir $APPTAINER_CACHEDIR docker://rocm/pytorch:rocm6.4.1_ubuntu24.04_py3.12_pytorch_release_2.6.0

# Enable the ROCm environment
apptainer exec --fakeroot --bind /path/to/mnt:/path/to/mnt --rocm $APPTAINER_CACHEDIR/pytorch_rocm6.4.1_ubuntu24.04_py3.12_pytorch_release_2.6.0.sif /bin/bash

# Set configurations
export TRANSFORMERS_VERBOSITY=debug
export HF_HOME="/path/to/cache/"
export HF_HUB_CACHE="/path/to/cache/"
export HF_DATASETS_CACHE="/path/to/cache/"
export HF_DATASETS_TRUST_REMOTE_CODE=true

# Create an env
python3 -m venv --system-site-packages /path/to/envs/ssu_train
source /path/to/envs/ssu_train/bin/activate

# Install packages
pip install transformers==4.52.4 peft==0.15.2 datasets==3.6.0 evaluate scikit-learn sentencepiece==0.2.0 huggingface-hub tqdm pyarrow protobuf tiktoken==0.9.0 nltk==3.9.1 zstandard
mkdir -p ~/src
cd ~/src
git clone --depth 1 https://github.com/ROCm/flash-attention.git
cd flash-attention
MAX_JOBS=$((`nproc` - 1)) pip install -v . --no-build-isolation

# Clone our repository
cd ~/src
git clone https://github.com/gucci-j/ssu.git # or just download the zip file and unzip it

For evaluation with LightEval and AlapacaEval 2.0

# Enable the ROCm environment
apptainer exec --fakeroot --bind /path/to/mnt:/path/to/mnt --rocm $APPTAINER_CACHEDIR/pytorch_rocm6.4.1_ubuntu24.04_py3.12_pytorch_release_2.6.0.sif /bin/bash

# Set configurations
export TRANSFORMERS_VERBOSITY=debug
export HF_HOME="/path/to/cache/"
export HF_HUB_CACHE="/path/to/cache/"
export HF_DATASETS_CACHE="/path/to/cache/"
export HF_DATASETS_TRUST_REMOTE_CODE=true

# Create an env
python3 -m venv --system-site-packages /path/to/envs/ssu_lighteval
source /path/to/envs/ssu_lighteval/bin/activate

# Install packages
pip install transformers==4.52.4 peft==0.15.2 datasets==3.6.0 evaluate scikit-learn sentencepiece==0.2.0 huggingface-hub tqdm pyarrow protobuf tiktoken==0.9.0 nltk==3.9.1 zstandard langcodes

# Get LightEval
cd ~/src
git clone https://github.com/huggingface/lighteval
cd lighteval
git checkout 327071fe86e427d880f907a51d1462f4a3f951c1

# Apply some patches by copying the contents of the patches directory
cd ~/src/ssu/evaluation/src/patches
cp __init__.py ~/src/lighteval/src/lighteval/metrics/
cp adapter_model.py ~/src/lighteval/src/lighteval/models/transformers/
cp language.py ~/src/lighteval/src/lighteval/utils/
cp model_input.py ~/src/lighteval/src/lighteval/models/
cp prompt_manager.py ~/src/lighteval/src/lighteval/tasks/
cp requests.py ~/src/lighteval/src/lighteval/tasks/
cp transformers_model.py ~/src/lighteval/src/lighteval/models/transformers/
cp translation_literals.py ~/src/lighteval/src/lighteval/tasks/templates/utils/

# Install LightEval and AlpacaEval 2.0
cd ~/src/lighteval
pip install .
pip install alpaca-eval==0.6.6

For NVIDIA GPUs

We optionally use NVIDIA GPUs for evaluation. The following commands set up the environment for using NVIDIA GPUs.

mkdir -p /path/to/containers/
APPTAINER_CACHEDIR=/path/to/containers/
export APPTAINER_CACHEDIR
apptainer pull --dir $APPTAINER_CACHEDIR docker://nvcr.io/nvidia/pytorch:25.04-py3

apptainer exec \
    --bind /path/to/mnt:/path/to/mnt \
    --fakeroot \
    --nv $APPTAINER_CACHEDIR/pytorch_25.04-py3.sif \
    /bin/bash

python3 -m venv --system-site-packages /path/to/envs/ssu_lighteval
source /path/to/envs/ssu_lighteval/bin/activate

# Install packages
unset PIP_CONSTRAINT
pip install transformers==4.52.4 peft==0.15.2 datasets==3.6.0 evaluate scikit-learn sentencepiece==0.2.0 huggingface-hub tqdm pyarrow protobuf tiktoken==0.9.0 nltk==3.9.1 zstandard langcodes

# Clone our repository
mkdir -p ~/src
cd ~/src
git clone https://github.com/gucci-j/ssu.git # or just download the zip file and unzip it

# Get LightEval
git clone https://github.com/huggingface/lighteval
cd lighteval
git checkout 327071fe86e427d880f907a51d1462f4a3f951c1

# Apply some patches by copying the contents of the patches directory
cd ~/src/ssu/evaluation/src/patches
cp __init__.py ~/src/lighteval/src/lighteval/metrics/
cp adapter_model.py ~/src/lighteval/src/lighteval/models/transformers/
cp language.py ~/src/lighteval/src/lighteval/utils/
cp model_input.py ~/src/lighteval/src/lighteval/models/
cp prompt_manager.py ~/src/lighteval/src/lighteval/tasks/
cp requests.py ~/src/lighteval/src/lighteval/tasks/
cp transformers_model.py ~/src/lighteval/src/lighteval/models/transformers/
cp translation_literals.py ~/src/lighteval/src/lighteval/tasks/templates/utils/

# Install LightEval and AlpacaEval 2.0
cd ~/src/lighteval
pip install .
pip install alpaca-eval==0.6.6

After installation, you need to copy the evaluation config file for AlpacaEval 2.0 from ./evaluation/config/alpaca_eval_gpt4.1-nano.yml in this repository to the /your/env/path/to/site-packages/alpaca_eval/evaluators_configs/ directory.

For evaluation with lm-evaluation-harness

# Enable the ROCm environment
apptainer exec --fakeroot --bind /path/to/mnt:/path/to/mnt --rocm $APPTAINER_CACHEDIR/pytorch_rocm6.4.1_ubuntu24.04_py3.12_pytorch_release_2.6.0.sif /bin/bash

# Set configurations
export TRANSFORMERS_VERBOSITY=debug
export HF_HOME="/path/to/cache"
export HF_HUB_CACHE="/path/to/cache"
export HF_DATASETS_CACHE="/path/to/cache"
export HF_DATASETS_TRUST_REMOTE_CODE=true

# Create an env
python3 -m venv --system-site-packages /path/to/envs/ssu_lmeval
source /path/to/envs/ssu_lmeval/bin/activate

# Install packages
pip install transformers==4.52.4 peft==0.15.2 datasets==3.6.0 evaluate scikit-learn sentencepiece==0.2.0 huggingface-hub tqdm pyarrow protobuf tiktoken==0.9.0 nltk==3.9.1 zstandard
cd ~/src
git clone --depth 1 --branch v0.4.8 https://github.com/EleutherAI/lm-evaluation-harness.git
cd lm-evaluation-harness
pip install ".[math,ifeval,sentencepiece]"

For NVIDIA GPUs

apptainer exec \
    --bind /path/to/mnt:/path/to/mnt \
    --fakeroot \
    --nv $APPTAINER_CACHEDIR/pytorch_25.04-py3.sif \
    /bin/bash

python3 -m venv --system-site-packages /path/to/envs/ssu_lmeval
source /path/to/envs/ssu_lmeval/bin/activate

# Install packages
unset PIP_CONSTRAINT
pip install transformers==4.52.4 peft==0.15.2 datasets==3.6.0 evaluate scikit-learn sentencepiece==0.2.0 huggingface-hub tqdm pyarrow protobuf tiktoken==0.9.0 nltk==3.9.1 zstandard
cd ~/src/
git clone --depth 1 --branch v0.4.8 https://github.com/EleutherAI/lm-evaluation-harness.git
cd lm-evaluation-harness
pip install ".[math,ifeval,sentencepiece]"

For safety evaluation

#!/bin/bash

# Download the container
mkdir -p /path/to/containers/
APPTAINER_CACHEDIR=/path/to/containers/
export APPTAINER_CACHEDIR
apptainer pull --dir $APPTAINER_CACHEDIR docker://rocm/vllm:rocm6.4.1_vllm_0.9.1_20250702

# Enable the ROCm environment
apptainer exec --fakeroot --bind /path/to/mnt:/path/to/mnt --rocm $APPTAINER_CACHEDIR/vllm_rocm6.4.1_vllm_0.9.1_20250702.sif /bin/bash

# Create a virtual environment
python3 -m venv --system-site-packages /path/to/envs/ssu_vllm
source /path/to/envs/ssu_vllm/bin/activate

# Set configurations
export TRANSFORMERS_VERBOSITY=debug
export HF_HOME="/path/to/cache/"
export HF_HUB_CACHE="/path/to/cache/"
export HF_DATASETS_CACHE="/path/to/cache/"
export HF_DATASETS_TRUST_REMOTE_CODE=true

# Install packages
cd ~/src
git clone https://github.com/nouhadziri/safety-eval-fork.git
cd safety-eval-fork
git checkout 2920bb85a8a8390144b9256f697395f81b94822e
pip install -e .
pip install fire>=0.7.1 tenacity>=9.1.2 fastchat>=0.1.0 scikit-learn>=1.7.1

After installation, you need to obtain access to the following Hugging Face Hub repositories for evaluation:

Note that we did not use NVIDIA GPUs for safety alignment evaluation. Therefore, we cannot provide the corresponding step-by-step instructions.

Preprocessing

Training data

To preprocess training data, you can use the generate_cpt_data.py script. The following example demonstrates how to run the script for a specific model and language code:

Click here to expand

#!/bin/bash

source /path/to/envs/ssu_train/bin/activate

export TRANSFORMERS_VERBOSITY=debug
export HF_HOME="/path/to/cache/"
export HF_HUB_CACHE="/path/to/cache/"
export HF_DATASETS_CACHE="/path/to/cache/"
export HF_DATASETS_TRUST_REMOTE_CODE=true

model_name=$1
lang_code=$2
if [ -z "$model_name" ] || [ -z "$lang_code" ]; then
    echo "Usage: $0 <model_name> <lang_code>"
    echo "Example: $0 allenai/OLMo-2-1124-7B-Instruct amh_Ethi"
    exit 1
fi
if [ "$lang_code" == "amh_Ethi" ]; then
    short_lang_code="am"
elif [ "$lang_code" == "hau_Latn" ]; then
    short_lang_code="ha"
elif [ "$lang_code" == "ibo_Latn" ]; then
    short_lang_code="ig"
elif [ "$lang_code" == "npi_Deva" ]; then
    short_lang_code="ne"
elif [ "$lang_code" == "kir_Cyrl" ]; then
    short_lang_code="ky"
else
    echo "Unsupported language code: $lang_code"
    exit 1
fi
if [ "$model_name" == "allenai/OLMo-2-1124-7B-Instruct" ]; then
    model_abbrev="OLMo-2-1124-7B-Instruct"
else
    echo "Unsupported model name: $model_name"
    exit 1
fi

cd ~/src/ssu/preprocessing/src
python generate_cpt_data.py \
    --lang_code $short_lang_code \
    --output_dir "/path/to/processed/data/${model_abbrev}_${short_lang_code}" \
    --cache_dir "/path/to/cache/" \
    --tokenizer_name_or_path "${model_name}" \
    --num_workers 31 \
    --max_length 512

Calibration data

To generate the calibration data, you can use the generate_calibration_data.py script. This script will create a calibration dataset based on the specified model. The following example demonstrates how to run the script for a specific model:

Click here to expand

#!/bin/bash

source /path/to/envs/ssu_train/bin/activate

export TRANSFORMERS_VERBOSITY=debug
export HF_HOME="/path/to/cache/"
export HF_HUB_CACHE="/path/to/cache/"
export HF_DATASETS_CACHE="/path/to/cache/"
export HF_DATASETS_TRUST_REMOTE_CODE=true

model_name=$1
if [ -z "$model_name" ]; then
    echo "Usage: $0 <model_name>"
    echo "Example: $0 allenai/OLMo-2-1124-7B-Instruct"
    exit 1
fi
if [ "$model_name" == "allenai/OLMo-2-1124-7B-Instruct" ]; then
    model_abbrev="OLMo-2-1124-7B-Instruct"
else
    echo "Unsupported model name: $model_name"
    exit 1
fi

cd ~/src/ssu/preprocessing/src
python generate_calibration_data.py \
    --output_dir "/path/to/processed/data/${model_abbrev}_calib" \
    --cache_dir "/path/to/cache/" \
    --dataset_name allenai/tulu-3-sft-olmo-2-mixture \
    --split train \
    --num_samples 2000 \
    --tokenizer_name_or_path "${model_name}" \
    --num_workers 8 \
    --block_size 2048 \
    --shuffle \
    --streaming

Note that num_samples specifies the number of raw samples to construct calibration data. This is not the same as the number of processed samples that will be used for the actual calibration. Also, it does not always necessarily need 2000 samples for constructing 500 calibration samples. We can tailor this number based on the specific requirements of the calibration process.

For the ablation analysis of using the Alpaca data, you can refer to generate_calibration_data_alpaca.sh. Basically, just change dataset_name to tatsu-lab/alpaca.

Evaluation data

To generate summarization and machine translation evaluation data, you can use the generate_sum_data.py and generate_mt_data.py scripts, respectively. The rest of the evaluation tasks do not need preprocessing. The following example demonstrates how to run each script:

Click here to expand

#!/bin/bash

source /path/to/envs/ssu_train/bin/activate

export TRANSFORMERS_VERBOSITY=debug
export HF_HOME="/path/to/cache/"
export HF_HUB_CACHE="/path/to/cache/"
export HF_DATASETS_CACHE="/path/to/cache/"
export HF_DATASETS_TRUST_REMOTE_CODE=true

cd ~/src/ssu/preprocessing/src
mkdir -p /path/to/outputs/

# Generate MT data
python generate_mt_data.py \
    --output_dir "/path/to/outputs" \
    --cache_dir "/path/to/cache/" \
    --repo_id your-hf-id/flores-ssu

# Generate SUM data
lang_codes=(
    "ig"
    "ha"
    "ky"
    "ne"
    "am"
    "en"
)
for lang_code in "${lang_codes[@]}"; do
    python generate_sum_data.py \
        --output_dir "/path/to/outputs/" \
        --cache_dir "/path/to/cache/" \
        --repo_id your-hf-id/sum-${lang_code}-ssu \
        --lang_code ${lang_code} \
        --tokenizer_name_or_path "allenai/OLMo-2-1124-7B-Instruct"
done

Continual pre-training

To train the model using the tokenized data, you can use the main.py script. This script will handle the training process, including loading the data, setting up the model, and running the training loop.

Supported training strategies

Full fine-tuning (FFT): Fine-tune all parameters in the model.
Half fine-tuning (HFT): A state-of-the-art static selective parameter update approach that updates exactly 50% of parameters using a fine-grained, per-layer strategy. Its freezing strategy is as follows: (1) for self-attention, it randomly freezes two of the four matrices ($W_Q, W_K, W_V, W_O$); (2) for feed-forward layers, it freezes two of three matrices ($W_{up}, W_{down}, W_{gate}$) in a random half of the layers and one matrix in the remaining half. This is based on https://aclanthology.org/2025.acl-long.626/.
Gradient-Mask Tuning (GMT): A state-of-the-art dynamic selective parameter update approach that drops gradients of a pre-defined ratio (50% in this study for fair comparison with HFT and SSU) with smaller absolute values on the target data. This is based on https://ojs.aaai.org/index.php/AAAI/article/view/34621.
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs (LoTA): A static selective parameter update approach that calibrates mask then sparsely fine-tunes only selected weights. This is based on https://arxiv.org/abs/2406.16797.
S2FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity (S2FT): A static selective parameter update approach that selects FFN channels, permutes coupled weights, and fine-tunes only the connected submatrices. This is based on https://arxiv.org/abs/2412.06289.
AdaLoRA: An architecture-based method to mitigate catastrophic forgetting for reference. This achieves the best overall performance across different LoRA-like approaches in the HFT paper.
SSU-Rand (Random): Freezes an equal number of randomly-selected columns to verify that the importance-based selection in SSU is meaningful.
SSU-Mag (Magnitude): Freezes columns based only on the magnitude score (i.e., $|\theta_{ij}|$), isolating the effect of the activation term.
SSU-Wanda: The primary proposed method that implements SSU in the paper. Freezes columns based on the Wanda score.
SSU-SparseGPT (SparseGPT): An alternative version of SSU that uses SparseGPT for computing importance scores. This is used only for ablation analysis.
SSU-Fisher (Fisher): An alternative version of SSU that uses the Fisher information matrix for computing importance scores. This is used only for ablation analysis.

Example job scripts

Here are example job scripts for each training strategy.

OLMo-2-7B-1124-Instruct

For ablation analysis

Freezing ratio
Freezing ratio analysis on baselines
- HFT
- GMT
Freezing methods
- Row-wise
- Element-wise
Calibration data (Alpaca)
Calibration data size
Different importance scoring methods
- SparseGPT
- Fisher
Additional baselines
- LoTA (90% Sparsity)
- LoTA (50% Sparsity) (Please provide 0.5 as the second argument when running the script.)
- LoTA (Sparsity ablation; Appendix) (Please provide the desired sparsity as the second argument when running the script.)
- S2FT (Down)
- S2FT (Down & Output; Appendix)
- S2FT (Sparsity ablation (rank = 16); Appendix)
- S2FT (Sparsity ablation (rank = 32); Appendix)
- S2FT (Sparsity ablation (rank = 64); Appendix)

OLMo-2-13B-1124-Instruct

On converting S2FT permuted checkpoints

To convert S2FT permuted checkpoints back to the original weight format for evaluation, you can use the convert_s2_to_linear.py script. Here is an example of how to run the script:

Click here to expand

#!/bin/bash

source /path/to/envs/ssu_train/bin/activate

# Configs
export TRANSFORMERS_VERBOSITY=debug
export HF_HOME="/path/to/cache"
export HF_HUB_CACHE="/path/to/cache"
export HF_DATASETS_CACHE="/path/to/cache"
export HF_DATASETS_TRUST_REMOTE_CODE=true
model_abbrev="OLMo-2-1124-7B-Instruct"
lang_code="$1"
approach="$2"
output_dir="/path/to/models/${model_abbrev}-${lang_code}-${approach}/checkpoint-12208"
model_name_or_path="allenai/OLMo-2-1124-7B-Instruct"

cd ~/src/ssu/training/src
python utils/convert_s2_to_linear.py \
    --input "${output_dir}" \
    --output "${output_dir}_converted/" \
    --dtype bf16 \
    --reconstruct-s2 \
    --selections ~/src/ssu/training/src/${approach}.json

For selections, please provide the corresponding selection results, which can be found in a training log file (i.e., the second output of s2ft_enable() function calls) and save it as a JSON file under the training/src/ directory. The file name should match the approach argument (e.g., s2ft_down.json for S2FT (Down)).

Evaluation

We use various benchmarks to evaluate performance of each approach.

Chat and instruction-following benchmarks: IFEval, AlpacaEval 2.0, MTBench, and GSM8K.
Safety benchmarks: Tulu 3 Safety Evaluation Suite. Note that the safety evaluation requires GPT-4 API access. See allenai/open-instruct#500 for more details.
English benchmarks: MMLU (Acc), Belebele (Acc), SUM (chrF++), MT (chrF)
Target language benchmarks: Global MMLU (Acc), Belebele (Acc), SUM (chrF++), MT (chrF++)

Example job scripts

Similar to the training scripts, we provide evaluation scripts for each model variant. Some evaluations like Safety and AE2 require access to OpenAI APIs.

Source model evaluation

7B adapted model evaluation

You can also use the above same scripts for ablation analysis by specifying each variant name as an argument. (e.g., bash ./evaluation/scripts/7b.sh ig ssu_alpaca 12208 1 to run the 7B evaluation with the Alpaca calibration data variant for SSU-Wanda. 12208 is the checkpoint steps. 1 here is to specify a postfix for logs to help distinguish between different runs. Also, when you specify 1, the script will conduct both classification and generation evaluations. In contrast, if you specify 2 or 3, it will only conduct generation evaluations to support multiple-run evaluation for generation tasks.)

13B adapted model evaluation

Analysis

We provide a script used for our qualitative analysis of linguistic code-mixing in the adapted models. You can find it at ae2_language_ratio_analyze.py.

To run the analysis, you can use the following script. Note that you need to install fasttext.

#!/bin/bash

# Create an env
python3 -m venv --system-site-packages /path/to/envs/ssu_analysis
source /path/to/envs/ssu_analysis/bin/activate

# Install packages
pip install fasttext pandas numpy==1.26.4 huggingface_hub

# Run the analysis
python ~/src/ssu/analysis/ae2_language_ratio_analyze.py

Adapted Models

The adapted model checkpoints are available at the following Hugging Face Hub repositories:

Approach	Model Size	Hugging Face Hub Repository
FFT	7B	ne / am / ig / ha / ky
FFT	13B	ne / am / ig / ha / ky
HFT	7B	ne / am / ig / ha / ky
HFT	13B	ne / am / ig / ha / ky
GMT	7B	ne / am / ig / ha / ky
GMT	13B	ne / am / ig / ha / ky
SSU-Wanda	7B	ne / am / ig / ha / ky
SSU-Wanda	13B	ne / am / ig / ha / ky
AdaLoRA	7B	ne / am / ig / ha / ky
AdaLoRA	13B	ne / am / ig / ha / ky
SSU-Rand	7B	ne / am / ig / ha / ky
SSU-Rand	13B	ne / am / ig / ha / ky
SSU-Mag	7B	ne / am / ig / ha / ky
SSU-Mag	13B	ne / am / ig / ha / ky

Ablation model checkpoints used in the paper are also available at the following Hugging Face Hub repositories:

Approach	Model Size	Hugging Face Hub Repository
SSU-Wanda (Alpaca)	7B	ig
SSU-SparseGPT	7B	ig
SSU-Fisher	7B	ig
SSU-Wanda (12.5% Freezing Ratio)	7B	ig
SSU-Wanda (25% Freezing Ratio)	7B	ig
SSU-Wanda (37.5% Freezing Ratio)	7B	ig
SSU-Wanda (62.5% Freezing Ratio)	7B	ig
SSU-Wanda (75% Freezing Ratio)	7B	ig
SSU-Wanda (87.5% Freezing Ratio)	7B	ig
SSU-Wanda (Row-wise)	7B	ig
SSU-Wanda (Element-wise)	7B	ig
SSU-Wanda (Calibration Data Size: 128)	7B	ig
HFT (12.5% Freezing Ratio)	7B	ig
HFT (25% Freezing Ratio)	7B	ig
HFT (37.5% Freezing Ratio)	7B	ig
HFT (62.5% Freezing Ratio)	7B	ig
HFT (75% Freezing Ratio)	7B	ig
HFT (87.5% Freezing Ratio)	7B	ig
GMT (12.5% Freezing Ratio)	7B	ig
GMT (25% Freezing Ratio)	7B	ig
GMT (37.5% Freezing Ratio)	7B	ig
GMT (62.5% Freezing Ratio)	7B	ig
GMT (75% Freezing Ratio)	7B	ig
GMT (87.5% Freezing Ratio)	7B	ig
LoTA (90% Sparsity)	7B	ig
LoTA (12.5% Sparsity)	7B	ig
LoTA (25% Sparsity)	7B	ig
LoTA (37.5% Sparsity)	7B	ig
LoTA (50% Sparsity)	7B	ig
LoTA (62.5% Sparsity)	7B	ig
LoTA (75% Sparsity)	7B	ig
LoTA (87.5% Sparsity)	7B	ig
S2FT (Down)	7B	ig
S2FT (Down & Output)	7B	ig
S2FT (Down, rank=16)	7B	ig
S2FT (Down, rank=32)	7B	ig
S2FT (Down, rank=64)	7B	ig

Citation

If you find our work useful in your research, please consider citing the following paper:

@misc{yamaguchi2025mitigatingcatastrophicforgettingtarget,
      title={Mitigating Catastrophic Forgetting in Target Language Adaptation of LLMs via Source-Shielded Updates}, 
      author={Atsuki Yamaguchi and Terufumi Morishita and Aline Villavicencio and Nikolaos Aletras},
      year={2025},
      eprint={2512.04844},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2512.04844}, 
}

License

This repository is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
analysis		analysis
evaluation		evaluation
preprocessing		preprocessing
training		training
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
overview.png		overview.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mitigating Catastrophic Forgetting in Target Language Adaptation of LLMs via Source-Shielded Updates

Installation

For preprocessing and training

For evaluation with LightEval and AlapacaEval 2.0

For evaluation with lm-evaluation-harness

For safety evaluation

Preprocessing

Training data

Calibration data

Evaluation data

Continual pre-training

Supported training strategies

Example job scripts

OLMo-2-7B-1124-Instruct

For ablation analysis

OLMo-2-13B-1124-Instruct

On converting S2FT permuted checkpoints

Evaluation

Example job scripts

Source model evaluation

7B adapted model evaluation

13B adapted model evaluation

Analysis

Adapted Models

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mitigating Catastrophic Forgetting in Target Language Adaptation of LLMs via Source-Shielded Updates

Installation

For preprocessing and training

For evaluation with LightEval and AlapacaEval 2.0

For evaluation with lm-evaluation-harness

For safety evaluation

Preprocessing

Training data

Calibration data

Evaluation data

Continual pre-training

Supported training strategies

Example job scripts

OLMo-2-7B-1124-Instruct

For ablation analysis

OLMo-2-13B-1124-Instruct

On converting S2FT permuted checkpoints

Evaluation

Example job scripts

Source model evaluation

7B adapted model evaluation

13B adapted model evaluation

Analysis

Adapted Models

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages