A project exploring task vectors in large language models (LLMs) through fine-tuning, extraction, and analysis across multiple NLP tasks and model architectures.
This repository implements and analyzes task vectors - the difference between fine-tuned and pretrained model parameters that capture task-specific knowledge. The project demonstrates how these vectors can be extracted, stored, and applied to models for efficient transfer learning.
- π Task Vector Extraction: Extract task-specific knowledge from fine-tuned models
- π Model Application: Apply task vectors to pretrained models for instant specialization
- π Multi-Task Analysis: Support for MNLI (Natural Language Inference) and SST-2 (Sentiment Analysis)
- π€ Multi-Model Support: Compatible with Llama 3.1, Mistral, and Gemma architectures
- β‘ Efficient Fine-tuning: Uses LoRA (Low-Rank Adaptation) for parameter-efficient training
- πΎ Portable Vectors: Save and load task vectors as lightweight
.pthfiles
Task_vector_Interpretation_and_Analysis/
βββ MNLI/ # Multi-Genre Natural Language Inference
β βββ gemma/
β β βββ gemma2-9b-mnli.ipynb
β βββ mistral/
β β βββ [mistral MNLI notebooks]
β βββ llama/
β βββ llama3-1-8b-mnli.ipynb
β βββ mnli_llama3.1_8B_task_vector.pth
βββ SST-2/ # Stanford Sentiment Treebank
β βββ gemma/
β β βββ [gemma SST-2 notebooks]
β βββ mistral/
β β βββ [mistral SST-2 notebooks]
β βββ llama/
β βββ llama3-1-8b-sst-2.ipynb
β βββ sst2_llama3.1_8B_task_vector.pth
βββ Load Task vector from file and merge with model/
βββ load-task-vector-from-pth-file.ipynb
pip install torch torchvision torchaudio xformers --index-url https://download.pytorch.org/whl/cu121
pip install unsloth transformers datasets trlfrom unsloth import FastLanguageModel
import torch
# Load pretrained model
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/Meta-Llama-3.1-8B",
max_seq_length=2048,
load_in_4bit=True,
)
# Apply LoRA for efficient fine-tuning
model = FastLanguageModel.get_peft_model(
model,
r=16,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
lora_alpha=16,
use_gradient_checkpointing="unsloth",
)
# Fine-tune on your task (MNLI, SST-2, etc.)
# ... training code ...
# Extract task vector
task_vector = TaskVector(pretrained_model, fine_tuned_model)
torch.save(task_vector.vector, "my_task_vector.pth")# Load saved task vector
task_vector_data = torch.load("sst2_llama3.1_8B_task_vector.pth")
task_vector = TaskVector(vector=task_vector_data)
# Apply to pretrained model
specialized_model = task_vector.apply_to(pretrained_model)
# Use for inference
inputs = tokenizer("This movie is amazing!", return_tensors="pt")
outputs = specialized_model.generate(**inputs)- Task: Determine relationship between premise and hypothesis
- Labels: entailment, neutral, contradiction
- Dataset: GLUE MNLI benchmark
- Task: Binary sentiment classification
- Labels: positive, negative
- Dataset: Movie review sentiment analysis
| Model | Size | Architecture | Status |
|---|---|---|---|
| Llama 3.1 | 8B | Transformer | β Fully Supported |
| Mistral | 7B | Transformer | β Fully Supported |
| Gemma 2 | 9B | Transformer | β Fully Supported |
Task vectors represent the difference between a fine-tuned model and its pretrained base:
Task Vector = Fine-tuned Model Parameters - Pretrained Model Parameters
- Modularity: Task-specific knowledge stored separately from base model
- Efficiency: Vectors are much smaller than full model weights
- Composability: Multiple task vectors can potentially be combined
- Portability: Easy to share and apply across different instances
- LoRA Fine-tuning: Parameter-efficient training using Low-Rank Adaptation
- Chunked Application: Memory-efficient vector application for large models
- GPU/CPU Flexibility: Optimized for both GPU and CPU inference
- Platform: All notebooks developed and tested on Kaggle with GPU T4 x2 using Unsloth for accelerated training
The project demonstrates successful task vector extraction and application across:
- β MNLI Classification: Accurate entailment/contradiction detection
- β Sentiment Analysis: Reliable positive/negative classification
- β Cross-Model Transfer: Vectors work across different model architectures
- β Memory Efficiency: Significant storage savings compared to full model weights
This codebase enables research in:
- Task Vector Arithmetic: Combining and manipulating task knowledge
- Transfer Learning: Efficient knowledge transfer between tasks
- Model Interpretability: Understanding what models learn for specific tasks
- Few-Shot Learning: Rapid adaptation to new tasks
- Model Compression: Storing task-specific knowledge efficiently
*-mnli.ipynb: Complete pipeline for MNLI task vector creation*-sst-2.ipynb: Complete pipeline for SST-2 task vector creationload-task-vector-from-pth-file.ipynb: Demonstration of loading and applying saved vectors
*.pth: Serialized task vectors (PyTorch format)- Typically 2-3GB per vector, much smaller than full model weights
Due to GitHub's file size limitations, the large task vector files (.pth) are not included in this repository. You can:
- Generate your own: Follow the notebooks to create task vectors from scratch
- Download pre-trained vectors: [Contact for access to pre-trained task vectors]
- Use Git LFS: Clone with
git lfs cloneif LFS is enabled
Available pre-trained task vectors:
sst2_llama3.1_8B_task_vector.pth(2.5GB) - Llama 3.1 8B for SST-2mnli_llama3.1_8B_task_vector.pth(2.5GB) - Llama 3.1 8B for MNLIsst2_gemma2_9B_task_vector.pth(3.7GB) - Gemma 2 9B for SST-2sst2_mistral_instruct_v0.3_7B_task_vector.pth(675MB) - Mistral 7B for SST-2mnli_mistral_nemo_12B_task_vector.pth(675MB) - Mistral Nemo 12B for MNLI
- Platform: Kaggle Notebooks (recommended)
- GPU: NVIDIA Tesla T4 x2 (used in development)
- Framework: Unsloth for efficient fine-tuning
- GPU: NVIDIA GPU with CUDA support (Tesla T4 or better)
- Memory: 16GB+ GPU memory recommended for 8B+ models
- Storage: ~10GB per model + task vectors
- Python: 3.8+ with PyTorch 2.0+
All Jupyter notebooks (*.ipynb) in this repository were developed and executed on Kaggle using GPU T4 x2 with the Unsloth library for optimized training.
# Load SST-2 task vector
sst2_vector = torch.load("sst2_llama3.1_8B_task_vector.pth")
task_vector = TaskVector(vector=sst2_vector)
# Apply to model
model = task_vector.apply_to(base_model)
# Classify sentiment
prompt = """Below is an instruction that describes a task...
### Input:
This movie is fantastic!
### Response:
"""
result = model.generate(tokenizer.encode(prompt))
# Output: "positive"# Load MNLI task vector
mnli_vector = torch.load("mnli_llama3.1_8B_task_vector.pth")
task_vector = TaskVector(vector=mnli_vector)
# Apply to model
model = task_vector.apply_to(base_model)
# Perform inference
premise = "A person is reading a book"
hypothesis = "Someone is learning"
# Output: "entailment"Contributions are welcome! Areas for improvement:
- Additional task implementations (summarization, QA, etc.)
- More model architectures (GPT, T5, etc.)
- Task vector arithmetic and composition
- Evaluation metrics and benchmarking
- Documentation and tutorials
This project is open-source and available under the MIT License.
- Unsloth: For efficient model fine-tuning capabilities
- Hugging Face: For model hosting and datasets
- Task Vector Research: Based on emerging research in modular AI systems
- Editing Models with Task Arithmetic
- LoRA: Low-Rank Adaptation of Large Language Models
- GLUE Benchmark
Note: This project is designed for research and educational purposes. Task vectors represent an active area of AI research with ongoing developments in methodology and applications.