Skip to content

Latest commit

Β 

History

History
270 lines (198 loc) Β· 9.29 KB

File metadata and controls

270 lines (198 loc) Β· 9.29 KB

Task Vector Interpretation and Analysis

A project exploring task vectors in large language models (LLMs) through fine-tuning, extraction, and analysis across multiple NLP tasks and model architectures.

πŸ“‹ Overview

This repository implements and analyzes task vectors - the difference between fine-tuned and pretrained model parameters that capture task-specific knowledge. The project demonstrates how these vectors can be extracted, stored, and applied to models for efficient transfer learning.

Key Features

  • πŸ” Task Vector Extraction: Extract task-specific knowledge from fine-tuned models
  • πŸ”„ Model Application: Apply task vectors to pretrained models for instant specialization
  • πŸ“Š Multi-Task Analysis: Support for MNLI (Natural Language Inference) and SST-2 (Sentiment Analysis)
  • πŸ€– Multi-Model Support: Compatible with Llama 3.1, Mistral, and Gemma architectures
  • ⚑ Efficient Fine-tuning: Uses LoRA (Low-Rank Adaptation) for parameter-efficient training
  • πŸ’Ύ Portable Vectors: Save and load task vectors as lightweight .pth files

πŸ—οΈ Project Structure

Task_vector_Interpretation_and_Analysis/
β”œβ”€β”€ MNLI/                          # Multi-Genre Natural Language Inference
β”‚   β”œβ”€β”€ gemma/
β”‚   β”‚   └── gemma2-9b-mnli.ipynb
β”‚   β”œβ”€β”€ mistral/
β”‚   β”‚   └── [mistral MNLI notebooks]
β”‚   └── llama/
β”‚       β”œβ”€β”€ llama3-1-8b-mnli.ipynb
β”‚       └── mnli_llama3.1_8B_task_vector.pth
β”œβ”€β”€ SST-2/                         # Stanford Sentiment Treebank
β”‚   β”œβ”€β”€ gemma/
β”‚   β”‚   └── [gemma SST-2 notebooks]
β”‚   β”œβ”€β”€ mistral/
β”‚   β”‚   └── [mistral SST-2 notebooks]
β”‚   └── llama/
β”‚       β”œβ”€β”€ llama3-1-8b-sst-2.ipynb
β”‚       └── sst2_llama3.1_8B_task_vector.pth
└── Load Task vector from file and merge with model/
    └── load-task-vector-from-pth-file.ipynb

πŸš€ Quick Start

Prerequisites

pip install torch torchvision torchaudio xformers --index-url https://download.pytorch.org/whl/cu121
pip install unsloth transformers datasets trl

1. Fine-tune a Model and Extract Task Vector

from unsloth import FastLanguageModel
import torch

# Load pretrained model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Meta-Llama-3.1-8B",
    max_seq_length=2048,
    load_in_4bit=True,
)

# Apply LoRA for efficient fine-tuning
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", 
                   "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    use_gradient_checkpointing="unsloth",
)

# Fine-tune on your task (MNLI, SST-2, etc.)
# ... training code ...

# Extract task vector
task_vector = TaskVector(pretrained_model, fine_tuned_model)
torch.save(task_vector.vector, "my_task_vector.pth")

2. Load and Apply Task Vector

# Load saved task vector
task_vector_data = torch.load("sst2_llama3.1_8B_task_vector.pth")
task_vector = TaskVector(vector=task_vector_data)

# Apply to pretrained model
specialized_model = task_vector.apply_to(pretrained_model)

# Use for inference
inputs = tokenizer("This movie is amazing!", return_tensors="pt")
outputs = specialized_model.generate(**inputs)

πŸ“š Supported Tasks

MNLI (Multi-Genre Natural Language Inference)

  • Task: Determine relationship between premise and hypothesis
  • Labels: entailment, neutral, contradiction
  • Dataset: GLUE MNLI benchmark

SST-2 (Stanford Sentiment Treebank)

  • Task: Binary sentiment classification
  • Labels: positive, negative
  • Dataset: Movie review sentiment analysis

πŸ€– Supported Models

Model Size Architecture Status
Llama 3.1 8B Transformer βœ… Fully Supported
Mistral 7B Transformer βœ… Fully Supported
Gemma 2 9B Transformer βœ… Fully Supported

🧠 Task Vector Methodology

What are Task Vectors?

Task vectors represent the difference between a fine-tuned model and its pretrained base:

Task Vector = Fine-tuned Model Parameters - Pretrained Model Parameters

Key Advantages

  1. Modularity: Task-specific knowledge stored separately from base model
  2. Efficiency: Vectors are much smaller than full model weights
  3. Composability: Multiple task vectors can potentially be combined
  4. Portability: Easy to share and apply across different instances

Implementation Details

  • LoRA Fine-tuning: Parameter-efficient training using Low-Rank Adaptation
  • Chunked Application: Memory-efficient vector application for large models
  • GPU/CPU Flexibility: Optimized for both GPU and CPU inference
  • Platform: All notebooks developed and tested on Kaggle with GPU T4 x2 using Unsloth for accelerated training

πŸ“Š Results and Analysis

The project demonstrates successful task vector extraction and application across:

  • βœ… MNLI Classification: Accurate entailment/contradiction detection
  • βœ… Sentiment Analysis: Reliable positive/negative classification
  • βœ… Cross-Model Transfer: Vectors work across different model architectures
  • βœ… Memory Efficiency: Significant storage savings compared to full model weights

πŸ”¬ Research Applications

This codebase enables research in:

  • Task Vector Arithmetic: Combining and manipulating task knowledge
  • Transfer Learning: Efficient knowledge transfer between tasks
  • Model Interpretability: Understanding what models learn for specific tasks
  • Few-Shot Learning: Rapid adaptation to new tasks
  • Model Compression: Storing task-specific knowledge efficiently

πŸ“ File Descriptions

Notebooks

  • *-mnli.ipynb: Complete pipeline for MNLI task vector creation
  • *-sst-2.ipynb: Complete pipeline for SST-2 task vector creation
  • load-task-vector-from-pth-file.ipynb: Demonstration of loading and applying saved vectors

Task Vector Files

  • *.pth: Serialized task vectors (PyTorch format)
  • Typically 2-3GB per vector, much smaller than full model weights

πŸ“₯ Downloading Pre-trained Task Vectors

Due to GitHub's file size limitations, the large task vector files (.pth) are not included in this repository. You can:

  1. Generate your own: Follow the notebooks to create task vectors from scratch
  2. Download pre-trained vectors: [Contact for access to pre-trained task vectors]
  3. Use Git LFS: Clone with git lfs clone if LFS is enabled

Available pre-trained task vectors:

  • sst2_llama3.1_8B_task_vector.pth (2.5GB) - Llama 3.1 8B for SST-2
  • mnli_llama3.1_8B_task_vector.pth (2.5GB) - Llama 3.1 8B for MNLI
  • sst2_gemma2_9B_task_vector.pth (3.7GB) - Gemma 2 9B for SST-2
  • sst2_mistral_instruct_v0.3_7B_task_vector.pth (675MB) - Mistral 7B for SST-2
  • mnli_mistral_nemo_12B_task_vector.pth (675MB) - Mistral Nemo 12B for MNLI

πŸ› οΈ Technical Requirements

Development Environment

  • Platform: Kaggle Notebooks (recommended)
  • GPU: NVIDIA Tesla T4 x2 (used in development)
  • Framework: Unsloth for efficient fine-tuning

System Requirements

  • GPU: NVIDIA GPU with CUDA support (Tesla T4 or better)
  • Memory: 16GB+ GPU memory recommended for 8B+ models
  • Storage: ~10GB per model + task vectors
  • Python: 3.8+ with PyTorch 2.0+

Note on Environment

All Jupyter notebooks (*.ipynb) in this repository were developed and executed on Kaggle using GPU T4 x2 with the Unsloth library for optimized training.

πŸ“– Usage Examples

Sentiment Analysis with Task Vector

# Load SST-2 task vector
sst2_vector = torch.load("sst2_llama3.1_8B_task_vector.pth")
task_vector = TaskVector(vector=sst2_vector)

# Apply to model
model = task_vector.apply_to(base_model)

# Classify sentiment
prompt = """Below is an instruction that describes a task...
### Input:
This movie is fantastic!
### Response:
"""
result = model.generate(tokenizer.encode(prompt))
# Output: "positive"

Natural Language Inference

# Load MNLI task vector
mnli_vector = torch.load("mnli_llama3.1_8B_task_vector.pth")
task_vector = TaskVector(vector=mnli_vector)

# Apply to model
model = task_vector.apply_to(base_model)

# Perform inference
premise = "A person is reading a book"
hypothesis = "Someone is learning"
# Output: "entailment"

🀝 Contributing

Contributions are welcome! Areas for improvement:

  • Additional task implementations (summarization, QA, etc.)
  • More model architectures (GPT, T5, etc.)
  • Task vector arithmetic and composition
  • Evaluation metrics and benchmarking
  • Documentation and tutorials

πŸ“œ License

This project is open-source and available under the MIT License.

πŸ™ Acknowledgments

  • Unsloth: For efficient model fine-tuning capabilities
  • Hugging Face: For model hosting and datasets
  • Task Vector Research: Based on emerging research in modular AI systems

πŸ“š References


Note: This project is designed for research and educational purposes. Task vectors represent an active area of AI research with ongoing developments in methodology and applications.