Skip to content

mitadake/Task_vector_Extraction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Task Vector Interpretation and Analysis

A project exploring task vectors in large language models (LLMs) through fine-tuning, extraction, and analysis across multiple NLP tasks and model architectures.

πŸ“‹ Overview

This repository implements and analyzes task vectors - the difference between fine-tuned and pretrained model parameters that capture task-specific knowledge. The project demonstrates how these vectors can be extracted, stored, and applied to models for efficient transfer learning.

Key Features

  • πŸ” Task Vector Extraction: Extract task-specific knowledge from fine-tuned models
  • πŸ”„ Model Application: Apply task vectors to pretrained models for instant specialization
  • πŸ“Š Multi-Task Analysis: Support for MNLI (Natural Language Inference) and SST-2 (Sentiment Analysis)
  • πŸ€– Multi-Model Support: Compatible with Llama 3.1, Mistral, and Gemma architectures
  • ⚑ Efficient Fine-tuning: Uses LoRA (Low-Rank Adaptation) for parameter-efficient training
  • πŸ’Ύ Portable Vectors: Save and load task vectors as lightweight .pth files

πŸ—οΈ Project Structure

Task_vector_Interpretation_and_Analysis/
β”œβ”€β”€ MNLI/                          # Multi-Genre Natural Language Inference
β”‚   β”œβ”€β”€ gemma/
β”‚   β”‚   └── gemma2-9b-mnli.ipynb
β”‚   β”œβ”€β”€ mistral/
β”‚   β”‚   └── [mistral MNLI notebooks]
β”‚   └── llama/
β”‚       β”œβ”€β”€ llama3-1-8b-mnli.ipynb
β”‚       └── mnli_llama3.1_8B_task_vector.pth
β”œβ”€β”€ SST-2/                         # Stanford Sentiment Treebank
β”‚   β”œβ”€β”€ gemma/
β”‚   β”‚   └── [gemma SST-2 notebooks]
β”‚   β”œβ”€β”€ mistral/
β”‚   β”‚   └── [mistral SST-2 notebooks]
β”‚   └── llama/
β”‚       β”œβ”€β”€ llama3-1-8b-sst-2.ipynb
β”‚       └── sst2_llama3.1_8B_task_vector.pth
└── Load Task vector from file and merge with model/
    └── load-task-vector-from-pth-file.ipynb

πŸš€ Quick Start

Prerequisites

pip install torch torchvision torchaudio xformers --index-url https://download.pytorch.org/whl/cu121
pip install unsloth transformers datasets trl

1. Fine-tune a Model and Extract Task Vector

from unsloth import FastLanguageModel
import torch

# Load pretrained model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Meta-Llama-3.1-8B",
    max_seq_length=2048,
    load_in_4bit=True,
)

# Apply LoRA for efficient fine-tuning
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", 
                   "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    use_gradient_checkpointing="unsloth",
)

# Fine-tune on your task (MNLI, SST-2, etc.)
# ... training code ...

# Extract task vector
task_vector = TaskVector(pretrained_model, fine_tuned_model)
torch.save(task_vector.vector, "my_task_vector.pth")

2. Load and Apply Task Vector

# Load saved task vector
task_vector_data = torch.load("sst2_llama3.1_8B_task_vector.pth")
task_vector = TaskVector(vector=task_vector_data)

# Apply to pretrained model
specialized_model = task_vector.apply_to(pretrained_model)

# Use for inference
inputs = tokenizer("This movie is amazing!", return_tensors="pt")
outputs = specialized_model.generate(**inputs)

πŸ“š Supported Tasks

MNLI (Multi-Genre Natural Language Inference)

  • Task: Determine relationship between premise and hypothesis
  • Labels: entailment, neutral, contradiction
  • Dataset: GLUE MNLI benchmark

SST-2 (Stanford Sentiment Treebank)

  • Task: Binary sentiment classification
  • Labels: positive, negative
  • Dataset: Movie review sentiment analysis

πŸ€– Supported Models

Model Size Architecture Status
Llama 3.1 8B Transformer βœ… Fully Supported
Mistral 7B Transformer βœ… Fully Supported
Gemma 2 9B Transformer βœ… Fully Supported

🧠 Task Vector Methodology

What are Task Vectors?

Task vectors represent the difference between a fine-tuned model and its pretrained base:

Task Vector = Fine-tuned Model Parameters - Pretrained Model Parameters

Key Advantages

  1. Modularity: Task-specific knowledge stored separately from base model
  2. Efficiency: Vectors are much smaller than full model weights
  3. Composability: Multiple task vectors can potentially be combined
  4. Portability: Easy to share and apply across different instances

Implementation Details

  • LoRA Fine-tuning: Parameter-efficient training using Low-Rank Adaptation
  • Chunked Application: Memory-efficient vector application for large models
  • GPU/CPU Flexibility: Optimized for both GPU and CPU inference
  • Platform: All notebooks developed and tested on Kaggle with GPU T4 x2 using Unsloth for accelerated training

πŸ“Š Results and Analysis

The project demonstrates successful task vector extraction and application across:

  • βœ… MNLI Classification: Accurate entailment/contradiction detection
  • βœ… Sentiment Analysis: Reliable positive/negative classification
  • βœ… Cross-Model Transfer: Vectors work across different model architectures
  • βœ… Memory Efficiency: Significant storage savings compared to full model weights

πŸ”¬ Research Applications

This codebase enables research in:

  • Task Vector Arithmetic: Combining and manipulating task knowledge
  • Transfer Learning: Efficient knowledge transfer between tasks
  • Model Interpretability: Understanding what models learn for specific tasks
  • Few-Shot Learning: Rapid adaptation to new tasks
  • Model Compression: Storing task-specific knowledge efficiently

πŸ“ File Descriptions

Notebooks

  • *-mnli.ipynb: Complete pipeline for MNLI task vector creation
  • *-sst-2.ipynb: Complete pipeline for SST-2 task vector creation
  • load-task-vector-from-pth-file.ipynb: Demonstration of loading and applying saved vectors

Task Vector Files

  • *.pth: Serialized task vectors (PyTorch format)
  • Typically 2-3GB per vector, much smaller than full model weights

πŸ“₯ Downloading Pre-trained Task Vectors

Due to GitHub's file size limitations, the large task vector files (.pth) are not included in this repository. You can:

  1. Generate your own: Follow the notebooks to create task vectors from scratch
  2. Download pre-trained vectors: [Contact for access to pre-trained task vectors]
  3. Use Git LFS: Clone with git lfs clone if LFS is enabled

Available pre-trained task vectors:

  • sst2_llama3.1_8B_task_vector.pth (2.5GB) - Llama 3.1 8B for SST-2
  • mnli_llama3.1_8B_task_vector.pth (2.5GB) - Llama 3.1 8B for MNLI
  • sst2_gemma2_9B_task_vector.pth (3.7GB) - Gemma 2 9B for SST-2
  • sst2_mistral_instruct_v0.3_7B_task_vector.pth (675MB) - Mistral 7B for SST-2
  • mnli_mistral_nemo_12B_task_vector.pth (675MB) - Mistral Nemo 12B for MNLI

πŸ› οΈ Technical Requirements

Development Environment

  • Platform: Kaggle Notebooks (recommended)
  • GPU: NVIDIA Tesla T4 x2 (used in development)
  • Framework: Unsloth for efficient fine-tuning

System Requirements

  • GPU: NVIDIA GPU with CUDA support (Tesla T4 or better)
  • Memory: 16GB+ GPU memory recommended for 8B+ models
  • Storage: ~10GB per model + task vectors
  • Python: 3.8+ with PyTorch 2.0+

Note on Environment

All Jupyter notebooks (*.ipynb) in this repository were developed and executed on Kaggle using GPU T4 x2 with the Unsloth library for optimized training.

πŸ“– Usage Examples

Sentiment Analysis with Task Vector

# Load SST-2 task vector
sst2_vector = torch.load("sst2_llama3.1_8B_task_vector.pth")
task_vector = TaskVector(vector=sst2_vector)

# Apply to model
model = task_vector.apply_to(base_model)

# Classify sentiment
prompt = """Below is an instruction that describes a task...
### Input:
This movie is fantastic!
### Response:
"""
result = model.generate(tokenizer.encode(prompt))
# Output: "positive"

Natural Language Inference

# Load MNLI task vector
mnli_vector = torch.load("mnli_llama3.1_8B_task_vector.pth")
task_vector = TaskVector(vector=mnli_vector)

# Apply to model
model = task_vector.apply_to(base_model)

# Perform inference
premise = "A person is reading a book"
hypothesis = "Someone is learning"
# Output: "entailment"

🀝 Contributing

Contributions are welcome! Areas for improvement:

  • Additional task implementations (summarization, QA, etc.)
  • More model architectures (GPT, T5, etc.)
  • Task vector arithmetic and composition
  • Evaluation metrics and benchmarking
  • Documentation and tutorials

πŸ“œ License

This project is open-source and available under the MIT License.

πŸ™ Acknowledgments

  • Unsloth: For efficient model fine-tuning capabilities
  • Hugging Face: For model hosting and datasets
  • Task Vector Research: Based on emerging research in modular AI systems

πŸ“š References


Note: This project is designed for research and educational purposes. Task vectors represent an active area of AI research with ongoing developments in methodology and applications.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors