Skip to content

Feature/qlora finetune#31

Open
MatteoPerona wants to merge 38 commits intoOpenHelix-Team:mainfrom
AndrasFerenczy:feature/qlora-finetune
Open

Feature/qlora finetune#31
MatteoPerona wants to merge 38 commits intoOpenHelix-Team:mainfrom
AndrasFerenczy:feature/qlora-finetune

Conversation

@MatteoPerona
Copy link
Copy Markdown

Title

Add LoRA fine-tuning pipeline for Cobra VLM (qlora_finetune)

Summary

This PR introduces a self-contained LoRA fine-tuning pipeline for Cobra VLM, including training, inference, and reproducible environment setup. It enables quickly fine-tuning Cobra (cobra+3b etc.) on LLaVA-CoT-100k (or local JSONL data) and saving LoRA adapters under timestamped qlora_outputs_* folders for downstream use.

Note: many files are still labeled as qlora because the original idea was to use QLoRA which adds quantization ontop of lora. Because we are using a MAMBA-based model this was not feasible, so we went with the normal strategy.


What’s Included

1. Environment & Dependencies

  • qlora_finetune/requirements.txt

    • Pinned versions for:
      • torch==2.1.0, torchvision==0.16.0, torchaudio==2.1.0, triton==2.1.0
      • transformers==4.34.1, tokenizers>=0.14,<0.15, accelerate==0.26.1
      • peft==0.7.1, bitsandbytes>=0.41.0,<0.43.0, trl==0.7.4, datasets>=2.14.0,<2.18.0
      • Utility deps: einops, timm==0.9.10, wandb, jsonlines, rich, tqdm, etc.
    • Version ranges chosen to avoid:
      • huggingface_hub vs datasets conflicts
      • transformers vs trl API breakages
      • torch vs bitsandbytes binary mismatches
  • qlora_finetune/install_requirements.sh

    • Creates/activates ./env venv (Python 3.10).
    • Installs build deps: pip, setuptools, wheel, packaging.
    • Installs PyTorch stack, Transformers ecosystem, vision/utils, QLoRA-specific deps, and:
      • Installs mamba-ssm<2.0.0 with --no-build-isolation so it can see the already-installed torch.

2. Config & Dataset Handling

  • qlora_finetune/config.py

    • QLoRAConfig dataclass with:
      • Model: model_id, pretrained_checkpoint, hf_token
      • Dataset: dataset_name, dataset_root, dataset_proportion, dataset_max_samples, dataset_seed
      • Training: output_dir, per_device_train_batch_size, gradient_accumulation_steps, learning_rate, num_train_epochs, max_steps, etc.
      • Training settings: fp16/bf16, logging_steps, save_steps, eval_steps, save_total_limit
      • Tracking: report_to (default ["wandb"]), wandb_project, wandb_entity
    • __post_init__ enforces sane defaults and value ranges, and resolves .hf_token files into actual tokens.
  • qlora_finetune/dataset_loader.py

    • load_llava_cot_dataset(...):
      • Supports:
        • HF dataset (default Xkev/LLaVA-CoT-100k) via load_dataset(dataset_name, split=...) (no trust_remote_code to stay compatible with datasets 2.14.x).
        • Local JSONL via dataset_root / "train.jsonl".
      • Supports:
        • dataset_max_samples (absolute cap, takes precedence).
        • dataset_proportion (fraction of the dataset).
      • Deterministic sampling with dataset_seed.
    • format_for_sft(...):
      • Flattens LLaVA-style conversations into a single "text" field with USER: / ASSISTANT: prefixes and injects an <image> token for the first user turn.

3. Model Loading & QLoRA Preparation

  • qlora_finetune/model_loader.py
    • load_cobra_for_qlora(model_id, pretrained_checkpoint, hf_token, freeze_vision_encoder):
      • Loads the full Cobra VLM either from:
        • HF hub (model_id), or
        • Local checkpoint (pretrained_checkpoint), using the core cobra.models.load.load.
      • Applies Mamba-specific workarounds:
        • Disables fused_add_norm, swaps RMSNorm for a safe LayerNorm to avoid Triton issues.
        • Optionally freezes the vision encoder.
    • prepare_model_for_qlora(model, target_modules, lora_r, lora_alpha, lora_dropout, lora_bias):
      • Wraps target modules with LoRA via PEFT:
        • Auto-detects typical Mamba SSM linear layers (in_proj, out_proj, x_proj, dt_proj, etc.).
        • Applies LoRA rank/alpha/dropout, and handles bias for mamba-ssm compatibility.
    • load_and_prepare_model(...):
      • Top-level function returning (vlm, llm_backbone_with_lora) ready for training.

4. Training Script

  • qlora_finetune/train_qlora.py
    • Main training entrypoint with:

      • CUDA / LD_LIBRARY_PATH setup for Triton and CUDA libs.
      • Imports and sys.path hacks so qlora_finetune + cobra modules work when run from different roots.
    • main(config: QLoRAConfig):

      • Loads prepared model via load_and_prepare_model.
      • Loads tokenizer from:
        • vlm.llm_backbone.tokenizer if available, or
        • Directly from HF (xiuyul/mamba-2.8b-zephyr / state-spaces/mamba-2.8b) as a fallback.
      • Prints parameter counts (total/trainable).
      • Loads dataset via load_llava_cot_dataset, applies sampling (dataset_max_samples / dataset_proportion), and maps to "text" format.
      • Attempts gradient checkpointing where supported, with graceful fallback.
      • Builds TrainingArguments with:
        • output_dir, LR, epochs, warmup, weight decay, grad norm, fp16/bf16, logging/save cadence, seed, dataloader workers, remove_unused_columns, report_to, run_name.
        • Optional max_steps, eval_steps.
        • Before constructing TrainingArguments, checks WandB availability:
          • If WANDB_API_KEY is not set and no stored login is found, automatically strips "wandb" from report_to and falls back to ["none"], printing a warning.
      • Optional WandB init:
        • Only if "wandb" still present in report_to.
        • Logs key hyperparameters to WandB.
      • Uses trl.SFTTrainer with:
        • dataset_text_field="text".
        • Custom DataCollatorForLanguageModeling (causal LM, pad_to_multiple_of=8).
        • max_seq_length=512 (explicitly set to reduce memory usage with big Mamba models).
      • Runs trainer.train(), saves final adapter + tokenizer to config.output_dir, and prints PEFT loading instructions.
    • CLI (if __name__ == "__main__":):

      • Arguments: --config, --model_id, --dataset_proportion, --dataset_max_samples, --output_dir, --hf_token, --per_device_train_batch_size, --gradient_accumulation_steps.
      • If --config JSON is provided and exists, initializes QLoRAConfig from it; otherwise uses defaults and CLI overrides.

5. Inference Utilities

  • qlora_finetune/inference.py
    • load_qlora_model(base_model_id, lora_adapter_path, hf_token=None, merge_weights=False):
      • Loads base Cobra VLM via cobra.models.load.load.
      • Loads LoRA adapters from lora_adapter_path into the LLM backbone via PeftModel.from_pretrained.
      • Optionally merges LoRA weights into the base model (merge_and_unload).
    • generate_with_qlora(model, vlm, image, prompt, max_new_tokens=512, temperature=0.7, do_sample=True):
      • Uses the Cobra VLM’s generate API with the fine-tuned weights.
    • save_merged_model(model, output_path, tokenizer=None):
      • Saves merged model + tokenizer in HF-style layout for downstream inference.

6. Run Scripts

  • qlora_finetune/run_100samples.sh

    • Activates ./env if present.
    • Creates a timestamped output dir: ./qlora_outputs_100samples_<timestamp>.
    • Runs:
      python train_qlora.py \
        --model_id cobra+3b \
        --dataset_max_samples 100 \
        --output_dir "${OUTPUT_DIR}" \
        --per_device_train_batch_size 1 \
        --gradient_accumulation_steps 1
    • Emits a clear banner and final summary path.
  • (You’ve also derived run_1000samples.sh, run_10000samples.sh in your workspace following the same pattern.)

7. Git Ignore

  • qlora_finetune/.gitignore
    • Now ignores QLoRA output dirs and local artifacts:

      checkpoints/
      outputs/
      runs/
      qlora_outputs*/
      qlora_outputs_*/
      *.pt
      *.pth
      *.ckpt
      wandb/
      *.log

How to Use

One-time setup

cd qlora_finetune
bash install_requirements.sh

Fine-tune with 100 examples

cd qlora_finetune
source env/bin/activate
./run_100samples.sh

Outputs go to a folder like:

qlora_outputs_100samples_20251130_225122/
  adapter_model.bin
  adapter_config.json
  tokenizer.json
  tokenizer_config.json
  special_tokens_map.json
  README.md

Evaluate / Inference

Use qlora_finetune/inference.py as documented in qlora_finetune/README.md, e.g.:

from pathlib import Path
from qlora_finetune.inference import load_qlora_model, generate_with_qlora
from PIL import Image

model, vlm = load_qlora_model(
    base_model_id="cobra+3b",
    lora_adapter_path=Path("./qlora_finetune/qlora_outputs_100samples_..."),
    merge_weights=False,
)

image = Image.open("path/to/image.jpg")
prompt = "What is going on in this image?"
response = generate_with_qlora(model, vlm, image, prompt)
print(response)

Notes / Trade-offs

  • Dependency pinning is intentionally strict to avoid the many incompatibilities we hit (mamba-ssm, datasets, huggingface_hub, trl, bitsandbytes, torch).
  • WandB:
    • By default, training will track to WandB if you’ve run wandb login or set WANDB_API_KEY.
    • If no key is available (e.g., non-interactive cluster job), WandB is automatically disabled and report_to falls back to ["none"].
  • Memory:
    • max_seq_length=512, batch_size=1, and gradient checkpointing (where supported) are chosen to fit the 3B Mamba-based Cobra model on a single L4‑class GPU.

Checklist

  • QLoRA training loop for Cobra VLM (train_qlora.py).
  • Dataset loader + formatting for LLaVA-CoT‑style conversations.
  • Reproducible environment via requirements.txt + install_requirements.sh.
  • Inference utilities to load LoRA adapters and optionally merge them.
  • Run scripts to quickly launch small-scale fine-tunes (e.g., 100 samples).
  • Ignore training artifacts (qlora_outputs_*, wandb logs) from Git.

andrasferenczy and others added 30 commits November 8, 2025 00:12
…nd save BLEU scores with timestamps. Add new entries to .gitignore for output files.
…ctionality for clearing GPU memory and saving BLEU scores has been integrated into the main workflow.
Update notebook: Clear GPU RAM and save image of results
This commit introduces a comprehensive set of files and scripts for fine-tuning the Cobra VLM on the LLaVA-CoT-100k dataset. Key additions include:

- **Dataset Preparation**: A script to download, validate, and prepare the dataset for training.
- **Custom Dataset Loader**: A new loader that supports JSONL format and integrates with existing training infrastructure.
- **Fine-Tuning Script**: A dedicated script for fine-tuning the model using the prepared dataset.
- **Documentation**: Detailed guides and summaries for setup and usage.

These changes enhance the model's reasoning capabilities by leveraging structured reasoning annotations in the dataset.
This commit introduces the foundational structure for the Cobra Evaluation System, including:

- **Main Module**: The entry point for running evaluations with command-line argument parsing.
- **Configuration Management**: A dedicated module for handling CLI arguments and settings.
- **Registry System**: A registry for managing generators and metrics, allowing for extensibility.
- **Generators**: Implementations for baseline, scratchpad, and external generation methods.
- **Metrics**: Initial implementations for BLEU and BERTScore metrics.
- **Utilities**: Functions for GPU management, JSON I/O, and visualization of results.

These changes establish a modular framework for evaluating visual language models with various inference strategies and metrics, enhancing the system's extensibility and usability.
This commit introduces significant improvements to the evaluation process, including:

- **Method Comparison**: Added functionality to run and compare results from multiple methods (baseline and scratchpad) within the same evaluation session.
- **Visualization Enhancements**: Implemented a new comparison visualization that displays results side-by-side for easier analysis of generated captions and metrics.
- **BERTScore Metric Updates**: Enhanced the BERTScore metric to store per-sample scores, allowing for detailed performance analysis.
- **Code Refactoring**: Cleaned up the main evaluation logic for better readability and maintainability.

These changes improve the usability and analytical capabilities of the Cobra Evaluation System, facilitating more comprehensive evaluations of visual language models.
This commit introduces several improvements to the evaluation process, including:

- **Dynamic Output Directories**: Results are now saved in timestamped directories for better organization, allowing users to easily manage multiple runs.
- **Comparison Statistics**: Added functionality to compute and save comparison statistics between baseline and scratchpad methods, including win rates and metric differences.
- **Visualization Updates**: Enhanced the comparison visualization to include detailed metrics and reasoning traces, improving the clarity of results.

These changes enhance the usability and analytical capabilities of the Cobra Evaluation System, facilitating more comprehensive evaluations of visual language models.
This commit modifies the .gitignore file to ensure that shell scripts are ignored and removes the output.png file, which is no longer needed. These changes help streamline the project by keeping unnecessary files out of version control.
This commit modifies the .gitignore file to include the __pycache__ directory, ensuring that Python bytecode files are not tracked in version control. This helps maintain a cleaner project structure by excluding unnecessary files.
This commit introduces several new files and enhancements to the Cobra Evaluation System, including:

- **New Scripts**: Added `analyze_significance.py`, `compare_scratchpad_passes.py`, and `visualize_scratchpad_passes.py` for analyzing and visualizing scratchpad performance across multiple passes.
- **Checkpointing Guide**: Introduced `CHECKPOINTING_GUIDE.md` to document the new automatic checkpointing feature for long-running evaluations.
- **Improved Documentation**: Added `SCRATCHPAD_COMPARE_MODE.md`, `SCRATCHPAD_DEGRADATION_ANALYSIS.md`, and `SCRATCHPAD_IMPROVEMENTS.md` to provide insights into scratchpad methods and their performance.
- **New Data Files**: Included various JSON and PNG files for results and visualizations from recent evaluations.

These changes enhance the analytical capabilities and usability of the evaluation system, facilitating better understanding and comparison of different methods in visual language models.
philosophercode and others added 8 commits December 1, 2025 02:12
This commit introduces support for external model API clients, allowing users to run evaluations using models such as GPT-5, Gemini, Claude, and Llama. Key changes include:

- **New Inference Methods**: Added options for external models in the evaluation workflow.
- **API Key Management**: Introduced command-line arguments for specifying API keys and model configurations.
- **Conditional Model Loading**: Updated the main evaluation logic to skip local model loading when using external models.
- **Checkpointing Improvements**: Enhanced checkpointing functionality to support overwriting the latest checkpoint file.

These updates significantly expand the evaluation options and flexibility of the Cobra Evaluation System, facilitating integration with various external AI models.
…ing scripts as well as my install requirements script.
…r_it

MMStar and accuracy eval added (& benchmarked for 1000 images on both COCO and MMStar)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants