Open
Conversation
…nd save BLEU scores with timestamps. Add new entries to .gitignore for output files.
Scratchpad Reasoning + Benchmark
…ctionality for clearing GPU memory and saving BLEU scores has been integrated into the main workflow.
Update notebook: Clear GPU RAM and save image of results
This commit introduces a comprehensive set of files and scripts for fine-tuning the Cobra VLM on the LLaVA-CoT-100k dataset. Key additions include: - **Dataset Preparation**: A script to download, validate, and prepare the dataset for training. - **Custom Dataset Loader**: A new loader that supports JSONL format and integrates with existing training infrastructure. - **Fine-Tuning Script**: A dedicated script for fine-tuning the model using the prepared dataset. - **Documentation**: Detailed guides and summaries for setup and usage. These changes enhance the model's reasoning capabilities by leveraging structured reasoning annotations in the dataset.
X Merge branch 'main' of https://github.com/AndrasFerenczy/thinking_cobra
This commit introduces the foundational structure for the Cobra Evaluation System, including: - **Main Module**: The entry point for running evaluations with command-line argument parsing. - **Configuration Management**: A dedicated module for handling CLI arguments and settings. - **Registry System**: A registry for managing generators and metrics, allowing for extensibility. - **Generators**: Implementations for baseline, scratchpad, and external generation methods. - **Metrics**: Initial implementations for BLEU and BERTScore metrics. - **Utilities**: Functions for GPU management, JSON I/O, and visualization of results. These changes establish a modular framework for evaluating visual language models with various inference strategies and metrics, enhancing the system's extensibility and usability.
This commit introduces significant improvements to the evaluation process, including: - **Method Comparison**: Added functionality to run and compare results from multiple methods (baseline and scratchpad) within the same evaluation session. - **Visualization Enhancements**: Implemented a new comparison visualization that displays results side-by-side for easier analysis of generated captions and metrics. - **BERTScore Metric Updates**: Enhanced the BERTScore metric to store per-sample scores, allowing for detailed performance analysis. - **Code Refactoring**: Cleaned up the main evaluation logic for better readability and maintainability. These changes improve the usability and analytical capabilities of the Cobra Evaluation System, facilitating more comprehensive evaluations of visual language models.
This commit introduces several improvements to the evaluation process, including: - **Dynamic Output Directories**: Results are now saved in timestamped directories for better organization, allowing users to easily manage multiple runs. - **Comparison Statistics**: Added functionality to compute and save comparison statistics between baseline and scratchpad methods, including win rates and metric differences. - **Visualization Updates**: Enhanced the comparison visualization to include detailed metrics and reasoning traces, improving the clarity of results. These changes enhance the usability and analytical capabilities of the Cobra Evaluation System, facilitating more comprehensive evaluations of visual language models.
This commit modifies the .gitignore file to ensure that shell scripts are ignored and removes the output.png file, which is no longer needed. These changes help streamline the project by keeping unnecessary files out of version control.
This commit modifies the .gitignore file to include the __pycache__ directory, ensuring that Python bytecode files are not tracked in version control. This helps maintain a cleaner project structure by excluding unnecessary files.
This commit introduces several new files and enhancements to the Cobra Evaluation System, including: - **New Scripts**: Added `analyze_significance.py`, `compare_scratchpad_passes.py`, and `visualize_scratchpad_passes.py` for analyzing and visualizing scratchpad performance across multiple passes. - **Checkpointing Guide**: Introduced `CHECKPOINTING_GUIDE.md` to document the new automatic checkpointing feature for long-running evaluations. - **Improved Documentation**: Added `SCRATCHPAD_COMPARE_MODE.md`, `SCRATCHPAD_DEGRADATION_ANALYSIS.md`, and `SCRATCHPAD_IMPROVEMENTS.md` to provide insights into scratchpad methods and their performance. - **New Data Files**: Included various JSON and PNG files for results and visualizations from recent evaluations. These changes enhance the analytical capabilities and usability of the evaluation system, facilitating better understanding and comparison of different methods in visual language models.
This commit introduces support for external model API clients, allowing users to run evaluations using models such as GPT-5, Gemini, Claude, and Llama. Key changes include: - **New Inference Methods**: Added options for external models in the evaluation workflow. - **API Key Management**: Introduced command-line arguments for specifying API keys and model configurations. - **Conditional Model Loading**: Updated the main evaluation logic to skip local model loading when using external models. - **Checkpointing Improvements**: Enhanced checkpointing functionality to support overwriting the latest checkpoint file. These updates significantly expand the evaluation options and flexibility of the Cobra Evaluation System, facilitating integration with various external AI models.
…ith 10,100,1000,10000 examples
…ing scripts as well as my install requirements script.
…r_it MMStar and accuracy eval added (& benchmarked for 1000 images on both COCO and MMStar)
… into feature/qlora-finetune
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Title
Add LoRA fine-tuning pipeline for Cobra VLM (
qlora_finetune)Summary
This PR introduces a self-contained LoRA fine-tuning pipeline for Cobra VLM, including training, inference, and reproducible environment setup. It enables quickly fine-tuning Cobra (
cobra+3betc.) on LLaVA-CoT-100k (or local JSONL data) and saving LoRA adapters under timestampedqlora_outputs_*folders for downstream use.Note: many files are still labeled as qlora because the original idea was to use QLoRA which adds quantization ontop of lora. Because we are using a MAMBA-based model this was not feasible, so we went with the normal strategy.
What’s Included
1. Environment & Dependencies
qlora_finetune/requirements.txttorch==2.1.0,torchvision==0.16.0,torchaudio==2.1.0,triton==2.1.0transformers==4.34.1,tokenizers>=0.14,<0.15,accelerate==0.26.1peft==0.7.1,bitsandbytes>=0.41.0,<0.43.0,trl==0.7.4,datasets>=2.14.0,<2.18.0einops,timm==0.9.10,wandb,jsonlines,rich,tqdm, etc.huggingface_hubvsdatasetsconflictstransformersvstrlAPI breakagestorchvsbitsandbytesbinary mismatchesqlora_finetune/install_requirements.sh./envvenv (Python 3.10).pip,setuptools,wheel,packaging.mamba-ssm<2.0.0with--no-build-isolationso it can see the already-installedtorch.2. Config & Dataset Handling
qlora_finetune/config.pyQLoRAConfigdataclass with:model_id,pretrained_checkpoint,hf_tokendataset_name,dataset_root,dataset_proportion,dataset_max_samples,dataset_seedoutput_dir,per_device_train_batch_size,gradient_accumulation_steps,learning_rate,num_train_epochs,max_steps, etc.fp16/bf16,logging_steps,save_steps,eval_steps,save_total_limitreport_to(default["wandb"]),wandb_project,wandb_entity__post_init__enforces sane defaults and value ranges, and resolves.hf_tokenfiles into actual tokens.qlora_finetune/dataset_loader.pyload_llava_cot_dataset(...):Xkev/LLaVA-CoT-100k) viaload_dataset(dataset_name, split=...)(notrust_remote_codeto stay compatible withdatasets 2.14.x).dataset_root / "train.jsonl".dataset_max_samples(absolute cap, takes precedence).dataset_proportion(fraction of the dataset).dataset_seed.format_for_sft(...):conversationsinto a single"text"field withUSER:/ASSISTANT:prefixes and injects an<image>token for the first user turn.3. Model Loading & QLoRA Preparation
qlora_finetune/model_loader.pyload_cobra_for_qlora(model_id, pretrained_checkpoint, hf_token, freeze_vision_encoder):model_id), orpretrained_checkpoint), using the corecobra.models.load.load.fused_add_norm, swaps RMSNorm for a safe LayerNorm to avoid Triton issues.prepare_model_for_qlora(model, target_modules, lora_r, lora_alpha, lora_dropout, lora_bias):in_proj,out_proj,x_proj,dt_proj, etc.).load_and_prepare_model(...):(vlm, llm_backbone_with_lora)ready for training.4. Training Script
qlora_finetune/train_qlora.pyMain training entrypoint with:
qlora_finetune+cobramodules work when run from different roots.main(config: QLoRAConfig):load_and_prepare_model.vlm.llm_backbone.tokenizerif available, orxiuyul/mamba-2.8b-zephyr/state-spaces/mamba-2.8b) as a fallback.load_llava_cot_dataset, applies sampling (dataset_max_samples/dataset_proportion), and maps to"text"format.TrainingArgumentswith:output_dir, LR, epochs, warmup, weight decay, grad norm, fp16/bf16, logging/save cadence, seed, dataloader workers,remove_unused_columns,report_to,run_name.max_steps,eval_steps.TrainingArguments, checks WandB availability:WANDB_API_KEYis not set and no stored login is found, automatically strips"wandb"fromreport_toand falls back to["none"], printing a warning."wandb"still present inreport_to.trl.SFTTrainerwith:dataset_text_field="text".DataCollatorForLanguageModeling(causal LM,pad_to_multiple_of=8).max_seq_length=512(explicitly set to reduce memory usage with big Mamba models).trainer.train(), saves final adapter + tokenizer toconfig.output_dir, and prints PEFT loading instructions.CLI (
if __name__ == "__main__":):--config,--model_id,--dataset_proportion,--dataset_max_samples,--output_dir,--hf_token,--per_device_train_batch_size,--gradient_accumulation_steps.--configJSON is provided and exists, initializesQLoRAConfigfrom it; otherwise uses defaults and CLI overrides.5. Inference Utilities
qlora_finetune/inference.pyload_qlora_model(base_model_id, lora_adapter_path, hf_token=None, merge_weights=False):cobra.models.load.load.lora_adapter_pathinto the LLM backbone viaPeftModel.from_pretrained.merge_and_unload).generate_with_qlora(model, vlm, image, prompt, max_new_tokens=512, temperature=0.7, do_sample=True):generateAPI with the fine-tuned weights.save_merged_model(model, output_path, tokenizer=None):6. Run Scripts
qlora_finetune/run_100samples.sh./envif present../qlora_outputs_100samples_<timestamp>.python train_qlora.py \ --model_id cobra+3b \ --dataset_max_samples 100 \ --output_dir "${OUTPUT_DIR}" \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 1(You’ve also derived
run_1000samples.sh,run_10000samples.shin your workspace following the same pattern.)7. Git Ignore
qlora_finetune/.gitignoreNow ignores QLoRA output dirs and local artifacts:
How to Use
One-time setup
cd qlora_finetune bash install_requirements.shFine-tune with 100 examples
Outputs go to a folder like:
Evaluate / Inference
Use
qlora_finetune/inference.pyas documented inqlora_finetune/README.md, e.g.:Notes / Trade-offs
mamba-ssm,datasets,huggingface_hub,trl,bitsandbytes,torch).wandb loginor setWANDB_API_KEY.report_tofalls back to["none"].max_seq_length=512,batch_size=1, and gradient checkpointing (where supported) are chosen to fit the 3B Mamba-based Cobra model on a single L4‑class GPU.Checklist
train_qlora.py).requirements.txt+install_requirements.sh.qlora_outputs_*, wandb logs) from Git.