Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Python cache/artifacts
__pycache__/
*.pyc

# Local datasets and weights
AudioVisualText/AVE_data/
AudioVisualText/pre-trained/
AudioVisualText/google-bert-base-uncased/

# Local outputs/logs
AudioVisualText/results/
AudioVisualText/slurm-*.out
AudioVisualText/slurm-*.err

# Local helper scripts
AudioVisualText/run_infer_ave.sbatch
AudioVisualText/run_ft_ave.sbatch
91 changes: 91 additions & 0 deletions AudioVisualText/README_CHANGES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# AudioVisualText Local Changes (Run/Eval/Analysis)

This document summarizes the practical changes made to run AVE finetuning locally, evaluate results, and add gradient-sensitivity analysis.

## 1) Runtime/Path Updates

- Updated local checkpoint paths in finetune and inference scripts:
- `AudioVisualText/scripts/finetune/ft_ave.sh`
- `AudioVisualText/scripts/finetune/infer_ave.sh`
- Added dynamic GPU process detection:
- `NPROC_PER_NODE=${NPROC_PER_NODE:-$(nvidia-smi -L 2>/dev/null | wc -l)}`
- Fallback to `1` when detection returns `0`.
- Set AVE scripts to use local weights under:
- `/nethome/rkhan96/flash/weights/...`

## 2) Precision/Compatibility Fixes

- Training and inference were aligned to FP32 (`bf16=False`) to avoid dtype mismatch issues observed with BF16 on this setup.
- `AudioVisualText/deepspeed/stage2-offload.json`
- `bf16.enabled` set to `false`.

## 3) Gradient Sensitivity Instrumentation

### Config Flags

- Added training flags in:
- `AudioVisualText/configs/unified_config.py`
- New fields:
- `grad_sensitivity_enable`
- `grad_sensitivity_include_projectors`

### Trainer Logging

- Extended `UnifiedTrainer` in:
- `AudioVisualText/trainer.py`
- Added per-step logging for:
- `lora_A_text`, `lora_A_visual`, `lora_A_audio`, `lora_B_shared`
- optional `vl_projector`, `al_projector`
- Logged metrics:
- `*_grad_norm`
- `*_param_norm`
- `*_relative_grad_norm`
- `*_num_params`
- Output file:
- `<output_dir>/grad_sensitivity.jsonl`

### DeepSpeed/ZeRO Reliability Fix

- Initial implementation used `param.grad` and produced near-zero gradients in logs.
- Updated implementation to use parameter backward hooks to accumulate grad norms reliably under DeepSpeed.

## 4) Script Controls for Clean Runs

- Updated `AudioVisualText/scripts/finetune/ft_ave.sh` run naming:
- If `RUN_NAME` is set, use it.
- Else if `GRAD_SENS_RUN=1`, use `llama_ave_gradsens`.
- Else use `llama_ave`.
- This avoids accidental resume collisions with existing checkpoint directories.

## 5) Evaluation Summary (AVE)

- Finetune result (3 epochs) was reproduced near paper-level:
- AVE accuracy: **77.24%**
- Reported reference: **77.06%**
- Parse-valid samples differed due to format strictness in evaluator:
- Local run: `394/402`
- Reference: `397/402`

## 6) Gradient Analysis Artifacts

### Analysis Script

- Added:
- `AudioVisualText/scripts/analysis/plot_grad_sensitivity.py`
- Script outputs:
- `grad_sensitivity_long.csv`
- `grad_sensitivity_summary.csv`
- PNG plots (if `matplotlib` is installed):
- `lora_grad.png`
- `lora_rel.png`
- `projector_rel.png`

### Current Analysis Output Location

- `AudioVisualText/results/finetune/llama_ave_gradsens_v2/analysis/`

## 7) Notes on Job Interruptions

- One long run was preempted by scheduler, but partial gradient logs were captured.
- The partial `grad_sensitivity.jsonl` still confirms non-zero gradient signals after hook-based fix.

18 changes: 18 additions & 0 deletions AudioVisualText/configs/unified_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,18 @@ class TrainingArguments(transformers.TrainingArguments):
## my
reserved_modality: str = field(default=None)
loramethod: str = field(default=None)
cross_attn_kv_mode: str = field(
default="question",
metadata={"help": "Cross-attn KV source for LoRA branches: question or full_text."},
)
cross_modal_mode: str = field(
default="trilinear",
metadata={"help": "Cross-modal fusion mode for LoRA: pairwise or trilinear."},
)
trilinear_pack_tokens: bool = field(
default=False,
metadata={"help": "If True, compact active tokens before Triton trilinear attention."},
)
blc_alpha: float = field(default=0.5)
blc_weight: float = field(default=0.5)

Expand All @@ -106,3 +118,9 @@ class TrainingArguments(transformers.TrainingArguments):
save_modules: str = field(default='vl_projector,al_projector,lora')

exp_desc: str = field(default='exp')

# Gradient sensitivity analysis toggles.
# When enabled, UnifiedTrainer logs per-modality gradient statistics for
# LoRA branches (text/visual/audio A, shared B) and optional projectors.
grad_sensitivity_enable: bool = field(default=False)
grad_sensitivity_include_projectors: bool = field(default=True)
1 change: 0 additions & 1 deletion AudioVisualText/dataset/unified_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,6 @@ def __init__(

print(f'tot training sample nums: {self.tot}')


def add_avqa_task_samples(self):
avqa_annotation_path = 'MUSIC_AVQA_data/train_samples_with_reasoning_avqa.json'
tot = 0
Expand Down
51 changes: 51 additions & 0 deletions AudioVisualText/deepspeed/stage2-offload-torch25.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
{
"optimizer": {
"type": "AdamW",
"params": {
"lr": "auto",
"betas": "auto",
"eps": "auto",
"weight_decay": "auto",
"torch_adam": true
}
},
"scheduler": {
"type": "WarmupDecayLR",
"params": {
"total_num_steps": "auto",
"warmup_min_lr": "auto",
"warmup_max_lr": "auto",
"warmup_num_steps": "auto"
}
},
"bf16": {
"enabled": "auto"
},
"fp16": {
"enabled": false,
"loss_scale": 0,
"initial_scale_power": 16,
"loss_scale_window": 1000,
"hysteresis": 2,
"min_loss_scale": 1
},
"zero_optimization": {
"stage": 2,
"allgather_partitions": true,
"allgather_bucket_size": 100000000.0,
"reduce_scatter": true,
"reduce_bucket_size": 100000000.0,
"overlap_comm": true,
"contiguous_gradients": true,
"offload_optimizer": {
"device": "cpu"
},
"round_robin_gradients": true
},
"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"steps_per_print": 2000,
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto",
"wall_clock_breakdown": false
}
2 changes: 1 addition & 1 deletion AudioVisualText/deepspeed/stage2-offload.json
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
}
},
"bf16": {
"enabled": "auto",
"enabled": true,
"loss_scale": 0,
"initial_scale_power": 16,
"loss_scale_window": 1000,
Expand Down
Binary file added AudioVisualText/docs/moka.pdf
Binary file not shown.
Binary file added AudioVisualText/docs/moka_math_from_code.pdf
Binary file not shown.
Loading