I used this script to fine tune LLama 3 (from AnswerAI blog post), what I'm left with is a state dict that I am unable to use to replace layers in the original model following the Converting the State Dict.ipynb notebook. Since it does not work (KeyError with mismatching key names of tensors/new_sd), how does one obtain a model from this state dict?
export CUDA_VISIBLE_DEVICES=0,1
python fsdp_qlora/train.py \
--train_type bnb_dora \
--model_name meta-llama/Meta-Llama-3-8B \
--dataset orca_math \
--dataset_samples 10000 \
--batch_size 4 \
--context_length 2048 \
--gradient_accumulation_steps 2 \
--sharding_strategy full_shard \
--use_gradient_checkpointing true \
--reentrant_checkpointing true \
--use_cpu_offload false \
--use_activation_cpu_offload false \
--log_to wandb \
--project_name "fsdp-quantized-ft-exps" \
--save_model true \
--output_dir models/Llama-3-8b-orca-math-10k-bnb-QDoRA
I used this script to fine tune LLama 3 (from AnswerAI blog post), what I'm left with is a state dict that I am unable to use to replace layers in the original model following the Converting the State Dict.ipynb notebook. Since it does not work (KeyError with mismatching key names of tensors/new_sd), how does one obtain a model from this state dict?