mistral small by sumo43 · Pull Request #3 · NousResearch/torchtitan

sumo43 · 2025-07-23T21:16:18Z

No description provided.

…image filepaths)

…lity

New: enable_weight_offload config. Offloads expert weights to pinned CPU after each layer's expert forward, reloads before next layer. D2H overlaps with post-MoE attention. Handles FSDP DTensor via to_local(). 30B-A3B + post-MoE attention, EP=8, batch=2, seq=4096, 8xB200: | # | Config | Memory | TPS | What offloaded | |---|---------------------|----------|-------|-------------------------| | 1 | Baseline | 162 GiB | 6,149 | Nothing | | 2 | Weight only | 169 GiB | 2,949 | Expert weights → CPU | | 3 | Activation only | 144 GiB | 3,330 | Expert acts (checkpoint)| | 4 | Weight + Activation | 146 GiB | 2,326 | Both | Weight offload (#2) doesn't save memory yet because FSDP DTensor .set_() doesn't actually free the original storage. Activation offload (#3) saves 18 GiB via checkpoint. Combined (#4) saves 16 GiB.

add new attn

45ea34d

sumo43 marked this pull request as draft July 23, 2025 21:16

Artem Yatsenko and others added 17 commits July 23, 2025 22:40

fix imports

d0c4cba

jul 24 changes

32c18cc

works but ooms?

3fb558c

temp remove vision encoder

b508dea

final

71839a6

Merge branch 'dev-updated' into mistral-small

7fde526

add mistral conversion

7e50705

fix checkpoint loading, add finetuning script mirroring qwen

8b3fef8

cleanup of mistral3 code

0d43ccb

pt 1 multimodal sample packing preprocessing func

d1d7673

add packing with images

e3a0056

hf preprocess instead of mistral

dc8f6cd

add back vision encoder, make change to preprocess (we now only keep …

cba826e

…image filepaths)

add multimodal packed ds, update to conversion script

ad0e63e

add instructions, set better default configs

c11f1dd

limit for testing preproc multimodal

46cb25e

update readme

a9fedf5

sumo43 marked this pull request as ready for review August 14, 2025 02:11

Artem Yatsenko and others added 10 commits August 14, 2025 02:28

add interleaved packed ds

b486534

add interleaved text-image and textonly preprocess script & functiona…

84b2b7c

…lity

bugfix: limit keyword in multimodal data prprocess script

8c079d7

add conversion script back for mistral

24d6a7c

fix freqs_cis bug. add more configs

6968436

update gitignore

f252e0d

small changes to scripts

220d7c8

remove junk and update gitignore

0a9f59b

fix VLM embedding with TP

f1e4890

temp fix for vision encoder loading in TP context

2761c2c

nvidia VLM dataset support

b8a4c0b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mistral small#3

mistral small#3
sumo43 wants to merge 29 commits intodev-updatedfrom
mistral-small

sumo43 commented Jul 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sumo43 commented Jul 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants