Skip to content

mistral small#3

Open
sumo43 wants to merge 29 commits intodev-updatedfrom
mistral-small
Open

mistral small#3
sumo43 wants to merge 29 commits intodev-updatedfrom
mistral-small

Conversation

@sumo43
Copy link
Copy Markdown

@sumo43 sumo43 commented Jul 23, 2025

No description provided.

@sumo43 sumo43 marked this pull request as draft July 23, 2025 21:16
@sumo43 sumo43 marked this pull request as ready for review August 14, 2025 02:11
xrsrke added a commit that referenced this pull request Mar 27, 2026
New: enable_weight_offload config. Offloads expert weights to pinned
CPU after each layer's expert forward, reloads before next layer.
D2H overlaps with post-MoE attention. Handles FSDP DTensor via to_local().

30B-A3B + post-MoE attention, EP=8, batch=2, seq=4096, 8xB200:

| # | Config              | Memory   | TPS   | What offloaded          |
|---|---------------------|----------|-------|-------------------------|
| 1 | Baseline            | 162 GiB  | 6,149 | Nothing                 |
| 2 | Weight only         | 169 GiB  | 2,949 | Expert weights → CPU    |
| 3 | Activation only     | 144 GiB  | 3,330 | Expert acts (checkpoint)|
| 4 | Weight + Activation | 146 GiB  | 2,326 | Both                    |

Weight offload (#2) doesn't save memory yet because FSDP DTensor
.set_() doesn't actually free the original storage. Activation
offload (#3) saves 18 GiB via checkpoint. Combined (#4) saves 16 GiB.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants