AnAI: BigramHash(12288) + TrigramHash + int5/int6 QAT by VeerGosai · Pull Request #427 · openai/parameter-golf

VeerGosai · 2026-03-22T14:34:36Z

Summary

TrigramHash embedding (4096 buckets, dim=64) — novel 3-gram context capture alongside existing bigram hashing
Larger BigramHash (12288 buckets, up from SOTA 10240) — monotonically reduces hash collisions
Aggressive SWA (start_frac=0.35, every=40 steps) — more checkpoints averaged for smoother quantization-friendly weights
5% magnitude pruning (vs SOTA 3%) — zeroes more small weights for better zstd compression
Mixed int5 MLP / int6 attention quantization with zstd-22 compression

Architecture

10 layers, 512 dim, 8 heads, 4 KV heads (GQA), 3x MLP (hidden=1536), relu² activation, SmearGate, orthogonal init with muP output scaling, U-Net skip connections, tied embeddings (FP16), Muon optimizer (WD=0.04, momentum 0.92→0.99), sliding window eval (stride=64).

Test plan

Run 3 seeds (42, 1337, 2024) on 8×H100 SXM within 10-minute wall-clock
Verify val_bpb beats SOTA (1.1428) by ≥0.005 with p < 0.01
Confirm artifact size ≤ 16,000,000 bytes (code + compressed model)
Attach training logs for all 3 seeds

Note: Training logs will be added once 8×H100 runs are completed. Submitting code for review in parallel.

🤖 Generated with Claude Code

…6 QAT Novel improvements over SOTA (1.1428 BPB): - TrigramHash embedding (4096 buckets, dim=64) for 3-gram context - Larger BigramHash (12288 vs 10240) for reduced collisions - Aggressive SWA (start_frac=0.35, every=40) - 5% magnitude pruning for better compression - Mixed int5/int6 quantization with optional int4 MLP fc Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AnAI: BigramHash(12288) + TrigramHash + int5/int6 QAT#427

AnAI: BigramHash(12288) + TrigramHash + int5/int6 QAT#427
VeerGosai wants to merge 1 commit intoopenai:mainfrom
VeerGosai:anai-submission

VeerGosai commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

VeerGosai commented Mar 22, 2026

Summary

Architecture

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant