Skip to content

AnAI: BigramHash(12288) + TrigramHash + int5/int6 QAT#427

Open
VeerGosai wants to merge 1 commit intoopenai:mainfrom
VeerGosai:anai-submission
Open

AnAI: BigramHash(12288) + TrigramHash + int5/int6 QAT#427
VeerGosai wants to merge 1 commit intoopenai:mainfrom
VeerGosai:anai-submission

Conversation

@VeerGosai
Copy link

Summary

  • TrigramHash embedding (4096 buckets, dim=64) — novel 3-gram context capture alongside existing bigram hashing
  • Larger BigramHash (12288 buckets, up from SOTA 10240) — monotonically reduces hash collisions
  • Aggressive SWA (start_frac=0.35, every=40 steps) — more checkpoints averaged for smoother quantization-friendly weights
  • 5% magnitude pruning (vs SOTA 3%) — zeroes more small weights for better zstd compression
  • Mixed int5 MLP / int6 attention quantization with zstd-22 compression

Architecture

10 layers, 512 dim, 8 heads, 4 KV heads (GQA), 3x MLP (hidden=1536), relu² activation, SmearGate, orthogonal init with muP output scaling, U-Net skip connections, tied embeddings (FP16), Muon optimizer (WD=0.04, momentum 0.92→0.99), sliding window eval (stride=64).

Test plan

  • Run 3 seeds (42, 1337, 2024) on 8×H100 SXM within 10-minute wall-clock
  • Verify val_bpb beats SOTA (1.1428) by ≥0.005 with p < 0.01
  • Confirm artifact size ≤ 16,000,000 bytes (code + compressed model)
  • Attach training logs for all 3 seeds

Note: Training logs will be added once 8×H100 runs are completed. Submitting code for review in parallel.

🤖 Generated with Claude Code

…6 QAT

Novel improvements over SOTA (1.1428 BPB):
- TrigramHash embedding (4096 buckets, dim=64) for 3-gram context
- Larger BigramHash (12288 vs 10240) for reduced collisions
- Aggressive SWA (start_frac=0.35, every=40)
- 5% magnitude pruning for better compression
- Mixed int5/int6 quantization with optional int4 MLP fc

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant