Radial bitnet submission by rthgit · Pull Request #435 · openai/parameter-golf

rthgit · 2026-03-22T16:31:21Z

⛳ 16MB Track Submission: Radial-BitNet (15.6M Parameters)
This PR submits an official entry for the 10-Minute / 16MB Parameter Golf track, achieving a final validation of 2.6034 BPB heavily constrained by a pure ternary-weight architecture.

The approach aggressively deviates from standard FP16 LLM baselines to maximize parameter volume per megabyte, utilizing my proprietary architectural designs (Radial Encoding and FRO) to push performance beyond conventional limits.

🧠 Proprietary Architecture & Innovations
Weight-Only BitNet (W1.58b / A16b): Every linear projection inside the Attention and MLP layers relies on strict ternary matrices (-1, 0, 1) internally explicitly during the forward pass. This allows my 15.6M parameter model to confidently scale capacity while compressing losslessly under Zstandard down to just 12.55 MB, clearing the 16MB limit by over 3.4 Megabytes.
Radial Positional Bypass (Proprietary Design): The architecture features exactly Zero learned sequential positional embeddings. Instead, I integrated my custom
RadialEncoding
algorithm. It mathematically computes Euler-based spatial frequencies over $\phi$ (Golden Ratio) and directly injects this geometric signal into the token embedding, freeing up massive parameter space.
Fractal Resonant Optimization - FRO (Proprietary Optimizer): To handle the shattered gradient momentum of step-quantized ternary weights within the strict 10-minute constraint, I am introducing my custom built-in optimizer: FRO. It enforces extreme early convergence through multi-scale resonance alignment, dramatically outperforming AdamW in this constrained environment.
⚙️ Hyperparameters & Configuration
Parameters: 15,600,000 (~12.55MB Compressed)
Shape: 12 Layers | 384 Model Dimension | 6 Q-Heads | 2 KV-Heads
Environment Execution: Kaggle Dual T4 GPUs (Falling back dynamically to native float16 to bypass Turing-architecture bfloat16 emulation bloating).
Train Sequence Length: 1024
Batch Size: 4 (Scale-Invariant deep accumulation logic)
📉 Final 10-Minute Convergence Log Snippet
text
Step 1100 | Time 495s | Train Loss: 6.1011 | Val BPB: 2.6244 ⛳
Step 1150 | Time 518s | Train Loss: 5.9497 | Val BPB: 2.5936 ⛳
Step 1200 | Time 540s | Train Loss: 6.0033 | Val BPB: 2.6363 ⛳
Step 1250 | Time 562s | Train Loss: 5.8162 | Val BPB: 2.5498 ⛳
⏰ 10-Minute training time budget exhausted. Validating final model...
FINAL RESULT | Val BPB: 2.6034 🏆

Christian Quintino De Luca added 2 commits March 22, 2026 17:22

Submit 15.6M Radial-BitNet parameter golf tracking record (2.6034 BPB)

756f0ad

Update README with accurate 15.6M architecture parameters

277c34c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Radial bitnet submission#435

Radial bitnet submission#435
rthgit wants to merge 2 commits intoopenai:mainfrom
rthgit:radial-bitnet-submission

rthgit commented Mar 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rthgit commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rthgit commented Mar 22, 2026 •

edited

Loading