openai · rthgit · Mar 22, 2026 · Mar 22, 2026 · Mar 23, 2026 · Mar 23, 2026
diff --git a/records/track_10min_16mb/2026-03-22_RadialBitNet/README.md b/records/track_10min_16mb/2026-03-22_RadialBitNet/README.md
@@ -0,0 +1,81 @@
+# Radial-BitNet 16MB Titan
+
+This submission presents an experimental compressed language-model design for the Parameter Golf 16MB track.
+
+The approach combines:
+- BitNet-style ternary-weight linear projections,
+- a custom positional scheme called **Radial Encoding**,
+- a custom optimizer called **FRO (Fractal Resonant Optimization)**,
+- compressed post-training export under the official artifact-size accounting rule.
+
+This is a public experimental submission intended to demonstrate a non-standard architecture under the Parameter Golf constraints. The attached reported result was obtained from a development run on non-target hardware. No claim is made in this README that the reported score has already been reproduced under the official 8xH100 SXM record-track environment.
+
+## Summary
+
+The goal of this design is to push model capacity as far as possible under the official submission artifact limit by combining:
+- ternary-style projection behavior for major linear layers,
+- reduced learned overhead,
+- tied embeddings,
+- compressed final export,
+- a training setup optimized for short wall-clock execution.
+
+Rather than following a conventional FP16 baseline recipe, this submission explores a more aggressive compression-oriented design.
+
+## Key Ideas
+
+### 1. BitLinear Expansion
+All major projections (`Q`, `K`, `V`, `O`, and MLP projections) use BitNet-style ternary-weight forward behavior. The purpose is to reduce effective storage pressure while preserving as much model width and depth as possible within the artifact budget.
+
+### 2. Radial Encoding
+Learned positional embeddings are removed. Instead, position-dependent geometric features are injected analytically through `RadialEncoding(8)`. This reduces learned parameter overhead while retaining explicit positional structure.
+
+### 3. FRO Optimizer
+`FRO` is a custom optimizer designed for short-horizon convergence under highly quantized weight dynamics. It replaces AdamW in this submission and is part of the experimental contribution.
+
+## Configuration
+
+- **Layers:** 12  
+- **Model Dimension:** 384  
+- **Attention Heads:** 6  
+- **KV Heads:** 2  
+- **Vocabulary Size:** 1024  
+- **Approximate Parameter Count:** 15.6M  
+
+## Artifact Accounting
+
+The submission script performs a post-training artifact audit using:
+- counted source-code bytes from `train_gpt.py`
+- compressed exported model bytes
+- a final decimal-byte check against the official `16,000,000` byte submission limit
+
+The audit is performed after training and writes the compressed model artifact physically to disk before measuring its byte size.
+
+## Evaluation
+
+The script implements tokenizer-agnostic BPB evaluation over the official validation shard format used by the challenge. In record-track mode, the script is designed to fail explicitly if required tokenizer or dataset files are missing.
+
+Mock or debug behavior is only enabled when explicitly requested through environment flags.
+
+## Reproducibility Notes
+
+`train_gpt.py` is designed to:
+- support distributed execution,
+- run with explicit record-track failure behavior when required assets are missing,
+- produce a final post-training artifact audit,
+- run final validation before reporting the final result.
+
+## Development Status
+
+The result currently attached to this submission comes from a development run on non-target hardware. This repository entry is intended as a serious experimental submission and as a candidate for further validation under the official challenge hardware setting.
+
+## Files Included
+
+This submission includes:
+- `README.md`
+- `submission.json`
+- `train.log`
+- `train_gpt.py`
+
+## Notes
+
+This submission should be interpreted as an experimental compressed-model approach, not as a claim of already-verified record-track performance on 8xH100 SXM.
diff --git a/records/track_10min_16mb/2026-03-22_RadialBitNet/submission.json b/records/track_10min_16mb/2026-03-22_RadialBitNet/submission.json
@@ -0,0 +1,8 @@
+{
+    "author": "Christian Q. De Luca",
+    "github_id": "rthgit",
+    "val_bpb": "2.6034",
+    "model_size": "13100000",
+    "hardware": "Kaggle Dual T4 (development run)",
+    "training_time": "562s"
+}
diff --git a/records/track_10min_16mb/2026-03-22_RadialBitNet/train.log b/records/track_10min_16mb/2026-03-22_RadialBitNet/train.log
@@ -0,0 +1,39 @@
+✨ Initializing Radial-BitNet for Parameter Golf (Constraint: 16MB)
+
+📦 Artifact Size Audit:
+- Parameters: 15.7 M
+- Compressed Size: 12.55 MB
+✅ QUALIFIED FOR PARAMETER GOLF! (<16MB)
+⏳ Loading training tokens into memory...
+Loading single dataset shard to protect Kaggle RAM: /kaggle/working/parameter-golf/data/datasets/fineweb10B_sp1024/fineweb_train_000000.bin
+
+🚀 Starting 10-Minute Rapid Convergence Cycle on real dataset...
+Step 0000 | Time 1s | Train Loss: 148.7086 | Val BPB: 28.1633 ⛳
+Step 0050 | Time 23s | Train Loss: 9.2294 | Val BPB: 3.9549 ⛳
+Step 0100 | Time 46s | Train Loss: 6.5566 | Val BPB: 2.8735 ⛳
+Step 0150 | Time 69s | Train Loss: 6.2854 | Val BPB: 2.7771 ⛳
+Step 0200 | Time 91s | Train Loss: 6.6208 | Val BPB: 2.7590 ⛳
+Step 0250 | Time 114s | Train Loss: 6.1678 | Val BPB: 2.6836 ⛳
+Step 0300 | Time 136s | Train Loss: 6.2128 | Val BPB: 2.6946 ⛳
+Step 0350 | Time 159s | Train Loss: 6.1435 | Val BPB: 2.6694 ⛳
+Step 0400 | Time 181s | Train Loss: 6.0490 | Val BPB: 2.7252 ⛳
+Step 0450 | Time 204s | Train Loss: 6.2580 | Val BPB: 2.6844 ⛳
+Step 0500 | Time 226s | Train Loss: 6.7366 | Val BPB: 2.6667 ⛳
+Step 0550 | Time 249s | Train Loss: 6.1070 | Val BPB: 2.6770 ⛳
+Step 0600 | Time 271s | Train Loss: 6.1023 | Val BPB: 2.6680 ⛳
+Step 0650 | Time 294s | Train Loss: 7.1158 | Val BPB: 2.6698 ⛳
+Step 0700 | Time 316s | Train Loss: 6.1919 | Val BPB: 2.6919 ⛳
+Step 0750 | Time 338s | Train Loss: 6.2160 | Val BPB: 2.6585 ⛳
+Step 0800 | Time 361s | Train Loss: 6.1988 | Val BPB: 2.6854 ⛳
+Step 0850 | Time 383s | Train Loss: 6.2080 | Val BPB: 2.6751 ⛳
+Step 0900 | Time 406s | Train Loss: 6.1793 | Val BPB: 2.6787 ⛳
+Step 0950 | Time 428s | Train Loss: 6.1073 | Val BPB: 2.6438 ⛳
+Step 1000 | Time 450s | Train Loss: 6.0260 | Val BPB: 2.6274 ⛳
+Step 1050 | Time 473s | Train Loss: 6.0984 | Val BPB: 2.6307 ⛳
+Step 1100 | Time 495s | Train Loss: 6.1011 | Val BPB: 2.6244 ⛳
+Step 1150 | Time 518s | Train Loss: 5.9497 | Val BPB: 2.5936 ⛳
+Step 1200 | Time 540s | Train Loss: 6.0033 | Val BPB: 2.6363 ⛳
+Step 1250 | Time 562s | Train Loss: 5.8162 | Val BPB: 2.5498 ⛳
+
+⏰ 10-Minute training time budget exhausted. Validating final model...
+FINAL RESULT | Val BPB: 2.6034 🏆