feat: Add Andreas' first test submission by AndreasBis · Pull Request #419 · openai/parameter-golf

AndreasBis · 2026-03-22T12:14:26Z

No description provided.

Copilot

Pull request overview

Adds a new non-record 16MB “first submission” entry under records/track_non_record_16mb, including the training script used, the captured training log, and the metadata needed to register the result.

Changes:

Add a standalone training script (train_gpt.py) for this submission variant (10-layer model w/ bigram hash, SWA, mixed int5/int6 quantization).
Add the full training output log (train.log) to document the run and reported metrics/sizes.
Add submission metadata + minimal documentation (submission.json, README.md) and a local requirements.txt.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
records/track_non_record_16mb/first-submission/train_gpt.py	Training + quantization script for the submission artifact.
records/track_non_record_16mb/first-submission/train.log	Captured run log (includes code snapshot + metrics + artifact sizes).
records/track_non_record_16mb/first-submission/submission.json	Declares the submission’s reported metric(s), size, author, and date.
records/track_non_record_16mb/first-submission/requirements.txt	Dependency list for reproducing the run.
records/track_non_record_16mb/first-submission/README.md	Minimal explanation of what the submission is and how to set up deps.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-22T12:18:42Z

records/track_non_record_16mb/first-submission/train_gpt.py

+INT8_KEEP_FLOAT_FP32_NAME_PATTERNS = tuple(
+    pattern
+    for pattern in os.environ.get(
+        "INT8_KEEP_FLOAT_FP32_NAME_PATTERNS",
+        ",".join(CONTROL_TENSOR_NAME_PATTERNS),
+    ).split(",")
+    if pattern
+)
+INT8_KEEP_FLOAT_MAX_NUMEL = 65_536
+INT8_KEEP_FLOAT_STORE_DTYPE = torch.float16
+INT8_PER_ROW_SCALE_DTYPE = torch.float16
+INT8_CLIP_PERCENTILE = 99.99984
+INT8_CLIP_Q = INT8_CLIP_PERCENTILE / 100.0
+
+def tensor_nbytes(t: Tensor) -> int:
+    return int(t.numel()) * int(t.element_size())
+


There are several quantization-related constants and helpers defined here that are never used in this script (e.g. INT8_KEEP_FLOAT_FP32_NAME_PATTERNS, INT8_KEEP_FLOAT_MAX_NUMEL, INT8_KEEP_FLOAT_STORE_DTYPE, tensor_nbytes). Consider removing unused definitions to reduce file size/complexity, especially since this script is intended to stay approachable for newcomers.

Suggested change

INT8_KEEP_FLOAT_FP32_NAME_PATTERNS = tuple(

pattern

for pattern in os.environ.get(

"INT8_KEEP_FLOAT_FP32_NAME_PATTERNS",

",".join(CONTROL_TENSOR_NAME_PATTERNS),

).split(",")

if pattern

)

INT8_KEEP_FLOAT_MAX_NUMEL = 65_536

INT8_KEEP_FLOAT_STORE_DTYPE = torch.float16

INT8_PER_ROW_SCALE_DTYPE = torch.float16

INT8_CLIP_PERCENTILE = 99.99984

INT8_CLIP_Q = INT8_CLIP_PERCENTILE / 100.0

def tensor_nbytes(t: Tensor) -> int:

return int(t.numel()) * int(t.element_size())

INT8_PER_ROW_SCALE_DTYPE = torch.float16

INT8_CLIP_PERCENTILE = 99.99984

INT8_CLIP_Q = INT8_CLIP_PERCENTILE / 100.0

Copilot · 2026-03-22T12:18:42Z

records/track_non_record_16mb/first-submission/train_gpt.py

+    def next_batch(self, global_tokens: int, seq_len: int, grad_accum_steps: int) -> tuple[Tensor, Tensor]:
+        local_tokens = global_tokens // (self.world_size * grad_accum_steps)
+        per_rank_span = local_tokens + 1
+        chunk = self.stream.take(per_rank_span * self.world_size)
+        start = self.rank * per_rank_span
+        local = chunk[start : start + per_rank_span].to(dtype=torch.int64)
+        x = local[:-1].reshape(-1, seq_len)
+        y = local[1:].reshape(-1, seq_len)
+        return x.to(self.device, non_blocking=True), y.to(self.device, non_blocking=True)


DistributedTokenLoader.next_batch() reshapes local[:-1] to (-1, seq_len), but local_tokens = global_tokens // (world_size * grad_accum_steps) is not validated to be a multiple of seq_len. If TRAIN_BATCH_TOKENS is set to a value that isn’t divisible by WORLD_SIZE * GRAD_ACCUM_STEPS * TRAIN_SEQ_LEN, this will raise a runtime reshape error. Add an explicit validation (and a clear ValueError) that local_tokens % seq_len == 0 (and ideally that global_tokens is divisible by world_size * grad_accum_steps) before reshaping.

Copilot · 2026-03-22T12:18:43Z

records/track_non_record_16mb/first-submission/train_gpt.py

+    def __init__(self, bigram_vocab_size: int, bigram_dim: int, model_dim: int):
+        super().__init__()
+        self.bigram_vocab_size = bigram_vocab_size
+        self.embed = nn.Embedding(bigram_vocab_size, bigram_dim)
+        nn.init.zeros_(self.embed.weight)
+        self.proj = CastedLinear(bigram_dim, model_dim, bias=False) if bigram_dim != model_dim else None
+        if self.proj is not None:
+            nn.init.zeros_(self.proj.weight)
+        self.scale = nn.Parameter(torch.tensor(0.05, dtype=torch.float32))
+
+    def bigram_hash(self, tokens: Tensor) -> Tensor:
+        t = tokens.to(torch.int32)
+        mod = self.bigram_vocab_size - 1
+        out = torch.empty_like(t)
+        out[..., 0] = mod
+        out[..., 1:] = torch.bitwise_xor(36313 * t[..., 1:], 27191 * t[..., :-1]) % mod
+        return out.long()


BigramHashEmbedding.bigram_hash() uses mod = self.bigram_vocab_size - 1 and then does % mod. If bigram_vocab_size is set to 1 (or 0, though the module is only constructed for >0), this becomes a modulo-by-zero error. Add input validation in BigramHashEmbedding.__init__ (or when reading hyperparameters) to require bigram_vocab_size >= 2 when enabled.

Copilot · 2026-03-22T12:18:43Z

records/track_non_record_16mb/first-submission/submission.json

+{
+  "name": "Andreas Test Submission",
+  "val_loss": 1.14276,
+  "bytes_total": 15900000,
+  "blurb": "Test submission based on the first place solution (10L Int5-MLP).",
+  "author": "Andreas",
+  "github_id": "andreasbis",
+  "date": "2026-03-22"
+}


This submission.json format doesn’t match the other entries in records/track_non_record_16mb, which include track and typically record both val_loss and val_bpb. Also, val_loss here is set to ~1.14, but the accompanying train.log shows val_loss ~1.929 and val_bpb ~1.143 after quant roundtrip, so the metric appears mislabeled. Recommend adding track (e.g. non-record-16mb), adding val_bpb, and setting val_loss/bytes_total to the exact values emitted by the training log.

Copilot · 2026-03-22T12:18:43Z

records/track_non_record_16mb/first-submission/train_gpt.py

+# fixes applied
+# tuned


The trailing # fixes applied / # tuned comments look like leftover scratch notes rather than explanatory documentation. Consider removing them or replacing with a brief, specific note about what was changed/tuned (or link to the relevant result) so the file doesn’t accumulate ambiguous annotations.

Suggested change

# fixes applied

# tuned

feat: Add Andreas' first test submission

40f6628

Copilot AI review requested due to automatic review settings March 22, 2026 12:14

Copilot started reviewing on behalf of AndreasBis March 22, 2026 12:14 View session

Copilot AI reviewed Mar 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add Andreas' first test submission#419

feat: Add Andreas' first test submission#419
AndreasBis wants to merge 1 commit intoopenai:mainfrom
AndreasBis:main

AndreasBis commented Mar 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 22, 2026

Uh oh!

Copilot AI Mar 22, 2026

Uh oh!

Copilot AI Mar 22, 2026

Uh oh!

Copilot AI Mar 22, 2026

Uh oh!

Copilot AI Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AndreasBis commented Mar 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants