efficient recurrent gpt by GITHUB-SUBASHk · Pull Request #423 · openai/parameter-golf

GITHUB-SUBASHk · 2026-03-22T13:36:35Z

key features

Recurrent transformer style architeure
low rank - attention projections
rotary positional emebeddings (RoPE)
SwiGLU feed forward network
token frequency weighted loss
curriculum training (64-256)
adaptive Ema smoothing

Initial implementation of a recurrent block model with attention and feedforward layers, including data loading, training utilities, and loss computation.

Expanded the README to provide detailed information about the Efficient Recurrent GPT model, including architecture, training strategy, model size, results, compliance, and structure.

GITHUB-SUBASHk added 3 commits March 22, 2026 18:47

Add initial README for Pgs-1-test

8893b28

Add recurrent block model with training utilities

65c34ab

Initial implementation of a recurrent block model with attention and feedforward layers, including data loading, training utilities, and loss computation.

Revise README for Efficient Recurrent GPT model

403e9bd

Expanded the README to provide detailed information about the Efficient Recurrent GPT model, including architecture, training strategy, model size, results, compliance, and structure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

efficient recurrent gpt#423

efficient recurrent gpt#423
GITHUB-SUBASHk wants to merge 3 commits intoopenai:mainfrom
GITHUB-SUBASHk:main

GITHUB-SUBASHk commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

GITHUB-SUBASHk commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant