Skip to content

efficient recurrent gpt#423

Open
GITHUB-SUBASHk wants to merge 3 commits intoopenai:mainfrom
GITHUB-SUBASHk:main
Open

efficient recurrent gpt#423
GITHUB-SUBASHk wants to merge 3 commits intoopenai:mainfrom
GITHUB-SUBASHk:main

Conversation

@GITHUB-SUBASHk
Copy link

key features

  1. Recurrent transformer style architeure
  2. low rank - attention projections
  3. rotary positional emebeddings (RoPE)
  4. SwiGLU feed forward network
  5. token frequency weighted loss
  6. curriculum training (64-256)
  7. adaptive Ema smoothing

Initial implementation of a recurrent block model with attention and feedforward layers, including data loading, training utilities, and loss computation.
Expanded the README to provide detailed information about the Efficient Recurrent GPT model, including architecture, training strategy, model size, results, compliance, and structure.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant