Skip to content

Systematic throughput benchmarking to identify MLA-o advantages #8

@chrisjmccormick

Description

@chrisjmccormick

Current experiments don't show throughput benefits for MLA-o. We need systematic benchmarking to find the model scales where computational benefits emerge.

Current Status: No speed difference observed even at sequence length 1024

Tasks:

  • Create isolated attention layer benchmarking script
  • Test various configurations:
    • Number of heads (current: 8, try: 12, 16, 24)
    • Head sizes (current: 32, try: 64, 128)
    • Sequence lengths (128, 512, 1024, 2048, 4096)
    • Hidden dimensions
  • Benchmark throughput:
    • Training - for, e.g., 1K steps.
    • Inference - just use random weights.
  • Document the crossover points where MLA-o becomes faster

Hypothesis: Head count and size matter for performance benefits to appear.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions