Skip to content

[Feature] qwen3_32b_decode_mixed.py on A3/A5 platforms with good precision & performance #95

@zhangqi-chen

Description

@zhangqi-chen

Summary

Track the validation and performance of qwen3_32b_decode_mixed.py on both A3 and A5 platforms.

Motivation / Use Case

  1. The mixed-style decode kernel runs correctly on A3
  2. The mixed-style decode kernel runs correctly on A5
  3. Performance on both platforms is benchmarked and maintained at a competitive level
  4. Regressions are caught early as the compiler and runtime evolve

Proposed API / Behavior

  • Validate numerical correctness on A3
  • Validate numerical correctness on A5
  • Ensure performance stays competitive with scope-specific implementations

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions