[Perf] Split scope1 projection accumulation in Qwen3 decode example

## Summary

Update the Qwen3 scope1 decode projection path to split Q/K/V accumulation into per-hidden-block matmuls in CUBE followed by a separate reduction in VEC.

## Motivation / Use Case

The current scope1 implementation on still performs Q, K, and V projection accumulation inside a single incore region(CUBE core) with repeated `pl.matmul_acc` calls.

We can replace that pattern with:
1. per-hidden-block `pl.matmul(...)` results written into preallocated partial buffers, and
2. a second incore pass that reduces those partials with `pl.add(...)`.

This keeps the scope1 implementation more explicit, avoids a long single-incore accumulation chain, and makes scope1 more consistent with the recent Qwen3 decode refactoring direction already happening in the repository.

## Proposed API / Behavior

No public API change is needed.

In `examples/models/qwen3/qwen3_32b_decode_scope1.py`, update `build_decode_projection_program()` so that:
- `q_partial`, `k_partial`, and `v_partial` are preallocated before the batch loop
- each hidden block computes its own `pl.matmul(...)` result
- partial results are assembled into the corresponding temporary buffer
- accumulation is done in a separate incore block using `pl.full(..., value=0.0)` plus repeated `pl.add(...)`
- the final `q_proj`, `k_proj`, and `v_proj` outputs keep the same shapes and function signature as today

## Alternatives Considered

- Keep the current single-incore `matmul_acc` implementation
- Consider exchange the N and K dimension in the loop
- Tune the tiling size of K

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Perf] Split scope1 projection accumulation in Qwen3 decode example #81

Summary

Motivation / Use Case

Proposed API / Behavior

Alternatives Considered

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Perf] Split scope1 projection accumulation in Qwen3 decode example #81

Description

Summary

Motivation / Use Case

Proposed API / Behavior

Alternatives Considered

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions