Proposed PR branch: qe/issue-08-work-area-size-apis
Category: Feature Request (performance + usability)
Problem
- Callers need deterministic work-area sizing for LSTM/GRU to preallocate and
avoid runtime allocator overhead in hot loops.
Justification / why this is needed
- Repeated allocation/free in inference loops increases tail latency and
wastes CPU time under load.
- A sizing helper enables a clean, deterministic preallocation flow for
frameworks (llama.cpp style loops, ORT EPs, etc.).
- This is additive/opt-in: callers can keep current behavior.
Proposed change
- Add
zdnn_get_lstm_work_area_size(...) and zdnn_get_gru_work_area_size(...).
- Add tests.
- (Optional follow-up) Add README guidance/examples for recommended caller preallocation.
Acceptance criteria
- Helper APIs return required bytes (return
0 on invalid inputs).
- Tests cover representative shapes and invalid cases.
Test plan
make test (expects tests/testDriver_work_area_size_apis.c to pass).
References
- Key files:
zdnn/work_area.c, zdnn/zdnn.h, zdnn/zdnn.map,
tests/testDriver_work_area_size_apis.c.
Proposed PR branch:
qe/issue-08-work-area-size-apisCategory:
Feature Request(performance + usability)Problem
avoid runtime allocator overhead in hot loops.
Justification / why this is needed
wastes CPU time under load.
frameworks (llama.cpp style loops, ORT EPs, etc.).
Proposed change
zdnn_get_lstm_work_area_size(...)andzdnn_get_gru_work_area_size(...).Acceptance criteria
0on invalid inputs).Test plan
make test(expectstests/testDriver_work_area_size_apis.cto pass).References
zdnn/work_area.c,zdnn/zdnn.h,zdnn/zdnn.map,tests/testDriver_work_area_size_apis.c.