[Vulkan] Skip 3x3 sym_eig tests + dedup OpTypeArray on NVIDIA#706
[Vulkan] Skip 3x3 sym_eig tests + dedup OpTypeArray on NVIDIA#706hughperkins wants to merge 1 commit into
Conversation
… in pipeline creation
NVIDIA driver 580.76.05 SIGSEGVs in `libnvidia-gpucomp.so` / `libnvidia-glvkspirv.so`
during compute-pipeline creation for the fully-inlined `_sym_eig3x3` (Eigen3
`computeDirect` Cardano method + `dsyevq3` Givens-rotation fallback) shader. The
emitted SPIR-V is accepted by `spirv-val --target-env vulkan1.3` and round-trips
cleanly through `spirv-cross`, so the bug is in NVIDIA's SPIR-V → NVVM frontend,
not Quadrants codegen — `test_sym_eig_sort_order` already documents the same
crash and skips the n=3 case (see comment there).
Two changes:
1. `tests/python/test_eig.py` — skip the four affected tests on Vulkan
(`test_sym_eig3x3_identity_f{32,64}`, `test_sym_eig3x3_f{32,64}`) with a
matching comment pointing at the same pre-existing driver quirk. n=2 and n>=4
are unaffected.
2. `quadrants/codegen/spirv/spirv_ir_builder.{h,cpp}` — dedup `OpTypeArray`
declarations in `get_function_array_type` / `get_array_type`. The Jacobi path
was emitting six independent `float[3]` / `float[9]` types for the same local
SoA, which trips strict drivers (NVIDIA actually crashes in pipeline creation
on the duplicated-type variant — separate code path from the above, but same
blast radius) and leaves observable `_arr_float_uint_3_0` / `..._1` / `..._2`
aliases in `QD_DUMP_IR` and `spirv-cross` output. Separate caches for the
Function-scope vs. `ArrayStride`-decorated buffer variants — sharing one cache
would re-apply `ArrayStride` to Function-scope arrays and re-introduce
`VUID-StandaloneSpirv-None-10684`. This dedup is independent from the sym_eig
skip (alone it isn't sufficient to make `_sym_eig3x3` compile on NVIDIA) but
is a real bug worth fixing on its own.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7b18f47c31
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
|
|
||
| def _test_sym_eig3x3(dt, a00): | ||
| if qd.lang.impl.current_cfg().arch == qd.vulkan: |
There was a problem hiding this comment.
Gate 3x3 skip to affected Vulkan drivers
The new arch == qd.vulkan guard skips these tests on every Vulkan implementation, but the failure described here is NVIDIA-driver-specific; in the same file, _test_sym_eig_sort_order notes that n == 3 runs cleanly on AMD Vulkan. As written, AMD/Intel Vulkan runs will now always skip _sym_eig3x3 coverage, so real regressions in the 3x3 path on unaffected Vulkan stacks can no longer be detected. Please narrow this skip to the problematic vendor/driver condition instead of all Vulkan backends.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
also, these should probalby be xfail, not skip anyway, I would think.
… in pipeline creation
NVIDIA driver 580.76.05 SIGSEGVs in
libnvidia-gpucomp.so/libnvidia-glvkspirv.soduring compute-pipeline creation for the fully-inlined_sym_eig3x3(Eigen3computeDirectCardano method +dsyevq3Givens-rotation fallback) shader. The emitted SPIR-V is accepted byspirv-val --target-env vulkan1.3and round-trips cleanly throughspirv-cross, so the bug is in NVIDIA's SPIR-V → NVVM frontend, not Quadrants codegen —test_sym_eig_sort_orderalready documents the same crash and skips the n=3 case (see comment there).Two changes:
tests/python/test_eig.py— skip the four affected tests on Vulkan (test_sym_eig3x3_identity_f{32,64},test_sym_eig3x3_f{32,64}) with a matching comment pointing at the same pre-existing driver quirk. n=2 and n>=4 are unaffected.quadrants/codegen/spirv/spirv_ir_builder.{h,cpp}— dedupOpTypeArraydeclarations inget_function_array_type/get_array_type. The Jacobi path was emitting six independentfloat[3]/float[9]types for the same local SoA, which trips strict drivers (NVIDIA actually crashes in pipeline creation on the duplicated-type variant — separate code path from the above, but same blast radius) and leaves observable_arr_float_uint_3_0/..._1/..._2aliases inQD_DUMP_IRandspirv-crossoutput. Separate caches for the Function-scope vs.ArrayStride-decorated buffer variants — sharing one cache would re-applyArrayStrideto Function-scope arrays and re-introduceVUID-StandaloneSpirv-None-10684. This dedup is independent from the sym_eig skip (alone it isn't sufficient to make_sym_eig3x3compile on NVIDIA) but is a real bug worth fixing on its own.Issue: #
Brief Summary
copilot:summary
Walkthrough
copilot:walkthrough