[CPU] Abort kernel execution on assertion failure instead of segfaulting#419
[CPU] Abort kernel execution on assertion failure instead of segfaulting#419hughperkins wants to merge 7 commits intomainfrom
Conversation
On CPU, when debug mode is enabled, out-of-bounds array accesses trigger a runtime assertion that records the error but allows execution to continue -- leading to a SIGSEGV before Python can retrieve the error. Fix this by using setjmp/longjmp: each CPU task runner (range_for, struct_for, mesh_for, serial) sets up a jmp_buf via RuntimeContext, and the new quadrants_assert_format_ctx function longjmps back on failure. The existing check_runtime_error path then raises QuadrantsAssertionError. GPU architectures are unaffected (they already kill threads via asm). Made-with: Cursor
|
|
||
| @qd.kernel | ||
| def write_oob(a: qd.types.ndarray(dtype=qd.f32, ndim=1)): | ||
| for i in range(10): |
There was a problem hiding this comment.
I dont think this test is valid, because this si a paallel loop? we should make ti serial I think?
|
Opus 4.6 review: Branch Problem it solves: On the CPU backend, when a runtime assertion fires (e.g., out-of-bounds ndarray access in debug mode), the assertion was logged but execution continued. The
Opinion: Overall this looks solid and ready for review/merge. |
On Windows x64, longjmp performs SEH-based stack unwinding which requires proper unwind tables (.pdata/.xdata) for every frame on the call stack. JIT-compiled code does not register these tables, so longjmp from JIT'd code crashes the process — causing all Windows OOB-check tests to fail with worker crashes. Replace the mechanism: quadrants_assert_format_ctx now returns 1 on failure instead of calling longjmp, and the codegen emits a conditional ret-void after each assert call on CPU. Task runners check the cpu_assert_failed flag after each body call to break out of their loops.
…assert # Conflicts: # quadrants/runtime/cpu/kernel_launcher.cpp
c552c09 to
0006e75
Compare
…assert # Conflicts: # quadrants/runtime/cpu/kernel_launcher.cpp
On CPU, when debug mode is enabled, out-of-bounds array accesses trigger a runtime assertion that records the error but allows execution to continue -- leading to a SIGSEGV before Python can retrieve the error.
Fix this by using setjmp/longjmp: each CPU task runner (range_for, struct_for, mesh_for, serial) sets up a jmp_buf via RuntimeContext, and the new quadrants_assert_format_ctx function longjmps back on failure. The existing check_runtime_error path then raises QuadrantsAssertionError.
GPU architectures are unaffected (they already kill threads via asm).
Made-with: Cursor
Issue: #
Brief Summary
copilot:summary
Walkthrough
copilot:walkthrough