Skip to content

[CPU] Abort kernel execution on assertion failure instead of segfaulting#419

Draft
hughperkins wants to merge 7 commits intomainfrom
hp/cpu-longjmp-after-assert
Draft

[CPU] Abort kernel execution on assertion failure instead of segfaulting#419
hughperkins wants to merge 7 commits intomainfrom
hp/cpu-longjmp-after-assert

Conversation

@hughperkins
Copy link
Collaborator

On CPU, when debug mode is enabled, out-of-bounds array accesses trigger a runtime assertion that records the error but allows execution to continue -- leading to a SIGSEGV before Python can retrieve the error.

Fix this by using setjmp/longjmp: each CPU task runner (range_for, struct_for, mesh_for, serial) sets up a jmp_buf via RuntimeContext, and the new quadrants_assert_format_ctx function longjmps back on failure. The existing check_runtime_error path then raises QuadrantsAssertionError.

GPU architectures are unaffected (they already kill threads via asm).

Made-with: Cursor

Issue: #

Brief Summary

copilot:summary

Walkthrough

copilot:walkthrough

On CPU, when debug mode is enabled, out-of-bounds array accesses trigger
a runtime assertion that records the error but allows execution to
continue -- leading to a SIGSEGV before Python can retrieve the error.

Fix this by using setjmp/longjmp: each CPU task runner (range_for,
struct_for, mesh_for, serial) sets up a jmp_buf via RuntimeContext, and
the new quadrants_assert_format_ctx function longjmps back on failure.
The existing check_runtime_error path then raises QuadrantsAssertionError.

GPU architectures are unaffected (they already kill threads via asm).

Made-with: Cursor

@qd.kernel
def write_oob(a: qd.types.ndarray(dtype=qd.f32, ndim=1)):
for i in range(10):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think this test is valid, because this si a paallel loop? we should make ti serial I think?

@hughperkins
Copy link
Collaborator Author

Opus 4.6 review:

Branch hp/cpu-longjmp-after-assert — Analysis

Problem it solves: On the CPU backend, when a runtime assertion fires (e.g., out-of-bounds ndarray access in debug mode), the assertion was logged but execution continued. The
subsequent memory access would then dereference invalid memory and segfault, which is a terrible user experience — you get a crash instead of a clean Python-side AssertionError.
On GPU this isn't a problem because asm("exit;") kills the thread immediately after the assertion.
How it works: The fix uses setjmp/longjmp to abort kernel execution on the CPU backend after an assertion failure:

  1. context.h — Adds a cpu_abort_jmp_buf pointer to RuntimeContext. When non-null, it points to a jmp_buf set up by the caller.
  2. kernel_launcher.cpp — Before running each task, sets up a setjmp guard. If longjmp fires, it breaks out of the task loop, skipping remaining tasks.
  3. runtime.cpp — Three parallel-for variants (struct-for, range-for, mesh-for) each get the same pattern: per-thread setjmp guard wrapping the task body. A new function
    quadrants_assert_format_ctx is added that calls the existing quadrants_assert_format (to print the error) and then longjmps back if the assertion failed.
  4. codegen_llvm.cpp — The codegen for AssertStmt now emits a call to quadrants_assert_format_ctx (passing the context) on CPU, while GPU still uses the original
    quadrants_assert_format (passing just the runtime).
  5. Tests — Four new tests in test_debug.py covering 1D, 2D, small-array OOB access (all expecting AssertionError instead of segfault), plus a sanity test that in-bounds access
    still works correctly.

Opinion:
This is a clean, well-scoped fix. The setjmp/longjmp pattern is the right approach here — it's the standard C/C++ mechanism for non-local jumps when you can't use exceptions (and
in JIT'd code you can't). A few observations:
• Correctness looks good. The jmp_buf is stack-allocated at each guard site and cleaned up (= nullptr) after the task completes, so there's no dangling pointer risk. Each
parallel thread gets its own jmp_buf via the per-thread RuntimeContext copy.
• Scope is minimal. Only CPU is affected; GPU codepath is untouched.
• One subtlety: longjmp out of JIT'd code skips any destructors or cleanup that might be on the JIT'd stack. For quadrants kernels this should be fine since the JIT'd code
operates on raw buffers with no RAII, but it's worth keeping in mind if the JIT'd code ever acquires resources.
• The tests are thorough — they cover the exact segfault scenario (small array OOB), multi-dimensional access, and verify the happy path isn't broken by the setjmp machinery.

Overall this looks solid and ready for review/merge.

On Windows x64, longjmp performs SEH-based stack unwinding which requires
proper unwind tables (.pdata/.xdata) for every frame on the call stack.
JIT-compiled code does not register these tables, so longjmp from JIT'd
code crashes the process — causing all Windows OOB-check tests to fail
with worker crashes.

Replace the mechanism: quadrants_assert_format_ctx now returns 1 on
failure instead of calling longjmp, and the codegen emits a conditional
ret-void after each assert call on CPU.  Task runners check the
cpu_assert_failed flag after each body call to break out of their loops.
…assert

# Conflicts:
#	quadrants/runtime/cpu/kernel_launcher.cpp
@hughperkins hughperkins force-pushed the hp/cpu-longjmp-after-assert branch from c552c09 to 0006e75 Compare March 16, 2026 18:39
…assert

# Conflicts:
#	quadrants/runtime/cpu/kernel_launcher.cpp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant