Fix: dlmopen namespace isolation + in-process sim execution#447
Fix: dlmopen namespace isolation + in-process sim execution#447hw-native-sys-bot wants to merge 1 commit intohw-native-sys:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new abstracted orchestration API for the host_build_graph runtime, replacing direct Runtime class access with an OrchestrationRuntime interface and a function-pointer table. This change decouples orchestration code from internal runtime headers. Additionally, the PR improves simulation task isolation in ci.py through reset-based isolation and enhances shared library loading in ChipWorker by utilizing dlmopen on Linux for better namespace separation. The AicpuExecutor was also updated to harden the handling of temporary shared objects. Feedback was provided regarding the removal of member clearing in AicpuExecutor::deinit, which could lead to dangling pointers if the instance is reused.
| orch_args_cached_ = nullptr; | ||
| orch_so_handle_ = nullptr; | ||
| orch_so_path_[0] = '\0'; | ||
|
|
There was a problem hiding this comment.
In AicpuExecutor::deinit, the clearing of orchestration-related members has been removed. Since the shared object is closed via dlclose elsewhere, pointers like orch_func_ become dangling. If the AicpuExecutor instance is intended to be reusable, not nulling out these members could lead to use-after-free vulnerabilities or other undefined behavior in subsequent operations.
It's safer to restore the clearing of these members to ensure the executor is reset to a clean and predictable state.
orch_func_ = nullptr;
orch_bind_runtime_ = nullptr;
orch_args_cached_ = nullptr;
orch_so_handle_ = nullptr;
orch_so_path_[0] = '\0';0490e88 to
6504693
Compare
d63b45b to
a2cb97e
Compare
ChipWorker: replace dlopen(RTLD_GLOBAL) with dlmopen(LM_ID_NEWLM) to load each host runtime SO into an independent linker namespace, eliminating cross-runtime symbol pollution when multiple runtimes run in the same process. Cache handles by path+mtime+size; never dlclose cached namespaces to avoid static TLS block exhaustion. At most 6 namespaces in CI (3 runtimes x 2 arches), production uses exactly 1. CI sim: replace subprocess-per-runtime-group with in-process ChipWorker.reset() isolation. Serial: shared worker, init/reset per task. Parallel (--parallel): one dedicated ChipWorker per task. AICPU device-side SO loading: encapsulate in platform layer via load_device_orch_so (sim: mkstemp unique name; onboard: getpid fixed path). aicpu_executor.cpp calls this instead of inline file I/O. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
a2cb97e to
41e449e
Compare
Summary
dlopen(RTLD_GLOBAL)withdlmopen(LM_ID_NEWLM)to load each host runtime SO into an independent linker namespace, eliminating cross-runtime symbol pollution when multiple runtimes run in the same process. Cache handles bypath#mtime:size; neverdlclosecached namespaces to avoid static TLS block exhaustion. At most 6 namespaces in CI (3 runtimes x 2 arches), production uses exactly 1.ci.py): Replace subprocess-per-runtime-group with in-processChipWorker.reset()isolation. Serial: shared worker,init/resetper task. Parallel (--parallel): one dedicatedChipWorkerper task.load_device_orch_so(orch_so_loader.cpp). sim: mkstemp unique name; onboard: getpid fixed path.aicpu_executor.cppcalls this instead of inline file I/O.Depends on: #449, #450
Testing
python ci.py -p a2a3sim -c 6622890 -t 600 --build-runtime— 15/15 PASS