Skip to content

Fix: dlmopen namespace isolation + in-process sim execution#447

Open
hw-native-sys-bot wants to merge 1 commit intohw-native-sys:mainfrom
hw-native-sys-bot:fix/dlmopen-namespace-isolation
Open

Fix: dlmopen namespace isolation + in-process sim execution#447
hw-native-sys-bot wants to merge 1 commit intohw-native-sys:mainfrom
hw-native-sys-bot:fix/dlmopen-namespace-isolation

Conversation

@hw-native-sys-bot
Copy link
Copy Markdown
Collaborator

@hw-native-sys-bot hw-native-sys-bot commented Apr 3, 2026

Summary

  • ChipWorker dlmopen: Replace dlopen(RTLD_GLOBAL) with dlmopen(LM_ID_NEWLM) to load each host runtime SO into an independent linker namespace, eliminating cross-runtime symbol pollution when multiple runtimes run in the same process. Cache handles by path#mtime:size; never dlclose cached namespaces to avoid static TLS block exhaustion. At most 6 namespaces in CI (3 runtimes x 2 arches), production uses exactly 1.
  • CI sim in-process (ci.py): Replace subprocess-per-runtime-group with in-process ChipWorker.reset() isolation. Serial: shared worker, init/reset per task. Parallel (--parallel): one dedicated ChipWorker per task.
  • AICPU device-side SO loading: Encapsulate in platform layer via load_device_orch_so (orch_so_loader.cpp). sim: mkstemp unique name; onboard: getpid fixed path. aicpu_executor.cpp calls this instead of inline file I/O.

Depends on: #449, #450

Testing

  • python ci.py -p a2a3sim -c 6622890 -t 600 --build-runtime — 15/15 PASS

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new abstracted orchestration API for the host_build_graph runtime, replacing direct Runtime class access with an OrchestrationRuntime interface and a function-pointer table. This change decouples orchestration code from internal runtime headers. Additionally, the PR improves simulation task isolation in ci.py through reset-based isolation and enhances shared library loading in ChipWorker by utilizing dlmopen on Linux for better namespace separation. The AicpuExecutor was also updated to harden the handling of temporary shared objects. Feedback was provided regarding the removal of member clearing in AicpuExecutor::deinit, which could lead to dangling pointers if the instance is reused.

orch_args_cached_ = nullptr;
orch_so_handle_ = nullptr;
orch_so_path_[0] = '\0';

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

In AicpuExecutor::deinit, the clearing of orchestration-related members has been removed. Since the shared object is closed via dlclose elsewhere, pointers like orch_func_ become dangling. If the AicpuExecutor instance is intended to be reusable, not nulling out these members could lead to use-after-free vulnerabilities or other undefined behavior in subsequent operations.

It's safer to restore the clearing of these members to ensure the executor is reset to a clean and predictable state.

    orch_func_ = nullptr;
    orch_bind_runtime_ = nullptr;
    orch_args_cached_ = nullptr;
    orch_so_handle_ = nullptr;
    orch_so_path_[0] = '\0';

@hw-native-sys-bot hw-native-sys-bot force-pushed the fix/dlmopen-namespace-isolation branch 6 times, most recently from 0490e88 to 6504693 Compare April 3, 2026 12:49
@hw-native-sys-bot hw-native-sys-bot changed the title Fix: dlmopen namespace isolation for in-process sim multi-runtime execution Fix: dlmopen namespace isolation + in-process sim execution Apr 3, 2026
@hw-native-sys-bot hw-native-sys-bot force-pushed the fix/dlmopen-namespace-isolation branch 6 times, most recently from d63b45b to a2cb97e Compare April 3, 2026 13:31
ChipWorker: replace dlopen(RTLD_GLOBAL) with dlmopen(LM_ID_NEWLM) to load
each host runtime SO into an independent linker namespace, eliminating
cross-runtime symbol pollution when multiple runtimes run in the same process.
Cache handles by path+mtime+size; never dlclose cached namespaces to avoid
static TLS block exhaustion. At most 6 namespaces in CI (3 runtimes x 2
arches), production uses exactly 1.

CI sim: replace subprocess-per-runtime-group with in-process
ChipWorker.reset() isolation. Serial: shared worker, init/reset per task.
Parallel (--parallel): one dedicated ChipWorker per task.

AICPU device-side SO loading: encapsulate in platform layer via
load_device_orch_so (sim: mkstemp unique name; onboard: getpid fixed path).
aicpu_executor.cpp calls this instead of inline file I/O.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@hw-native-sys-bot hw-native-sys-bot force-pushed the fix/dlmopen-namespace-isolation branch from a2cb97e to 41e449e Compare April 3, 2026 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant