Skip to content

Update hwloc CPU Binding Implementation#1226

Draft
bcmIntc wants to merge 1 commit into
Sandia-OpenSHMEM:mainfrom
bcmIntc:bcm_hwloc_fixes_updated
Draft

Update hwloc CPU Binding Implementation#1226
bcmIntc wants to merge 1 commit into
Sandia-OpenSHMEM:mainfrom
bcmIntc:bcm_hwloc_fixes_updated

Conversation

@bcmIntc
Copy link
Copy Markdown
Collaborator

@bcmIntc bcmIntc commented Apr 2, 2026

Problem

This PR replaces the previous compile-time-gated hwloc CPU binding in shmem_internal_heap_postinit()

Solution

Replace it with an opt-in system controlled by the new SHMEM_CPU_PLACEMENT_POLICY environment variable (default: none).

Available policies:

Policy Behavior
none (default) No changes; launcher owns CPU affinity entirely
best-memory / numa-local Bind CPUs to the NUMA node of the process's current CPU location
socket-local Bind CPUs to the socket of the process's current CPU location
balanced-numa Round-robin PEs across NUMA nodes by local rank; also sets memory binding via hwloc_set_membind
best-network Bind CPUs to the NUMA domain of the assigned NIC; applied inside assign_nic_with_hwloc() after NIC selection

SHMEM_DISABLE_CPU_BINDING=true suppresses all placement regardless of policy. The hwloc topology
initialization is preserved as it is still required by NIC affinity selection in transport_ofi.c.

Files changed

  • src/init.c — removed old rebind logic; added apply_cpu_placement() dispatching on policy name
  • src/shmem_env_defs.h — added SHMEM_CPU_PLACEMENT_POLICY env var; updated SHMEM_DISABLE_CPU_BINDING description
  • src/transport_ofi.c — added best-network policy inside assign_nic_with_hwloc() where PCI topology is already in scope

Test coverage

…policies

Removes the HWLOC_ENFORCE_SINGLE_SOCKET / HWLOC_ENFORCE_SINGLE_NUMA_NODE
compile-time flags that unconditionally narrowed each PE's CPU affinity
mask at startup, overriding any binding set by the job launcher.

Replaces them with SHMEM_CPU_PLACEMENT_POLICY (default: none), a runtime
env var that lets users opt in to a specific placement strategy:

  none          - no changes; launcher owns CPU affinity entirely (default)
  best-memory   - bind CPUs to the NUMA node of the PE's current location
  numa-local    - alias for best-memory
  socket-local  - bind CPUs to the socket of the PE's current location
  balanced-numa - round-robin PEs across NUMA nodes by local rank;
                  also sets memory binding via hwloc_set_membind
  best-network  - bind CPUs to the NUMA domain of the assigned NIC;
                  applied in transport_ofi.c after NIC selection

Adds SHMEM_DISABLE_CPU_BINDING (bool, default false) to suppress all
hwloc CPU placement regardless of policy.

The best-network policy is implemented in assign_nic_with_hwloc() in
transport_ofi.c, where the selected NIC's PCI bus attributes are already
in scope. All other policies are applied in a new apply_cpu_placement()
helper in init.c, called after hwloc_topology_load().

The hwloc topology object is preserved as it is still required by NIC
affinity selection in transport_ofi.c.
@bcmIntc bcmIntc force-pushed the bcm_hwloc_fixes_updated branch from 0292757 to 9db3d46 Compare April 30, 2026 13:47
@markbrown314 markbrown314 added this to the v1.6.0-perf-r2 milestone May 4, 2026
@markbrown314 markbrown314 self-requested a review May 4, 2026 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants