@efaulhaber ran into an issue recently where the inheritance of thread pinning became an issue.
CUDA.jl spawns worker thread in order to accelerate the nonblocking synchronization.
# we don't know what the size of uv_thread_t is, so reserve enough space
tid = Ref{NTuple{32, UInt8}}(ntuple(i -> 0, 32))
cb = @cfunction(synchronization_worker, Cvoid, (Ptr{Cvoid},))
err = @ccall uv_thread_create(tid::Ptr{Cvoid}, cb::Ptr{Cvoid}, Ptr{Cvoid}(i)::Ptr{Cvoid})::Cint
err == 0 || Base.uv_error("uv_thread_create", err)
@ccall uv_thread_detach(tid::Ptr{Cvoid})::Cint
err == 0 || Base.uv_error("uv_thread_detach", err)
end
Now the thread spawned here, inherits it's affinity mask from it's parent.
A new thread created by pthread_create(3) inherits a copy of its creator's CPU affinity mask.
https://man7.org/linux/man-pages/man3/pthread_setaffinity_np.3.html
@efaulhaber was running
using ThreadPinning; pinthreads(:numa);
Which contrary to my expectation did not set the affinity mask to the entire numa domain, but rather assigned each thread to a core within the numa domain. My expectation stems from numactl https://linux.die.net/man/8/numactl which operates on a process level.
julia> ThreadPinningCore.getaffinity()
16-element Vector{Int8}:
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
@efaulhaber ran into an issue recently where the inheritance of thread pinning became an issue.
CUDA.jl spawns worker thread in order to accelerate the nonblocking synchronization.
Now the thread spawned here, inherits it's affinity mask from it's parent.
https://man7.org/linux/man-pages/man3/pthread_setaffinity_np.3.html
@efaulhaber was running
Which contrary to my expectation did not set the affinity mask to the entire numa domain, but rather assigned each thread to a core within the numa domain. My expectation stems from
numactlhttps://linux.die.net/man/8/numactl which operates on a process level.