[WIP] set accelerator environment variables from runtime env agent #63327
[WIP] set accelerator environment variables from runtime env agent #63327andrewsykim wants to merge 30 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request moves the configuration of resource-related environment variables, such as OMP_NUM_THREADS and accelerator visibility (e.g., CUDA_VISIBLE_DEVICES), from task execution time to worker startup. This is implemented by passing resource requirements and allocated instances to the RuntimeEnvAgent during worker creation. Feedback identifies several critical issues: an isolation bug in the Python agent where environment variables are popped instead of cleared, a missing check for specific allocated resource IDs when reusing workers, and concerns regarding strict resource equality checks and hardcoded prestart values which may significantly degrade worker reuse efficiency. Additionally, the reviewer questioned the language-specific logic that might leave non-Python workers without proper isolation.
| if (worker.GetResourceRequirements() != pop_worker_request.resource_requirements_) { | ||
| return WorkerUnfitForLeaseReason::OTHERS; | ||
| } |
There was a problem hiding this comment.
This check is missing a comparison for allocated_instances. Since environment variables like CUDA_VISIBLE_DEVICES are now set at worker startup based on the specific resource IDs allocated, a worker can only be reused if the new task is assigned the exact same resource IDs. Without this check, a task assigned GPU 1 might be scheduled on a worker process that was started with CUDA_VISIBLE_DEVICES=0.
Additionally, the strict equality check on ResourceSet will significantly reduce worker reuse. For example, a worker prestarted with 1.0 CPU will not be reused for a task requiring 0.5 CPU, even though the resulting OMP_NUM_THREADS would be identical. While strict equality is safe, it may lead to excessive worker process churn.
| @@ -238,6 +242,8 @@ class Worker : public std::enable_shared_from_this<Worker>, public WorkerInterfa | |||
| /// Saved process group id captured at registration time. Used for process-group | |||
| /// cleanup validation at disconnect/stop. | |||
| std::optional<pid_t> saved_pgid_ = std::nullopt; | |||
| /// Resource requirements of this worker process. | |||
| ResourceSet resource_requirements_; | |||
There was a problem hiding this comment.
| ResourceSet resource_requirements; | ||
| resource_requirements.Set(scheduling::ResourceID::CPU(), 1.0); |
There was a problem hiding this comment.
Hardcoding prestarted workers to 1.0 CPU, combined with the strict equality check in WorkerFitForLease, means these workers will only be used for tasks that request exactly 1.0 CPU. Tasks requesting any other amount (e.g., 0.1 or 2.0) will trigger the start of a new worker process, effectively negating the benefit of prestarting workers for a large class of tasks.
| @@ -463,7 +493,9 @@ async def _create_runtime_env_with_retry( | |||
| serialized_context = self._env_cache[serialized_env] | |||
| result = self._env_cache[serialized_env] | |||
| if result.success: | |||
| context = result.result | |||
| context_obj = RuntimeEnvContext.deserialize(result.result) | |||
| self._inject_resource_env_vars(context_obj, request) | |||
There was a problem hiding this comment.
It seemed cleaner to call _inject_resource_env_vars in _setup_runtime_env, but request.allocated_instances for GPUs was never set. I'm assuming this is called for worker prestart before the accelerator IDs are passed down
2593f18 to
1b857bd
Compare
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
…ts or assigned accelerator IDs Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
…est.cc Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
…clock Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
…accelerator env vars Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
…ateRuntimeEnv Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Description
Closes #63138
Sets accelerator specific environment variables at runtime environment creation instead of from within worker processes. This is to avoid crashes caused by getenv/setenv thread safety issues (ex: #62028).
Related issues
Additional information