feat(models): vllm k8s support#305
Draft
benmccown wants to merge 5 commits into
Draft
Conversation
Signed-off-by: Ben McCown <bmccown@nvidia.com>
Fixes found validating the k8s vLLM path on a real cluster: - model-source and health-path are stored as annotations, not labels (their values contain '/', invalid for k8s label values). - puller Job runs the image entrypoint via container args, not command (command would override the entrypoint and exec the first token). - puller HF_ENDPOINT resolves the cluster-routable Files URL (service discovery / base_url), bypassing the in-process local-service routing that returns localhost (unreachable from the puller pod). - release the RWO weights volume before serving: delete the completed puller Job at P3 so its pod releases the volume attachment, avoiding a Multi-Attach error when the server pod schedules on another node. Only on the success path; a failed Job is kept for status/log reporting. - status reads the Deployment first (source of truth once created); a missing Job with the PVC still present resumes P3 instead of reporting LOST (prevents a re-pull loop via drift recovery). - pod securityContext: do not force runAsUser on the server pod (images like vLLM lack a passwd entry for uid 1000 -> getpwuid crash); the puller keeps fsGroup so it can write the freshly-provisioned PVC. - orphan listing tolerates NIM CRD 403 quietly on the vLLM-only path. Signed-off-by: Ben McCown <bmccown@nvidia.com>
Teardown deletes every resource a deployment could own, by name, across
both the operator path (NIMService/NIMCache CRs) and the directly-emitted
vLLM path (Deployment/Service/Job/PVC). delete_model_deployment has only
workspace/name (no engine/config -- it is also driven by orphan
reconciliation), so attempting all resource types by name is the correct,
idempotent, self-healing teardown.
Previously a single delete failure (notably a 403 deleting a NIMService
when the ServiceAccount lacked that RBAC on the vLLM-only path) raised
out of the routine and aborted the whole teardown, leaving the deployment
stuck in DELETING and re-erroring every reconcile.
Now each delete is independent and 404-tolerant ('already gone' is
success); non-404 failures are logged concisely (no stack trace) and
aggregated, so one resource's failure never blocks the others, and the
result is ERROR (visible/stuck) rather than a false DELETED that would
orphan cluster resources.
The platform Helm chart's models ServiceAccount must grant get/list/
watch/delete on apps.nvidia.com nimservices/nimcaches (in addition to the
core/apps/batch resources) so engine-agnostic teardown does not 403.
Signed-off-by: Ben McCown <bmccown@nvidia.com>
The vLLM puller + server pods previously inherited the NIM-oriented default securityContext (or none). Running the server as an arbitrary uid 1000 made torch/inductor crash at startup (getpass.getuser -> pwd.getpwuid: 'uid not found: 1000') because the vllm/vllm-openai image has no /etc/passwd entry for 1000, and running unset defaulted to root. Inspecting the image, its provisioned non-root user is 'vllm' (uid 2000, gid 0) with a real passwd entry. Add dedicated default_vllm_user_id (2000) / default_vllm_group_id (0) config and apply them to both the puller and the server: the server runs as a non-root user that resolves cleanly, and the puller writes the weights under the same uid/gid so the server can read them. Operators can override via config; we do not hardcode root. Signed-off-by: Ben McCown <bmccown@nvidia.com>
… raw-object migration When NIM is migrated off k8s-nim-operator onto the shared raw-object compilers (vllm_k8s_compiler), the securityContext uid/gid params are engine-specific and must not be shared: vLLM uses 2000/0 (its image's 'vllm' user, which has an /etc/passwd entry), while NIM images expect the operator's historical 1000/2000. Reusing default_vllm_user_id for NIM (or hardcoding either in the compiler) would reintroduce the getpwuid crash or break NIM. Add a FUTURE note in the compiler module docstring and cross-references on the NIM uid/gid config fields and the vLLM puller call site so a future implementer (possibly not us) does not hit this. Signed-off-by: Ben McCown <bmccown@nvidia.com>
Contributor
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.