Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds a Virchow2 Ray Serve deployment with FastAPI ingress, tooling to download HuggingFace model snapshots and a PVC for cache; updates Dockerfiles and Ray service manifests; removes deprecated intra_op_num_threads from existing model configs. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant RayServe as Ray Serve (Virchow2)
participant Provider as providers.huggingface
participant PVC as HuggingFace PVC
participant HFHub as HuggingFace Hub
participant DownloaderJob as Downloader Job
Client->>RayServe: POST LZ4-compressed images
RayServe->>PVC: ensure model files available (HF_HOME)
RayServe->>Provider: request repo_id path
Provider-->>PVC: return local cache path (may be empty)
alt cache miss
RayServe->>DownloaderJob: (scheduled) run downloader job
DownloaderJob->>HFHub: snapshot_download (with HF_TOKEN)
HFHub-->>PVC: write model files
end
RayServe->>PVC: read model files
RayServe->>RayServe: preprocess, batch, infer with timm model
RayServe-->>Client: return embeddings
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the model serving infrastructure by integrating the Virchow2 foundation model and optimizing existing models for GPU performance. It establishes a robust framework for deploying advanced deep learning models, leveraging TensorRT for efficient inference and Hugging Face for model management. The changes also include necessary infrastructure updates for dependency management and persistent caching, ensuring a more efficient and scalable system. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces support for the Virchow2 foundation model within the Ray Serve infrastructure, which is a significant addition. The changes include a new Ray Serve deployment for Virchow2, updates to Dockerfiles for GPU support with necessary dependencies, and Kubernetes configurations for model downloading and caching. The refactoring of existing models to leverage GPU and TensorRT is a great performance enhancement. However, the pull request introduces critical security vulnerabilities by including hardcoded secrets in configuration files. These must be addressed by using a secure secret management solution like Kubernetes Secrets. Additionally, there are minor areas for improvement regarding Docker image consistency and file permissions.
| containers: | ||
| - name: ray-worker | ||
| image: cerit.io/rationai/model-service:2.53.0-gpu | ||
| image: cerit.io/rationai/model-service:latest-gpu |
There was a problem hiding this comment.
✔️ New docker image was pushed
https://github.com/RationAI/model-service/actions/runs/23088868861/job/67069978707
There was a problem hiding this comment.
Pull request overview
Adds support for the Virchow2 foundation model (paige-ai/Virchow2) as a new Ray Serve application, including offline Hugging Face caching and container/runtime updates needed to run the model in the existing model-service stack.
Changes:
- Introduces a new
Virchow2Ray Serve deployment with a/virchow2route and Hugging Face cache mounting. - Adds a Hugging Face model provider plus a Kubernetes Job to pre-download/cache the model for offline serving.
- Updates Docker images/dependencies and removes ONNX Runtime
intra_op_num_threadsconfiguration plumbing.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| ray-service.yaml | Adds Virchow2 Serve app, mounts HF cache PVC, updates worker/head images and GPU worker settings |
| pvc/huggingface-pvc.yaml | Adds a new RWX PVC for Hugging Face cache storage |
| providers/model_provider.py | Adds a Hugging Face provider helper for (offline) hub downloads |
| models/virchow2.py | New Ray Serve deployment implementing Virchow2 embedding inference via timm + torch |
| models/semantic_segmentation.py | Removes intra_op_num_threads wiring from config/session options |
| models/binary_classifier.py | Removes intra_op_num_threads wiring and reformats ORT call |
| misc/virchow2_downloader/virchow2_downloader_job.yaml | Adds a Job manifest intended to pre-populate the HF cache PVC |
| misc/virchow2_downloader/download_virchow2.py | Adds the downloader script used by the Job |
| docker/Dockerfile.gpu | Adds TensorRT/torch/timm/huggingface-hub dependencies for Virchow2 runtime |
| docker/Dockerfile.cpu | Reformats pip install (currently introduces a Dockerfile syntax issue) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (4)
models/virchow2.py (1)
57-63: Avoid mutatingconfig["model"]insidereconfigure.Using
popon the incoming config introduces avoidable side effects and can make repeated reconfiguration brittle.♻️ Proposed refactor
- module_path, attr_name = config["model"].pop("_target_").split(":") + model_cfg = dict(config["model"]) + module_path, attr_name = model_cfg.pop("_target_").split(":") provider = getattr(importlib.import_module(module_path), attr_name) - repo_id = config["model"]["repo_id"] + repo_id = model_cfg["repo_id"] @@ - provider(**config["model"]) + provider(**model_cfg)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@models/virchow2.py` around lines 57 - 63, The code in reconfigure currently mutates config["model"] by calling pop("_target_"); instead, read the target into a local variable and avoid modifying the original dict: extract target = config["model"].get("_target_") (or target_str = config["model"]["_target_"]), split it into module_path and attr_name, import provider as you do now, and pass a shallow copy of the kwargs to provider (e.g., kwargs = dict(config["model"]); kwargs.pop("_target_", None); provider(**kwargs)). Keep repo_id and the logger usage the same but do not call pop on config["model"] itself.misc/virchow2_downloader/virchow2_downloader_job.yaml (2)
30-33: Pin package versions for reproducibility.The
pip installcommand installs unpinned packages, which could lead to version drift between job runs or unexpected failures if a new incompatible version is released.📦 Proposed fix
args: - | - pip install --user --no-cache-dir huggingface_hub transformers torch timm + pip install --user --no-cache-dir huggingface_hub==0.27.0 transformers==4.47.0 torch==2.5.1 timm==1.0.12 python3 /mnt/scripts/download_virchow2.pyAlternatively, consider building a dedicated downloader image with dependencies pre-installed to eliminate runtime network dependencies on PyPI.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@misc/virchow2_downloader/virchow2_downloader_job.yaml` around lines 30 - 33, The pip install line that runs in the job args ("pip install --user --no-cache-dir huggingface_hub transformers torch timm") is unpinned and should be changed to use explicit version pins (e.g., huggingface_hub==x.y.z, transformers==x.y.z, torch==x.y.z, timm==x.y.z) or replace the runtime install with a prebuilt downloader image that already contains those versions; update the args to install the pinned versions (or switch the job to use the built image) so that running python3 /mnt/scripts/download_virchow2.py uses deterministic dependency versions.
25-28: Consider enablingreadOnlyRootFilesystem: truefor defense-in-depth.Since
/tmpis already mounted as anemptyDirandHOMEis set to/tmp, pip with--userwill write packages there. The root filesystem can likely be made read-only, which reduces attack surface if the container is compromised.🛡️ Proposed fix
securityContext: allowPrivilegeEscalation: false capabilities: drop: ["ALL"] + readOnlyRootFilesystem: trueTest that the job still completes successfully with
readOnlyRootFilesystem: trueenabled, as some Python operations may require writes outside/tmp.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@misc/virchow2_downloader/virchow2_downloader_job.yaml` around lines 25 - 28, The pod spec's securityContext currently sets allowPrivilegeEscalation: false and drops all capabilities but doesn't set readOnlyRootFilesystem; add readOnlyRootFilesystem: true to the same securityContext to harden the container (refer to securityContext, allowPrivilegeEscalation, capabilities in the manifest), then ensure any writable paths (e.g., the existing emptyDir at /tmp and HOME=/tmp used for pip --user installs) remain mounted and writable; after the change run the virchow2_downloader job to verify it completes successfully and adjust mounts or HOME if any Python tooling needs additional writable paths.ray-service.yaml (1)
256-256: Consider pinning the GPU worker image tag.Using
latest-gpucan lead to unexpected behavior if the image is updated while pods are rescheduled. The head and cpu-worker images use a pinned version (2.53.0), but the gpu-worker useslatest-gpu.📌 Proposed fix
- image: cerit.io/rationai/model-service:latest-gpu + image: cerit.io/rationai/model-service:2.53.0-gpu🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ray-service.yaml` at line 256, The gpu worker image is using the floating tag "cerit.io/rationai/model-service:latest-gpu" which can cause unpredictable rollouts; change the image reference for the GPU worker to a pinned version (match the head/cpu-worker pattern, e.g. replace "cerit.io/rationai/model-service:latest-gpu" with a specific tag like "cerit.io/rationai/model-service:2.53.0" or "cerit.io/rationai/model-service:2.53.0-gpu") in the container spec that defines the GPU worker so the image is immutable during reschedules.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docker/Dockerfile.cpu`:
- Around line 54-55: The final RUN pip install command in Dockerfile.cpu ends
with a dangling backslash which breaks Docker parsing; update the RUN
instruction (the "RUN pip install --no-cache-dir onnxruntime lz4 ratiopath \"
line) by removing the trailing backslash or adding the missing continuation
token/package so the command is a complete single-line install or a properly
continued multi-line install, ensuring the command terminates correctly.
In `@misc/virchow2_downloader/download_virchow2.py`:
- Around line 41-43: The verification exception handler in download_virchow2.py
currently swallows errors with "except Exception as e: print(f\"Verification
warning: {e}\")"; change this so verification failures fail the job by either
re-raising the exception or exiting with a non-zero status (e.g., raise or call
sys.exit(1)), and include the original error message in the log; update the
except block that catches Exception as e to log the error and then raise e (or
call sys.exit(1)) so the process does not exit successfully on verification
failure.
In `@models/virchow2.py`:
- Around line 98-101: The current context manager unconditionally uses
torch.autocast(device_type="cuda", dtype=torch.float16) which breaks CPU-only
runs; modify the inference context in models/virchow2.py (the with
torch.inference_mode() block that currently nests torch.autocast(...)) to only
enable CUDA autocast when the selected runtime device is CUDA (e.g., check the
model/device variable or torch device.type), otherwise use a no-op context
(contextlib.nullcontext) or skip autocast; ensure you reference the same context
manager expression so CPU paths do not attempt CUDA autocast.
---
Nitpick comments:
In `@misc/virchow2_downloader/virchow2_downloader_job.yaml`:
- Around line 30-33: The pip install line that runs in the job args ("pip
install --user --no-cache-dir huggingface_hub transformers torch timm") is
unpinned and should be changed to use explicit version pins (e.g.,
huggingface_hub==x.y.z, transformers==x.y.z, torch==x.y.z, timm==x.y.z) or
replace the runtime install with a prebuilt downloader image that already
contains those versions; update the args to install the pinned versions (or
switch the job to use the built image) so that running python3
/mnt/scripts/download_virchow2.py uses deterministic dependency versions.
- Around line 25-28: The pod spec's securityContext currently sets
allowPrivilegeEscalation: false and drops all capabilities but doesn't set
readOnlyRootFilesystem; add readOnlyRootFilesystem: true to the same
securityContext to harden the container (refer to securityContext,
allowPrivilegeEscalation, capabilities in the manifest), then ensure any
writable paths (e.g., the existing emptyDir at /tmp and HOME=/tmp used for pip
--user installs) remain mounted and writable; after the change run the
virchow2_downloader job to verify it completes successfully and adjust mounts or
HOME if any Python tooling needs additional writable paths.
In `@models/virchow2.py`:
- Around line 57-63: The code in reconfigure currently mutates config["model"]
by calling pop("_target_"); instead, read the target into a local variable and
avoid modifying the original dict: extract target =
config["model"].get("_target_") (or target_str = config["model"]["_target_"]),
split it into module_path and attr_name, import provider as you do now, and pass
a shallow copy of the kwargs to provider (e.g., kwargs = dict(config["model"]);
kwargs.pop("_target_", None); provider(**kwargs)). Keep repo_id and the logger
usage the same but do not call pop on config["model"] itself.
In `@ray-service.yaml`:
- Line 256: The gpu worker image is using the floating tag
"cerit.io/rationai/model-service:latest-gpu" which can cause unpredictable
rollouts; change the image reference for the GPU worker to a pinned version
(match the head/cpu-worker pattern, e.g. replace
"cerit.io/rationai/model-service:latest-gpu" with a specific tag like
"cerit.io/rationai/model-service:2.53.0" or
"cerit.io/rationai/model-service:2.53.0-gpu") in the container spec that defines
the GPU worker so the image is immutable during reschedules.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 745968a1-d5c6-4efc-bab4-bab8d3a8ff7b
📒 Files selected for processing (10)
docker/Dockerfile.cpudocker/Dockerfile.gpumisc/virchow2_downloader/download_virchow2.pymisc/virchow2_downloader/virchow2_downloader_job.yamlmodels/binary_classifier.pymodels/semantic_segmentation.pymodels/virchow2.pyproviders/model_provider.pypvc/huggingface-pvc.yamlray-service.yaml
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ray-service.yaml`:
- Line 256: The image tag for the model service is using a mutable tag
("cerit.io/rationai/model-service:latest-gpu") which can drift from the pinned
rayVersion (rayVersion: 2.53.0); update the image field to a specific, versioned
tag that matches the pinned rayVersion pattern used elsewhere (follow the
pattern at the other image entries near the keys on lines where images are
pinned), e.g. replace "latest-gpu" with the explicit version string used for
rayVersion so the model-service image is deterministically tied to rayVersion.
- Around line 92-96: The working_dir currently points at a branch zip
(working_dir:
https://github.com/RationAI/model-service/archive/refs/heads/master.zip) which
is mutable; replace that URL with one that pins a specific immutable commit SHA
(e.g., https://github.com/RationAI/model-service/archive/<COMMIT_SHA>.zip), so
update the runtime_env.working_dir to reference the chosen commit SHA, and
ensure any deployment automation that updates the repo replaces the SHA
intentionally when you want a new release; look for runtime_env and working_dir
in the diff to locate the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a0be4b40-3f9b-409a-989c-446443aa4506
📒 Files selected for processing (2)
docker/Dockerfile.cpuray-service.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
- docker/Dockerfile.cpu
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
This PR introduces support for the Virchow2 foundation model (paige-ai/Virchow2) within the Ray Serve infrastructure.
New Model Deployment: Added virchow2.py implementing the Virchow2 class as a Ray Serve deployment
Summary by CodeRabbit
New Features
Refactor
Chores