Virchow2 model by Jurgee · Pull Request #2 · RationAI/model-service

Jurgee · 2026-03-13T19:38:55Z

This PR introduces support for the Virchow2 foundation model (paige-ai/Virchow2) within the Ray Serve infrastructure.

New Model Deployment: Added virchow2.py implementing the Virchow2 class as a Ray Serve deployment

Summary by CodeRabbit

New Features
- Added Virchow2 foundation model deployment with FastAPI ingress for image embeddings.
- Added an automated HuggingFace downloader job and a local model-download utility for offline caching.
- Added a persistent volume claim for shared HuggingFace cache.
Refactor
- Removed thread-level inference configuration from model settings.
Chores
- Updated container images, GPU worker runtime/setup, and increased head/worker resource allocations.

coderabbitai · 2026-03-13T19:39:11Z

Warning

Rate limit exceeded

@Jurgee has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 12 minutes and 36 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 210b4563-41ac-4083-b4c5-3c98cfbc6d2b

📥 Commits

Reviewing files that changed from the base of the PR and between fb646c4 and bfd90a9.

📒 Files selected for processing (2)

models/virchow2.py
providers/model_provider.py

📝 Walkthrough

Walkthrough

Adds a Virchow2 Ray Serve deployment with FastAPI ingress, tooling to download HuggingFace model snapshots and a PVC for cache; updates Dockerfiles and Ray service manifests; removes deprecated intra_op_num_threads from existing model configs.

Changes

Cohort / File(s)	Summary
Docker Configuration `docker/Dockerfile.cpu`, `docker/Dockerfile.gpu`	CPU Dockerfile: reformatted pip install into multiline. GPU Dockerfile: split pip installs into multiple RUN layers, added explicit extra/index URLs and GPU-specific package pins (onnxruntime-gpu, tensorrt-cu12, torch/cu121, torchvision/cu121, timm, huggingface-hub).
Virchow2 Downloader Utilities `misc/virchow2_downloader/download_virchow2.py`, `misc/virchow2_downloader/virchow2_downloader_job.yaml`	Added a Python script to download/cache HF model snapshots (optional HF_TOKEN login, local cache) and a Kubernetes Job manifest to run the downloader with resource limits, mounted cache PVC, script ConfigMap, and secure non-root context.
Model Config Cleanup `models/binary_classifier.py`, `models/semantic_segmentation.py`	Removed `intra_op_num_threads` from Config TypedDicts and removed corresponding session option assignments; minor formatting tweaks.
Virchow2 Model Implementation `models/virchow2.py`	New Ray Serve deployment module: `Virchow2` class with FastAPI ingress, reconfigure to load a timm model from HF (offline/cache mode), preprocessing transforms, batch inference producing embeddings, and LZ4-compressed image request handling. Exports `app = Virchow2.bind()`.
Model Provider Enhancement `providers/model_provider.py`	Added `huggingface(repo_id, filename: str
Kubernetes / Ray Serve Manifests `pvc/huggingface-pvc.yaml`, `ray-service.yaml`	Added a 15Gi NFS-backed PVC `huggingface-cache-pvc`. Extended `ray-service.yaml`: new `virchow2` application, HF_HOME envs, HF_TOKEN secret in GPU workers, added huggingface-cache volume mounts to cpu/gpu workers, updated head/worker images and resources, switched GPU nodeSelector to MIG mapping, and adjusted autoscaling/actor resources for Virchow2.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant RayServe as Ray Serve (Virchow2)
    participant Provider as providers.huggingface
    participant PVC as HuggingFace PVC
    participant HFHub as HuggingFace Hub
    participant DownloaderJob as Downloader Job

    Client->>RayServe: POST LZ4-compressed images
    RayServe->>PVC: ensure model files available (HF_HOME)
    RayServe->>Provider: request repo_id path
    Provider-->>PVC: return local cache path (may be empty)
    alt cache miss
        RayServe->>DownloaderJob: (scheduled) run downloader job
        DownloaderJob->>HFHub: snapshot_download (with HF_TOKEN)
        HFHub-->>PVC: write model files
    end
    RayServe->>PVC: read model files
    RayServe->>RayServe: preprocess, batch, infer with timm model
    RayServe-->>Client: return embeddings

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇 I hopped to the cache and found a bright seed,
A model unrolled with embeddings to feed,
From HuggingFace burrow to PVC nest,
Ray serves it onward — the rabbits impressed!
🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 22.22% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'Virchow2 model' is vague and generic, using only the model name without describing what was actually changed or added (e.g., new deployment, integration, setup).	Use a more descriptive title that clarifies the main change, such as 'Add Virchow2 model deployment to Ray Serve' or 'Integrate Virchow2 foundation model support'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/virchow2-model

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-03-13T19:39:33Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the model serving infrastructure by integrating the Virchow2 foundation model and optimizing existing models for GPU performance. It establishes a robust framework for deploying advanced deep learning models, leveraging TensorRT for efficient inference and Hugging Face for model management. The changes also include necessary infrastructure updates for dependency management and persistent caching, ensuring a more efficient and scalable system.

Highlights

Virchow2 Model Integration: Introduced the Virchow2 foundation model (paige-ai/Virchow2) as a new Ray Serve deployment, enabling its use within the existing infrastructure.
GPU Acceleration with TensorRT: Enabled GPU acceleration for existing BinaryClassifier and SemanticSegmentation models by integrating TensorRT, significantly improving inference performance.
Infrastructure Updates for Model Caching: Added support for persistent caching of Hugging Face models and TensorRT engines through new Persistent Volume Claims (PVCs) and a dedicated model downloader script.
Dockerfile and Dependency Management: Updated Dockerfiles to include necessary Python packages like PyTorch, timm, and huggingface_hub, and created a new GPU-specific Dockerfile with source builds for image processing libraries and TensorRT.
Ray Serve Configuration Enhancements: Adjusted resource allocations, autoscaling parameters, and added volume mounts in ray-service.yaml to optimize performance and resource utilization across all deployments, including the new Virchow2 model.

Changelog

builders/heatmap_builder.py
- Added a call to mask_builder.flush() to ensure data is written before saving.
docker/Dockerfile.cpu
- Updated pip install commands to include torch, torchvision, timm, and huggingface-hub.
docker/Dockerfile.gpu
- Added a new Dockerfile for GPU environments, including building libvips and openslide from source, and installing onnxruntime-gpu, tensorrt, torch (CUDA-enabled), torchvision (CUDA-enabled), timm, and huggingface-hub.
misc/tile_heatmap_builder.py
- Introduced a flush method to explicitly flush image and count data.
misc/virchow2_downloader/download_virchow2.py
- Added a Python script to download the paige-ai/Virchow2 model from Hugging Face Hub and verify its loading.
misc/virchow2_downloader/virchow2_downloader_job.yaml
- Added a Kubernetes Job definition to run the download_virchow2.py script, configuring security contexts, resources, environment variables, and volume mounts for caching.
models/binary_classifier.py
- Imported asyncio and os.
- Removed mean and std from Config as normalization is now handled by the model.
- Modified reconfigure to be synchronous and include TensorRT execution provider options, cache path creation, graph optimizations, and a model warmup step.
- Updated predict to handle uint8 input directly and flatten outputs.
- Adjusted root method to use asyncio.to_thread for decompression and transpose for image reshaping.
models/semantic_segmentation.py
- Imported os.
- Modified reconfigure to be synchronous and include TensorRT execution provider options, cache path creation, graph optimizations, and a model warmup step.
- Updated providers list to prioritize TensorrtExecutionProvider.
models/virchow2.py
- Added a new Ray Serve deployment for the Virchow2 model, including __init__, reconfigure (for model loading, transformations, and warmup), predict (for inference and embedding extraction), and root (for handling requests).
providers/model_provider.py
- Added a new provider function for fetching models from Hugging Face Hub, supporting both single files and snapshots, with local caching.
pvc/huggingface-pvc.yaml
- Added a Persistent Volume Claim for the Hugging Face cache with 15Gi storage.
pvc/tensorrt-cache-pvc.yaml
- Added a Persistent Volume Claim for the TensorRT cache with 20Gi storage.
pyproject.toml
- Added 'Jiří Štípek' as an author.
ray-service.yaml
- Updated BinaryClassifier deployment: changed num_cpus to 4, added num_gpus to 1, reduced memory to 4Gi, decreased max_batch_size to 16, reduced batch_wait_timeout_s to 0.01, removed mean and std config, and updated artifact_uri.
- Updated SemanticSegmentation deployment: reduced max_queued_requests to 32, max_replicas to 2, target_ongoing_requests to 8, num_cpus to 4, added num_gpus to 1, increased max_batch_size to 8, and reduced batch_wait_timeout_s to 0.1.
- Updated HeatmapBuilder deployment: increased max_replicas to 4, num_cpus to 8, num_threads to 8, and max_concurrent_tasks to 24.
- Added a new virchow2 service with a Virchow2 deployment, configuring runtime environment, resource limits (8 CPUs, 1 GPU, 8Gi memory), autoscaling, and user configuration for tile size, batching, and model provider.
- Updated ray-head image to cerit.io/rationai/model-service:2.53.0 and increased memory limits/requests.
- Added HTTPS_PROXY environment variable to ray-head and ray-worker containers.
- Added securityContext and lifecycle hooks to ray-worker containers.
- Added trt-cache-volume and huggingface-cache volume mounts to ray-worker containers.
- Added gpu-workers group with specific GPU resource requests (nvidia.com/mig-2g.20gb: 1).
- Added trt-cache-volume and huggingface-cache PVCs to the volumes section.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces support for the Virchow2 foundation model within the Ray Serve infrastructure, which is a significant addition. The changes include a new Ray Serve deployment for Virchow2, updates to Dockerfiles for GPU support with necessary dependencies, and Kubernetes configurations for model downloading and caching. The refactoring of existing models to leverage GPU and TensorRT is a great performance enhancement. However, the pull request introduces critical security vulnerabilities by including hardcoded secrets in configuration files. These must be addressed by using a secure secret management solution like Kubernetes Secrets. Additionally, there are minor areas for improvement regarding Docker image consistency and file permissions.

misc/virchow2_downloader/virchow2_downloader_job.yaml

ray-service.yaml

docker/Dockerfile.gpu

misc/virchow2_downloader/virchow2_downloader_job.yaml

models/binary_classifier.py

Jurgee · 2026-03-14T13:57:56Z

ray-service.yaml

            containers:
              - name: ray-worker
-                image: cerit.io/rationai/model-service:2.53.0-gpu
+                image: cerit.io/rationai/model-service:latest-gpu


✔️ New docker image was pushed
https://github.com/RationAI/model-service/actions/runs/23088868861/job/67069978707

Copilot

Pull request overview

Adds support for the Virchow2 foundation model (paige-ai/Virchow2) as a new Ray Serve application, including offline Hugging Face caching and container/runtime updates needed to run the model in the existing model-service stack.

Changes:

Introduces a new Virchow2 Ray Serve deployment with a /virchow2 route and Hugging Face cache mounting.
Adds a Hugging Face model provider plus a Kubernetes Job to pre-download/cache the model for offline serving.
Updates Docker images/dependencies and removes ONNX Runtime intra_op_num_threads configuration plumbing.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
ray-service.yaml	Adds Virchow2 Serve app, mounts HF cache PVC, updates worker/head images and GPU worker settings
pvc/huggingface-pvc.yaml	Adds a new RWX PVC for Hugging Face cache storage
providers/model_provider.py	Adds a Hugging Face provider helper for (offline) hub downloads
models/virchow2.py	New Ray Serve deployment implementing Virchow2 embedding inference via timm + torch
models/semantic_segmentation.py	Removes `intra_op_num_threads` wiring from config/session options
models/binary_classifier.py	Removes `intra_op_num_threads` wiring and reformats ORT call
misc/virchow2_downloader/virchow2_downloader_job.yaml	Adds a Job manifest intended to pre-populate the HF cache PVC
misc/virchow2_downloader/download_virchow2.py	Adds the downloader script used by the Job
docker/Dockerfile.gpu	Adds TensorRT/torch/timm/huggingface-hub dependencies for Virchow2 runtime
docker/Dockerfile.cpu	Reformats pip install (currently introduces a Dockerfile syntax issue)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ray-service.yaml

pvc/huggingface-pvc.yaml

providers/model_provider.py

models/virchow2.py

docker/Dockerfile.cpu

misc/virchow2_downloader/virchow2_downloader_job.yaml

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (4)

models/virchow2.py (1)

57-63: Avoid mutating config["model"] inside reconfigure.

Using pop on the incoming config introduces avoidable side effects and can make repeated reconfiguration brittle.

♻️ Proposed refactor

-        module_path, attr_name = config["model"].pop("_target_").split(":")
+        model_cfg = dict(config["model"])
+        module_path, attr_name = model_cfg.pop("_target_").split(":")
         provider = getattr(importlib.import_module(module_path), attr_name)
-        repo_id = config["model"]["repo_id"]
+        repo_id = model_cfg["repo_id"]
@@
-        provider(**config["model"])
+        provider(**model_cfg)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@models/virchow2.py` around lines 57 - 63, The code in reconfigure currently
mutates config["model"] by calling pop("_target_"); instead, read the target
into a local variable and avoid modifying the original dict: extract target =
config["model"].get("_target_") (or target_str = config["model"]["_target_"]),
split it into module_path and attr_name, import provider as you do now, and pass
a shallow copy of the kwargs to provider (e.g., kwargs = dict(config["model"]);
kwargs.pop("_target_", None); provider(**kwargs)). Keep repo_id and the logger
usage the same but do not call pop on config["model"] itself.

misc/virchow2_downloader/virchow2_downloader_job.yaml (2)

30-33: Pin package versions for reproducibility.

The pip install command installs unpinned packages, which could lead to version drift between job runs or unexpected failures if a new incompatible version is released.

📦 Proposed fix

          args:
            - |
-             pip install --user --no-cache-dir huggingface_hub transformers torch timm
+             pip install --user --no-cache-dir huggingface_hub==0.27.0 transformers==4.47.0 torch==2.5.1 timm==1.0.12
              python3 /mnt/scripts/download_virchow2.py

Alternatively, consider building a dedicated downloader image with dependencies pre-installed to eliminate runtime network dependencies on PyPI.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@misc/virchow2_downloader/virchow2_downloader_job.yaml` around lines 30 - 33,
The pip install line that runs in the job args ("pip install --user
--no-cache-dir huggingface_hub transformers torch timm") is unpinned and should
be changed to use explicit version pins (e.g., huggingface_hub==x.y.z,
transformers==x.y.z, torch==x.y.z, timm==x.y.z) or replace the runtime install
with a prebuilt downloader image that already contains those versions; update
the args to install the pinned versions (or switch the job to use the built
image) so that running python3 /mnt/scripts/download_virchow2.py uses
deterministic dependency versions.

25-28: Consider enabling readOnlyRootFilesystem: true for defense-in-depth.

Since /tmp is already mounted as an emptyDir and HOME is set to /tmp, pip with --user will write packages there. The root filesystem can likely be made read-only, which reduces attack surface if the container is compromised.

🛡️ Proposed fix

          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop: ["ALL"]
+           readOnlyRootFilesystem: true

Test that the job still completes successfully with readOnlyRootFilesystem: true enabled, as some Python operations may require writes outside /tmp.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@misc/virchow2_downloader/virchow2_downloader_job.yaml` around lines 25 - 28,
The pod spec's securityContext currently sets allowPrivilegeEscalation: false
and drops all capabilities but doesn't set readOnlyRootFilesystem; add
readOnlyRootFilesystem: true to the same securityContext to harden the container
(refer to securityContext, allowPrivilegeEscalation, capabilities in the
manifest), then ensure any writable paths (e.g., the existing emptyDir at /tmp
and HOME=/tmp used for pip --user installs) remain mounted and writable; after
the change run the virchow2_downloader job to verify it completes successfully
and adjust mounts or HOME if any Python tooling needs additional writable paths.

ray-service.yaml (1)

256-256: Consider pinning the GPU worker image tag.

Using latest-gpu can lead to unexpected behavior if the image is updated while pods are rescheduled. The head and cpu-worker images use a pinned version (2.53.0), but the gpu-worker uses latest-gpu.

📌 Proposed fix

-               image: cerit.io/rationai/model-service:latest-gpu
+               image: cerit.io/rationai/model-service:2.53.0-gpu

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ray-service.yaml` at line 256, The gpu worker image is using the floating tag
"cerit.io/rationai/model-service:latest-gpu" which can cause unpredictable
rollouts; change the image reference for the GPU worker to a pinned version
(match the head/cpu-worker pattern, e.g. replace
"cerit.io/rationai/model-service:latest-gpu" with a specific tag like
"cerit.io/rationai/model-service:2.53.0" or
"cerit.io/rationai/model-service:2.53.0-gpu") in the container spec that defines
the GPU worker so the image is immutable during reschedules.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docker/Dockerfile.cpu`:
- Around line 54-55: The final RUN pip install command in Dockerfile.cpu ends
with a dangling backslash which breaks Docker parsing; update the RUN
instruction (the "RUN pip install --no-cache-dir onnxruntime lz4 ratiopath \"
line) by removing the trailing backslash or adding the missing continuation
token/package so the command is a complete single-line install or a properly
continued multi-line install, ensuring the command terminates correctly.

In `@misc/virchow2_downloader/download_virchow2.py`:
- Around line 41-43: The verification exception handler in download_virchow2.py
currently swallows errors with "except Exception as e: print(f\"Verification
warning: {e}\")"; change this so verification failures fail the job by either
re-raising the exception or exiting with a non-zero status (e.g., raise or call
sys.exit(1)), and include the original error message in the log; update the
except block that catches Exception as e to log the error and then raise e (or
call sys.exit(1)) so the process does not exit successfully on verification
failure.

In `@models/virchow2.py`:
- Around line 98-101: The current context manager unconditionally uses
torch.autocast(device_type="cuda", dtype=torch.float16) which breaks CPU-only
runs; modify the inference context in models/virchow2.py (the with
torch.inference_mode() block that currently nests torch.autocast(...)) to only
enable CUDA autocast when the selected runtime device is CUDA (e.g., check the
model/device variable or torch device.type), otherwise use a no-op context
(contextlib.nullcontext) or skip autocast; ensure you reference the same context
manager expression so CPU paths do not attempt CUDA autocast.

---

Nitpick comments:
In `@misc/virchow2_downloader/virchow2_downloader_job.yaml`:
- Around line 30-33: The pip install line that runs in the job args ("pip
install --user --no-cache-dir huggingface_hub transformers torch timm") is
unpinned and should be changed to use explicit version pins (e.g.,
huggingface_hub==x.y.z, transformers==x.y.z, torch==x.y.z, timm==x.y.z) or
replace the runtime install with a prebuilt downloader image that already
contains those versions; update the args to install the pinned versions (or
switch the job to use the built image) so that running python3
/mnt/scripts/download_virchow2.py uses deterministic dependency versions.
- Around line 25-28: The pod spec's securityContext currently sets
allowPrivilegeEscalation: false and drops all capabilities but doesn't set
readOnlyRootFilesystem; add readOnlyRootFilesystem: true to the same
securityContext to harden the container (refer to securityContext,
allowPrivilegeEscalation, capabilities in the manifest), then ensure any
writable paths (e.g., the existing emptyDir at /tmp and HOME=/tmp used for pip
--user installs) remain mounted and writable; after the change run the
virchow2_downloader job to verify it completes successfully and adjust mounts or
HOME if any Python tooling needs additional writable paths.

In `@models/virchow2.py`:
- Around line 57-63: The code in reconfigure currently mutates config["model"]
by calling pop("_target_"); instead, read the target into a local variable and
avoid modifying the original dict: extract target =
config["model"].get("_target_") (or target_str = config["model"]["_target_"]),
split it into module_path and attr_name, import provider as you do now, and pass
a shallow copy of the kwargs to provider (e.g., kwargs = dict(config["model"]);
kwargs.pop("_target_", None); provider(**kwargs)). Keep repo_id and the logger
usage the same but do not call pop on config["model"] itself.

In `@ray-service.yaml`:
- Line 256: The gpu worker image is using the floating tag
"cerit.io/rationai/model-service:latest-gpu" which can cause unpredictable
rollouts; change the image reference for the GPU worker to a pinned version
(match the head/cpu-worker pattern, e.g. replace
"cerit.io/rationai/model-service:latest-gpu" with a specific tag like
"cerit.io/rationai/model-service:2.53.0" or
"cerit.io/rationai/model-service:2.53.0-gpu") in the container spec that defines
the GPU worker so the image is immutable during reschedules.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 745968a1-d5c6-4efc-bab4-bab8d3a8ff7b

📥 Commits

Reviewing files that changed from the base of the PR and between 6beb06f and 8dfea82.

📒 Files selected for processing (10)

docker/Dockerfile.cpu
docker/Dockerfile.gpu
misc/virchow2_downloader/download_virchow2.py
misc/virchow2_downloader/virchow2_downloader_job.yaml
models/binary_classifier.py
models/semantic_segmentation.py
models/virchow2.py
providers/model_provider.py
pvc/huggingface-pvc.yaml
ray-service.yaml

docker/Dockerfile.cpu

misc/virchow2_downloader/download_virchow2.py

models/virchow2.py

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ray-service.yaml`:
- Line 256: The image tag for the model service is using a mutable tag
("cerit.io/rationai/model-service:latest-gpu") which can drift from the pinned
rayVersion (rayVersion: 2.53.0); update the image field to a specific, versioned
tag that matches the pinned rayVersion pattern used elsewhere (follow the
pattern at the other image entries near the keys on lines where images are
pinned), e.g. replace "latest-gpu" with the explicit version string used for
rayVersion so the model-service image is deterministically tied to rayVersion.
- Around line 92-96: The working_dir currently points at a branch zip
(working_dir:
https://github.com/RationAI/model-service/archive/refs/heads/master.zip) which
is mutable; replace that URL with one that pins a specific immutable commit SHA
(e.g., https://github.com/RationAI/model-service/archive/<COMMIT_SHA>.zip), so
update the runtime_env.working_dir to reference the chosen commit SHA, and
ensure any deployment automation that updates the repo replaces the SHA
intentionally when you want a new release; look for runtime_env and working_dir
in the diff to locate the change.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a0be4b40-3f9b-409a-989c-446443aa4506

📥 Commits

Reviewing files that changed from the base of the PR and between 8dfea82 and fb646c4.

📒 Files selected for processing (2)

docker/Dockerfile.cpu
ray-service.yaml

🚧 Files skipped from review as they are similar to previous changes (1)

docker/Dockerfile.cpu

ray-service.yaml

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

matejpekar and others added 22 commits January 17, 2026 11:47

feat: tensorrt support

a167a22

fix: remove flush

1d3310f

feat: add docker files for cpu/gpu

4e27a48

feat: add PVC for TensorRT

fd3154d

feat: add support of TensorRT for models

eaac807

feat: add TensorRT cache to workers

46fe8b1

add Jiri as coauthor

f07723e

fix: remove gpu number from serve.deployment in code

9d6e265

fix: warning suppress

e7612f9

feat: add jobs to download virchow2

5945f10

feat: add model provider for hf

8ef4cc5

feat: add pvc for huggingface

a6c427e

feat: add virchow2 model

27c7801

fix

e5d84cb

fix: fine tune

e1fcb6c

feat: add into dockerfile

e7ac073

fix: remove installs from model

51f07a4

fix: based on official docs

178f226

fix

5cc123f

fix: remove comment

964114e

chore: update docker gpu file

181f79e

feat: optimalize virchow2 deployment

156a7d4

gemini-code-assist bot reviewed Mar 13, 2026

View reviewed changes

Jurgee self-assigned this Mar 13, 2026

Jurgee added 4 commits March 14, 2026 11:38

fix: remove hf token, create new secret

57176d6

fix

fe51ee2

Merge branch 'main' into feature/virchow2-model

9f37d03

fix: remove intra threads

210c7e6

Jurgee added 6 commits March 14, 2026 12:00

fix: lint

bf7cff1

fix: remove duplicity

6813264

fixes

7510c9f

docker files

2eae503

fix: docker

c5095bd

chore: new docker image

e94baec

Jurgee commented Mar 14, 2026

View reviewed changes

Jurgee added 4 commits March 14, 2026 15:29

chore: cpu docker

7cdd290

fix

8cce2cb

final changes

7e329a8

fix: usage of master branch

8dfea82

Jurgee marked this pull request as ready for review March 15, 2026 12:04

Jurgee requested review from a team, JakubPekar, Copilot and ejdam87 March 15, 2026 12:04

Copilot started reviewing on behalf of Jurgee March 15, 2026 12:04 View session

Copilot AI reviewed Mar 15, 2026

View reviewed changes

Jurgee and others added 2 commits March 15, 2026 13:10

Potential fix for pull request finding

e6f8603

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Potential fix for pull request finding

fb646c4

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

coderabbitai bot reviewed Mar 15, 2026

View reviewed changes

docker/Dockerfile.cpu Outdated Show resolved Hide resolved

misc/virchow2_downloader/download_virchow2.py Show resolved Hide resolved

models/virchow2.py Show resolved Hide resolved

Potential fix for pull request finding

b2d083c

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

coderabbitai bot reviewed Mar 15, 2026

View reviewed changes

ray-service.yaml Show resolved Hide resolved

ray-service.yaml Show resolved Hide resolved

Potential fix for pull request finding

bfd90a9

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Conversation

Jurgee commented Mar 13, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

gemini-code-assist bot commented Mar 13, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jurgee Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Jurgee commented Mar 13, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 13, 2026 •

edited

Loading