Spilt ray serve by Jurgee · Pull Request #4 · RationAI/model-service

Jurgee · 2026-03-13T19:55:03Z

No description provided.

coderabbitai · 2026-03-13T19:55:16Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Free

Run ID: fa853b55-9917-47d1-96e1-1bb05e604fab

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Coding Plan

Generate coding plan for human review comments

Note

🎁 Summarized by CodeRabbit Free

The PR author is not assigned a seat. To perform a comprehensive line-by-line review, please assign a seat to the pull request author through the subscription management page by visiting https://app.coderabbit.ai/login.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

CodeRabbit can use your project's `ruff` configuration to improve the quality of Python code reviews.

Add a Ruff configuration file to your project to customize how CodeRabbit runs ruff.

gemini-code-assist · 2026-03-13T19:55:31Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request establishes a robust and scalable infrastructure for deploying machine learning models using Ray Serve, orchestrated via Kustomize. The primary goal is to streamline the process of defining, configuring, and deploying various model services, ensuring efficient resource utilization and simplified management within a Kubernetes environment. By separating model definitions and dynamically merging them, the system becomes highly adaptable to evolving model landscapes and diverse computational needs.

Highlights

Modular Ray Serve Deployment: Introduced a new, modular deployment structure for Ray Serve applications using Kustomize, enabling flexible configuration and management of ML models.
Dynamic Model Configuration Merging: Implemented a Python script (merge_models.py) to dynamically combine multiple individual model definitions into a single Ray Serve configuration patch, simplifying the addition of new models.
CPU and GPU Worker Support: Configured distinct Kustomize components for defining both CPU and GPU worker groups within the Ray cluster, allowing for optimized resource allocation based on model requirements.
Pre-defined Model Services: Included initial definitions for several model services, such as semantic segmentation (episeg), heatmap builder, and prostate classifier, demonstrating the new modular approach.

Changelog

deploy.sh
- Added a new shell script to automate the process of merging model configurations and applying Kustomize overlays for Ray Serve deployment.
kustomize/base/kustomization.yaml
- Created a base Kustomization file to define the foundational resources for the Ray Serve setup.
kustomize/base/ray-service-base.yaml
- Added the core RayService definition, establishing the basic structure for the model-splitting service with default head group specifications.
kustomize/components/cpu-workers/cpu-workers-patch.yaml
- Introduced a Kustomize patch to add CPU worker group specifications, including resource limits, security contexts, and volume mounts, to the RayService.
kustomize/components/cpu-workers/kustomization.yaml
- Created a Kustomization component to apply the CPU worker group patch to the RayService.
kustomize/components/gpu-workers/gpu-workers-patch.yaml
- Introduced a Kustomize patch to add GPU worker group specifications, including GPU resource limits, node selectors, and volume mounts, to the RayService.
kustomize/components/gpu-workers/kustomization.yaml
- Created a Kustomization component to apply the GPU worker group patch to the RayService.
kustomize/components/models/kustomization.yaml
- Created a Kustomization component to apply the dynamically generated serve configuration patch to the RayService.
kustomize/components/models/merge_models.py
- Added a Python script responsible for reading individual model definition YAML files, merging their 'applications' sections, and generating a unified 'serve-config-patch.yaml'.
kustomize/components/models/models-definitions/episeg.yaml
- Added a YAML file defining the 'episeg-1' semantic segmentation model application, including its import path, route prefix, runtime environment, and deployment configurations.
kustomize/components/models/models-definitions/heatmap.yaml
- Added a YAML file defining the 'heatmap-builder' application, specifying its import path, route prefix, runtime environment, and deployment settings.
kustomize/components/models/models-definitions/prostate.yaml
- Added a YAML file defining the 'prostate-classifier-1' binary classification model application, detailing its import path, route prefix, runtime environment, and deployment parameters.
kustomize/components/models/serve-config-patch.yaml
- Added the generated Kustomize patch containing the merged Ray Serve application configurations, which is produced by the merge_models.py script.
kustomize/overlays/kustomization.yaml
- Created an overlay Kustomization file that combines the base RayService definition with the various component patches (models, CPU workers, GPU workers) for a complete deployment.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request sets up a Kustomize-based deployment for a Ray Serve application, splitting configurations for base, models, and workers. The structure is good, but there are several critical and high-severity issues to address. These include using a non-reproducible master branch for a dependency, hardcoding environment-specific proxy URLs across multiple files, and using unencrypted communication. There are also medium-severity maintainability concerns regarding a confusing workflow for a generated file and significant duplication in worker configurations. Addressing these will improve the deployment's robustness, security, and maintainability. Also, the pull request title seems to have a typo ('Spilt' instead of 'Split').

gemini-code-assist · 2026-03-13T19:57:09Z

kustomize/components/models/models-definitions/episeg.yaml

+    import_path: models.semantic_segmentation:app
+    route_prefix: /episeg-1
+    runtime_env:
+      working_dir: https://gitlab.ics.muni.cz/rationai/infrastructure/model-service/-/archive/master/model-service-master.zip


The working_dir points to a zip archive of the master branch. This is a critical issue for reproducibility and stability. Any change pushed to the master branch will be automatically picked up in deployments, potentially introducing breaking changes or bugs without a corresponding change in this repository. This also affects heatmap.yaml and prostate.yaml.

To ensure predictable and stable deployments, please change this to use an immutable git reference, such as a commit SHA or a tag. For example: https://gitlab.ics.muni.cz/rationai/infrastructure/model-service/-/archive/v1.2.3/model-service-v1.2.3.zip

gemini-code-assist · 2026-03-13T19:57:09Z

kustomize/base/ray-service-base.yaml

+                - name: HTTPS_PROXY
+                  value: http://proxy.ics.muni.cz:3128


A hardcoded proxy URL is used. This value is repeated in cpu-workers-patch.yaml and gpu-workers-patch.yaml. Hardcoding environment-specific configuration makes the deployment less portable and harder to manage across different environments (e.g., dev, staging, prod). It also makes the configuration brittle, as any change to the proxy needs to be updated in multiple places.

Consider externalizing this configuration. For example, you could use a ConfigMap and envFrom to inject the proxy settings into your pods.

gemini-code-assist · 2026-03-13T19:57:09Z

kustomize/components/cpu-workers/cpu-workers-patch.yaml

+              - name: HTTPS_PROXY
+                value: http://proxy.ics.muni.cz:3128


This hardcoded proxy URL is also present in ray-service-base.yaml and gpu-workers-patch.yaml. Hardcoding environment-specific configuration makes the deployment less portable and harder to manage.

To improve this, the proxy configuration should be defined in one place and injected into the containers. Using a ConfigMap with envFrom is a common pattern for this in Kubernetes.

gemini-code-assist · 2026-03-13T19:57:09Z

kustomize/components/gpu-workers/gpu-workers-patch.yaml

+              - name: HTTPS_PROXY
+                value: http://proxy.ics.muni.cz:3128


This hardcoded proxy URL is repeated from other configuration files (ray-service-base.yaml and cpu-workers-patch.yaml). This duplication makes the configuration difficult to maintain.

It's recommended to centralize this configuration, for instance in a ConfigMap, and reference it in all the necessary pod specs.

gemini-code-assist · 2026-03-13T19:57:09Z

deploy.sh

+set -e
+
+python kustomize/components/models/merge_models.py
+kubectl apply -k kustomize/overlays -n rationai-jobs-ns


It's a good practice to end shell scripts with a newline character. Some tools and shells might behave unexpectedly without it.

Suggested change

kubectl apply -k kustomize/overlays -n rationai-jobs-ns

kubectl apply -k kustomize/overlays -n rationai-jobs-ns

gemini-code-assist · 2026-03-13T19:57:09Z

kustomize/components/cpu-workers/cpu-workers-patch.yaml

+  value:
+    groupName: cpu-workers
+    replicas: 0
+    minReplicas: 0
+    maxReplicas: 2
+    template:
+      spec:
+        securityContext:
+          fsGroupChangePolicy: OnRootMismatch
+          runAsNonRoot: true
+          seccompProfile:
+            type: RuntimeDefault
+        containers:
+          - name: ray-worker
+            image: cerit.io/rationai/model-service:2.53.0
+            imagePullPolicy: Always
+            resources:
+              limits:
+                cpu: 8
+                memory: 16Gi
+              requests:
+                cpu: 8
+                memory: 16Gi
+            env:
+              - name: HTTPS_PROXY
+                value: http://proxy.ics.muni.cz:3128
+            securityContext:
+              allowPrivilegeEscalation: false
+              capabilities:
+                drop: ["ALL"]
+              runAsUser: 1000
+            lifecycle:
+              preStop:
+                exec:
+                  command: ["/bin/sh", "-c", "ray stop"]
+            volumeMounts:
+              - name: data
+                mountPath: /mnt/data
+              - name: public-data
+                mountPath: /mnt/data/Public
+              - name: projects
+                mountPath: /mnt/projects
+              - name: bioptic-tree
+                mountPath: /mnt/bioptic_tree
+              - name: trt-cache-volume
+                mountPath: /mnt/cache
+
+        volumes:
+          - name: data
+            persistentVolumeClaim:
+              claimName: data-ro
+          - name: public-data
+            persistentVolumeClaim:
+              claimName: rationai-data-ro-pvc-jobs
+          - name: projects
+            persistentVolumeClaim:
+              claimName: projects-rw
+          - name: bioptic-tree
+            persistentVolumeClaim:
+              claimName: bioptictree-ro
+          - name: trt-cache-volume
+            persistentVolumeClaim:
+              claimName: tensorrt-cache-pvc


There is a large amount of configuration duplicated between this file and kustomize/components/gpu-workers/gpu-workers-patch.yaml. Sections like securityContext, lifecycle, volumeMounts, and volumes are identical. This duplication increases the maintenance burden, as changes need to be made in multiple places, increasing the risk of inconsistencies.

Consider refactoring to reduce this duplication. For example, you could use kustomize's patches feature in an overlay to apply common settings to multiple resources.

gemini-code-assist · 2026-03-13T19:57:09Z

kustomize/components/models/merge_models.py

+import os
+
+import yaml
+
+
+script_dir = os.path.dirname(os.path.abspath(__file__))
+models_definitions_dir = os.path.join(script_dir, "models-definitions")
+output_file = os.path.join(script_dir, "serve-config-patch.yaml")
+
+model_files = [f for f in os.listdir(models_definitions_dir) if f.endswith(".yaml")]
+
+if not model_files:
+    raise RuntimeError(f"No model definition files found in {models_definitions_dir}")
+
+merged_applications = []
+
+for file_name in sorted(model_files):
+    file_path = os.path.join(models_definitions_dir, file_name)
+    with open(file_path) as f:
+        data = yaml.safe_load(f)
+        if not data or "applications" not in data:
+            raise RuntimeError(f"File {file_name} is missing 'applications' key")
+        merged_applications.extend(data["applications"])
+
+serve_config_str = yaml.dump({"applications": merged_applications}, sort_keys=False)
+
+
+# Literal block scalar wrapper
+class LiteralString(str):
+    pass
+
+
+def literal_presenter(dumper, data):
+    return dumper.represent_scalar("tag:yaml.org,2002:str", data, style="|")
+
+
+yaml.add_representer(LiteralString, literal_presenter)
+
+patch = {
+    "apiVersion": "ray.io/v1",
+    "kind": "RayService",
+    "metadata": {"name": "rayservice-model-split"},
+    "spec": {"serveConfigV2": LiteralString(serve_config_str)},
+}
+
+with open(output_file, "w") as f:
+    yaml.dump(patch, f, sort_keys=False)
+
+print(f"Generated {output_file} from {len(model_files)} model files:")
+for f in sorted(model_files):
+    print(f"  - {f}")


This script generates serve-config-patch.yaml, which is also checked into version control. However, deploy.sh also runs this script before deployment. This dual approach is confusing and can lead to maintenance problems, such as manual edits being overwritten or outdated generated files being deployed.

To clarify the workflow, please choose one of these patterns:

Fully generated (recommended for CI/CD): Add kustomize/components/models/serve-config-patch.yaml to your .gitignore file and let the deploy.sh script generate it on-the-fly during deployment.

Developer-managed: Remove the execution of this script from deploy.sh. Developers will be responsible for running it locally to update serve-config-patch.yaml and committing the changes.

gemini-code-assist · 2026-03-13T19:57:09Z

kustomize/components/models/models-definitions/episeg.yaml

+          memory: 12884901888 # 12 GiB
+          runtime_env:
+            env_vars:
+              MLFLOW_TRACKING_URI: http://mlflow.rationai-mlflow:5000


The MLflow tracking URI is configured to use http, which means the communication is unencrypted. If any sensitive data is exchanged with the MLflow server, it could be exposed on the network. This also applies to prostate.yaml.

If the network between these services is not fully trusted, you should consider enabling TLS on the MLflow server and using an https URI.

JiriStipek added 3 commits February 23, 2026 19:09

feat: add more smaller yamls

32434f6

feat: add deploy script

3f74fb2

feat: add deploy script

df76f0f

Jurgee self-assigned this Mar 13, 2026

gemini-code-assist bot reviewed Mar 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spilt ray serve#4

Spilt ray serve#4
Jurgee wants to merge 3 commits intomainfrom
feature/spilt-ray-serve

Jurgee commented Mar 13, 2026

Uh oh!

coderabbitai bot commented Mar 13, 2026

Review skipped

Uh oh!

gemini-code-assist bot commented Mar 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 13, 2026

Uh oh!

gemini-code-assist bot Mar 13, 2026

Uh oh!

gemini-code-assist bot Mar 13, 2026

Uh oh!

gemini-code-assist bot Mar 13, 2026

Uh oh!

gemini-code-assist bot Mar 13, 2026

Uh oh!

gemini-code-assist bot Mar 13, 2026

Uh oh!

gemini-code-assist bot Mar 13, 2026

Uh oh!

gemini-code-assist bot Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	kubectl apply -k kustomize/overlays -n rationai-jobs-ns
	kubectl apply -k kustomize/overlays -n rationai-jobs-ns

Conversation

Jurgee commented Mar 13, 2026

Uh oh!

coderabbitai bot commented Mar 13, 2026

Review skipped

Uh oh!

gemini-code-assist bot commented Mar 13, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant