Skip to content

Spilt ray serve#4

Draft
Jurgee wants to merge 3 commits intomainfrom
feature/spilt-ray-serve
Draft

Spilt ray serve#4
Jurgee wants to merge 3 commits intomainfrom
feature/spilt-ray-serve

Conversation

@Jurgee
Copy link
Collaborator

@Jurgee Jurgee commented Mar 13, 2026

No description provided.

@Jurgee Jurgee self-assigned this Mar 13, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 13, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Free

Run ID: fa853b55-9917-47d1-96e1-1bb05e604fab

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Coding Plan
  • Generate coding plan for human review comments

Note

🎁 Summarized by CodeRabbit Free

The PR author is not assigned a seat. To perform a comprehensive line-by-line review, please assign a seat to the pull request author through the subscription management page by visiting https://app.coderabbit.ai/login.

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can use your project's `ruff` configuration to improve the quality of Python code reviews.

Add a Ruff configuration file to your project to customize how CodeRabbit runs ruff.

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request establishes a robust and scalable infrastructure for deploying machine learning models using Ray Serve, orchestrated via Kustomize. The primary goal is to streamline the process of defining, configuring, and deploying various model services, ensuring efficient resource utilization and simplified management within a Kubernetes environment. By separating model definitions and dynamically merging them, the system becomes highly adaptable to evolving model landscapes and diverse computational needs.

Highlights

  • Modular Ray Serve Deployment: Introduced a new, modular deployment structure for Ray Serve applications using Kustomize, enabling flexible configuration and management of ML models.
  • Dynamic Model Configuration Merging: Implemented a Python script (merge_models.py) to dynamically combine multiple individual model definitions into a single Ray Serve configuration patch, simplifying the addition of new models.
  • CPU and GPU Worker Support: Configured distinct Kustomize components for defining both CPU and GPU worker groups within the Ray cluster, allowing for optimized resource allocation based on model requirements.
  • Pre-defined Model Services: Included initial definitions for several model services, such as semantic segmentation (episeg), heatmap builder, and prostate classifier, demonstrating the new modular approach.
Changelog
  • deploy.sh
    • Added a new shell script to automate the process of merging model configurations and applying Kustomize overlays for Ray Serve deployment.
  • kustomize/base/kustomization.yaml
    • Created a base Kustomization file to define the foundational resources for the Ray Serve setup.
  • kustomize/base/ray-service-base.yaml
    • Added the core RayService definition, establishing the basic structure for the model-splitting service with default head group specifications.
  • kustomize/components/cpu-workers/cpu-workers-patch.yaml
    • Introduced a Kustomize patch to add CPU worker group specifications, including resource limits, security contexts, and volume mounts, to the RayService.
  • kustomize/components/cpu-workers/kustomization.yaml
    • Created a Kustomization component to apply the CPU worker group patch to the RayService.
  • kustomize/components/gpu-workers/gpu-workers-patch.yaml
    • Introduced a Kustomize patch to add GPU worker group specifications, including GPU resource limits, node selectors, and volume mounts, to the RayService.
  • kustomize/components/gpu-workers/kustomization.yaml
    • Created a Kustomization component to apply the GPU worker group patch to the RayService.
  • kustomize/components/models/kustomization.yaml
    • Created a Kustomization component to apply the dynamically generated serve configuration patch to the RayService.
  • kustomize/components/models/merge_models.py
    • Added a Python script responsible for reading individual model definition YAML files, merging their 'applications' sections, and generating a unified 'serve-config-patch.yaml'.
  • kustomize/components/models/models-definitions/episeg.yaml
    • Added a YAML file defining the 'episeg-1' semantic segmentation model application, including its import path, route prefix, runtime environment, and deployment configurations.
  • kustomize/components/models/models-definitions/heatmap.yaml
    • Added a YAML file defining the 'heatmap-builder' application, specifying its import path, route prefix, runtime environment, and deployment settings.
  • kustomize/components/models/models-definitions/prostate.yaml
    • Added a YAML file defining the 'prostate-classifier-1' binary classification model application, detailing its import path, route prefix, runtime environment, and deployment parameters.
  • kustomize/components/models/serve-config-patch.yaml
    • Added the generated Kustomize patch containing the merged Ray Serve application configurations, which is produced by the merge_models.py script.
  • kustomize/overlays/kustomization.yaml
    • Created an overlay Kustomization file that combines the base RayService definition with the various component patches (models, CPU workers, GPU workers) for a complete deployment.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request sets up a Kustomize-based deployment for a Ray Serve application, splitting configurations for base, models, and workers. The structure is good, but there are several critical and high-severity issues to address. These include using a non-reproducible master branch for a dependency, hardcoding environment-specific proxy URLs across multiple files, and using unencrypted communication. There are also medium-severity maintainability concerns regarding a confusing workflow for a generated file and significant duplication in worker configurations. Addressing these will improve the deployment's robustness, security, and maintainability. Also, the pull request title seems to have a typo ('Spilt' instead of 'Split').

import_path: models.semantic_segmentation:app
route_prefix: /episeg-1
runtime_env:
working_dir: https://gitlab.ics.muni.cz/rationai/infrastructure/model-service/-/archive/master/model-service-master.zip

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The working_dir points to a zip archive of the master branch. This is a critical issue for reproducibility and stability. Any change pushed to the master branch will be automatically picked up in deployments, potentially introducing breaking changes or bugs without a corresponding change in this repository. This also affects heatmap.yaml and prostate.yaml.

To ensure predictable and stable deployments, please change this to use an immutable git reference, such as a commit SHA or a tag. For example: https://gitlab.ics.muni.cz/rationai/infrastructure/model-service/-/archive/v1.2.3/model-service-v1.2.3.zip

Comment on lines +41 to +42
- name: HTTPS_PROXY
value: http://proxy.ics.muni.cz:3128

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

A hardcoded proxy URL is used. This value is repeated in cpu-workers-patch.yaml and gpu-workers-patch.yaml. Hardcoding environment-specific configuration makes the deployment less portable and harder to manage across different environments (e.g., dev, staging, prod). It also makes the configuration brittle, as any change to the proxy needs to be updated in multiple places.

Consider externalizing this configuration. For example, you could use a ConfigMap and envFrom to inject the proxy settings into your pods.

Comment on lines +27 to +28
- name: HTTPS_PROXY
value: http://proxy.ics.muni.cz:3128

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This hardcoded proxy URL is also present in ray-service-base.yaml and gpu-workers-patch.yaml. Hardcoding environment-specific configuration makes the deployment less portable and harder to manage.

To improve this, the proxy configuration should be defined in one place and injected into the containers. Using a ConfigMap with envFrom is a common pattern for this in Kubernetes.

Comment on lines +30 to +31
- name: HTTPS_PROXY
value: http://proxy.ics.muni.cz:3128

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This hardcoded proxy URL is repeated from other configuration files (ray-service-base.yaml and cpu-workers-patch.yaml). This duplication makes the configuration difficult to maintain.

It's recommended to centralize this configuration, for instance in a ConfigMap, and reference it in all the necessary pod specs.

set -e

python kustomize/components/models/merge_models.py
kubectl apply -k kustomize/overlays -n rationai-jobs-ns No newline at end of file

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It's a good practice to end shell scripts with a newline character. Some tools and shells might behave unexpectedly without it.

Suggested change
kubectl apply -k kustomize/overlays -n rationai-jobs-ns
kubectl apply -k kustomize/overlays -n rationai-jobs-ns

Comment on lines +3 to +65
value:
groupName: cpu-workers
replicas: 0
minReplicas: 0
maxReplicas: 2
template:
spec:
securityContext:
fsGroupChangePolicy: OnRootMismatch
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
containers:
- name: ray-worker
image: cerit.io/rationai/model-service:2.53.0
imagePullPolicy: Always
resources:
limits:
cpu: 8
memory: 16Gi
requests:
cpu: 8
memory: 16Gi
env:
- name: HTTPS_PROXY
value: http://proxy.ics.muni.cz:3128
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
runAsUser: 1000
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "ray stop"]
volumeMounts:
- name: data
mountPath: /mnt/data
- name: public-data
mountPath: /mnt/data/Public
- name: projects
mountPath: /mnt/projects
- name: bioptic-tree
mountPath: /mnt/bioptic_tree
- name: trt-cache-volume
mountPath: /mnt/cache

volumes:
- name: data
persistentVolumeClaim:
claimName: data-ro
- name: public-data
persistentVolumeClaim:
claimName: rationai-data-ro-pvc-jobs
- name: projects
persistentVolumeClaim:
claimName: projects-rw
- name: bioptic-tree
persistentVolumeClaim:
claimName: bioptictree-ro
- name: trt-cache-volume
persistentVolumeClaim:
claimName: tensorrt-cache-pvc

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a large amount of configuration duplicated between this file and kustomize/components/gpu-workers/gpu-workers-patch.yaml. Sections like securityContext, lifecycle, volumeMounts, and volumes are identical. This duplication increases the maintenance burden, as changes need to be made in multiple places, increasing the risk of inconsistencies.

Consider refactoring to reduce this duplication. For example, you could use kustomize's patches feature in an overlay to apply common settings to multiple resources.

Comment on lines +1 to +51
import os

import yaml


script_dir = os.path.dirname(os.path.abspath(__file__))
models_definitions_dir = os.path.join(script_dir, "models-definitions")
output_file = os.path.join(script_dir, "serve-config-patch.yaml")

model_files = [f for f in os.listdir(models_definitions_dir) if f.endswith(".yaml")]

if not model_files:
raise RuntimeError(f"No model definition files found in {models_definitions_dir}")

merged_applications = []

for file_name in sorted(model_files):
file_path = os.path.join(models_definitions_dir, file_name)
with open(file_path) as f:
data = yaml.safe_load(f)
if not data or "applications" not in data:
raise RuntimeError(f"File {file_name} is missing 'applications' key")
merged_applications.extend(data["applications"])

serve_config_str = yaml.dump({"applications": merged_applications}, sort_keys=False)


# Literal block scalar wrapper
class LiteralString(str):
pass


def literal_presenter(dumper, data):
return dumper.represent_scalar("tag:yaml.org,2002:str", data, style="|")


yaml.add_representer(LiteralString, literal_presenter)

patch = {
"apiVersion": "ray.io/v1",
"kind": "RayService",
"metadata": {"name": "rayservice-model-split"},
"spec": {"serveConfigV2": LiteralString(serve_config_str)},
}

with open(output_file, "w") as f:
yaml.dump(patch, f, sort_keys=False)

print(f"Generated {output_file} from {len(model_files)} model files:")
for f in sorted(model_files):
print(f" - {f}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This script generates serve-config-patch.yaml, which is also checked into version control. However, deploy.sh also runs this script before deployment. This dual approach is confusing and can lead to maintenance problems, such as manual edits being overwritten or outdated generated files being deployed.

To clarify the workflow, please choose one of these patterns:

  1. Fully generated (recommended for CI/CD): Add kustomize/components/models/serve-config-patch.yaml to your .gitignore file and let the deploy.sh script generate it on-the-fly during deployment.
  2. Developer-managed: Remove the execution of this script from deploy.sh. Developers will be responsible for running it locally to update serve-config-patch.yaml and committing the changes.

memory: 12884901888 # 12 GiB
runtime_env:
env_vars:
MLFLOW_TRACKING_URI: http://mlflow.rationai-mlflow:5000

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The MLflow tracking URI is configured to use http, which means the communication is unencrypted. If any sensitive data is exchanged with the MLflow server, it could be exposed on the network. This also applies to prostate.yaml.

If the network between these services is not fully trusted, you should consider enabling TLS on the MLflow server and using an https URI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant