Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Free Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 Coding Plan
Note 🎁 Summarized by CodeRabbit FreeThe PR author is not assigned a seat. To perform a comprehensive line-by-line review, please assign a seat to the pull request author through the subscription management page by visiting https://app.coderabbit.ai/login. Comment Tip CodeRabbit can use your project's `ruff` configuration to improve the quality of Python code reviews.Add a Ruff configuration file to your project to customize how CodeRabbit runs |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request establishes a robust and scalable infrastructure for deploying machine learning models using Ray Serve, orchestrated via Kustomize. The primary goal is to streamline the process of defining, configuring, and deploying various model services, ensuring efficient resource utilization and simplified management within a Kubernetes environment. By separating model definitions and dynamically merging them, the system becomes highly adaptable to evolving model landscapes and diverse computational needs. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request sets up a Kustomize-based deployment for a Ray Serve application, splitting configurations for base, models, and workers. The structure is good, but there are several critical and high-severity issues to address. These include using a non-reproducible master branch for a dependency, hardcoding environment-specific proxy URLs across multiple files, and using unencrypted communication. There are also medium-severity maintainability concerns regarding a confusing workflow for a generated file and significant duplication in worker configurations. Addressing these will improve the deployment's robustness, security, and maintainability. Also, the pull request title seems to have a typo ('Spilt' instead of 'Split').
| import_path: models.semantic_segmentation:app | ||
| route_prefix: /episeg-1 | ||
| runtime_env: | ||
| working_dir: https://gitlab.ics.muni.cz/rationai/infrastructure/model-service/-/archive/master/model-service-master.zip |
There was a problem hiding this comment.
The working_dir points to a zip archive of the master branch. This is a critical issue for reproducibility and stability. Any change pushed to the master branch will be automatically picked up in deployments, potentially introducing breaking changes or bugs without a corresponding change in this repository. This also affects heatmap.yaml and prostate.yaml.
To ensure predictable and stable deployments, please change this to use an immutable git reference, such as a commit SHA or a tag. For example: https://gitlab.ics.muni.cz/rationai/infrastructure/model-service/-/archive/v1.2.3/model-service-v1.2.3.zip
| - name: HTTPS_PROXY | ||
| value: http://proxy.ics.muni.cz:3128 |
There was a problem hiding this comment.
A hardcoded proxy URL is used. This value is repeated in cpu-workers-patch.yaml and gpu-workers-patch.yaml. Hardcoding environment-specific configuration makes the deployment less portable and harder to manage across different environments (e.g., dev, staging, prod). It also makes the configuration brittle, as any change to the proxy needs to be updated in multiple places.
Consider externalizing this configuration. For example, you could use a ConfigMap and envFrom to inject the proxy settings into your pods.
| - name: HTTPS_PROXY | ||
| value: http://proxy.ics.muni.cz:3128 |
There was a problem hiding this comment.
This hardcoded proxy URL is also present in ray-service-base.yaml and gpu-workers-patch.yaml. Hardcoding environment-specific configuration makes the deployment less portable and harder to manage.
To improve this, the proxy configuration should be defined in one place and injected into the containers. Using a ConfigMap with envFrom is a common pattern for this in Kubernetes.
| - name: HTTPS_PROXY | ||
| value: http://proxy.ics.muni.cz:3128 |
There was a problem hiding this comment.
This hardcoded proxy URL is repeated from other configuration files (ray-service-base.yaml and cpu-workers-patch.yaml). This duplication makes the configuration difficult to maintain.
It's recommended to centralize this configuration, for instance in a ConfigMap, and reference it in all the necessary pod specs.
| set -e | ||
|
|
||
| python kustomize/components/models/merge_models.py | ||
| kubectl apply -k kustomize/overlays -n rationai-jobs-ns No newline at end of file |
| value: | ||
| groupName: cpu-workers | ||
| replicas: 0 | ||
| minReplicas: 0 | ||
| maxReplicas: 2 | ||
| template: | ||
| spec: | ||
| securityContext: | ||
| fsGroupChangePolicy: OnRootMismatch | ||
| runAsNonRoot: true | ||
| seccompProfile: | ||
| type: RuntimeDefault | ||
| containers: | ||
| - name: ray-worker | ||
| image: cerit.io/rationai/model-service:2.53.0 | ||
| imagePullPolicy: Always | ||
| resources: | ||
| limits: | ||
| cpu: 8 | ||
| memory: 16Gi | ||
| requests: | ||
| cpu: 8 | ||
| memory: 16Gi | ||
| env: | ||
| - name: HTTPS_PROXY | ||
| value: http://proxy.ics.muni.cz:3128 | ||
| securityContext: | ||
| allowPrivilegeEscalation: false | ||
| capabilities: | ||
| drop: ["ALL"] | ||
| runAsUser: 1000 | ||
| lifecycle: | ||
| preStop: | ||
| exec: | ||
| command: ["/bin/sh", "-c", "ray stop"] | ||
| volumeMounts: | ||
| - name: data | ||
| mountPath: /mnt/data | ||
| - name: public-data | ||
| mountPath: /mnt/data/Public | ||
| - name: projects | ||
| mountPath: /mnt/projects | ||
| - name: bioptic-tree | ||
| mountPath: /mnt/bioptic_tree | ||
| - name: trt-cache-volume | ||
| mountPath: /mnt/cache | ||
|
|
||
| volumes: | ||
| - name: data | ||
| persistentVolumeClaim: | ||
| claimName: data-ro | ||
| - name: public-data | ||
| persistentVolumeClaim: | ||
| claimName: rationai-data-ro-pvc-jobs | ||
| - name: projects | ||
| persistentVolumeClaim: | ||
| claimName: projects-rw | ||
| - name: bioptic-tree | ||
| persistentVolumeClaim: | ||
| claimName: bioptictree-ro | ||
| - name: trt-cache-volume | ||
| persistentVolumeClaim: | ||
| claimName: tensorrt-cache-pvc |
There was a problem hiding this comment.
There is a large amount of configuration duplicated between this file and kustomize/components/gpu-workers/gpu-workers-patch.yaml. Sections like securityContext, lifecycle, volumeMounts, and volumes are identical. This duplication increases the maintenance burden, as changes need to be made in multiple places, increasing the risk of inconsistencies.
Consider refactoring to reduce this duplication. For example, you could use kustomize's patches feature in an overlay to apply common settings to multiple resources.
| import os | ||
|
|
||
| import yaml | ||
|
|
||
|
|
||
| script_dir = os.path.dirname(os.path.abspath(__file__)) | ||
| models_definitions_dir = os.path.join(script_dir, "models-definitions") | ||
| output_file = os.path.join(script_dir, "serve-config-patch.yaml") | ||
|
|
||
| model_files = [f for f in os.listdir(models_definitions_dir) if f.endswith(".yaml")] | ||
|
|
||
| if not model_files: | ||
| raise RuntimeError(f"No model definition files found in {models_definitions_dir}") | ||
|
|
||
| merged_applications = [] | ||
|
|
||
| for file_name in sorted(model_files): | ||
| file_path = os.path.join(models_definitions_dir, file_name) | ||
| with open(file_path) as f: | ||
| data = yaml.safe_load(f) | ||
| if not data or "applications" not in data: | ||
| raise RuntimeError(f"File {file_name} is missing 'applications' key") | ||
| merged_applications.extend(data["applications"]) | ||
|
|
||
| serve_config_str = yaml.dump({"applications": merged_applications}, sort_keys=False) | ||
|
|
||
|
|
||
| # Literal block scalar wrapper | ||
| class LiteralString(str): | ||
| pass | ||
|
|
||
|
|
||
| def literal_presenter(dumper, data): | ||
| return dumper.represent_scalar("tag:yaml.org,2002:str", data, style="|") | ||
|
|
||
|
|
||
| yaml.add_representer(LiteralString, literal_presenter) | ||
|
|
||
| patch = { | ||
| "apiVersion": "ray.io/v1", | ||
| "kind": "RayService", | ||
| "metadata": {"name": "rayservice-model-split"}, | ||
| "spec": {"serveConfigV2": LiteralString(serve_config_str)}, | ||
| } | ||
|
|
||
| with open(output_file, "w") as f: | ||
| yaml.dump(patch, f, sort_keys=False) | ||
|
|
||
| print(f"Generated {output_file} from {len(model_files)} model files:") | ||
| for f in sorted(model_files): | ||
| print(f" - {f}") |
There was a problem hiding this comment.
This script generates serve-config-patch.yaml, which is also checked into version control. However, deploy.sh also runs this script before deployment. This dual approach is confusing and can lead to maintenance problems, such as manual edits being overwritten or outdated generated files being deployed.
To clarify the workflow, please choose one of these patterns:
- Fully generated (recommended for CI/CD): Add
kustomize/components/models/serve-config-patch.yamlto your.gitignorefile and let thedeploy.shscript generate it on-the-fly during deployment. - Developer-managed: Remove the execution of this script from
deploy.sh. Developers will be responsible for running it locally to updateserve-config-patch.yamland committing the changes.
| memory: 12884901888 # 12 GiB | ||
| runtime_env: | ||
| env_vars: | ||
| MLFLOW_TRACKING_URI: http://mlflow.rationai-mlflow:5000 |
There was a problem hiding this comment.
The MLflow tracking URI is configured to use http, which means the communication is unencrypted. If any sensitive data is exchanged with the MLflow server, it could be exposed on the network. This also applies to prostate.yaml.
If the network between these services is not fully trusted, you should consider enabling TLS on the MLflow server and using an https URI.
No description provided.