Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions deploy.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/bin/bash
set -e

python kustomize/components/models/merge_models.py
kubectl apply -k kustomize/overlays -n rationai-jobs-ns

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It's a good practice to end shell scripts with a newline character. Some tools and shells might behave unexpectedly without it.

Suggested change
kubectl apply -k kustomize/overlays -n rationai-jobs-ns
kubectl apply -k kustomize/overlays -n rationai-jobs-ns

5 changes: 5 additions & 0 deletions kustomize/base/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
- ray-service-base.yaml
58 changes: 58 additions & 0 deletions kustomize/base/ray-service-base.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
apiVersion: ray.io/v1
kind: RayService
metadata:
name: rayservice-model-split
spec:
serveConfigV2: ""
rayClusterConfig:
rayVersion: 2.53.0
enableInTreeAutoscaling: true
autoscalerOptions:
idleTimeoutSeconds: 60
securityContext:
runAsUser: 1000
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]

headGroupSpec:
rayStartParams:
num-cpus: "0"
dashboard-host: "0.0.0.0"
template:
spec:
securityContext:
fsGroupChangePolicy: OnRootMismatch
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
containers:
- name: ray-head
image: rayproject/ray:2.53.0-py312
imagePullPolicy: Always
resources:
limits:
cpu: 0
memory: 4Gi
requests:
cpu: 0
memory: 4Gi
env:
- name: HTTPS_PROXY
value: http://proxy.ics.muni.cz:3128
Comment on lines +41 to +42

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

A hardcoded proxy URL is used. This value is repeated in cpu-workers-patch.yaml and gpu-workers-patch.yaml. Hardcoding environment-specific configuration makes the deployment less portable and harder to manage across different environments (e.g., dev, staging, prod). It also makes the configuration brittle, as any change to the proxy needs to be updated in multiple places.

Consider externalizing this configuration. For example, you could use a ConfigMap and envFrom to inject the proxy settings into your pods.

ports:
- containerPort: 6379
name: gcs-server
- containerPort: 8265
name: dashboard
- containerPort: 10001
name: client
- containerPort: 8000
name: serve
securityContext:
runAsUser: 1000
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]

workerGroupSpecs: []
65 changes: 65 additions & 0 deletions kustomize/components/cpu-workers/cpu-workers-patch.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
- op: add
path: /spec/rayClusterConfig/workerGroupSpecs/-
value:
groupName: cpu-workers
replicas: 0
minReplicas: 0
maxReplicas: 2
template:
spec:
securityContext:
fsGroupChangePolicy: OnRootMismatch
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
containers:
- name: ray-worker
image: cerit.io/rationai/model-service:2.53.0
imagePullPolicy: Always
resources:
limits:
cpu: 8
memory: 16Gi
requests:
cpu: 8
memory: 16Gi
env:
- name: HTTPS_PROXY
value: http://proxy.ics.muni.cz:3128
Comment on lines +27 to +28

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This hardcoded proxy URL is also present in ray-service-base.yaml and gpu-workers-patch.yaml. Hardcoding environment-specific configuration makes the deployment less portable and harder to manage.

To improve this, the proxy configuration should be defined in one place and injected into the containers. Using a ConfigMap with envFrom is a common pattern for this in Kubernetes.

securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
runAsUser: 1000
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "ray stop"]
volumeMounts:
- name: data
mountPath: /mnt/data
- name: public-data
mountPath: /mnt/data/Public
- name: projects
mountPath: /mnt/projects
- name: bioptic-tree
mountPath: /mnt/bioptic_tree
- name: trt-cache-volume
mountPath: /mnt/cache

volumes:
- name: data
persistentVolumeClaim:
claimName: data-ro
- name: public-data
persistentVolumeClaim:
claimName: rationai-data-ro-pvc-jobs
- name: projects
persistentVolumeClaim:
claimName: projects-rw
- name: bioptic-tree
persistentVolumeClaim:
claimName: bioptictree-ro
- name: trt-cache-volume
persistentVolumeClaim:
claimName: tensorrt-cache-pvc
Comment on lines +3 to +65

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a large amount of configuration duplicated between this file and kustomize/components/gpu-workers/gpu-workers-patch.yaml. Sections like securityContext, lifecycle, volumeMounts, and volumes are identical. This duplication increases the maintenance burden, as changes need to be made in multiple places, increasing the risk of inconsistencies.

Consider refactoring to reduce this duplication. For example, you could use kustomize's patches feature in an overlay to apply common settings to multiple resources.

8 changes: 8 additions & 0 deletions kustomize/components/cpu-workers/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
apiVersion: kustomize.config.k8s.io/v1alpha1
kind: Component

patches:
- target:
kind: RayService
name: rayservice-model-split
path: cpu-workers-patch.yaml
68 changes: 68 additions & 0 deletions kustomize/components/gpu-workers/gpu-workers-patch.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
- op: add
path: /spec/rayClusterConfig/workerGroupSpecs/-
value:
groupName: gpu-workers
replicas: 0
minReplicas: 0
maxReplicas: 2
template:
spec:
securityContext:
fsGroupChangePolicy: OnRootMismatch
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
nodeSelector:
nvidia.com/gpu.product: NVIDIA-A40
containers:
- name: ray-worker
image: cerit.io/rationai/model-service:2.53.0-gpu
imagePullPolicy: Always
resources:
limits:
cpu: 8
memory: 24Gi
nvidia.com/gpu: 1
requests:
cpu: 8
memory: 24Gi
env:
- name: HTTPS_PROXY
value: http://proxy.ics.muni.cz:3128
Comment on lines +30 to +31

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This hardcoded proxy URL is repeated from other configuration files (ray-service-base.yaml and cpu-workers-patch.yaml). This duplication makes the configuration difficult to maintain.

It's recommended to centralize this configuration, for instance in a ConfigMap, and reference it in all the necessary pod specs.

securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
runAsUser: 1000
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "ray stop"]
volumeMounts:
- name: data
mountPath: /mnt/data
- name: public-data
mountPath: /mnt/data/Public
- name: projects
mountPath: /mnt/projects
- name: bioptic-tree
mountPath: /mnt/bioptic_tree
- name: trt-cache-volume
mountPath: /mnt/cache

volumes:
- name: data
persistentVolumeClaim:
claimName: data-ro
- name: public-data
persistentVolumeClaim:
claimName: rationai-data-ro-pvc-jobs
- name: projects
persistentVolumeClaim:
claimName: projects-rw
- name: bioptic-tree
persistentVolumeClaim:
claimName: bioptictree-ro
- name: trt-cache-volume
persistentVolumeClaim:
claimName: tensorrt-cache-pvc
8 changes: 8 additions & 0 deletions kustomize/components/gpu-workers/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
apiVersion: kustomize.config.k8s.io/v1alpha1
kind: Component

patches:
- target:
kind: RayService
name: rayservice-model-split
path: gpu-workers-patch.yaml
8 changes: 8 additions & 0 deletions kustomize/components/models/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
apiVersion: kustomize.config.k8s.io/v1alpha1
kind: Component

patches:
- target:
kind: RayService
name: rayservice-model-split
path: serve-config-patch.yaml
51 changes: 51 additions & 0 deletions kustomize/components/models/merge_models.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
import os

import yaml


script_dir = os.path.dirname(os.path.abspath(__file__))
models_definitions_dir = os.path.join(script_dir, "models-definitions")
output_file = os.path.join(script_dir, "serve-config-patch.yaml")

model_files = [f for f in os.listdir(models_definitions_dir) if f.endswith(".yaml")]

if not model_files:
raise RuntimeError(f"No model definition files found in {models_definitions_dir}")

merged_applications = []

for file_name in sorted(model_files):
file_path = os.path.join(models_definitions_dir, file_name)
with open(file_path) as f:
data = yaml.safe_load(f)
if not data or "applications" not in data:
raise RuntimeError(f"File {file_name} is missing 'applications' key")
merged_applications.extend(data["applications"])

serve_config_str = yaml.dump({"applications": merged_applications}, sort_keys=False)


# Literal block scalar wrapper
class LiteralString(str):
pass


def literal_presenter(dumper, data):
return dumper.represent_scalar("tag:yaml.org,2002:str", data, style="|")


yaml.add_representer(LiteralString, literal_presenter)

patch = {
"apiVersion": "ray.io/v1",
"kind": "RayService",
"metadata": {"name": "rayservice-model-split"},
"spec": {"serveConfigV2": LiteralString(serve_config_str)},
}

with open(output_file, "w") as f:
yaml.dump(patch, f, sort_keys=False)

print(f"Generated {output_file} from {len(model_files)} model files:")
for f in sorted(model_files):
print(f" - {f}")
Comment on lines +1 to +51

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This script generates serve-config-patch.yaml, which is also checked into version control. However, deploy.sh also runs this script before deployment. This dual approach is confusing and can lead to maintenance problems, such as manual edits being overwritten or outdated generated files being deployed.

To clarify the workflow, please choose one of these patterns:

  1. Fully generated (recommended for CI/CD): Add kustomize/components/models/serve-config-patch.yaml to your .gitignore file and let the deploy.sh script generate it on-the-fly during deployment.
  2. Developer-managed: Remove the execution of this script from deploy.sh. Developers will be responsible for running it locally to update serve-config-patch.yaml and committing the changes.

29 changes: 29 additions & 0 deletions kustomize/components/models/models-definitions/episeg.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
applications:
- name: episeg-1
import_path: models.semantic_segmentation:app
route_prefix: /episeg-1
runtime_env:
working_dir: https://gitlab.ics.muni.cz/rationai/infrastructure/model-service/-/archive/master/model-service-master.zip

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The working_dir points to a zip archive of the master branch. This is a critical issue for reproducibility and stability. Any change pushed to the master branch will be automatically picked up in deployments, potentially introducing breaking changes or bugs without a corresponding change in this repository. This also affects heatmap.yaml and prostate.yaml.

To ensure predictable and stable deployments, please change this to use an immutable git reference, such as a commit SHA or a tag. For example: https://gitlab.ics.muni.cz/rationai/infrastructure/model-service/-/archive/v1.2.3/model-service-v1.2.3.zip

deployments:
- name: SemanticSegmentation
max_ongoing_requests: 16
max_queued_requests: 64
autoscaling_config:
min_replicas: 0
max_replicas: 4
target_ongoing_requests: 4
ray_actor_options:
num_cpus: 12
memory: 12884901888 # 12 GiB
runtime_env:
env_vars:
MLFLOW_TRACKING_URI: http://mlflow.rationai-mlflow:5000

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The MLflow tracking URI is configured to use http, which means the communication is unencrypted. If any sensitive data is exchanged with the MLflow server, it could be exposed on the network. This also applies to prostate.yaml.

If the network between these services is not fully trusted, you should consider enabling TLS on the MLflow server and using an https URI.

user_config:
tile_size: 1024
mpp: 0.468
max_batch_size: 2
batch_wait_timeout_s: 0.5
intra_op_num_threads: 11
model:
_target_: providers.model_provider:mlflow
artifact_uri: mlflow-artifacts:/10/39f821ed5b964c71a603cc6db196f9fd/artifacts/checkpoints/epoch=19-step=32020/model.onnx/model.onnx
20 changes: 20 additions & 0 deletions kustomize/components/models/models-definitions/heatmap.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
applications:
- name: heatmap-builder
import_path: builders.heatmap_builder:app
route_prefix: /heatmap-builder
runtime_env:
working_dir: https://gitlab.ics.muni.cz/rationai/infrastructure/model-service/-/archive/master/model-service-master.zip
deployments:
- name: HeatmapBuilder
max_ongoing_requests: 16
max_queued_requests: 64
autoscaling_config:
min_replicas: 0
max_replicas: 2
target_ongoing_requests: 2
ray_actor_options:
num_cpus: 4
memory: 12884901888 # 12 GiB
user_config:
num_threads: 4
max_concurrent_tasks: 8
36 changes: 36 additions & 0 deletions kustomize/components/models/models-definitions/prostate.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
applications:
- name: prostate-classifier-1
import_path: models.binary_classifier:app
route_prefix: /prostate-classifier-1
runtime_env:
working_dir: https://gitlab.ics.muni.cz/rationai/infrastructure/model-service/-/archive/master/model-service-master.zip
deployments:
- name: BinaryClassifier
max_ongoing_requests: 64
max_queued_requests: 128
autoscaling_config:
min_replicas: 0
max_replicas: 4
target_ongoing_requests: 32
ray_actor_options:
num_cpus: 6
memory: 6442450944
runtime_env:
env_vars:
MLFLOW_TRACKING_URI: http://mlflow.rationai-mlflow:5000
user_config:
tile_size: 512
max_batch_size: 32
batch_wait_timeout_s: 0.5
mean:
- 228.5544
- 178.8584
- 219.8793
std:
- 27.8285
- 51.4639
- 26.4458
intra_op_num_threads: 5
model:
_target_: providers.model_provider:mlflow
artifact_uri: mlflow-artifacts:/65/aebc892f526047249b972f200bef4381/artifacts/checkpoints/epoch=0-step=6972/model.onnx
Loading
Loading