-
Notifications
You must be signed in to change notification settings - Fork 0
Spilt ray serve #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| #!/bin/bash | ||
| set -e | ||
|
|
||
| python kustomize/components/models/merge_models.py | ||
| kubectl apply -k kustomize/overlays -n rationai-jobs-ns | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| apiVersion: kustomize.config.k8s.io/v1beta1 | ||
| kind: Kustomization | ||
|
|
||
| resources: | ||
| - ray-service-base.yaml |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,58 @@ | ||
| apiVersion: ray.io/v1 | ||
| kind: RayService | ||
| metadata: | ||
| name: rayservice-model-split | ||
| spec: | ||
| serveConfigV2: "" | ||
| rayClusterConfig: | ||
| rayVersion: 2.53.0 | ||
| enableInTreeAutoscaling: true | ||
| autoscalerOptions: | ||
| idleTimeoutSeconds: 60 | ||
| securityContext: | ||
| runAsUser: 1000 | ||
| allowPrivilegeEscalation: false | ||
| capabilities: | ||
| drop: ["ALL"] | ||
|
|
||
| headGroupSpec: | ||
| rayStartParams: | ||
| num-cpus: "0" | ||
| dashboard-host: "0.0.0.0" | ||
| template: | ||
| spec: | ||
| securityContext: | ||
| fsGroupChangePolicy: OnRootMismatch | ||
| runAsNonRoot: true | ||
| seccompProfile: | ||
| type: RuntimeDefault | ||
| containers: | ||
| - name: ray-head | ||
| image: rayproject/ray:2.53.0-py312 | ||
| imagePullPolicy: Always | ||
| resources: | ||
| limits: | ||
| cpu: 0 | ||
| memory: 4Gi | ||
| requests: | ||
| cpu: 0 | ||
| memory: 4Gi | ||
| env: | ||
| - name: HTTPS_PROXY | ||
| value: http://proxy.ics.muni.cz:3128 | ||
|
Comment on lines
+41
to
+42
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A hardcoded proxy URL is used. This value is repeated in Consider externalizing this configuration. For example, you could use a |
||
| ports: | ||
| - containerPort: 6379 | ||
| name: gcs-server | ||
| - containerPort: 8265 | ||
| name: dashboard | ||
| - containerPort: 10001 | ||
| name: client | ||
| - containerPort: 8000 | ||
| name: serve | ||
| securityContext: | ||
| runAsUser: 1000 | ||
| allowPrivilegeEscalation: false | ||
| capabilities: | ||
| drop: ["ALL"] | ||
|
|
||
| workerGroupSpecs: [] | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,65 @@ | ||
| - op: add | ||
| path: /spec/rayClusterConfig/workerGroupSpecs/- | ||
| value: | ||
| groupName: cpu-workers | ||
| replicas: 0 | ||
| minReplicas: 0 | ||
| maxReplicas: 2 | ||
| template: | ||
| spec: | ||
| securityContext: | ||
| fsGroupChangePolicy: OnRootMismatch | ||
| runAsNonRoot: true | ||
| seccompProfile: | ||
| type: RuntimeDefault | ||
| containers: | ||
| - name: ray-worker | ||
| image: cerit.io/rationai/model-service:2.53.0 | ||
| imagePullPolicy: Always | ||
| resources: | ||
| limits: | ||
| cpu: 8 | ||
| memory: 16Gi | ||
| requests: | ||
| cpu: 8 | ||
| memory: 16Gi | ||
| env: | ||
| - name: HTTPS_PROXY | ||
| value: http://proxy.ics.muni.cz:3128 | ||
|
Comment on lines
+27
to
+28
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This hardcoded proxy URL is also present in To improve this, the proxy configuration should be defined in one place and injected into the containers. Using a |
||
| securityContext: | ||
| allowPrivilegeEscalation: false | ||
| capabilities: | ||
| drop: ["ALL"] | ||
| runAsUser: 1000 | ||
| lifecycle: | ||
| preStop: | ||
| exec: | ||
| command: ["/bin/sh", "-c", "ray stop"] | ||
| volumeMounts: | ||
| - name: data | ||
| mountPath: /mnt/data | ||
| - name: public-data | ||
| mountPath: /mnt/data/Public | ||
| - name: projects | ||
| mountPath: /mnt/projects | ||
| - name: bioptic-tree | ||
| mountPath: /mnt/bioptic_tree | ||
| - name: trt-cache-volume | ||
| mountPath: /mnt/cache | ||
|
|
||
| volumes: | ||
| - name: data | ||
| persistentVolumeClaim: | ||
| claimName: data-ro | ||
| - name: public-data | ||
| persistentVolumeClaim: | ||
| claimName: rationai-data-ro-pvc-jobs | ||
| - name: projects | ||
| persistentVolumeClaim: | ||
| claimName: projects-rw | ||
| - name: bioptic-tree | ||
| persistentVolumeClaim: | ||
| claimName: bioptictree-ro | ||
| - name: trt-cache-volume | ||
| persistentVolumeClaim: | ||
| claimName: tensorrt-cache-pvc | ||
|
Comment on lines
+3
to
+65
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is a large amount of configuration duplicated between this file and Consider refactoring to reduce this duplication. For example, you could use kustomize's |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| apiVersion: kustomize.config.k8s.io/v1alpha1 | ||
| kind: Component | ||
|
|
||
| patches: | ||
| - target: | ||
| kind: RayService | ||
| name: rayservice-model-split | ||
| path: cpu-workers-patch.yaml |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,68 @@ | ||
| - op: add | ||
| path: /spec/rayClusterConfig/workerGroupSpecs/- | ||
| value: | ||
| groupName: gpu-workers | ||
| replicas: 0 | ||
| minReplicas: 0 | ||
| maxReplicas: 2 | ||
| template: | ||
| spec: | ||
| securityContext: | ||
| fsGroupChangePolicy: OnRootMismatch | ||
| runAsNonRoot: true | ||
| seccompProfile: | ||
| type: RuntimeDefault | ||
| nodeSelector: | ||
| nvidia.com/gpu.product: NVIDIA-A40 | ||
| containers: | ||
| - name: ray-worker | ||
| image: cerit.io/rationai/model-service:2.53.0-gpu | ||
| imagePullPolicy: Always | ||
| resources: | ||
| limits: | ||
| cpu: 8 | ||
| memory: 24Gi | ||
| nvidia.com/gpu: 1 | ||
| requests: | ||
| cpu: 8 | ||
| memory: 24Gi | ||
| env: | ||
| - name: HTTPS_PROXY | ||
| value: http://proxy.ics.muni.cz:3128 | ||
|
Comment on lines
+30
to
+31
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This hardcoded proxy URL is repeated from other configuration files ( It's recommended to centralize this configuration, for instance in a |
||
| securityContext: | ||
| allowPrivilegeEscalation: false | ||
| capabilities: | ||
| drop: ["ALL"] | ||
| runAsUser: 1000 | ||
| lifecycle: | ||
| preStop: | ||
| exec: | ||
| command: ["/bin/sh", "-c", "ray stop"] | ||
| volumeMounts: | ||
| - name: data | ||
| mountPath: /mnt/data | ||
| - name: public-data | ||
| mountPath: /mnt/data/Public | ||
| - name: projects | ||
| mountPath: /mnt/projects | ||
| - name: bioptic-tree | ||
| mountPath: /mnt/bioptic_tree | ||
| - name: trt-cache-volume | ||
| mountPath: /mnt/cache | ||
|
|
||
| volumes: | ||
| - name: data | ||
| persistentVolumeClaim: | ||
| claimName: data-ro | ||
| - name: public-data | ||
| persistentVolumeClaim: | ||
| claimName: rationai-data-ro-pvc-jobs | ||
| - name: projects | ||
| persistentVolumeClaim: | ||
| claimName: projects-rw | ||
| - name: bioptic-tree | ||
| persistentVolumeClaim: | ||
| claimName: bioptictree-ro | ||
| - name: trt-cache-volume | ||
| persistentVolumeClaim: | ||
| claimName: tensorrt-cache-pvc | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| apiVersion: kustomize.config.k8s.io/v1alpha1 | ||
| kind: Component | ||
|
|
||
| patches: | ||
| - target: | ||
| kind: RayService | ||
| name: rayservice-model-split | ||
| path: gpu-workers-patch.yaml |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| apiVersion: kustomize.config.k8s.io/v1alpha1 | ||
| kind: Component | ||
|
|
||
| patches: | ||
| - target: | ||
| kind: RayService | ||
| name: rayservice-model-split | ||
| path: serve-config-patch.yaml |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,51 @@ | ||
| import os | ||
|
|
||
| import yaml | ||
|
|
||
|
|
||
| script_dir = os.path.dirname(os.path.abspath(__file__)) | ||
| models_definitions_dir = os.path.join(script_dir, "models-definitions") | ||
| output_file = os.path.join(script_dir, "serve-config-patch.yaml") | ||
|
|
||
| model_files = [f for f in os.listdir(models_definitions_dir) if f.endswith(".yaml")] | ||
|
|
||
| if not model_files: | ||
| raise RuntimeError(f"No model definition files found in {models_definitions_dir}") | ||
|
|
||
| merged_applications = [] | ||
|
|
||
| for file_name in sorted(model_files): | ||
| file_path = os.path.join(models_definitions_dir, file_name) | ||
| with open(file_path) as f: | ||
| data = yaml.safe_load(f) | ||
| if not data or "applications" not in data: | ||
| raise RuntimeError(f"File {file_name} is missing 'applications' key") | ||
| merged_applications.extend(data["applications"]) | ||
|
|
||
| serve_config_str = yaml.dump({"applications": merged_applications}, sort_keys=False) | ||
|
|
||
|
|
||
| # Literal block scalar wrapper | ||
| class LiteralString(str): | ||
| pass | ||
|
|
||
|
|
||
| def literal_presenter(dumper, data): | ||
| return dumper.represent_scalar("tag:yaml.org,2002:str", data, style="|") | ||
|
|
||
|
|
||
| yaml.add_representer(LiteralString, literal_presenter) | ||
|
|
||
| patch = { | ||
| "apiVersion": "ray.io/v1", | ||
| "kind": "RayService", | ||
| "metadata": {"name": "rayservice-model-split"}, | ||
| "spec": {"serveConfigV2": LiteralString(serve_config_str)}, | ||
| } | ||
|
|
||
| with open(output_file, "w") as f: | ||
| yaml.dump(patch, f, sort_keys=False) | ||
|
|
||
| print(f"Generated {output_file} from {len(model_files)} model files:") | ||
| for f in sorted(model_files): | ||
| print(f" - {f}") | ||
|
Comment on lines
+1
to
+51
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This script generates To clarify the workflow, please choose one of these patterns:
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| applications: | ||
| - name: episeg-1 | ||
| import_path: models.semantic_segmentation:app | ||
| route_prefix: /episeg-1 | ||
| runtime_env: | ||
| working_dir: https://gitlab.ics.muni.cz/rationai/infrastructure/model-service/-/archive/master/model-service-master.zip | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The To ensure predictable and stable deployments, please change this to use an immutable git reference, such as a commit SHA or a tag. For example: |
||
| deployments: | ||
| - name: SemanticSegmentation | ||
| max_ongoing_requests: 16 | ||
| max_queued_requests: 64 | ||
| autoscaling_config: | ||
| min_replicas: 0 | ||
| max_replicas: 4 | ||
| target_ongoing_requests: 4 | ||
| ray_actor_options: | ||
| num_cpus: 12 | ||
| memory: 12884901888 # 12 GiB | ||
| runtime_env: | ||
| env_vars: | ||
| MLFLOW_TRACKING_URI: http://mlflow.rationai-mlflow:5000 | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The MLflow tracking URI is configured to use If the network between these services is not fully trusted, you should consider enabling TLS on the MLflow server and using an |
||
| user_config: | ||
| tile_size: 1024 | ||
| mpp: 0.468 | ||
| max_batch_size: 2 | ||
| batch_wait_timeout_s: 0.5 | ||
| intra_op_num_threads: 11 | ||
| model: | ||
| _target_: providers.model_provider:mlflow | ||
| artifact_uri: mlflow-artifacts:/10/39f821ed5b964c71a603cc6db196f9fd/artifacts/checkpoints/epoch=19-step=32020/model.onnx/model.onnx | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,20 @@ | ||
| applications: | ||
| - name: heatmap-builder | ||
| import_path: builders.heatmap_builder:app | ||
| route_prefix: /heatmap-builder | ||
| runtime_env: | ||
| working_dir: https://gitlab.ics.muni.cz/rationai/infrastructure/model-service/-/archive/master/model-service-master.zip | ||
| deployments: | ||
| - name: HeatmapBuilder | ||
| max_ongoing_requests: 16 | ||
| max_queued_requests: 64 | ||
| autoscaling_config: | ||
| min_replicas: 0 | ||
| max_replicas: 2 | ||
| target_ongoing_requests: 2 | ||
| ray_actor_options: | ||
| num_cpus: 4 | ||
| memory: 12884901888 # 12 GiB | ||
| user_config: | ||
| num_threads: 4 | ||
| max_concurrent_tasks: 8 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,36 @@ | ||
| applications: | ||
| - name: prostate-classifier-1 | ||
| import_path: models.binary_classifier:app | ||
| route_prefix: /prostate-classifier-1 | ||
| runtime_env: | ||
| working_dir: https://gitlab.ics.muni.cz/rationai/infrastructure/model-service/-/archive/master/model-service-master.zip | ||
| deployments: | ||
| - name: BinaryClassifier | ||
| max_ongoing_requests: 64 | ||
| max_queued_requests: 128 | ||
| autoscaling_config: | ||
| min_replicas: 0 | ||
| max_replicas: 4 | ||
| target_ongoing_requests: 32 | ||
| ray_actor_options: | ||
| num_cpus: 6 | ||
| memory: 6442450944 | ||
| runtime_env: | ||
| env_vars: | ||
| MLFLOW_TRACKING_URI: http://mlflow.rationai-mlflow:5000 | ||
| user_config: | ||
| tile_size: 512 | ||
| max_batch_size: 32 | ||
| batch_wait_timeout_s: 0.5 | ||
| mean: | ||
| - 228.5544 | ||
| - 178.8584 | ||
| - 219.8793 | ||
| std: | ||
| - 27.8285 | ||
| - 51.4639 | ||
| - 26.4458 | ||
| intra_op_num_threads: 5 | ||
| model: | ||
| _target_: providers.model_provider:mlflow | ||
| artifact_uri: mlflow-artifacts:/65/aebc892f526047249b972f200bef4381/artifacts/checkpoints/epoch=0-step=6972/model.onnx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good practice to end shell scripts with a newline character. Some tools and shells might behave unexpectedly without it.