Since beginning of this week, the cml runner launch command is failing to provision a VM.
The VM provisioning works, but the service is unable to start the VM gets terminated.
jobs:
deploy-runner:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v3
- uses: iterative/setup-cml@v2
- name: deploy runner
env:
GOOGLE_APPLICATION_CREDENTIALS_DATA: ${{ secrets.CML_AI_GCLOUD_SERVICE_ACCOUNT_KEY }}
run: |
cml runner launch --cloud=gcp --cloud-type=g2-standard-16+nvidia-l4*1 --name=test-name --labels="val,bk0000,L4" --token=${{ secrets.CML_GITHUB_TOKEN }} --repo=https://github.com/***/**.git --cloud-hdd-size=200 --idle-timeout=1200 --log=debug --cloud-region=us-central1-a
Some logs from the serial console
cml.sh[39067]: {"level":"error","message":"terraform,version\n\t\n\t","stack":"Error: terraform,version\n\t\n\t\n at /snapshot/cml/src/utils.js:38:18\n at exithandler (node:child_process:406:5)\n at ChildProcess.errorhandler (node:child_process:418:5)\n at ChildProcess.emit (node:events:527:28)\n at Process.ChildProcess._handle.onexit (node:internal/child_process:289:12)\n at onErrorNT (node:internal/child_process:478:16)\n at processTicksAndRejections (node:internal/process/task_queues:83:21)"}
Jun 12 08:50:32 cml-test-name-mqmvh9j6-5383uaw2 systemd[1]: cml.service: Main process exited, code=exited, status=1/FAILURE
Jun 12 08:50:32 cml-test-name-mqmvh9j6-5383uaw2 systemd[1]: cml.service: Failed with result 'exit-code'.
Would there be a fix for this issue? Or shoudl we use our own startup scripts?
Any suggestions are much appreciated
Since beginning of this week, the cml runner launch command is failing to provision a VM.
The VM provisioning works, but the service is unable to start the VM gets terminated.
Some logs from the serial console
Would there be a fix for this issue? Or shoudl we use our own startup scripts?
Any suggestions are much appreciated