Commit 77f3be1
[CloudRift] Fix NTP clock skew breaking Docker; handle amd-smi 7.x output format (#3701)
CloudRift VMs boot with an incorrect RTC clock (~1h ahead). When NTP
corrects it backwards, Docker discards container exit events, leaving
containers stuck as ghosts forever. Add NTP sync wait before launching
the shim to prevent this.
Also handle both amd-smi output formats (flat array in ROCm 6.x,
wrapped {"gpu_data": [...]} in ROCm 7.x) and add a 2-minute timeout
to AMD GPU detection to prevent the shim from hanging indefinitely.
Co-authored-by: Andrey Cheptsov <andrey.cheptsov@github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 30e90d3 commit 77f3be1
File tree
3 files changed
+55
-8
lines changed- runner/internal/shim/host
- src/dstack/_internal/core/backends/cloudrift
3 files changed
+55
-8
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| 13 | + | |
13 | 14 | | |
14 | 15 | | |
15 | 16 | | |
| |||
114 | 115 | | |
115 | 116 | | |
116 | 117 | | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
117 | 123 | | |
118 | 124 | | |
119 | 125 | | |
| |||
130 | 136 | | |
131 | 137 | | |
132 | 138 | | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
133 | 154 | | |
134 | 155 | | |
135 | 156 | | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
136 | 160 | | |
137 | 161 | | |
138 | 162 | | |
| |||
158 | 182 | | |
159 | 183 | | |
160 | 184 | | |
161 | | - | |
162 | | - | |
| 185 | + | |
| 186 | + | |
163 | 187 | | |
164 | 188 | | |
165 | 189 | | |
| |||
Lines changed: 14 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
72 | 72 | | |
73 | 73 | | |
74 | 74 | | |
75 | | - | |
| 75 | + | |
76 | 76 | | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
77 | 82 | | |
78 | 83 | | |
79 | | - | |
80 | | - | |
| 84 | + | |
81 | 85 | | |
82 | 86 | | |
83 | 87 | | |
| |||
97 | 101 | | |
98 | 102 | | |
99 | 103 | | |
100 | | - | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
101 | 110 | | |
102 | | - | |
| 111 | + | |
103 | 112 | | |
104 | 113 | | |
105 | 114 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
73 | 73 | | |
74 | 74 | | |
75 | 75 | | |
76 | | - | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
77 | 86 | | |
78 | 87 | | |
79 | 88 | | |
80 | 89 | | |
81 | 90 | | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
82 | 95 | | |
83 | 96 | | |
84 | 97 | | |
85 | 98 | | |
86 | 99 | | |
| 100 | + | |
87 | 101 | | |
88 | 102 | | |
89 | 103 | | |
| |||
0 commit comments