gh-runner: keep only actions runner configuration in external volume#53
gh-runner: keep only actions runner configuration in external volume#53lukkrusz wants to merge 3 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the GitHub Actions runner Dockerfile to optimize the multi-stage build, trim run-stage dependencies, and introduce a persistent volume for runner configuration. However, several critical issues were identified: the configuration linking logic in entry.py can lead to permanent configuration loss on container restart, and PyOCD packs installed in the build stage are lost because they are not copied to the run stage. Additionally, it is recommended to combine the separate COPY instructions for udev rules to reduce image layers.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| def link_config(): | ||
| os.makedirs(CONFIG_DIR, exist_ok=True) | ||
| for name in CONFIG_FILES: | ||
| link = os.path.join('/actions-runner', name) | ||
| target = os.path.join(CONFIG_DIR, name) | ||
| if os.path.islink(link) and os.readlink(link) == target: | ||
| continue | ||
| if os.path.lexists(link): | ||
| os.remove(link) | ||
| os.symlink(target, link) | ||
|
|
||
|
|
||
| API_URL = resolve_api_url() | ||
|
|
||
| link_config() | ||
|
|
||
| if not os.path.isfile('.runner'): | ||
| configure_runner() |
There was a problem hiding this comment.
Critical Bug: Configuration Loss on Container Restart
In the current implementation, link_config() is called at startup and immediately creates broken symlinks in /actions-runner pointing to non-existent files in /runner-config (since the volume is initially empty).
- First Run:
link_config()creates broken symlinks. Thenconfigure_runner()runs. If the runner configuration process or .NET runtime performs atomic writes (e.g., writing to a temporary file and renaming/moving it to.runner), the broken symlink is replaced by a regular file in/actions-runner, while/runner-config/.runnerremains empty. - Container Restart: On the next boot,
link_config()runs again. Since/actions-runner/.runneris now a regular file (not a symlink),os.path.islink(link)isFalse. The code then executesif os.path.lexists(link): os.remove(link)and deletes the newly created configuration file! Since the file was never successfully written to/runner-config, the configuration is permanently lost.
Solution
Implement a robust two-way sync/linking mechanism:
- If a regular file exists at the link path (e.g., newly created or updated during runtime), move it to the persistent volume.
- If the file exists in the persistent volume, ensure the symlink points to it.
- Call
link_config()both at startup and immediately afterconfigure_runner()to ensure newly created configurations are immediately persisted.
def link_config():
os.makedirs(CONFIG_DIR, exist_ok=True)
for name in CONFIG_FILES:
link = os.path.join('/actions-runner', name)
target = os.path.join(CONFIG_DIR, name)
if os.path.exists(link) and not os.path.islink(link):
if os.path.exists(target):
os.remove(target)
os.rename(link, target)
if os.path.exists(target):
if os.path.islink(link):
if os.readlink(link) == target:
continue
os.remove(link)
elif os.path.lexists(link):
os.remove(link)
os.symlink(target, link)
API_URL = resolve_api_url()
link_config()
if not os.path.isfile('.runner'):
configure_runner()
link_config()| # install all dependencies | ||
| RUN pip3 install --no-cache-dir pyocd requests && \ | ||
| pyocd pack install mcxn947 |
There was a problem hiding this comment.
High Severity: Missing PyOCD Packs in Run Stage
Moving pyocd pack install mcxn947 to the build stage means the pack is downloaded to the build stage's user home directory (typically /root/.cache/pyocd or /root/.local/share/cmsis-pack-manager). Since only /opt/venv is copied to the run stage, the downloaded CMSIS pack is lost, and PyOCD will not have the mcxn947 pack installed in the final image.
To fix this cleanly and robustly, define a system-wide CMSIS_PACK_ROOT environment variable in both stages and copy the downloaded packs to the run stage.
# install all dependencies
ENV CMSIS_PACK_ROOT=/opt/cmsis-packs
RUN pip3 install --no-cache-dir pyocd requests && \
pyocd pack install mcxn947
| COPY --from=build /opt/venv /opt/venv | ||
| ENV PATH="/opt/venv/bin:${PATH}" |
There was a problem hiding this comment.
| # get udev rules for hardware debuggers (JLink and OpenOCD) | ||
| COPY --from=build /etc/udev/rules.d/99-jlink.rules /etc/udev/rules.d/ | ||
| COPY --from=build /etc/udev/rules.d/60-openocd.rules /etc/udev/rules.d/ |
There was a problem hiding this comment.
Medium Severity: Combine COPY Instructions
Instead of copying the udev rules individually, you can copy the entire /etc/udev/rules.d/ directory in a single COPY instruction. This is cleaner, more idiomatic, and reduces the number of layers in the final image.
# get udev rules for hardware debuggers (JLink and OpenOCD)
COPY --from=build /etc/udev/rules.d/ /etc/udev/rules.d/
There was a problem hiding this comment.
/gemini This copy targets seem specific, as per the comment, so leaving as is.
There was a problem hiding this comment.
Understood. If the directory /etc/udev/rules.d/ contains other files that should not be included in the final image, then keeping the COPY instructions explicit is indeed the correct approach to avoid unintended side effects. Thank you for the clarification.
There was a problem hiding this comment.
you could still use 2 source files and 1 target dir. We need to optimize heavily the docker commands, as each RUN/COPY/ADD creates a new layer in the underlying FS.
Why did you move jlink installation from the run stage to the build stage? Is it needed for building something?
There was a problem hiding this comment.
You're right. I previously attempted a removal of wget from run stage, which I later reverted but kept this. I see now that it should be moved back to run stage as well. This Dockerfile is very much a draft still. Thanks for taking a look
8720401 to
7aeecb2
Compare
nalajcie
left a comment
There was a problem hiding this comment.
I know this is a draft, just pointing out some issues I've seen in the email notification
| # get udev rules for hardware debuggers (JLink and OpenOCD) | ||
| COPY --from=build /etc/udev/rules.d/99-jlink.rules /etc/udev/rules.d/ | ||
| COPY --from=build /etc/udev/rules.d/60-openocd.rules /etc/udev/rules.d/ |
There was a problem hiding this comment.
you could still use 2 source files and 1 target dir. We need to optimize heavily the docker commands, as each RUN/COPY/ADD creates a new layer in the underlying FS.
Why did you move jlink installation from the run stage to the build stage? Is it needed for building something?
| ## NEXT VERSION | ||
|
|
||
| ## 20260414 (latest) | ||
| ## 20260609 (latest) |
There was a problem hiding this comment.
you're not making a release, put the changes under "NEXT VERSION"
7aeecb2 to
4ce0600
Compare
Improvements made to optimize the Dockerfile and reduce the number of layers to minimum. Assisted-by: Claude Sonnet 4.6 YT: CI-346
4ce0600 to
772d033
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request improves the gh-runner Dockerfile by copying build artifacts in a single layer, trimming run-stage dependencies, and fixing package names. It also updates entry.py to persist only the runner configuration inside an external volume and improves error handling. The review feedback suggests stripping trailing slashes from the repository URL to avoid duplicate slashes in the API endpoint, and safely masking the registration token by matching exact arguments in the command list instead of performing a substring replacement on the joined command string.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| command = " ".join(exe).replace(token, "<TOKEN>") | ||
| print(f"{command} failed!") |
There was a problem hiding this comment.
Using .replace(token, "<TOKEN>") on the joined command string can accidentally replace substrings in other arguments (such as the runner name, labels, or paths) if they happen to contain the token string. A safer approach is to mask the token by matching exact arguments in the exe list before joining them.
| command = " ".join(exe).replace(token, "<TOKEN>") | |
| print(f"{command} failed!") | |
| command = " ".join("<TOKEN>" if arg == token else arg for arg in exe) | |
| print(f"{command} failed!") |
There was a problem hiding this comment.
I don't think that this would be possible, anything containing the token should be somewhat redacted
Create symlinks pointing to the external volume with config, to ensure that, when updated, the config persists, while the old binaries are removed. Assisted-by: Claude Opus 4.8 YT: CI-346
YT: CI-346
772d033 to
685a6c0
Compare
All boxes must be checked before marking the PR as ready for review.
Checklist
CHANGELOG.mdupdated