🛡️ Sentinel: Fix path traversal and environment leakage vulnerabilities#172
Conversation
- Mitigate path traversal risk in `heidi_engine/telemetry.py` and `heidi_engine/dashboard.py` by sanitizing `run_id` with `Path(run_id).name`. - Secure untrusted code execution in `scripts/03_unit_test_gate.py` by using a restricted `safe_env` (PATH, PYTHONPATH, HOME, LANG, PYTHONDONTWRITEBYTECODE) to prevent secret leakage from `os.environ`. - Fix `NameError` in `heidi_engine/telemetry.py` by removing a redundant and broken cache check in `get_state`. - Improve robustness of `save_jsonl` in `scripts/03_unit_test_gate.py` to handle paths without directory components. - Document vulnerabilities and preventions in `.jules/sentinel.md`.
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses critical security vulnerabilities related to path traversal and environment variable leakage, along with a functional bug causing a Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request addresses several security vulnerabilities and a bug. The path traversal fix in heidi_engine/dashboard.py and heidi_engine/telemetry.py is correctly implemented by sanitizing the run_id. The environment variable leakage in scripts/03_unit_test_gate.py is also well-handled by using a whitelisted safe_env. However, the fix for the NameError in heidi_engine/telemetry.py's get_state function, while solving the immediate crash, has introduced a performance regression by removing the caching layer entirely. I've provided a comment with a suggestion to reintroduce the caching logic in a correct and more efficient way.
| try: | ||
| with open(state_file) as f: | ||
| state = json.load(f) |
There was a problem hiding this comment.
While removing the buggy code block fixed the NameError, it also removed the caching mechanism entirely. This will cause a performance regression as get_state will now read from the disk on every call. The caching logic should be restored.
The original code was also redundant. A correct and efficient implementation should first check the cache, and if that fails, return a default state, as the cache's get method already attempts to read from the file. The logic after the if not state_file.exists(): check could be simplified to:
# Check thread-safe state cache. The .get() method will handle loading from file.
cached = _state_cache.get(run_id, state_file)
if cached:
return cached
# If cache is empty (file exists but is corrupt), return default state.
return get_default_state(run_id)This would replace the entire try...except block that currently starts here.
I have identified and fixed two critical security vulnerabilities and one functional bug:
Path Traversal Mitigation: The
get_run_dirfunction in bothheidi_engine/telemetry.pyandheidi_engine/dashboard.pypreviously used rawrun_idstrings to construct file paths. This allowed an attacker (or accidental misconfiguration) to provide an absolute path asrun_id, escaping the intendedruns/directory and potentially reading/writing sensitive files elsewhere on the system. I sanitized this by usingPath(run_id).name, which extracts only the final component of the path.Environment Variable Isolation: The
scripts/03_unit_test_gate.pyscript was passing the entire host environment (os.environ) to subprocesses running untrusted (synthetic) code. This could leak sensitive credentials likeOPENAI_API_KEY. I implemented asafe_envwhitelist that only provides necessary variables (PATH,PYTHONPATH,HOME,LANG, andPYTHONDONTWRITEBYTECODE) to the subprocess.NameError Bug Fix: In
heidi_engine/telemetry.py, a redundant cache check in theget_statefunction was referencing an undefined variabletarget_run_id, which would cause aNameErrorand potentially a Denial of Service for telemetry clients. I removed this redundant check.Journaling: I've documented these findings and their preventions in the Sentinel security journal (
.jules/sentinel.md).All changes were verified with targeted reproduction scripts and the full
pytestsuite.PR created automatically by Jules for task 7801631284483336633 started by @heidi-dang