refactor(git): improve code health in git.py (CodeScene issues)#1080
refactor(git): improve code health in git.py (CodeScene issues)#1080
Conversation
Address four CodeScene-flagged issues in dfetch/vcs/git.py: 1. Bumpy Road – _move_src_folder_up (complexity 10 → 4) Extract _collect_source_dirs() and _move_directory_contents() to eliminate three nested-conditional bumps. 2. Bumpy Road – create_diff (2 bumps) Extract _build_hash_args() and _build_ignore_args() so each conditional block has a single, named responsibility. 3. Code Duplication – get_username / get_useremail Introduce _get_git_config_value(key) helper; both methods now delegate to it, removing structural copy-paste. 4. Too Many Arguments – checkout_version (5 args → 1) Introduce CheckoutOptions dataclass; update the single call site in gitsubproject.py accordingly. All 50 existing tests continue to pass. https://claude.ai/code/session_014ZRQGvyTdWBqoRjtu7gsSC
…lpers - Replace `for x in list[:1]:` anti-pattern with `if list: x = list[0]` - Fix misleading any_changes_or_untracked docstring (returns bool, not a list) - Trim Args/Returns blocks from 2-line private helpers (_build_hash_args, _build_ignore_args, _collect_source_dirs) — code already says it - Add 14 parametrized unit tests covering all edge cases of the three newly extracted pure helpers (_collect_source_dirs, _build_hash_args, _build_ignore_args) https://claude.ai/code/session_014ZRQGvyTdWBqoRjtu7gsSC
…isort _move_directory_contents accepted `src` for warning messages but the warning already used `src_dir_path`. Dropped the dead parameter to satisfy pylint (10.00/10). Also applied isort to test file. https://claude.ai/code/session_014ZRQGvyTdWBqoRjtu7gsSC
archive.py – _http_download (5 → 3 levels deep) Extract _stream_response_to_file() to handle the write loop, removing the with-open / while-chunk / if-hasher nesting from the redirect loop. svnsubproject.py – _fetch_impl for/break anti-pattern Replace `for file in ...: ...; break` (only ever processes the first element) with `license_files = ...; if license_files:` — same fix applied to _move_src_folder_up in the previous git.py refactor. archivesubproject.py – duplicated try-finally temp-file cleanup Both _download_and_compute_hash() and _fetch_impl() contained identical fd/mkstemp/try/finally/os.remove blocks. Extract a _temp_file() context manager so each caller becomes a clean `with _temp_file(...) as tmp_path:` block. Adds contextlib + Generator imports. https://claude.ai/code/session_014ZRQGvyTdWBqoRjtu7gsSC
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughReplaces manual temp-file handling with a Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Three private helpers extracted in earlier commits belong in util.py —
they are general filesystem utilities with no VCS-specific logic.
util.py — three additions:
temp_file(suffix)
Context manager that creates a temp file, yields its path, and
guarantees deletion on exit. Replaces the local _temp_file in
archivesubproject.py.
unique_parent_dirs(paths)
Returns the unique parent directories for a list of paths (dirs
are kept as-is; files resolve to their parent). Name describes
the result, not the mechanism. Replaces GitLocalRepo._collect_source_dirs
whose name described the git-specific context it was extracted from.
Parallels existing copy/move directory helpers.
move_directory_contents(src_dir, dest_dir)
Moves every entry from src_dir into dest_dir. Complements the
existing copy_directory_contents. The git-specific domain logic
(safe_rm of the top ancestor + FileNotFoundError warning) stays in
git.py's _move_src_folder_up where it belongs.
Call-site changes:
git.py — imports unique_parent_dirs + move_directory_contents;
removes two static methods; drops unused shutil import.
archivesubproject.py — imports temp_file; removes contextlib, Generator,
os, tempfile imports that were only needed locally.
tests/test_git_vcs.py — test_collect_source_dirs → test_unique_parent_dirs;
patch target updated to dfetch.util.util.os.path.isdir.
https://claude.ai/code/session_014ZRQGvyTdWBqoRjtu7gsSC
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
dfetch/vcs/git.py (1)
367-387:⚠️ Potential issue | 🔴 CriticalValidate
src-derived paths before move/delete operations.
glob(src)results are used directly formove_directory_contents(...)andsafe_rm(...). Asrccontaining traversal/absolute patterns can target paths outside the repo, causing unintended file moves/deletes.🔒 Proposed hardening patch
from dfetch.util.util import ( + check_no_path_traversal, in_directory, is_license_file, move_directory_contents, safe_rm, strip_glob_prefix, unique_parent_dirs, ) @@ def _move_src_folder_up(remote: str, src: str) -> None: @@ matched_paths = sorted(glob.glob(src)) + project_root = Path(".").resolve() if not matched_paths: logger.warning( f"The 'src:' filter '{src}' didn't match any files from '{remote}'" ) return + + for matched_path in matched_paths: + check_no_path_traversal(matched_path, project_root) dirs = unique_parent_dirs(matched_paths) @@ if dirs: try: - move_directory_contents(dirs[0], ".") - safe_rm(PurePath(dirs[0]).parts[0]) + check_no_path_traversal(dirs[0], project_root) + move_directory_contents(dirs[0], ".") + safe_rm(PurePath(dirs[0]).parts[0], within=project_root) except FileNotFoundError: logger.warning( f"The 'src:' filter '{dirs[0]}' didn't match any files from '{remote}'" )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dfetch/vcs/git.py` around lines 367 - 387, The code uses glob.glob(src) results (matched_paths) and then passes unique_parent_dirs(...) results directly to move_directory_contents(...) and safe_rm(...), which can allow traversal or absolute patterns in src to operate outside the repo; sanitize and validate those derived paths before mutating: normalize and resolve each matched path and the chosen directory (dirs[0]) against the repository root, reject or log+return if src is absolute or resolves outside the repo root, and only call move_directory_contents(dir, ".") and safe_rm(first_segment) when the resolved dir is a subpath of the repo; update the logic around matched_paths, unique_parent_dirs, move_directory_contents, and safe_rm to perform these checks and fail-safe early.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@dfetch/vcs/git.py`:
- Around line 367-387: The code uses glob.glob(src) results (matched_paths) and
then passes unique_parent_dirs(...) results directly to
move_directory_contents(...) and safe_rm(...), which can allow traversal or
absolute patterns in src to operate outside the repo; sanitize and validate
those derived paths before mutating: normalize and resolve each matched path and
the chosen directory (dirs[0]) against the repository root, reject or log+return
if src is absolute or resolves outside the repo root, and only call
move_directory_contents(dir, ".") and safe_rm(first_segment) when the resolved
dir is a subpath of the repo; update the logic around matched_paths,
unique_parent_dirs, move_directory_contents, and safe_rm to perform these checks
and fail-safe early.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 2e39fd87-e0ab-4255-b571-fec2bb1f7e0c
📒 Files selected for processing (4)
dfetch/project/archivesubproject.pydfetch/util/util.pydfetch/vcs/git.pytests/test_git_vcs.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/test_git_vcs.py
move_directory_contents had no protection against absolute or traversal src patterns in dfetch.yaml. A malicious or misconfigured manifest entry such as `src: /etc` or `src: ../../sensitive` would cause shutil.move to relocate files from outside the checkout directory into the project destination before safe_rm's check_no_path_traversal could intervene on the deletion step. Two guards added to _move_src_folder_up: 1. Reject absolute src immediately (os.path.isabs) and log a warning. 2. For each path returned by glob.glob(src), resolve it and verify it is relative to the repo root (Path.is_relative_to). Out-of-root matches are logged and dropped; if no safe matches remain the function returns without calling move_directory_contents or safe_rm. Two regression tests added: - test_move_src_folder_up_rejects_absolute_src - test_move_src_folder_up_rejects_traversal_src https://claude.ai/code/session_014ZRQGvyTdWBqoRjtu7gsSC
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dfetch/vcs/git.py`:
- Around line 367-403: The cleanup step uses the original directory piece
(PurePath(dirs[0]).parts[0]) which can be a traversal token and call safe_rm
without restricting deletion scope; change the flow so that dirs is computed and
stored as resolved paths (use Path(...).resolve() for entries in dirs or compute
from safe_matched resolved values), then when calling safe_rm from the cleanup
after move_directory_contents, pass a validated/resolved directory and include
the within=repo_root (or otherwise assert the first part is not a parent
traversal) to ensure safe_rm only removes paths inside the repository; update
references to unique_parent_dirs, move_directory_contents and safe_rm
accordingly so cleanup uses the resolved safe path rather than the original
potentially traversal-containing segment.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4bd6b595-89ad-4a1a-9177-a47dd9fe728e
📒 Files selected for processing (2)
dfetch/vcs/git.pytests/test_git_vcs.py
✅ Files skipped from review due to trivial changes (1)
- tests/test_git_vcs.py
unique_parent_dirs returns strings from os.path.dirname which does not normalise '..'. If safe_matched contained 'some/../inside/file.c' (legitimate — resolves within repo_root), dirs[0] would be the string 'some/../inside' and PurePath(...).parts[0] would yield 'some', causing safe_rm to delete the wrong top-level directory instead of 'inside'. Fix: resolve every entry returned by unique_parent_dirs to an absolute canonical Path immediately. Downstream steps then use: move_directory_contents(str(chosen), ".") top = repo_root / chosen.relative_to(repo_root).parts[0] safe_rm(top, within=repo_root) This ensures: - The path fed to move_directory_contents has no '..' components. - The cleanup target is the first component *under* repo_root, derived from the resolved path, not from a raw string split. - safe_rm receives an explicit within=repo_root rather than relying on the implicit within="." default. Also removes the now-unused PurePath import. https://claude.ai/code/session_014ZRQGvyTdWBqoRjtu7gsSC
…ative path The global git config had commit.gpgsign=true which caused all BDD test scenarios that create git repos to error with a signing server failure. Setting commit.gpgsign=false locally in create_repo() lets these tests run. Also corrects the 'Only considering files in' warning to show the relative path rather than the absolute resolved path, restoring the expected output for the multi-directory src filter scenario. https://claude.ai/code/session_014ZRQGvyTdWBqoRjtu7gsSC
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dfetch/vcs/git.py`:
- Around line 400-409: The code assumes chosen.relative_to(repo_root).parts has
at least one element which can raise IndexError when chosen equals repo_root;
after moving contents with move_directory_contents(str(chosen), ".") add a guard
using parts = chosen.relative_to(repo_root).parts and only compute top =
repo_root / parts[0] and call safe_rm(top, within=repo_root) when parts is
non-empty (or when chosen != repo_root); otherwise skip the safe_rm step.
Reference symbols: resolved_dirs, chosen, repo_root, move_directory_contents,
safe_rm.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4eba23c3-4b1d-4743-b7a0-8ad1f047a1ed
📒 Files selected for processing (2)
dfetch/vcs/git.pyfeatures/steps/git_steps.py
✅ Files skipped from review due to trivial changes (1)
- features/steps/git_steps.py
| if resolved_dirs: | ||
| chosen = resolved_dirs[0] | ||
| try: | ||
| for file_to_copy in os.listdir(src_dir_path): | ||
| shutil.move(src_dir_path + "/" + file_to_copy, ".") | ||
| safe_rm(PurePath(src_dir_path).parts[0]) | ||
| move_directory_contents(str(chosen), ".") | ||
| top = repo_root / chosen.relative_to(repo_root).parts[0] | ||
| safe_rm(top, within=repo_root) | ||
| except FileNotFoundError: | ||
| logger.warning( | ||
| f"The 'src:' filter '{src_dir_path}' didn't match any files from '{remote}'" | ||
| f"The 'src:' filter '{chosen}' didn't match any files from '{remote}'" | ||
| ) |
There was a problem hiding this comment.
Potential IndexError if the resolved source directory equals the repository root.
If src resolves to the repository root itself (e.g., src: "."), then chosen.relative_to(repo_root).parts returns an empty tuple, and accessing .parts[0] raises IndexError.
Consider adding a guard:
Proposed fix
if resolved_dirs:
chosen = resolved_dirs[0]
+ rel_parts = chosen.relative_to(repo_root).parts
+ if not rel_parts:
+ logger.warning(
+ f"The 'src:' filter '{src}' resolved to the repo root; nothing to move for '{remote}'"
+ )
+ return
try:
move_directory_contents(str(chosen), ".")
- top = repo_root / chosen.relative_to(repo_root).parts[0]
+ top = repo_root / rel_parts[0]
safe_rm(top, within=repo_root)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if resolved_dirs: | |
| chosen = resolved_dirs[0] | |
| try: | |
| for file_to_copy in os.listdir(src_dir_path): | |
| shutil.move(src_dir_path + "/" + file_to_copy, ".") | |
| safe_rm(PurePath(src_dir_path).parts[0]) | |
| move_directory_contents(str(chosen), ".") | |
| top = repo_root / chosen.relative_to(repo_root).parts[0] | |
| safe_rm(top, within=repo_root) | |
| except FileNotFoundError: | |
| logger.warning( | |
| f"The 'src:' filter '{src_dir_path}' didn't match any files from '{remote}'" | |
| f"The 'src:' filter '{chosen}' didn't match any files from '{remote}'" | |
| ) | |
| if resolved_dirs: | |
| chosen = resolved_dirs[0] | |
| rel_parts = chosen.relative_to(repo_root).parts | |
| if not rel_parts: | |
| logger.warning( | |
| f"The 'src:' filter '{src}' resolved to the repo root; nothing to move for '{remote}'" | |
| ) | |
| return | |
| try: | |
| move_directory_contents(str(chosen), ".") | |
| top = repo_root / rel_parts[0] | |
| safe_rm(top, within=repo_root) | |
| except FileNotFoundError: | |
| logger.warning( | |
| f"The 'src:' filter '{chosen}' didn't match any files from '{remote}'" | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dfetch/vcs/git.py` around lines 400 - 409, The code assumes
chosen.relative_to(repo_root).parts has at least one element which can raise
IndexError when chosen equals repo_root; after moving contents with
move_directory_contents(str(chosen), ".") add a guard using parts =
chosen.relative_to(repo_root).parts and only compute top = repo_root / parts[0]
and call safe_rm(top, within=repo_root) when parts is non-empty (or when chosen
!= repo_root); otherwise skip the safe_rm step. Reference symbols:
resolved_dirs, chosen, repo_root, move_directory_contents, safe_rm.
chosen.relative_to(repo_root).parts is empty when the resolved src directory is the repo root itself (e.g. src: .). Extracting parts into a variable and checking it before indexing prevents an IndexError and correctly skips the safe_rm step in that edge case. https://claude.ai/code/session_014ZRQGvyTdWBqoRjtu7gsSC
Address four CodeScene-flagged issues in dfetch/vcs/git.py:
Bumpy Road – _move_src_folder_up (complexity 10 → 4)
Extract _collect_source_dirs() and _move_directory_contents() to
eliminate three nested-conditional bumps.
Bumpy Road – create_diff (2 bumps)
Extract _build_hash_args() and _build_ignore_args() so each
conditional block has a single, named responsibility.
Code Duplication – get_username / get_useremail
Introduce _get_git_config_value(key) helper; both methods now
delegate to it, removing structural copy-paste.
Too Many Arguments – checkout_version (5 args → 1)
Introduce CheckoutOptions dataclass; update the single call site
in gitsubproject.py accordingly.
All 50 existing tests continue to pass.
https://claude.ai/code/session_014ZRQGvyTdWBqoRjtu7gsSC
Summary by CodeRabbit
Refactor
Bug Fixes
Tests