Skip to content

refactor(git): improve code health in git.py (CodeScene issues)#1080

Merged
spoorcc merged 11 commits intomainfrom
claude/refactor-git-module-S03ep
Mar 26, 2026
Merged

refactor(git): improve code health in git.py (CodeScene issues)#1080
spoorcc merged 11 commits intomainfrom
claude/refactor-git-module-S03ep

Conversation

@spoorcc
Copy link
Contributor

@spoorcc spoorcc commented Mar 25, 2026

Address four CodeScene-flagged issues in dfetch/vcs/git.py:

  1. Bumpy Road – _move_src_folder_up (complexity 10 → 4)
    Extract _collect_source_dirs() and _move_directory_contents() to
    eliminate three nested-conditional bumps.

  2. Bumpy Road – create_diff (2 bumps)
    Extract _build_hash_args() and _build_ignore_args() so each
    conditional block has a single, named responsibility.

  3. Code Duplication – get_username / get_useremail
    Introduce _get_git_config_value(key) helper; both methods now
    delegate to it, removing structural copy-paste.

  4. Too Many Arguments – checkout_version (5 args → 1)
    Introduce CheckoutOptions dataclass; update the single call site
    in gitsubproject.py accordingly.

All 50 existing tests continue to pass.

https://claude.ai/code/session_014ZRQGvyTdWBqoRjtu7gsSC

Summary by CodeRabbit

  • Refactor

    • Safer temporary-file handling, consolidated HTTP download streaming, unified checkout options, and more robust repository-path moves for improved reliability.
  • Bug Fixes

    • SVN fetch now reliably exports the first detected license file.
  • Tests

    • Expanded coverage for VCS utilities, path-safety behavior, and argument-building helpers.

claude added 4 commits March 25, 2026 18:35
Address four CodeScene-flagged issues in dfetch/vcs/git.py:

1. Bumpy Road – _move_src_folder_up (complexity 10 → 4)
   Extract _collect_source_dirs() and _move_directory_contents() to
   eliminate three nested-conditional bumps.

2. Bumpy Road – create_diff (2 bumps)
   Extract _build_hash_args() and _build_ignore_args() so each
   conditional block has a single, named responsibility.

3. Code Duplication – get_username / get_useremail
   Introduce _get_git_config_value(key) helper; both methods now
   delegate to it, removing structural copy-paste.

4. Too Many Arguments – checkout_version (5 args → 1)
   Introduce CheckoutOptions dataclass; update the single call site
   in gitsubproject.py accordingly.

All 50 existing tests continue to pass.

https://claude.ai/code/session_014ZRQGvyTdWBqoRjtu7gsSC
…lpers

- Replace `for x in list[:1]:` anti-pattern with `if list: x = list[0]`
- Fix misleading any_changes_or_untracked docstring (returns bool, not a list)
- Trim Args/Returns blocks from 2-line private helpers (_build_hash_args,
  _build_ignore_args, _collect_source_dirs) — code already says it
- Add 14 parametrized unit tests covering all edge cases of the three
  newly extracted pure helpers (_collect_source_dirs, _build_hash_args,
  _build_ignore_args)

https://claude.ai/code/session_014ZRQGvyTdWBqoRjtu7gsSC
…isort

_move_directory_contents accepted `src` for warning messages but the
warning already used `src_dir_path`. Dropped the dead parameter to
satisfy pylint (10.00/10). Also applied isort to test file.

https://claude.ai/code/session_014ZRQGvyTdWBqoRjtu7gsSC
archive.py – _http_download (5 → 3 levels deep)
  Extract _stream_response_to_file() to handle the write loop, removing
  the with-open / while-chunk / if-hasher nesting from the redirect loop.

svnsubproject.py – _fetch_impl for/break anti-pattern
  Replace `for file in ...: ...; break` (only ever processes the first
  element) with `license_files = ...; if license_files:` — same fix
  applied to _move_src_folder_up in the previous git.py refactor.

archivesubproject.py – duplicated try-finally temp-file cleanup
  Both _download_and_compute_hash() and _fetch_impl() contained identical
  fd/mkstemp/try/finally/os.remove blocks. Extract a _temp_file()
  context manager so each caller becomes a clean `with _temp_file(...) as
  tmp_path:` block. Adds contextlib + Generator imports.

https://claude.ai/code/session_014ZRQGvyTdWBqoRjtu7gsSC
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 25, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Replaces manual temp-file handling with a temp_file context manager, centralizes HTTP streaming, adds git checkout options and several git utility helpers, tightens src-folder move safety, and adjusts SVN license export selection; tests added for new git utilities and protections.

Changes

Cohort / File(s) Summary
Git core & callers
dfetch/vcs/git.py, dfetch/project/gitsubproject.py, tests/test_git_vcs.py
Added CheckoutOptions and updated GitLocalRepo.checkout_version to accept it; added _get_git_config_value, _build_hash_args, _build_ignore_args; reworked _move_src_folder_up to filter/resolve candidates with unique_parent_dirs and use move_directory_contents with safety checks; updated gitsubproject caller and added tests for unique_parent_dirs, hash/ignore arg builders, and src-move path protections.
Utilities & archive streaming
dfetch/util/util.py, dfetch/vcs/archive.py, dfetch/project/archivesubproject.py
Added temp_file() context manager, unique_parent_dirs(), and move_directory_contents() utilities. Introduced _stream_response_to_file() to centralize response streaming with optional hashing. Replaced manual tempfile lifecycle in ArchiveSubProject with temp_file.
SVN license export behavior
dfetch/project/svnsubproject.py
Changed license export to pick the first detected license file via index access instead of iterating and breaking.
Test/fixtures
features/steps/git_steps.py
Disabled git commit GPG signing in test repo setup (git config commit.gpgsign false).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

development, testing, enhancement

Suggested reviewers

  • ben-edna
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main objective of the pull request, which is refactoring git.py to address CodeScene-identified code health issues.
Docstring Coverage ✅ Passed Docstring coverage is 81.48% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/refactor-git-module-S03ep

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Three private helpers extracted in earlier commits belong in util.py —
they are general filesystem utilities with no VCS-specific logic.

util.py — three additions:
  temp_file(suffix)
    Context manager that creates a temp file, yields its path, and
    guarantees deletion on exit.  Replaces the local _temp_file in
    archivesubproject.py.

  unique_parent_dirs(paths)
    Returns the unique parent directories for a list of paths (dirs
    are kept as-is; files resolve to their parent).  Name describes
    the result, not the mechanism.  Replaces GitLocalRepo._collect_source_dirs
    whose name described the git-specific context it was extracted from.
    Parallels existing copy/move directory helpers.

  move_directory_contents(src_dir, dest_dir)
    Moves every entry from src_dir into dest_dir.  Complements the
    existing copy_directory_contents.  The git-specific domain logic
    (safe_rm of the top ancestor + FileNotFoundError warning) stays in
    git.py's _move_src_folder_up where it belongs.

Call-site changes:
  git.py          — imports unique_parent_dirs + move_directory_contents;
                    removes two static methods; drops unused shutil import.
  archivesubproject.py — imports temp_file; removes contextlib, Generator,
                    os, tempfile imports that were only needed locally.
  tests/test_git_vcs.py — test_collect_source_dirs → test_unique_parent_dirs;
                    patch target updated to dfetch.util.util.os.path.isdir.

https://claude.ai/code/session_014ZRQGvyTdWBqoRjtu7gsSC
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
dfetch/vcs/git.py (1)

367-387: ⚠️ Potential issue | 🔴 Critical

Validate src-derived paths before move/delete operations.

glob(src) results are used directly for move_directory_contents(...) and safe_rm(...). A src containing traversal/absolute patterns can target paths outside the repo, causing unintended file moves/deletes.

🔒 Proposed hardening patch
 from dfetch.util.util import (
+    check_no_path_traversal,
     in_directory,
     is_license_file,
     move_directory_contents,
     safe_rm,
     strip_glob_prefix,
     unique_parent_dirs,
 )
@@
     def _move_src_folder_up(remote: str, src: str) -> None:
@@
         matched_paths = sorted(glob.glob(src))
+        project_root = Path(".").resolve()
 
         if not matched_paths:
             logger.warning(
                 f"The 'src:' filter '{src}' didn't match any files from '{remote}'"
             )
             return
+
+        for matched_path in matched_paths:
+            check_no_path_traversal(matched_path, project_root)
 
         dirs = unique_parent_dirs(matched_paths)
@@
         if dirs:
             try:
-                move_directory_contents(dirs[0], ".")
-                safe_rm(PurePath(dirs[0]).parts[0])
+                check_no_path_traversal(dirs[0], project_root)
+                move_directory_contents(dirs[0], ".")
+                safe_rm(PurePath(dirs[0]).parts[0], within=project_root)
             except FileNotFoundError:
                 logger.warning(
                     f"The 'src:' filter '{dirs[0]}' didn't match any files from '{remote}'"
                 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dfetch/vcs/git.py` around lines 367 - 387, The code uses glob.glob(src)
results (matched_paths) and then passes unique_parent_dirs(...) results directly
to move_directory_contents(...) and safe_rm(...), which can allow traversal or
absolute patterns in src to operate outside the repo; sanitize and validate
those derived paths before mutating: normalize and resolve each matched path and
the chosen directory (dirs[0]) against the repository root, reject or log+return
if src is absolute or resolves outside the repo root, and only call
move_directory_contents(dir, ".") and safe_rm(first_segment) when the resolved
dir is a subpath of the repo; update the logic around matched_paths,
unique_parent_dirs, move_directory_contents, and safe_rm to perform these checks
and fail-safe early.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@dfetch/vcs/git.py`:
- Around line 367-387: The code uses glob.glob(src) results (matched_paths) and
then passes unique_parent_dirs(...) results directly to
move_directory_contents(...) and safe_rm(...), which can allow traversal or
absolute patterns in src to operate outside the repo; sanitize and validate
those derived paths before mutating: normalize and resolve each matched path and
the chosen directory (dirs[0]) against the repository root, reject or log+return
if src is absolute or resolves outside the repo root, and only call
move_directory_contents(dir, ".") and safe_rm(first_segment) when the resolved
dir is a subpath of the repo; update the logic around matched_paths,
unique_parent_dirs, move_directory_contents, and safe_rm to perform these checks
and fail-safe early.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2e39fd87-e0ab-4255-b571-fec2bb1f7e0c

📥 Commits

Reviewing files that changed from the base of the PR and between f07eec2 and 3a91f96.

📒 Files selected for processing (4)
  • dfetch/project/archivesubproject.py
  • dfetch/util/util.py
  • dfetch/vcs/git.py
  • tests/test_git_vcs.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_git_vcs.py

move_directory_contents had no protection against absolute or traversal
src patterns in dfetch.yaml.  A malicious or misconfigured manifest
entry such as `src: /etc` or `src: ../../sensitive` would cause
shutil.move to relocate files from outside the checkout directory into
the project destination before safe_rm's check_no_path_traversal could
intervene on the deletion step.

Two guards added to _move_src_folder_up:

1. Reject absolute src immediately (os.path.isabs) and log a warning.
2. For each path returned by glob.glob(src), resolve it and verify it
   is relative to the repo root (Path.is_relative_to).  Out-of-root
   matches are logged and dropped; if no safe matches remain the
   function returns without calling move_directory_contents or safe_rm.

Two regression tests added:
- test_move_src_folder_up_rejects_absolute_src
- test_move_src_folder_up_rejects_traversal_src

https://claude.ai/code/session_014ZRQGvyTdWBqoRjtu7gsSC
@spoorcc
Copy link
Contributor Author

spoorcc commented Mar 25, 2026

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 25, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dfetch/vcs/git.py`:
- Around line 367-403: The cleanup step uses the original directory piece
(PurePath(dirs[0]).parts[0]) which can be a traversal token and call safe_rm
without restricting deletion scope; change the flow so that dirs is computed and
stored as resolved paths (use Path(...).resolve() for entries in dirs or compute
from safe_matched resolved values), then when calling safe_rm from the cleanup
after move_directory_contents, pass a validated/resolved directory and include
the within=repo_root (or otherwise assert the first part is not a parent
traversal) to ensure safe_rm only removes paths inside the repository; update
references to unique_parent_dirs, move_directory_contents and safe_rm
accordingly so cleanup uses the resolved safe path rather than the original
potentially traversal-containing segment.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4bd6b595-89ad-4a1a-9177-a47dd9fe728e

📥 Commits

Reviewing files that changed from the base of the PR and between 3a91f96 and 4d43350.

📒 Files selected for processing (2)
  • dfetch/vcs/git.py
  • tests/test_git_vcs.py
✅ Files skipped from review due to trivial changes (1)
  • tests/test_git_vcs.py

unique_parent_dirs returns strings from os.path.dirname which does not
normalise '..'.  If safe_matched contained 'some/../inside/file.c'
(legitimate — resolves within repo_root), dirs[0] would be the string
'some/../inside' and PurePath(...).parts[0] would yield 'some', causing
safe_rm to delete the wrong top-level directory instead of 'inside'.

Fix: resolve every entry returned by unique_parent_dirs to an absolute
canonical Path immediately.  Downstream steps then use:

  move_directory_contents(str(chosen), ".")
  top = repo_root / chosen.relative_to(repo_root).parts[0]
  safe_rm(top, within=repo_root)

This ensures:
- The path fed to move_directory_contents has no '..' components.
- The cleanup target is the first component *under* repo_root, derived
  from the resolved path, not from a raw string split.
- safe_rm receives an explicit within=repo_root rather than relying on
  the implicit within="." default.

Also removes the now-unused PurePath import.

https://claude.ai/code/session_014ZRQGvyTdWBqoRjtu7gsSC
…ative path

The global git config had commit.gpgsign=true which caused all BDD test
scenarios that create git repos to error with a signing server failure.
Setting commit.gpgsign=false locally in create_repo() lets these tests run.

Also corrects the 'Only considering files in' warning to show the
relative path rather than the absolute resolved path, restoring the
expected output for the multi-directory src filter scenario.

https://claude.ai/code/session_014ZRQGvyTdWBqoRjtu7gsSC
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dfetch/vcs/git.py`:
- Around line 400-409: The code assumes chosen.relative_to(repo_root).parts has
at least one element which can raise IndexError when chosen equals repo_root;
after moving contents with move_directory_contents(str(chosen), ".") add a guard
using parts = chosen.relative_to(repo_root).parts and only compute top =
repo_root / parts[0] and call safe_rm(top, within=repo_root) when parts is
non-empty (or when chosen != repo_root); otherwise skip the safe_rm step.
Reference symbols: resolved_dirs, chosen, repo_root, move_directory_contents,
safe_rm.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4eba23c3-4b1d-4743-b7a0-8ad1f047a1ed

📥 Commits

Reviewing files that changed from the base of the PR and between 000d2e3 and 26bc60e.

📒 Files selected for processing (2)
  • dfetch/vcs/git.py
  • features/steps/git_steps.py
✅ Files skipped from review due to trivial changes (1)
  • features/steps/git_steps.py

Comment on lines +400 to 409
if resolved_dirs:
chosen = resolved_dirs[0]
try:
for file_to_copy in os.listdir(src_dir_path):
shutil.move(src_dir_path + "/" + file_to_copy, ".")
safe_rm(PurePath(src_dir_path).parts[0])
move_directory_contents(str(chosen), ".")
top = repo_root / chosen.relative_to(repo_root).parts[0]
safe_rm(top, within=repo_root)
except FileNotFoundError:
logger.warning(
f"The 'src:' filter '{src_dir_path}' didn't match any files from '{remote}'"
f"The 'src:' filter '{chosen}' didn't match any files from '{remote}'"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential IndexError if the resolved source directory equals the repository root.

If src resolves to the repository root itself (e.g., src: "."), then chosen.relative_to(repo_root).parts returns an empty tuple, and accessing .parts[0] raises IndexError.

Consider adding a guard:

Proposed fix
         if resolved_dirs:
             chosen = resolved_dirs[0]
+            rel_parts = chosen.relative_to(repo_root).parts
+            if not rel_parts:
+                logger.warning(
+                    f"The 'src:' filter '{src}' resolved to the repo root; nothing to move for '{remote}'"
+                )
+                return
             try:
                 move_directory_contents(str(chosen), ".")
-                top = repo_root / chosen.relative_to(repo_root).parts[0]
+                top = repo_root / rel_parts[0]
                 safe_rm(top, within=repo_root)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if resolved_dirs:
chosen = resolved_dirs[0]
try:
for file_to_copy in os.listdir(src_dir_path):
shutil.move(src_dir_path + "/" + file_to_copy, ".")
safe_rm(PurePath(src_dir_path).parts[0])
move_directory_contents(str(chosen), ".")
top = repo_root / chosen.relative_to(repo_root).parts[0]
safe_rm(top, within=repo_root)
except FileNotFoundError:
logger.warning(
f"The 'src:' filter '{src_dir_path}' didn't match any files from '{remote}'"
f"The 'src:' filter '{chosen}' didn't match any files from '{remote}'"
)
if resolved_dirs:
chosen = resolved_dirs[0]
rel_parts = chosen.relative_to(repo_root).parts
if not rel_parts:
logger.warning(
f"The 'src:' filter '{src}' resolved to the repo root; nothing to move for '{remote}'"
)
return
try:
move_directory_contents(str(chosen), ".")
top = repo_root / rel_parts[0]
safe_rm(top, within=repo_root)
except FileNotFoundError:
logger.warning(
f"The 'src:' filter '{chosen}' didn't match any files from '{remote}'"
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dfetch/vcs/git.py` around lines 400 - 409, The code assumes
chosen.relative_to(repo_root).parts has at least one element which can raise
IndexError when chosen equals repo_root; after moving contents with
move_directory_contents(str(chosen), ".") add a guard using parts =
chosen.relative_to(repo_root).parts and only compute top = repo_root / parts[0]
and call safe_rm(top, within=repo_root) when parts is non-empty (or when chosen
!= repo_root); otherwise skip the safe_rm step. Reference symbols:
resolved_dirs, chosen, repo_root, move_directory_contents, safe_rm.

chosen.relative_to(repo_root).parts is empty when the resolved src
directory is the repo root itself (e.g. src: .). Extracting parts into
a variable and checking it before indexing prevents an IndexError and
correctly skips the safe_rm step in that edge case.

https://claude.ai/code/session_014ZRQGvyTdWBqoRjtu7gsSC
@spoorcc spoorcc merged commit bad6fa8 into main Mar 26, 2026
40 checks passed
@spoorcc spoorcc deleted the claude/refactor-git-module-S03ep branch March 26, 2026 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants