Fetch git submodule during update of git subproject#1017
Conversation
WalkthroughAdds Git submodule discovery and recording of submodule dependencies in project metadata; changes subproject fetch APIs to return dependency lists; updates Git handling, reporting output, and filesystem utilities; expands BDD and unit tests and example manifest; updates changelog and formatting config. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as dfetch Client
participant SubProj as SubProject.update()
participant GitSub as GitSubProject._fetch_impl()
participant GitRepo as GitLocalRepo
participant Metadata as Metadata.fetched()
Client->>SubProj: update(version)
SubProj->>GitSub: _fetch_impl(version)
GitSub->>GitRepo: checkout_version()
GitRepo->>GitRepo: git submodule update --init --recursive
GitRepo-->>GitSub: (sha, list[Submodule])
GitSub->>GitSub: build list[Dependency]
GitSub-->>SubProj: (Version, list[Dependency])
SubProj->>Metadata: fetched(version, dependencies=[...])
Metadata->>Metadata: store dependencies
SubProj-->>Client: update complete
sequenceDiagram
participant Reporter as stdout_reporter
participant DLogger as DLogger
participant Metadata as Metadata
Reporter->>DLogger: print_info_line(project_name, "")
DLogger-->>Reporter: project header
Reporter->>Metadata: get metadata
Metadata-->>Reporter: metadata + dependencies
Reporter->>Reporter: print remote, branch, tag, revision...
alt has dependencies
Reporter->>Reporter: print "dependencies:" and per-dependency fields
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
126d91c to
dda77ef
Compare
dda77ef to
5b52e24
Compare
5b52e24 to
5148a28
Compare
|
Note Unit test generation is a public access feature. Expect some limitations and changes as we gather feedback and continue to improve it. Generating unit tests... This may take up to 20 minutes. |
|
✅ Created PR with unit tests: #1025 |
5148a28 to
ad901d7
Compare
ad901d7 to
3badd46
Compare
3badd46 to
70f0adc
Compare
70f0adc to
78711d1
Compare
78711d1 to
b81f411
Compare
b81f411 to
07b6553
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
dfetch/vcs/git.py (1)
305-316:⚠️ Potential issue | 🟠 MajorApply the
src:remap before persisting dependency paths.Line 305 snapshots submodules before Lines 313-314 rewrite the checkout. If
src: vendor/libcontainsvendor/lib/ext/zlib, the exported tree ends up withext/zlibbut the dependency metadata still storesvendor/lib/ext/zlib; submodules outsidesrcare also reported even though they were removed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dfetch/vcs/git.py` around lines 305 - 316, The code captures submodules and current SHA before applying the src remap, causing dependency paths to persist pre-remap locations; move the src rewrite so it runs before snapshotting submodules and reading HEAD. Specifically, call self.move_src_folder_up(remote, src) (with the same parameters) before invoking self.submodules() and before the run_on_cmdline(... "git", "rev-parse", "HEAD") call so the exported tree and recorded submodule paths reflect the remapped src layout.
🧹 Nitpick comments (1)
dfetch/util/util.py (1)
45-51: Broken symlinks won't be removed.The use of
os.path.exists()returnsFalsefor broken symlinks (since it follows the link to check target existence). If the intent is to remove the path regardless of whether it's a valid symlink, consider usingos.path.lexists()instead, which returnsTruefor broken symlinks.If broken symlinks should be silently ignored, the current implementation is correct.
♻️ Optional fix if broken symlinks should be removed
def safe_rm(path: str | Path) -> None: """Delete an file or directory safely.""" - if os.path.exists(path): + if os.path.lexists(path): if os.path.isdir(path): safe_rmtree(str(path)) else: os.remove(path)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dfetch/util/util.py` around lines 45 - 51, The safe_rm function uses os.path.exists which returns False for broken symlinks; change the existence check to os.path.lexists(path) so broken symlinks are detected and removed. In safe_rm (and when calling safe_rmtree/remove), keep the existing branch logic (os.path.isdir -> safe_rmtree, else -> os.remove) but gate it on os.path.lexists instead of os.path.exists so broken symlinks fall through to the os.remove path and are deleted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CHANGELOG.rst`:
- Line 5: The changelog entry "* Fetch git submodules in git subproject at
pinned revision (`#1015`)" references the wrong issue number; verify whether the
correct reference is `#1017` (per PR metadata) or `#1013` (per commit summary) and
update that parenthetical to the confirmed issue number so the CHANGELOG.rst
entry matches the PR and commit message.
In `@dfetch/util/util.py`:
- Line 46: Fix the minor typo in the docstring that currently reads "Delete an
file or directory safely." by changing it to "Delete a file or directory
safely."; update the docstring text in dfetch/util/util.py where that string
appears so the grammar is correct.
In `@dfetch/vcs/git.py`:
- Around line 299-305: In submodules() fix nested-submodule resolution: stop
using sm_path and the single cached urls map; instead parse each toplevel's
.gitmodules to build a fresh urls mapping per toplevel, use the $displaypath
value (not $sm_path) when emitting/looking up submodule entries and when
constructing full paths, and ensure url lookups use the corresponding
toplevel-specific map so names/URLs come from the same .gitmodules file.
In `@features/fetch-git-repo-with-submodule.feature`:
- Line 70: The expected "last fetch" value in
features/fetch-git-repo-with-submodule.feature is hardcoded to "26/02/2026,
20:28:24" and must be aligned with the suite's fixed clock; update the "last
fetch" expectation (the line containing the literal last fetch timestamp) to use
the shared fixed timestamp "02/07/2021, 20:25:56" (the same value asserted in
features/list-projects.feature) so the scenario uses the common clock fixture.
In `@features/steps/generic_steps.py`:
- Around line 333-335: The step function step_impl for the "{name} exists" step
currently delegates to check_file_exists which uses os.path.isfile() and only
accepts regular files; update the behavior to check for any filesystem entry by
either (a) changing check_file_exists to use os.path.exists() instead of
os.path.isfile(), or (b) create/rename a helper like check_path_exists that uses
os.path.exists() and call that from step_impl so the step truly asserts path
existence (reference: step_impl and check_file_exists).
---
Outside diff comments:
In `@dfetch/vcs/git.py`:
- Around line 305-316: The code captures submodules and current SHA before
applying the src remap, causing dependency paths to persist pre-remap locations;
move the src rewrite so it runs before snapshotting submodules and reading HEAD.
Specifically, call self.move_src_folder_up(remote, src) (with the same
parameters) before invoking self.submodules() and before the run_on_cmdline(...
"git", "rev-parse", "HEAD") call so the exported tree and recorded submodule
paths reflect the remapped src layout.
---
Nitpick comments:
In `@dfetch/util/util.py`:
- Around line 45-51: The safe_rm function uses os.path.exists which returns
False for broken symlinks; change the existence check to os.path.lexists(path)
so broken symlinks are detected and removed. In safe_rm (and when calling
safe_rmtree/remove), keep the existing branch logic (os.path.isdir ->
safe_rmtree, else -> os.remove) but gate it on os.path.lexists instead of
os.path.exists so broken symlinks fall through to the os.remove path and are
deleted.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c047622b-f1a4-4b2e-9810-993a1556fbe1
📒 Files selected for processing (16)
CHANGELOG.rstdfetch/log.pydfetch/project/gitsubproject.pydfetch/project/metadata.pydfetch/project/subproject.pydfetch/project/svnsubproject.pydfetch/reporting/stdout_reporter.pydfetch/util/util.pydfetch/vcs/git.pyexample/dfetch.yamlfeatures/fetch-git-repo-with-submodule.featurefeatures/list-projects.featurefeatures/steps/generic_steps.pyfeatures/steps/git_steps.pytests/test_report.pytests/test_subproject.py
07b6553 to
d996bda
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
dfetch/project/subproject.py (1)
151-160: Confusing variable naming:dependencyis a list but named singular, and shadows itself in comprehension.The variable
dependencyholds alist[VcsDependency]but uses a singular name. The list comprehensionfor dependency in dependencythen shadows this variable, which works but is confusing and error-prone.Consider renaming to
vcs_depsorvcs_dependenciesfor clarity, matching the naming pattern used inGitSubProject._fetch_impl.♻️ Suggested fix for clarity
- actually_fetched, dependency = self._fetch_impl(to_fetch) + actually_fetched, vcs_deps = self._fetch_impl(to_fetch) self._log_project(f"Fetched {actually_fetched}") applied_patches = self._apply_patches(patch_count) self.__metadata.fetched( actually_fetched, hash_=hash_directory(self.local_path, skiplist=[self.__metadata.FILENAME]), patch_=applied_patches, - dependencies=[dependency.to_dependency() for dependency in dependency], + dependencies=[dep.to_dependency() for dep in vcs_deps], )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dfetch/project/subproject.py` around lines 151 - 160, The variable named `dependency` is actually a list and is shadowed in the comprehension; rename it to a plural name (e.g., `vcs_dependencies` or `vcs_deps`) where it's returned from _fetch_impl and where it's consumed before calling __metadata.fetched; update the call site that currently reads `dependency` to `vcs_dependencies` and change the list comprehension to `[d.to_dependency() for d in vcs_dependencies]`, ensuring references in the surrounding block (including the unpack from self._fetch_impl(to_fetch) and the dependencies argument in __metadata.fetched) use the new plural name consistently (follow the naming pattern used in GitSubProject._fetch_impl).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dfetch/project/gitsubproject.py`:
- Around line 72-76: The checkout currently preserves ".gitmodules" via the
must_keeps argument in local_repo.checkout_version and then removes only the
top-level ".git", leaving each submodule's .git entries orphaned; modify the
post-checkout cleanup (around the local_repo.checkout_version call and the later
cleanup block that deletes the superproject .git) to recursively find and remove
any nested ".git" files or directories under the checked-out working tree
(including submodule paths) rather than only deleting the top-level .git,
ensuring you still keep ".gitmodules" long enough to initialize submodules
before removing all nested .git metadata.
In `@dfetch/vcs/git.py`:
- Around line 300-306: The submodule enumeration misses nested submodules and
the relative_to call can raise for non-ancestor src; update the call that
gathers submodules to request recursive enumeration (e.g., make
self.submodules() accept/forward a recursive=True flag or call the git command
with --recursive so nested/transitive submodules returned) and change the path
handling in checkout_version()/where Path(submodule.path).relative_to(Path(src))
is used to avoid ValueError by normalizing src to a directory (if src points to
a file, use src.parent) or by computing a safe relative path via
os.path.relpath/Path(...).relative_to inside a try/except fallback—apply these
changes to the functions self.submodules() and checkout_version() so nested
submodules are included and non-ancestor src inputs do not crash.
- Around line 314-317: The code that sets submodule.path using
Path(submodule.path).relative_to(Path(src)) fails for non-directory or glob
`src` values; update the logic in the loop that mutates submodule.path (in
dfetch/vcs/git.py) to mirror move_src_folder_up() behavior: if src is not a
directory (os.path.isdir(src) is False) replace src with its parent/dirname
before creating the Path for relative_to, and fall back to
os.path.relpath(submodule.path, src) or skip relative calculation if relative_to
still raises ValueError so submodules don't crash the checkout.
---
Nitpick comments:
In `@dfetch/project/subproject.py`:
- Around line 151-160: The variable named `dependency` is actually a list and is
shadowed in the comprehension; rename it to a plural name (e.g.,
`vcs_dependencies` or `vcs_deps`) where it's returned from _fetch_impl and where
it's consumed before calling __metadata.fetched; update the call site that
currently reads `dependency` to `vcs_dependencies` and change the list
comprehension to `[d.to_dependency() for d in vcs_dependencies]`, ensuring
references in the surrounding block (including the unpack from
self._fetch_impl(to_fetch) and the dependencies argument in __metadata.fetched)
use the new plural name consistently (follow the naming pattern used in
GitSubProject._fetch_impl).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: faf1d394-d889-40b1-9ec4-222dacf51e84
📒 Files selected for processing (15)
CHANGELOG.rstdfetch/log.pydfetch/project/gitsubproject.pydfetch/project/metadata.pydfetch/project/subproject.pydfetch/project/svnsubproject.pydfetch/reporting/stdout_reporter.pydfetch/util/util.pydfetch/vcs/git.pyfeatures/fetch-git-repo-with-submodule.featurefeatures/list-projects.featurefeatures/steps/generic_steps.pyfeatures/steps/git_steps.pytests/test_report.pytests/test_subproject.py
✅ Files skipped from review due to trivial changes (1)
- CHANGELOG.rst
🚧 Files skipped from review as they are similar to previous changes (5)
- features/fetch-git-repo-with-submodule.feature
- features/steps/git_steps.py
- features/steps/generic_steps.py
- dfetch/util/util.py
- tests/test_subproject.py
d996bda to
e01a9a6
Compare
|
@coderabbitai resume |
✅ Actions performedReviews resumed. |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
dfetch/vcs/git.py (2)
327-331:⚠️ Potential issue | 🟠 MajorResolve
srcagainst the actual kept directory before rewriting submodule paths.This helper only rewrites when
PurePath.match()succeeds at the same path depth. Cases likesrc="src"orsrc="pkg/*"leave nested submodules unchanged, andcheckout_version()still returns those entries after_move_src_folder_up()reshapes the tree.dfetch/project/gitsubproject.py:71-102then records them as fetched dependencies, so metadata/reporting can drift from what is actually present.Read-only verification
#!/bin/bash python - <<'PY' from pathlib import PurePath, Path cases = [ ("src", "src/child"), ("pkg/*", "pkg/dir/child"), ("pkg/Some*", "pkg/SomeDir/child"), ] for src, existing in cases: src_path = PurePath(src) sub_path = PurePath(existing) prefix_len = 0 for part in src_path.parts: if "*" in part: break prefix_len += 1 rewritten = ( str(Path(*sub_path.parts[prefix_len:])) if sub_path.match(str(src_path)) else existing ) print(f"{src!r} + {existing!r} -> {rewritten!r}") PY rg -n -C2 'checkout_version\(|for submodule in submodules|ProjectEntry\(' dfetch/project/gitsubproject.pyAlso applies to: 339-359
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dfetch/vcs/git.py` around lines 327 - 331, The current rewrite loop uses the original src pattern to match submodule.path which fails when src is relative (e.g. "src" or "pkg/*") and the tree has been reshaped by _move_src_folder_up; fix by first resolving the src pattern against the actual kept-directory layout and then using that resolved pattern for matching and slicing. Concretely: compute a resolved_src (as a PurePath) by applying the same rewrite/kept-folder transformation you apply later (use _rewrite_path or the same logic that _move_src_folder_up uses) and then, in the loop over submodules, run submodule_path.match(str(resolved_src)) and slice parts relative to resolved_src when setting submodule.path; keep usages of _rewrite_path, _move_src_folder_up, submodule.path and checkout_version consistent so rewritten submodules match what _move_src_folder_up produces.
393-402:⚠️ Potential issue | 🟠 MajorOnly delete the matched
srcdirectories, not their shared parent.
PurePath(src_dir_path).parts[0]is just the first path component. For a nestedsrclikefoo/bar, this removesfoo/and any unrelated siblings under it after the move.Suggested cleanup
- for src_dir_path in unique_dirs[:1]: - try: - for file_to_copy in os.listdir(src_dir_path): - shutil.move(src_dir_path + "/" + file_to_copy, ".") - safe_rmtree(PurePath(src_dir_path).parts[0]) - except FileNotFoundError: - logger.warning( - f"The 'src:' filter '{src_dir_path}' didn't match any files from '{remote}'" - ) - continue + if not unique_dirs: + return + + chosen_dir = unique_dirs[0] + try: + for file_to_copy in os.listdir(chosen_dir): + shutil.move(os.path.join(chosen_dir, file_to_copy), ".") + safe_rm(unique_dirs) + except FileNotFoundError: + logger.warning( + f"The 'src:' filter '{chosen_dir}' didn't match any files from '{remote}'" + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dfetch/vcs/git.py` around lines 393 - 402, Loop currently deletes the first path component via PurePath(src_dir_path).parts[0], which can remove a parent and unrelated siblings; change the cleanup to only remove the matched directory itself by calling safe_rmtree(src_dir_path) (or an equivalent that accepts the full path) after moving files from src_dir_path. Update the block that iterates unique_dirs[:1] so safe_rmtree is invoked with src_dir_path (not PurePath(...).parts[0]) and keep the existing FileNotFoundError handling.
🧹 Nitpick comments (2)
pyproject.toml (1)
124-124: Align Black target with declared minimum Python support (3.10).Black is configured to target
py313while the project declares support for Python 3.10+. The CI matrix inrun.ymlactively tests against Python 3.10, so Black should target the minimum supported version to ensure consistent formatting guardrails across all supported versions.Suggested config change
[tool.black] extend-exclude = "sphinxcontrib_asciinema" -target-version = ["py313"] +target-version = ["py310"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pyproject.toml` at line 124, Black is configured to target py313 via the setting target-version = ["py313"]; update this to target the project's declared minimum Python version by changing the Black configuration to target py310 (i.e., replace the "py313" entry with "py310") so formatting rules match the lowest supported runtime used in CI and the project metadata.dfetch/project/subproject.py (1)
151-160: Renamedependencytodependenciesfor clarity.The variable is a list but uses singular naming, and the list comprehension shadows the outer variable (
for dependency in dependency), which is confusing.♻️ Suggested improvement
- actually_fetched, dependency = self._fetch_impl(to_fetch) + actually_fetched, dependencies = self._fetch_impl(to_fetch) self._log_project(f"Fetched {actually_fetched}") applied_patches = self._apply_patches(patch_count) self.__metadata.fetched( actually_fetched, hash_=hash_directory(self.local_path, skiplist=[self.__metadata.FILENAME]), patch_=applied_patches, - dependencies=[dependency.to_dependency() for dependency in dependency], + dependencies=[dep.to_dependency() for dep in dependencies], )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dfetch/project/subproject.py` around lines 151 - 160, The code mistakenly names a list variable "dependency" and then shadows it in the comprehension; change the tuple returned from self._fetch_impl(to_fetch) to (actually_fetched, dependencies) and use that plural name consistently (e.g., call self._apply_patches as before, then call self.__metadata.fetched(..., dependencies=[dep.to_dependency() for dep in dependencies]) ), updating any other references to the old singular name (and keeping hash_directory(self.local_path, skiplist=[self.__metadata.FILENAME]) unchanged); this removes the shadowing and makes it clear the value is a list.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dfetch/util/util.py`:
- Around line 45-57: The cwd-safety check in safe_rm is incorrect for absolute
inputs; update safe_rm to resolve/normalize each path to an absolute Path before
checking and use Path.cwd() as the base for is_relative_to: for each path in
safe_rm, call Path(path).resolve() (or .absolute()) and then verify
resolved_path.is_relative_to(Path.cwd()) before deleting (and continue to call
safe_rmtree or os.remove using the original or resolved path as appropriate).
---
Duplicate comments:
In `@dfetch/vcs/git.py`:
- Around line 327-331: The current rewrite loop uses the original src pattern to
match submodule.path which fails when src is relative (e.g. "src" or "pkg/*")
and the tree has been reshaped by _move_src_folder_up; fix by first resolving
the src pattern against the actual kept-directory layout and then using that
resolved pattern for matching and slicing. Concretely: compute a resolved_src
(as a PurePath) by applying the same rewrite/kept-folder transformation you
apply later (use _rewrite_path or the same logic that _move_src_folder_up uses)
and then, in the loop over submodules, run
submodule_path.match(str(resolved_src)) and slice parts relative to resolved_src
when setting submodule.path; keep usages of _rewrite_path, _move_src_folder_up,
submodule.path and checkout_version consistent so rewritten submodules match
what _move_src_folder_up produces.
- Around line 393-402: Loop currently deletes the first path component via
PurePath(src_dir_path).parts[0], which can remove a parent and unrelated
siblings; change the cleanup to only remove the matched directory itself by
calling safe_rmtree(src_dir_path) (or an equivalent that accepts the full path)
after moving files from src_dir_path. Update the block that iterates
unique_dirs[:1] so safe_rmtree is invoked with src_dir_path (not
PurePath(...).parts[0]) and keep the existing FileNotFoundError handling.
---
Nitpick comments:
In `@dfetch/project/subproject.py`:
- Around line 151-160: The code mistakenly names a list variable "dependency"
and then shadows it in the comprehension; change the tuple returned from
self._fetch_impl(to_fetch) to (actually_fetched, dependencies) and use that
plural name consistently (e.g., call self._apply_patches as before, then call
self.__metadata.fetched(..., dependencies=[dep.to_dependency() for dep in
dependencies]) ), updating any other references to the old singular name (and
keeping hash_directory(self.local_path, skiplist=[self.__metadata.FILENAME])
unchanged); this removes the shadowing and makes it clear the value is a list.
In `@pyproject.toml`:
- Line 124: Black is configured to target py313 via the setting target-version =
["py313"]; update this to target the project's declared minimum Python version
by changing the Black configuration to target py310 (i.e., replace the "py313"
entry with "py310") so formatting rules match the lowest supported runtime used
in CI and the project metadata.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0adcf33d-8672-4206-9af5-ca6958e30891
📒 Files selected for processing (21)
.github/workflows/release.ymlCHANGELOG.rstdfetch/log.pydfetch/manifest/parse.pydfetch/project/gitsubproject.pydfetch/project/metadata.pydfetch/project/subproject.pydfetch/project/svnsubproject.pydfetch/reporting/stdout_reporter.pydfetch/util/util.pydfetch/vcs/git.pyexample/dfetch.yamlfeatures/environment.pyfeatures/fetch-file-pattern-git.featurefeatures/fetch-git-repo-with-submodule.featurefeatures/list-projects.featurefeatures/steps/generic_steps.pyfeatures/steps/git_steps.pypyproject.tomltests/test_report.pytests/test_subproject.py
✅ Files skipped from review due to trivial changes (2)
- .github/workflows/release.yml
- CHANGELOG.rst
🚧 Files skipped from review as they are similar to previous changes (4)
- dfetch/reporting/stdout_reporter.py
- tests/test_subproject.py
- features/steps/git_steps.py
- dfetch/log.py
a0a5aca to
6cac513
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dfetch/util/util.py`:
- Around line 147-152: The code currently treats symlinks like their targets
because it uses os.path.realpath()/os.path.isdir() after os.path.lexists(),
causing directory symlinks to be passed into safe_rmtree() instead of being
removed as symlinks. Update the removal logic in the block that checks
os.path.lexists(path) (used by safe_rm/prune_files_by_pattern) to first detect
os.path.islink(path) and, if true, remove the link itself (os.unlink or
os.remove) without resolving; only if not a symlink then call
check_no_path_traversal(path, base) and branch to os.path.isdir(path) →
safe_rmtree(...) or os.remove(...) as appropriate, so directory symlinks are not
rmtree'd.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: eeb3a053-6fff-41dc-8f26-ecdf998313eb
📒 Files selected for processing (22)
CHANGELOG.rstdfetch/log.pydfetch/manifest/parse.pydfetch/project/archivesubproject.pydfetch/project/gitsubproject.pydfetch/project/metadata.pydfetch/project/subproject.pydfetch/project/svnsubproject.pydfetch/reporting/stdout_reporter.pydfetch/util/util.pydfetch/vcs/git.pyexample/dfetch.yamlfeatures/environment.pyfeatures/fetch-file-pattern-git.featurefeatures/fetch-git-repo-with-submodule.featurefeatures/list-projects.featurefeatures/steps/generic_steps.pyfeatures/steps/git_steps.pypyproject.tomltests/manifest_mock.pytests/test_report.pytests/test_subproject.py
✅ Files skipped from review due to trivial changes (7)
- pyproject.toml
- dfetch/log.py
- CHANGELOG.rst
- features/list-projects.feature
- example/dfetch.yaml
- features/fetch-git-repo-with-submodule.feature
- features/fetch-file-pattern-git.feature
🚧 Files skipped from review as they are similar to previous changes (9)
- features/environment.py
- dfetch/manifest/parse.py
- features/steps/generic_steps.py
- tests/test_report.py
- dfetch/reporting/stdout_reporter.py
- dfetch/project/gitsubproject.py
- features/steps/git_steps.py
- tests/test_subproject.py
- dfetch/vcs/git.py
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
dfetch/vcs/git.py (2)
393-397:⚠️ Potential issue | 🟠 MajorDelete the processed
srcdirectory, not its top-level ancestor.For nested selections like
src="pkg/subdir",PurePath(src_dir_path).parts[0]becomespkg, so this cleanup can remove sibling content that was never moved. The delete target should besrc_dir_pathitself.🛠️ Suggested fix
- safe_rm(PurePath(src_dir_path).parts[0]) + safe_rm(src_dir_path)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dfetch/vcs/git.py` around lines 393 - 397, The loop over unique_dirs currently calls safe_rm(PurePath(src_dir_path).parts[0]) which removes the top-level ancestor (e.g., "pkg") and can delete siblings; change the cleanup to remove the processed directory itself by calling safe_rm on src_dir_path (or its normalized path) after moving files; update the call site near unique_dirs, src_dir_path and safe_rm in git.py to use src_dir_path as the delete target instead of PurePath(...).parts[0].
340-359:⚠️ Potential issue | 🟠 Major
_rewrite_path()still misses commonsrcpatterns.
PurePath.match()is right-anchored for relative patterns, so values likesrc="pkg"and nested matches underpkg/*usually leaveexisting_pathunchanged. After_move_src_folder_up(), that stale path is what the caller persists as the submodule destination. Use the same resolved base directory that_move_src_folder_up()selects, then compute the rewritten path from that base instead of relying onPurePath.match().🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dfetch/vcs/git.py` around lines 340 - 359, The _rewrite_path function fails to handle src patterns that lack explicit wildcards (e.g., "pkg") because PurePath.match() is right-anchored; instead, resolve the same base directory that _move_src_folder_up uses and compute the rewritten path relative to that base. Change _rewrite_path to derive base_dir by taking src up to the first wildcard (or the whole src when no wildcard), normalize it (Path(...).as_posix()), and then if existing_path starts with that base_dir (or base_dir + separator) return the remainder (Path(*Path(existing_path).parts[len(base_dir.parts):]) as string); otherwise return existing_path. Ensure the logic mirrors the base-resolution used in _move_src_folder_up so nested matches like "pkg" correctly map to "pkg/sub/..." destinations.dfetch/util/util.py (1)
137-155:⚠️ Potential issue | 🟠 MajorPreserve the
withinguard when unlinking symlinks.The symlink branch skips
check_no_path_traversal(), sosafe_rm()can still remove an out-of-scope symlink path even whenwithinis set. Validate the link's parent againstbasebefore unlinking it.🛠️ Suggested fix
def safe_rm( paths: str | Path | Sequence[str | Path], within: str | Path = ".", ) -> None: """Delete a file, directory or list of files/directories safely.""" base = Path(within).resolve() paths_to_remove = ( [paths] if isinstance(paths, str) or not isinstance(paths, Sequence) else paths ) for path in paths_to_remove: - if os.path.lexists(path): - if os.path.islink(path): - os.unlink(path) + candidate = Path(path) + if os.path.lexists(candidate): + if os.path.islink(candidate): + check_no_path_traversal(candidate.parent, base) + candidate.unlink() else: - check_no_path_traversal(path, base) - if os.path.isdir(path): - _safe_rmtree(str(path)) + check_no_path_traversal(candidate, base) + if candidate.is_dir(): + _safe_rmtree(str(candidate)) else: - os.remove(path) + candidate.unlink()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dfetch/util/util.py` around lines 137 - 155, In safe_rm, the symlink branch currently calls os.unlink(path) without enforcing the within guard; before unlinking you must validate the symlink location against base by calling check_no_path_traversal(path, base) (or check the symlink's parent against base) so unlinking a symlink outside the allowed `within` is prevented; update the os.path.islink(path) branch in safe_rm to run check_no_path_traversal(path, base) prior to os.unlink(path) (keeping existing behavior for directories/files unchanged).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dfetch/vcs/git.py`:
- Around line 333-336: After pruning ignored paths with
safe_rm(glob.glob(ignore_path)), re-filter the in-memory submodules list so any
submodule whose repository-root path was removed by the ignore step is dropped
before returning/using it (this prevents checkout_version() from receiving stale
entries). Concretely, after the loop that calls safe_rm, replace or filter the
submodules variable (the list named submodules) to only include entries whose
repo-root/path still exists on disk (check the appropriate attribute on each
submodule object, e.g., path or repo_root, with os.path.exists) so callers like
checkout_version() only see surviving submodules.
---
Duplicate comments:
In `@dfetch/util/util.py`:
- Around line 137-155: In safe_rm, the symlink branch currently calls
os.unlink(path) without enforcing the within guard; before unlinking you must
validate the symlink location against base by calling
check_no_path_traversal(path, base) (or check the symlink's parent against base)
so unlinking a symlink outside the allowed `within` is prevented; update the
os.path.islink(path) branch in safe_rm to run check_no_path_traversal(path,
base) prior to os.unlink(path) (keeping existing behavior for directories/files
unchanged).
In `@dfetch/vcs/git.py`:
- Around line 393-397: The loop over unique_dirs currently calls
safe_rm(PurePath(src_dir_path).parts[0]) which removes the top-level ancestor
(e.g., "pkg") and can delete siblings; change the cleanup to remove the
processed directory itself by calling safe_rm on src_dir_path (or its normalized
path) after moving files; update the call site near unique_dirs, src_dir_path
and safe_rm in git.py to use src_dir_path as the delete target instead of
PurePath(...).parts[0].
- Around line 340-359: The _rewrite_path function fails to handle src patterns
that lack explicit wildcards (e.g., "pkg") because PurePath.match() is
right-anchored; instead, resolve the same base directory that
_move_src_folder_up uses and compute the rewritten path relative to that base.
Change _rewrite_path to derive base_dir by taking src up to the first wildcard
(or the whole src when no wildcard), normalize it (Path(...).as_posix()), and
then if existing_path starts with that base_dir (or base_dir + separator) return
the remainder (Path(*Path(existing_path).parts[len(base_dir.parts):]) as
string); otherwise return existing_path. Ensure the logic mirrors the
base-resolution used in _move_src_folder_up so nested matches like "pkg"
correctly map to "pkg/sub/..." destinations.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 497b9565-5676-4513-9fb9-989dd4e1cd1c
📒 Files selected for processing (3)
dfetch/util/util.pydfetch/vcs/git.pyfeatures/environment.py
🚧 Files skipped from review as they are similar to previous changes (1)
- features/environment.py
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
dfetch/project/subproject.py (1)
132-148: Correct propagation of dependencies to metadata.The
update()method properly unpacks the fetch result and passes dependencies tometadata.fetched().Minor observation:
list(dependency)on line 148 is redundant sincedependencyis already typed aslist[Dependency]from the_fetch_implreturn. This is harmless but could be simplified to justdependencies=dependency.♻️ Optional simplification
self.__metadata.fetched( actually_fetched, hash_=hash_directory( self.local_path, skiplist=[self.__metadata.FILENAME] + post_fetch_ignored, ), patch_=applied_patches, - dependencies=list(dependency), + dependencies=dependency, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dfetch/project/subproject.py` around lines 132 - 148, The code redundantly wraps the already-typed list variable dependency when calling self.__metadata.fetched; in the update/_fetch flow replace list(dependency) with dependency so the call becomes self.__metadata.fetched(..., dependencies=dependency) (locate the call to self.__metadata.fetched and the variable dependency returned from _fetch_impl to make this change).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dfetch/vcs/git.py`:
- Around line 379-388: The call to safe_rm that currently passes
PurePath(src_dir_path).parts[0] should include the within parameter to constrain
deletion to the current working directory; update the code inside the
unique_dirs loop (where src_dir_path is used) to call
safe_rm(PurePath(src_dir_path).parts[0], within="." ) (or within=os.getcwd()) so
the removal is limited to the repo cwd, keeping behavior consistent with other
safe_rm uses and preventing accidental deletion outside the working directory.
---
Nitpick comments:
In `@dfetch/project/subproject.py`:
- Around line 132-148: The code redundantly wraps the already-typed list
variable dependency when calling self.__metadata.fetched; in the update/_fetch
flow replace list(dependency) with dependency so the call becomes
self.__metadata.fetched(..., dependencies=dependency) (locate the call to
self.__metadata.fetched and the variable dependency returned from _fetch_impl to
make this change).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: fdf2df17-2746-4064-a8ce-27d1815b231b
📒 Files selected for processing (12)
dfetch/commands/update.pydfetch/project/archivesubproject.pydfetch/project/gitsubproject.pydfetch/project/subproject.pydfetch/project/svnsubproject.pydfetch/reporting/stdout_reporter.pydfetch/util/util.pydfetch/vcs/git.pydoc/internal.rstfeatures/fetch-git-repo-with-submodule.featuretests/test_subproject.pytests/test_util.py
💤 Files with no reviewable changes (1)
- doc/internal.rst
✅ Files skipped from review due to trivial changes (2)
- dfetch/commands/update.py
- features/fetch-git-repo-with-submodule.feature
🚧 Files skipped from review as they are similar to previous changes (2)
- dfetch/project/archivesubproject.py
- tests/test_subproject.py
| for src_dir_path in unique_dirs[:1]: | ||
| try: | ||
| for file_to_copy in os.listdir(src_dir_path): | ||
| shutil.move(src_dir_path + "/" + file_to_copy, ".") | ||
| safe_rm(PurePath(src_dir_path).parts[0]) | ||
| except FileNotFoundError: | ||
| logger.warning( | ||
| f"The 'src:' filter '{src_dir_path}' didn't match any files from '{remote}'" | ||
| ) | ||
| continue |
There was a problem hiding this comment.
Missing within parameter in safe_rm call.
On line 383, safe_rm(PurePath(src_dir_path).parts[0]) is called without the within parameter. Since this deletes the first path component of the matched source directory, it should be constrained to the current working directory for safety consistency with other safe_rm calls in this file.
🛡️ Proposed fix
- safe_rm(PurePath(src_dir_path).parts[0])
+ safe_rm(PurePath(src_dir_path).parts[0], within=".")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dfetch/vcs/git.py` around lines 379 - 388, The call to safe_rm that currently
passes PurePath(src_dir_path).parts[0] should include the within parameter to
constrain deletion to the current working directory; update the code inside the
unique_dirs loop (where src_dir_path is used) to call
safe_rm(PurePath(src_dir_path).parts[0], within="." ) (or within=os.getcwd()) so
the removal is limited to the repo cwd, keeping behavior consistent with other
safe_rm uses and preventing accidental deletion outside the working directory.
Summary by CodeRabbit
New Features
Improvements
Tests / Examples