Skip to content

Fetch git submodule during update of git subproject#1017

Merged
spoorcc merged 15 commits intomainfrom
spoorcc/issue1013
Mar 23, 2026
Merged

Fetch git submodule during update of git subproject#1017
spoorcc merged 15 commits intomainfrom
spoorcc/issue1013

Conversation

@spoorcc
Copy link
Copy Markdown
Contributor

@spoorcc spoorcc commented Feb 23, 2026

Summary by CodeRabbit

  • New Features

    • Git submodule support: detects and fetches submodules (including nested) at pinned revisions and records them as dependencies in project metadata and reports
  • Improvements

    • Report output reformatted to a YAML-like structure and now lists dependency details (path, url, branch, tag, revision, source-type)
    • Better handling of src paths with spaces and warnings when a src glob matches multiple directories
    • Changelog updated for Release 0.13.0
  • Tests / Examples

    • Added end-to-end feature tests and an example manifest entry for submodule workflows

@spoorcc spoorcc added the enhancement New feature or request label Feb 23, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 23, 2026

Walkthrough

Adds Git submodule discovery and recording of submodule dependencies in project metadata; changes subproject fetch APIs to return dependency lists; updates Git handling, reporting output, and filesystem utilities; expands BDD and unit tests and example manifest; updates changelog and formatting config.

Changes

Cohort / File(s) Summary
Metadata & Dependency Types
dfetch/project/metadata.py
Add Dependency TypedDict; persist dependencies in Metadata; extend fetched(...) signature, equality and dump behavior.
Subproject API & Implementations
dfetch/project/subproject.py, dfetch/project/gitsubproject.py, dfetch/project/svnsubproject.py, dfetch/project/archivesubproject.py
Change _fetch_impl to return tuple[Version, list[Dependency]]; propagate dependency lists through update flow and tests.
Git implementation & submodule handling
dfetch/vcs/git.py
Convert Submodule to dataclass; checkout_version() returns (sha, list[Submodule]); add sparse-checkout, src/ignore handling, submodule path rewriting, glob support, and safer cleanup.
Reporting / Logging
dfetch/reporting/stdout_reporter.py, dfetch/log.py
Reformat reporter output to YAML-like mapping; emit dependencies section when present; DLogger.print_info_line omits value line when info falsy.
Filesystem utilities
dfetch/util/util.py
safe_rm accepts single or sequence with within boundary, safety checks and type-specific removals; rename safe_rmtree_safe_rmtree; add strip_glob_prefix.
Tests, BDD features & Steps
features/*, features/steps/*, tests/*, features/environment.py, tests/manifest_mock.py
Add submodule-focused feature scenarios and step(s), update step implementations, assertions and mocks to new return types and logging calls, add filesystem-exists step and file-protocol allowance.
Manifest parsing, examples & docs
dfetch/manifest/parse.py, example/dfetch.yaml, CHANGELOG.rst, dfetch/commands/update.py, pyproject.toml, doc/internal.rst
Gracefully skip missing sub-manifests, add example TF-PSA-Crypto project, update changelog for 0.13.0, add docs note for submodule behavior, and set Black target-version = ["py313"].

Sequence Diagram(s)

sequenceDiagram
    participant Client as dfetch Client
    participant SubProj as SubProject.update()
    participant GitSub as GitSubProject._fetch_impl()
    participant GitRepo as GitLocalRepo
    participant Metadata as Metadata.fetched()

    Client->>SubProj: update(version)
    SubProj->>GitSub: _fetch_impl(version)
    GitSub->>GitRepo: checkout_version()
    GitRepo->>GitRepo: git submodule update --init --recursive
    GitRepo-->>GitSub: (sha, list[Submodule])
    GitSub->>GitSub: build list[Dependency]
    GitSub-->>SubProj: (Version, list[Dependency])
    SubProj->>Metadata: fetched(version, dependencies=[...])
    Metadata->>Metadata: store dependencies
    SubProj-->>Client: update complete
Loading
sequenceDiagram
    participant Reporter as stdout_reporter
    participant DLogger as DLogger
    participant Metadata as Metadata

    Reporter->>DLogger: print_info_line(project_name, "")
    DLogger-->>Reporter: project header
    Reporter->>Metadata: get metadata
    Metadata-->>Reporter: metadata + dependencies
    Reporter->>Reporter: print remote, branch, tag, revision...
    alt has dependencies
        Reporter->>Reporter: print "dependencies:" and per-dependency fields
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • dfetch-org/dfetch#960 — overlaps changes to logger printing/formatting and related test updates.
  • dfetch-org/dfetch#931 — overlaps Metadata.fetched signature and metadata shape changes.
  • dfetch-org/dfetch#1058 — touches logging and git subproject fetching/cleanup behavior similar to this PR.

Suggested labels

documentation, testing

Suggested reviewers

  • ben-edna
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: enabling git submodule fetching during git subproject updates, which is central to this PR's scope.
Docstring Coverage ✅ Passed Docstring coverage is 81.03% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch spoorcc/issue1013

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 26, 2026

Note

Unit test generation is a public access feature. Expect some limitations and changes as we gather feedback and continue to improve it.


Generating unit tests... This may take up to 20 minutes.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 26, 2026

✅ Created PR with unit tests: #1025

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
dfetch/vcs/git.py (1)

305-316: ⚠️ Potential issue | 🟠 Major

Apply the src: remap before persisting dependency paths.

Line 305 snapshots submodules before Lines 313-314 rewrite the checkout. If src: vendor/lib contains vendor/lib/ext/zlib, the exported tree ends up with ext/zlib but the dependency metadata still stores vendor/lib/ext/zlib; submodules outside src are also reported even though they were removed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dfetch/vcs/git.py` around lines 305 - 316, The code captures submodules and
current SHA before applying the src remap, causing dependency paths to persist
pre-remap locations; move the src rewrite so it runs before snapshotting
submodules and reading HEAD. Specifically, call self.move_src_folder_up(remote,
src) (with the same parameters) before invoking self.submodules() and before the
run_on_cmdline(... "git", "rev-parse", "HEAD") call so the exported tree and
recorded submodule paths reflect the remapped src layout.
🧹 Nitpick comments (1)
dfetch/util/util.py (1)

45-51: Broken symlinks won't be removed.

The use of os.path.exists() returns False for broken symlinks (since it follows the link to check target existence). If the intent is to remove the path regardless of whether it's a valid symlink, consider using os.path.lexists() instead, which returns True for broken symlinks.

If broken symlinks should be silently ignored, the current implementation is correct.

♻️ Optional fix if broken symlinks should be removed
 def safe_rm(path: str | Path) -> None:
     """Delete an file or directory safely."""
-    if os.path.exists(path):
+    if os.path.lexists(path):
         if os.path.isdir(path):
             safe_rmtree(str(path))
         else:
             os.remove(path)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dfetch/util/util.py` around lines 45 - 51, The safe_rm function uses
os.path.exists which returns False for broken symlinks; change the existence
check to os.path.lexists(path) so broken symlinks are detected and removed. In
safe_rm (and when calling safe_rmtree/remove), keep the existing branch logic
(os.path.isdir -> safe_rmtree, else -> os.remove) but gate it on os.path.lexists
instead of os.path.exists so broken symlinks fall through to the os.remove path
and are deleted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CHANGELOG.rst`:
- Line 5: The changelog entry "* Fetch git submodules in git subproject at
pinned revision (`#1015`)" references the wrong issue number; verify whether the
correct reference is `#1017` (per PR metadata) or `#1013` (per commit summary) and
update that parenthetical to the confirmed issue number so the CHANGELOG.rst
entry matches the PR and commit message.

In `@dfetch/util/util.py`:
- Line 46: Fix the minor typo in the docstring that currently reads "Delete an
file or directory safely." by changing it to "Delete a file or directory
safely."; update the docstring text in dfetch/util/util.py where that string
appears so the grammar is correct.

In `@dfetch/vcs/git.py`:
- Around line 299-305: In submodules() fix nested-submodule resolution: stop
using sm_path and the single cached urls map; instead parse each toplevel's
.gitmodules to build a fresh urls mapping per toplevel, use the $displaypath
value (not $sm_path) when emitting/looking up submodule entries and when
constructing full paths, and ensure url lookups use the corresponding
toplevel-specific map so names/URLs come from the same .gitmodules file.

In `@features/fetch-git-repo-with-submodule.feature`:
- Line 70: The expected "last fetch" value in
features/fetch-git-repo-with-submodule.feature is hardcoded to "26/02/2026,
20:28:24" and must be aligned with the suite's fixed clock; update the "last
fetch" expectation (the line containing the literal last fetch timestamp) to use
the shared fixed timestamp "02/07/2021, 20:25:56" (the same value asserted in
features/list-projects.feature) so the scenario uses the common clock fixture.

In `@features/steps/generic_steps.py`:
- Around line 333-335: The step function step_impl for the "{name} exists" step
currently delegates to check_file_exists which uses os.path.isfile() and only
accepts regular files; update the behavior to check for any filesystem entry by
either (a) changing check_file_exists to use os.path.exists() instead of
os.path.isfile(), or (b) create/rename a helper like check_path_exists that uses
os.path.exists() and call that from step_impl so the step truly asserts path
existence (reference: step_impl and check_file_exists).

---

Outside diff comments:
In `@dfetch/vcs/git.py`:
- Around line 305-316: The code captures submodules and current SHA before
applying the src remap, causing dependency paths to persist pre-remap locations;
move the src rewrite so it runs before snapshotting submodules and reading HEAD.
Specifically, call self.move_src_folder_up(remote, src) (with the same
parameters) before invoking self.submodules() and before the run_on_cmdline(...
"git", "rev-parse", "HEAD") call so the exported tree and recorded submodule
paths reflect the remapped src layout.

---

Nitpick comments:
In `@dfetch/util/util.py`:
- Around line 45-51: The safe_rm function uses os.path.exists which returns
False for broken symlinks; change the existence check to os.path.lexists(path)
so broken symlinks are detected and removed. In safe_rm (and when calling
safe_rmtree/remove), keep the existing branch logic (os.path.isdir ->
safe_rmtree, else -> os.remove) but gate it on os.path.lexists instead of
os.path.exists so broken symlinks fall through to the os.remove path and are
deleted.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c047622b-f1a4-4b2e-9810-993a1556fbe1

📥 Commits

Reviewing files that changed from the base of the PR and between 44290f0 and 07b6553.

📒 Files selected for processing (16)
  • CHANGELOG.rst
  • dfetch/log.py
  • dfetch/project/gitsubproject.py
  • dfetch/project/metadata.py
  • dfetch/project/subproject.py
  • dfetch/project/svnsubproject.py
  • dfetch/reporting/stdout_reporter.py
  • dfetch/util/util.py
  • dfetch/vcs/git.py
  • example/dfetch.yaml
  • features/fetch-git-repo-with-submodule.feature
  • features/list-projects.feature
  • features/steps/generic_steps.py
  • features/steps/git_steps.py
  • tests/test_report.py
  • tests/test_subproject.py

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
dfetch/project/subproject.py (1)

151-160: Confusing variable naming: dependency is a list but named singular, and shadows itself in comprehension.

The variable dependency holds a list[VcsDependency] but uses a singular name. The list comprehension for dependency in dependency then shadows this variable, which works but is confusing and error-prone.

Consider renaming to vcs_deps or vcs_dependencies for clarity, matching the naming pattern used in GitSubProject._fetch_impl.

♻️ Suggested fix for clarity
-            actually_fetched, dependency = self._fetch_impl(to_fetch)
+            actually_fetched, vcs_deps = self._fetch_impl(to_fetch)
         self._log_project(f"Fetched {actually_fetched}")

         applied_patches = self._apply_patches(patch_count)

         self.__metadata.fetched(
             actually_fetched,
             hash_=hash_directory(self.local_path, skiplist=[self.__metadata.FILENAME]),
             patch_=applied_patches,
-            dependencies=[dependency.to_dependency() for dependency in dependency],
+            dependencies=[dep.to_dependency() for dep in vcs_deps],
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dfetch/project/subproject.py` around lines 151 - 160, The variable named
`dependency` is actually a list and is shadowed in the comprehension; rename it
to a plural name (e.g., `vcs_dependencies` or `vcs_deps`) where it's returned
from _fetch_impl and where it's consumed before calling __metadata.fetched;
update the call site that currently reads `dependency` to `vcs_dependencies` and
change the list comprehension to `[d.to_dependency() for d in
vcs_dependencies]`, ensuring references in the surrounding block (including the
unpack from self._fetch_impl(to_fetch) and the dependencies argument in
__metadata.fetched) use the new plural name consistently (follow the naming
pattern used in GitSubProject._fetch_impl).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dfetch/project/gitsubproject.py`:
- Around line 72-76: The checkout currently preserves ".gitmodules" via the
must_keeps argument in local_repo.checkout_version and then removes only the
top-level ".git", leaving each submodule's .git entries orphaned; modify the
post-checkout cleanup (around the local_repo.checkout_version call and the later
cleanup block that deletes the superproject .git) to recursively find and remove
any nested ".git" files or directories under the checked-out working tree
(including submodule paths) rather than only deleting the top-level .git,
ensuring you still keep ".gitmodules" long enough to initialize submodules
before removing all nested .git metadata.

In `@dfetch/vcs/git.py`:
- Around line 300-306: The submodule enumeration misses nested submodules and
the relative_to call can raise for non-ancestor src; update the call that
gathers submodules to request recursive enumeration (e.g., make
self.submodules() accept/forward a recursive=True flag or call the git command
with --recursive so nested/transitive submodules returned) and change the path
handling in checkout_version()/where Path(submodule.path).relative_to(Path(src))
is used to avoid ValueError by normalizing src to a directory (if src points to
a file, use src.parent) or by computing a safe relative path via
os.path.relpath/Path(...).relative_to inside a try/except fallback—apply these
changes to the functions self.submodules() and checkout_version() so nested
submodules are included and non-ancestor src inputs do not crash.
- Around line 314-317: The code that sets submodule.path using
Path(submodule.path).relative_to(Path(src)) fails for non-directory or glob
`src` values; update the logic in the loop that mutates submodule.path (in
dfetch/vcs/git.py) to mirror move_src_folder_up() behavior: if src is not a
directory (os.path.isdir(src) is False) replace src with its parent/dirname
before creating the Path for relative_to, and fall back to
os.path.relpath(submodule.path, src) or skip relative calculation if relative_to
still raises ValueError so submodules don't crash the checkout.

---

Nitpick comments:
In `@dfetch/project/subproject.py`:
- Around line 151-160: The variable named `dependency` is actually a list and is
shadowed in the comprehension; rename it to a plural name (e.g.,
`vcs_dependencies` or `vcs_deps`) where it's returned from _fetch_impl and where
it's consumed before calling __metadata.fetched; update the call site that
currently reads `dependency` to `vcs_dependencies` and change the list
comprehension to `[d.to_dependency() for d in vcs_dependencies]`, ensuring
references in the surrounding block (including the unpack from
self._fetch_impl(to_fetch) and the dependencies argument in __metadata.fetched)
use the new plural name consistently (follow the naming pattern used in
GitSubProject._fetch_impl).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: faf1d394-d889-40b1-9ec4-222dacf51e84

📥 Commits

Reviewing files that changed from the base of the PR and between 07b6553 and d996bda.

📒 Files selected for processing (15)
  • CHANGELOG.rst
  • dfetch/log.py
  • dfetch/project/gitsubproject.py
  • dfetch/project/metadata.py
  • dfetch/project/subproject.py
  • dfetch/project/svnsubproject.py
  • dfetch/reporting/stdout_reporter.py
  • dfetch/util/util.py
  • dfetch/vcs/git.py
  • features/fetch-git-repo-with-submodule.feature
  • features/list-projects.feature
  • features/steps/generic_steps.py
  • features/steps/git_steps.py
  • tests/test_report.py
  • tests/test_subproject.py
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.rst
🚧 Files skipped from review as they are similar to previous changes (5)
  • features/fetch-git-repo-with-submodule.feature
  • features/steps/git_steps.py
  • features/steps/generic_steps.py
  • dfetch/util/util.py
  • tests/test_subproject.py

@spoorcc spoorcc force-pushed the spoorcc/issue1013 branch from d996bda to e01a9a6 Compare March 13, 2026 19:57
@spoorcc
Copy link
Copy Markdown
Contributor Author

spoorcc commented Mar 18, 2026

@coderabbitai resume

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 18, 2026

✅ Actions performed

Reviews resumed.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
dfetch/vcs/git.py (2)

327-331: ⚠️ Potential issue | 🟠 Major

Resolve src against the actual kept directory before rewriting submodule paths.

This helper only rewrites when PurePath.match() succeeds at the same path depth. Cases like src="src" or src="pkg/*" leave nested submodules unchanged, and checkout_version() still returns those entries after _move_src_folder_up() reshapes the tree. dfetch/project/gitsubproject.py:71-102 then records them as fetched dependencies, so metadata/reporting can drift from what is actually present.

Read-only verification
#!/bin/bash
python - <<'PY'
from pathlib import PurePath, Path

cases = [
    ("src", "src/child"),
    ("pkg/*", "pkg/dir/child"),
    ("pkg/Some*", "pkg/SomeDir/child"),
]

for src, existing in cases:
    src_path = PurePath(src)
    sub_path = PurePath(existing)

    prefix_len = 0
    for part in src_path.parts:
        if "*" in part:
            break
        prefix_len += 1

    rewritten = (
        str(Path(*sub_path.parts[prefix_len:]))
        if sub_path.match(str(src_path))
        else existing
    )
    print(f"{src!r} + {existing!r} -> {rewritten!r}")
PY

rg -n -C2 'checkout_version\(|for submodule in submodules|ProjectEntry\(' dfetch/project/gitsubproject.py

Also applies to: 339-359

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dfetch/vcs/git.py` around lines 327 - 331, The current rewrite loop uses the
original src pattern to match submodule.path which fails when src is relative
(e.g. "src" or "pkg/*") and the tree has been reshaped by _move_src_folder_up;
fix by first resolving the src pattern against the actual kept-directory layout
and then using that resolved pattern for matching and slicing. Concretely:
compute a resolved_src (as a PurePath) by applying the same rewrite/kept-folder
transformation you apply later (use _rewrite_path or the same logic that
_move_src_folder_up uses) and then, in the loop over submodules, run
submodule_path.match(str(resolved_src)) and slice parts relative to resolved_src
when setting submodule.path; keep usages of _rewrite_path, _move_src_folder_up,
submodule.path and checkout_version consistent so rewritten submodules match
what _move_src_folder_up produces.

393-402: ⚠️ Potential issue | 🟠 Major

Only delete the matched src directories, not their shared parent.

PurePath(src_dir_path).parts[0] is just the first path component. For a nested src like foo/bar, this removes foo/ and any unrelated siblings under it after the move.

Suggested cleanup
-        for src_dir_path in unique_dirs[:1]:
-            try:
-                for file_to_copy in os.listdir(src_dir_path):
-                    shutil.move(src_dir_path + "/" + file_to_copy, ".")
-                safe_rmtree(PurePath(src_dir_path).parts[0])
-            except FileNotFoundError:
-                logger.warning(
-                    f"The 'src:' filter '{src_dir_path}' didn't match any files from '{remote}'"
-                )
-            continue
+        if not unique_dirs:
+            return
+
+        chosen_dir = unique_dirs[0]
+        try:
+            for file_to_copy in os.listdir(chosen_dir):
+                shutil.move(os.path.join(chosen_dir, file_to_copy), ".")
+            safe_rm(unique_dirs)
+        except FileNotFoundError:
+            logger.warning(
+                f"The 'src:' filter '{chosen_dir}' didn't match any files from '{remote}'"
+            )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dfetch/vcs/git.py` around lines 393 - 402, Loop currently deletes the first
path component via PurePath(src_dir_path).parts[0], which can remove a parent
and unrelated siblings; change the cleanup to only remove the matched directory
itself by calling safe_rmtree(src_dir_path) (or an equivalent that accepts the
full path) after moving files from src_dir_path. Update the block that iterates
unique_dirs[:1] so safe_rmtree is invoked with src_dir_path (not
PurePath(...).parts[0]) and keep the existing FileNotFoundError handling.
🧹 Nitpick comments (2)
pyproject.toml (1)

124-124: Align Black target with declared minimum Python support (3.10).

Black is configured to target py313 while the project declares support for Python 3.10+. The CI matrix in run.yml actively tests against Python 3.10, so Black should target the minimum supported version to ensure consistent formatting guardrails across all supported versions.

Suggested config change
 [tool.black]
 extend-exclude = "sphinxcontrib_asciinema"
-target-version = ["py313"]
+target-version = ["py310"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` at line 124, Black is configured to target py313 via the
setting target-version = ["py313"]; update this to target the project's declared
minimum Python version by changing the Black configuration to target py310
(i.e., replace the "py313" entry with "py310") so formatting rules match the
lowest supported runtime used in CI and the project metadata.
dfetch/project/subproject.py (1)

151-160: Rename dependency to dependencies for clarity.

The variable is a list but uses singular naming, and the list comprehension shadows the outer variable (for dependency in dependency), which is confusing.

♻️ Suggested improvement
-            actually_fetched, dependency = self._fetch_impl(to_fetch)
+            actually_fetched, dependencies = self._fetch_impl(to_fetch)
         self._log_project(f"Fetched {actually_fetched}")

         applied_patches = self._apply_patches(patch_count)

         self.__metadata.fetched(
             actually_fetched,
             hash_=hash_directory(self.local_path, skiplist=[self.__metadata.FILENAME]),
             patch_=applied_patches,
-            dependencies=[dependency.to_dependency() for dependency in dependency],
+            dependencies=[dep.to_dependency() for dep in dependencies],
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dfetch/project/subproject.py` around lines 151 - 160, The code mistakenly
names a list variable "dependency" and then shadows it in the comprehension;
change the tuple returned from self._fetch_impl(to_fetch) to (actually_fetched,
dependencies) and use that plural name consistently (e.g., call
self._apply_patches as before, then call self.__metadata.fetched(...,
dependencies=[dep.to_dependency() for dep in dependencies]) ), updating any
other references to the old singular name (and keeping
hash_directory(self.local_path, skiplist=[self.__metadata.FILENAME]) unchanged);
this removes the shadowing and makes it clear the value is a list.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dfetch/util/util.py`:
- Around line 45-57: The cwd-safety check in safe_rm is incorrect for absolute
inputs; update safe_rm to resolve/normalize each path to an absolute Path before
checking and use Path.cwd() as the base for is_relative_to: for each path in
safe_rm, call Path(path).resolve() (or .absolute()) and then verify
resolved_path.is_relative_to(Path.cwd()) before deleting (and continue to call
safe_rmtree or os.remove using the original or resolved path as appropriate).

---

Duplicate comments:
In `@dfetch/vcs/git.py`:
- Around line 327-331: The current rewrite loop uses the original src pattern to
match submodule.path which fails when src is relative (e.g. "src" or "pkg/*")
and the tree has been reshaped by _move_src_folder_up; fix by first resolving
the src pattern against the actual kept-directory layout and then using that
resolved pattern for matching and slicing. Concretely: compute a resolved_src
(as a PurePath) by applying the same rewrite/kept-folder transformation you
apply later (use _rewrite_path or the same logic that _move_src_folder_up uses)
and then, in the loop over submodules, run
submodule_path.match(str(resolved_src)) and slice parts relative to resolved_src
when setting submodule.path; keep usages of _rewrite_path, _move_src_folder_up,
submodule.path and checkout_version consistent so rewritten submodules match
what _move_src_folder_up produces.
- Around line 393-402: Loop currently deletes the first path component via
PurePath(src_dir_path).parts[0], which can remove a parent and unrelated
siblings; change the cleanup to only remove the matched directory itself by
calling safe_rmtree(src_dir_path) (or an equivalent that accepts the full path)
after moving files from src_dir_path. Update the block that iterates
unique_dirs[:1] so safe_rmtree is invoked with src_dir_path (not
PurePath(...).parts[0]) and keep the existing FileNotFoundError handling.

---

Nitpick comments:
In `@dfetch/project/subproject.py`:
- Around line 151-160: The code mistakenly names a list variable "dependency"
and then shadows it in the comprehension; change the tuple returned from
self._fetch_impl(to_fetch) to (actually_fetched, dependencies) and use that
plural name consistently (e.g., call self._apply_patches as before, then call
self.__metadata.fetched(..., dependencies=[dep.to_dependency() for dep in
dependencies]) ), updating any other references to the old singular name (and
keeping hash_directory(self.local_path, skiplist=[self.__metadata.FILENAME])
unchanged); this removes the shadowing and makes it clear the value is a list.

In `@pyproject.toml`:
- Line 124: Black is configured to target py313 via the setting target-version =
["py313"]; update this to target the project's declared minimum Python version
by changing the Black configuration to target py310 (i.e., replace the "py313"
entry with "py310") so formatting rules match the lowest supported runtime used
in CI and the project metadata.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0adcf33d-8672-4206-9af5-ca6958e30891

📥 Commits

Reviewing files that changed from the base of the PR and between c45c695 and a0a5aca.

📒 Files selected for processing (21)
  • .github/workflows/release.yml
  • CHANGELOG.rst
  • dfetch/log.py
  • dfetch/manifest/parse.py
  • dfetch/project/gitsubproject.py
  • dfetch/project/metadata.py
  • dfetch/project/subproject.py
  • dfetch/project/svnsubproject.py
  • dfetch/reporting/stdout_reporter.py
  • dfetch/util/util.py
  • dfetch/vcs/git.py
  • example/dfetch.yaml
  • features/environment.py
  • features/fetch-file-pattern-git.feature
  • features/fetch-git-repo-with-submodule.feature
  • features/list-projects.feature
  • features/steps/generic_steps.py
  • features/steps/git_steps.py
  • pyproject.toml
  • tests/test_report.py
  • tests/test_subproject.py
✅ Files skipped from review due to trivial changes (2)
  • .github/workflows/release.yml
  • CHANGELOG.rst
🚧 Files skipped from review as they are similar to previous changes (4)
  • dfetch/reporting/stdout_reporter.py
  • tests/test_subproject.py
  • features/steps/git_steps.py
  • dfetch/log.py

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dfetch/util/util.py`:
- Around line 147-152: The code currently treats symlinks like their targets
because it uses os.path.realpath()/os.path.isdir() after os.path.lexists(),
causing directory symlinks to be passed into safe_rmtree() instead of being
removed as symlinks. Update the removal logic in the block that checks
os.path.lexists(path) (used by safe_rm/prune_files_by_pattern) to first detect
os.path.islink(path) and, if true, remove the link itself (os.unlink or
os.remove) without resolving; only if not a symlink then call
check_no_path_traversal(path, base) and branch to os.path.isdir(path) →
safe_rmtree(...) or os.remove(...) as appropriate, so directory symlinks are not
rmtree'd.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: eeb3a053-6fff-41dc-8f26-ecdf998313eb

📥 Commits

Reviewing files that changed from the base of the PR and between a0a5aca and 6cac513.

📒 Files selected for processing (22)
  • CHANGELOG.rst
  • dfetch/log.py
  • dfetch/manifest/parse.py
  • dfetch/project/archivesubproject.py
  • dfetch/project/gitsubproject.py
  • dfetch/project/metadata.py
  • dfetch/project/subproject.py
  • dfetch/project/svnsubproject.py
  • dfetch/reporting/stdout_reporter.py
  • dfetch/util/util.py
  • dfetch/vcs/git.py
  • example/dfetch.yaml
  • features/environment.py
  • features/fetch-file-pattern-git.feature
  • features/fetch-git-repo-with-submodule.feature
  • features/list-projects.feature
  • features/steps/generic_steps.py
  • features/steps/git_steps.py
  • pyproject.toml
  • tests/manifest_mock.py
  • tests/test_report.py
  • tests/test_subproject.py
✅ Files skipped from review due to trivial changes (7)
  • pyproject.toml
  • dfetch/log.py
  • CHANGELOG.rst
  • features/list-projects.feature
  • example/dfetch.yaml
  • features/fetch-git-repo-with-submodule.feature
  • features/fetch-file-pattern-git.feature
🚧 Files skipped from review as they are similar to previous changes (9)
  • features/environment.py
  • dfetch/manifest/parse.py
  • features/steps/generic_steps.py
  • tests/test_report.py
  • dfetch/reporting/stdout_reporter.py
  • dfetch/project/gitsubproject.py
  • features/steps/git_steps.py
  • tests/test_subproject.py
  • dfetch/vcs/git.py

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
dfetch/vcs/git.py (2)

393-397: ⚠️ Potential issue | 🟠 Major

Delete the processed src directory, not its top-level ancestor.

For nested selections like src="pkg/subdir", PurePath(src_dir_path).parts[0] becomes pkg, so this cleanup can remove sibling content that was never moved. The delete target should be src_dir_path itself.

🛠️ Suggested fix
-                safe_rm(PurePath(src_dir_path).parts[0])
+                safe_rm(src_dir_path)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dfetch/vcs/git.py` around lines 393 - 397, The loop over unique_dirs
currently calls safe_rm(PurePath(src_dir_path).parts[0]) which removes the
top-level ancestor (e.g., "pkg") and can delete siblings; change the cleanup to
remove the processed directory itself by calling safe_rm on src_dir_path (or its
normalized path) after moving files; update the call site near unique_dirs,
src_dir_path and safe_rm in git.py to use src_dir_path as the delete target
instead of PurePath(...).parts[0].

340-359: ⚠️ Potential issue | 🟠 Major

_rewrite_path() still misses common src patterns.

PurePath.match() is right-anchored for relative patterns, so values like src="pkg" and nested matches under pkg/* usually leave existing_path unchanged. After _move_src_folder_up(), that stale path is what the caller persists as the submodule destination. Use the same resolved base directory that _move_src_folder_up() selects, then compute the rewritten path from that base instead of relying on PurePath.match().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dfetch/vcs/git.py` around lines 340 - 359, The _rewrite_path function fails
to handle src patterns that lack explicit wildcards (e.g., "pkg") because
PurePath.match() is right-anchored; instead, resolve the same base directory
that _move_src_folder_up uses and compute the rewritten path relative to that
base. Change _rewrite_path to derive base_dir by taking src up to the first
wildcard (or the whole src when no wildcard), normalize it
(Path(...).as_posix()), and then if existing_path starts with that base_dir (or
base_dir + separator) return the remainder
(Path(*Path(existing_path).parts[len(base_dir.parts):]) as string); otherwise
return existing_path. Ensure the logic mirrors the base-resolution used in
_move_src_folder_up so nested matches like "pkg" correctly map to "pkg/sub/..."
destinations.
dfetch/util/util.py (1)

137-155: ⚠️ Potential issue | 🟠 Major

Preserve the within guard when unlinking symlinks.

The symlink branch skips check_no_path_traversal(), so safe_rm() can still remove an out-of-scope symlink path even when within is set. Validate the link's parent against base before unlinking it.

🛠️ Suggested fix
 def safe_rm(
     paths: str | Path | Sequence[str | Path],
     within: str | Path = ".",
 ) -> None:
     """Delete a file, directory or list of files/directories safely."""
     base = Path(within).resolve()
     paths_to_remove = (
         [paths] if isinstance(paths, str) or not isinstance(paths, Sequence) else paths
     )
     for path in paths_to_remove:
-        if os.path.lexists(path):
-            if os.path.islink(path):
-                os.unlink(path)
+        candidate = Path(path)
+        if os.path.lexists(candidate):
+            if os.path.islink(candidate):
+                check_no_path_traversal(candidate.parent, base)
+                candidate.unlink()
             else:
-                check_no_path_traversal(path, base)
-                if os.path.isdir(path):
-                    _safe_rmtree(str(path))
+                check_no_path_traversal(candidate, base)
+                if candidate.is_dir():
+                    _safe_rmtree(str(candidate))
                 else:
-                    os.remove(path)
+                    candidate.unlink()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dfetch/util/util.py` around lines 137 - 155, In safe_rm, the symlink branch
currently calls os.unlink(path) without enforcing the within guard; before
unlinking you must validate the symlink location against base by calling
check_no_path_traversal(path, base) (or check the symlink's parent against base)
so unlinking a symlink outside the allowed `within` is prevented; update the
os.path.islink(path) branch in safe_rm to run check_no_path_traversal(path,
base) prior to os.unlink(path) (keeping existing behavior for directories/files
unchanged).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dfetch/vcs/git.py`:
- Around line 333-336: After pruning ignored paths with
safe_rm(glob.glob(ignore_path)), re-filter the in-memory submodules list so any
submodule whose repository-root path was removed by the ignore step is dropped
before returning/using it (this prevents checkout_version() from receiving stale
entries). Concretely, after the loop that calls safe_rm, replace or filter the
submodules variable (the list named submodules) to only include entries whose
repo-root/path still exists on disk (check the appropriate attribute on each
submodule object, e.g., path or repo_root, with os.path.exists) so callers like
checkout_version() only see surviving submodules.

---

Duplicate comments:
In `@dfetch/util/util.py`:
- Around line 137-155: In safe_rm, the symlink branch currently calls
os.unlink(path) without enforcing the within guard; before unlinking you must
validate the symlink location against base by calling
check_no_path_traversal(path, base) (or check the symlink's parent against base)
so unlinking a symlink outside the allowed `within` is prevented; update the
os.path.islink(path) branch in safe_rm to run check_no_path_traversal(path,
base) prior to os.unlink(path) (keeping existing behavior for directories/files
unchanged).

In `@dfetch/vcs/git.py`:
- Around line 393-397: The loop over unique_dirs currently calls
safe_rm(PurePath(src_dir_path).parts[0]) which removes the top-level ancestor
(e.g., "pkg") and can delete siblings; change the cleanup to remove the
processed directory itself by calling safe_rm on src_dir_path (or its normalized
path) after moving files; update the call site near unique_dirs, src_dir_path
and safe_rm in git.py to use src_dir_path as the delete target instead of
PurePath(...).parts[0].
- Around line 340-359: The _rewrite_path function fails to handle src patterns
that lack explicit wildcards (e.g., "pkg") because PurePath.match() is
right-anchored; instead, resolve the same base directory that
_move_src_folder_up uses and compute the rewritten path relative to that base.
Change _rewrite_path to derive base_dir by taking src up to the first wildcard
(or the whole src when no wildcard), normalize it (Path(...).as_posix()), and
then if existing_path starts with that base_dir (or base_dir + separator) return
the remainder (Path(*Path(existing_path).parts[len(base_dir.parts):]) as
string); otherwise return existing_path. Ensure the logic mirrors the
base-resolution used in _move_src_folder_up so nested matches like "pkg"
correctly map to "pkg/sub/..." destinations.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 497b9565-5676-4513-9fb9-989dd4e1cd1c

📥 Commits

Reviewing files that changed from the base of the PR and between 6cac513 and 582caae.

📒 Files selected for processing (3)
  • dfetch/util/util.py
  • dfetch/vcs/git.py
  • features/environment.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • features/environment.py

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
dfetch/project/subproject.py (1)

132-148: Correct propagation of dependencies to metadata.

The update() method properly unpacks the fetch result and passes dependencies to metadata.fetched().

Minor observation: list(dependency) on line 148 is redundant since dependency is already typed as list[Dependency] from the _fetch_impl return. This is harmless but could be simplified to just dependencies=dependency.

♻️ Optional simplification
         self.__metadata.fetched(
             actually_fetched,
             hash_=hash_directory(
                 self.local_path,
                 skiplist=[self.__metadata.FILENAME] + post_fetch_ignored,
             ),
             patch_=applied_patches,
-            dependencies=list(dependency),
+            dependencies=dependency,
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dfetch/project/subproject.py` around lines 132 - 148, The code redundantly
wraps the already-typed list variable dependency when calling
self.__metadata.fetched; in the update/_fetch flow replace list(dependency) with
dependency so the call becomes self.__metadata.fetched(...,
dependencies=dependency) (locate the call to self.__metadata.fetched and the
variable dependency returned from _fetch_impl to make this change).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dfetch/vcs/git.py`:
- Around line 379-388: The call to safe_rm that currently passes
PurePath(src_dir_path).parts[0] should include the within parameter to constrain
deletion to the current working directory; update the code inside the
unique_dirs loop (where src_dir_path is used) to call
safe_rm(PurePath(src_dir_path).parts[0], within="." ) (or within=os.getcwd()) so
the removal is limited to the repo cwd, keeping behavior consistent with other
safe_rm uses and preventing accidental deletion outside the working directory.

---

Nitpick comments:
In `@dfetch/project/subproject.py`:
- Around line 132-148: The code redundantly wraps the already-typed list
variable dependency when calling self.__metadata.fetched; in the update/_fetch
flow replace list(dependency) with dependency so the call becomes
self.__metadata.fetched(..., dependencies=dependency) (locate the call to
self.__metadata.fetched and the variable dependency returned from _fetch_impl to
make this change).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fdf2df17-2746-4064-a8ce-27d1815b231b

📥 Commits

Reviewing files that changed from the base of the PR and between 582caae and 0ddf1dd.

📒 Files selected for processing (12)
  • dfetch/commands/update.py
  • dfetch/project/archivesubproject.py
  • dfetch/project/gitsubproject.py
  • dfetch/project/subproject.py
  • dfetch/project/svnsubproject.py
  • dfetch/reporting/stdout_reporter.py
  • dfetch/util/util.py
  • dfetch/vcs/git.py
  • doc/internal.rst
  • features/fetch-git-repo-with-submodule.feature
  • tests/test_subproject.py
  • tests/test_util.py
💤 Files with no reviewable changes (1)
  • doc/internal.rst
✅ Files skipped from review due to trivial changes (2)
  • dfetch/commands/update.py
  • features/fetch-git-repo-with-submodule.feature
🚧 Files skipped from review as they are similar to previous changes (2)
  • dfetch/project/archivesubproject.py
  • tests/test_subproject.py

Comment on lines +379 to +388
for src_dir_path in unique_dirs[:1]:
try:
for file_to_copy in os.listdir(src_dir_path):
shutil.move(src_dir_path + "/" + file_to_copy, ".")
safe_rm(PurePath(src_dir_path).parts[0])
except FileNotFoundError:
logger.warning(
f"The 'src:' filter '{src_dir_path}' didn't match any files from '{remote}'"
)
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing within parameter in safe_rm call.

On line 383, safe_rm(PurePath(src_dir_path).parts[0]) is called without the within parameter. Since this deletes the first path component of the matched source directory, it should be constrained to the current working directory for safety consistency with other safe_rm calls in this file.

🛡️ Proposed fix
-                safe_rm(PurePath(src_dir_path).parts[0])
+                safe_rm(PurePath(src_dir_path).parts[0], within=".")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dfetch/vcs/git.py` around lines 379 - 388, The call to safe_rm that currently
passes PurePath(src_dir_path).parts[0] should include the within parameter to
constrain deletion to the current working directory; update the code inside the
unique_dirs loop (where src_dir_path is used) to call
safe_rm(PurePath(src_dir_path).parts[0], within="." ) (or within=os.getcwd()) so
the removal is limited to the repo cwd, keeping behavior consistent with other
safe_rm uses and preventing accidental deletion outside the working directory.

@spoorcc spoorcc merged commit bc58e98 into main Mar 23, 2026
38 checks passed
@spoorcc spoorcc deleted the spoorcc/issue1013 branch March 23, 2026 05:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants