From cbd6e003dbb130f2d226672a55716cabc76cdca2 Mon Sep 17 00:00:00 2001 From: chkp-roniz <35386615+chkp-roniz@users.noreply.github.com> Date: Mon, 25 May 2026 15:12:24 +0300 Subject: [PATCH 1/6] fix(artifactory): deterministic boundary probe for nested GitLab paths Mirrors the native GitLab resolver pattern -- but uses HEAD on Artifactory archive URLs as the existence signal, so no separate metadata API is needed. Covers both routing modes: Mode 1 -- explicit FQDN deps (host/artifactory/key/owner/repo/...). Mode 2 -- bare shorthand under PROXY_REGISTRY_URL+PROXY_REGISTRY_ONLY. The resolver walks candidate (owner, repo, virtual_path) splits shallow-first and locks in the first one whose archive responds 2xx-3xx. For unambiguous paths it returns the parse-time ref unchanged. When every candidate is rejected it raises -- distinguishing "missing repo" from auth (401/403) errors so users get an actionable hint instead of silently anchoring on a wrong guess. ``allow_redirects=False`` on the HEAD call keeps the Bearer token from leaking cross-host. Removes the parse-time marker-segment heuristic (_VIRTUAL_PATH_ROOT_SEGMENTS / _ARTIFACTORY_VIRTUAL_MARKERS / _ARTIFACTORY_VIRTUAL_FILE_EXTENSIONS) that previously decided the boundary based on hard-coded directory names like ``skills/`` or ``prompts/``. Parse-time now defaults to all-as-repo with a structural file-extension rule on the last segment; the install-time resolver is the authoritative source of the boundary. Also fixes a pre-existing URL-roundtrip bug where ``to_github_url`` -> ``parse`` folded the ``artifactory/`` prefix into ``repo_url``, causing the downloader to build double-prefixed archive URLs and 404. ``_validate_url_repo_path`` now strips the prefix before returning the bare ``owner/repo`` slug. --- CHANGELOG.md | 10 + .../content/docs/enterprise/registry-proxy.md | 127 +++++ src/apm_cli/commands/install.py | 6 +- src/apm_cli/deps/artifactory_orchestrator.py | 11 +- src/apm_cli/install/artifactory_resolver.py | 275 ++++++++++ src/apm_cli/install/package_resolution.py | 29 +- src/apm_cli/models/dependency/reference.py | 84 ++- src/apm_cli/utils/github_host.py | 100 +++- tests/unit/test_artifactory_support.py | 482 ++++++++++++++++++ 9 files changed, 1103 insertions(+), 21 deletions(-) create mode 100644 src/apm_cli/install/artifactory_resolver.py diff --git a/CHANGELOG.md b/CHANGELOG.md index 1110f2352..e2985904c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,8 +7,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +### Added + +- Deterministic Artifactory boundary probe: at install time, `_resolve_artifactory_boundary` HEAD-probes the candidate archive URLs and rebuilds the dependency reference at the proxy-verified split. Covers both explicit-FQDN (`host/artifactory/key/owner/repo/...`) and bare-shorthand-under-proxy deps. Distinguishes "missing repo" from auth (401/403) errors; uses `allow_redirects=False` so the bearer token can't leak cross-host. Mirrors the native GitLab probing pattern but without a separate metadata API. + +### Changed + +- Parse-time Artifactory boundary detection no longer uses directory-marker heuristics (`skills/`, `prompts/`, `agents/`, `collections/`, `instructions/`). The install-time resolver is authoritative for the (owner, repo, virtual_path) split; parse-time defaults to a simple all-as-repo / structural-file-extension rule. The `_VIRTUAL_PATH_ROOT_SEGMENTS`, `_ARTIFACTORY_VIRTUAL_MARKERS`, and `_ARTIFACTORY_VIRTUAL_FILE_EXTENSIONS` constants are removed. + ### Fixed +- `apm install` through a registry proxy now supports nested-group repos (3+ path segments, e.g. `group/subgroup/project`). Previously every trailing segment past `owner/repo` was treated as an in-repo virtual sub-path, so the downloader requested the wrong archive URL and the install failed with HTTP 404 from the proxy. Behavior is gated on `PROXY_REGISTRY_ONLY` so direct (non-proxy) installs keep the legacy two-segment shape. Affects `parse_artifactory_path`, `build_artifactory_archive_url`, `_detect_virtual_package`, the shorthand resolvers in `DependencyReference`, and `ArtifactoryOrchestrator._split_owner_repo` / `download_subdirectory`. +- URL-form Artifactory deps no longer round-trip with the `artifactory/` prefix folded into `repo_url`. The duplicated prefix caused the downloader to construct double-prefixed archive URLs (`/artifactory/key/artifactory/key/owner/repo/...`) and 404. `_validate_url_repo_path` now strips the Artifactory VCS prefix before returning the bare `owner/repo` slug; the prefix is still recovered separately via `_extract_artifactory_prefix`. - Copilot, Codex, Cursor, Claude, Windsurf, OpenCode, and Gemini adapters handle MCP v0.1 `runtimeArguments`/`packageArguments` with `variables` (no `type` key), matching the VS Code fix from #1444. (#1461, closes #1452, thanks @sergio-sisternes-epam) ## [0.14.2] - 2026-05-22 diff --git a/docs/src/content/docs/enterprise/registry-proxy.md b/docs/src/content/docs/enterprise/registry-proxy.md index c8da3f50b..69c0366fb 100644 --- a/docs/src/content/docs/enterprise/registry-proxy.md +++ b/docs/src/content/docs/enterprise/registry-proxy.md @@ -95,6 +95,129 @@ rewrites `apm.lock.yaml`. | `apm marketplace` (`marketplace.json` fetch) | Yes; falls back to GitHub Contents API unless `PROXY_REGISTRY_ONLY=1` | | Policy file fetch (`apm-policy.yml`) | No -- uses the GitHub API directly | +### Nested-group repos (GitLab subgroups behind the proxy) + +GitHub uses a fixed `owner/repo` shape, but GitLab projects can sit at any +subgroup depth (e.g. `group/subgroup/project`). When +`PROXY_REGISTRY_ONLY=1` is set, APM treats path segments past the second +as part of the repo slug; the real boundary between repo path and +in-repo virtual sub-path is then settled at install time by the same +deterministic boundary probe used for explicit FQDN deps (see +[Explicit Artifactory FQDN](#explicit-artifactory-fqdn-deterministic-boundary-probe) +below): + +```yaml +# apm.yml -- 3-segment GitLab project behind a registry proxy +dependencies: + apm: + - group/subgroup/project#main # resolves to the full nested path + - group/sub-a/sub-b/project#v1.2.0 # arbitrary depth supported +``` + +Virtual sub-paths under nested-group repos work via the probe: parse +defaults to all-as-repo, then the install-time resolver HEAD-probes +candidate splits against the proxy and rebuilds the dependency +reference at the first split whose archive responds 2xx-3xx: + +```yaml +dependencies: + apm: + # The probe walks shallow-first and lands on the real boundary -- + # ``group/subgroup/project`` is the repo, ``skills/`` is the + # virtual sub-path -- no marker-segment heuristic involved. + - group/subgroup/project/skills/ + # Files ending in ``.prompt.md`` / ``.instructions.md`` / + # ``.chatmode.md`` / ``.agent.md`` are structurally a virtual file + # at parse time; the probe still confirms which directory the file + # sits under is part of the repo path. + - group/subgroup/project/.prompt.md +``` + +Probe authentication matches the URL being probed: bare-shorthand deps +(Mode 2) use the proxy's own bearer token from `PROXY_REGISTRY_TOKEN`, +while explicit-FQDN deps (Mode 1) use the per-host auth resolver -- in +both cases the audience matches the probed URL, never the upstream Git +host. + +#### Trade-off: lockfile env-dependence + +The fold-into-repo behavior is gated on `PROXY_REGISTRY_ONLY` to keep the +legacy two-segment shape for direct installs. Consequence: the same +shorthand parses differently with vs. without the env set. For a team +that always runs through the proxy, this is invisible. For a mixed CI +fleet, expect lockfile drift if some agents have the env and others +don't -- pin the env in the same place you pin Python and APM versions. + +#### Configuring the upstream remote (GitLab) + +When the proxy fronts a private GitLab instance, the proxy itself must +authenticate upstream -- the client (APM) does not need a token if the +proxy is configured to accept anonymous reads on its API. + +In the Artifactory UI, for the remote pointing at GitLab: + +| Field | Value | +|---|---| +| URL | `https://` (no path prefix) | +| Repository Path Prefix | *blank* (any value gets prepended to every upstream request) | +| Username | empty *or* the GitLab username | +| Password / Token | the raw GitLab PAT value -- no `PRIVATE-TOKEN:` prefix | +| Token Authentication | enable when the password is a GitLab PAT | +| VCS Provider | `GitLab` | + +The PAT must carry **`read_repository`** scope -- `read_api` alone does +not permit `/-/archive/` downloads. Verify directly against GitLab +before saving on the remote: + +```bash +curl -sI -H "PRIVATE-TOKEN: $PAT" \ + "https://///-/archive//-.zip" \ + | head -3 +# Want: HTTP/1.1 200 OK + Content-Type: application/zip +``` + +#### Default branch gotcha + +APM defaults to `main` when no ref is provided. GitLab projects whose +default branch is still `master` will return HTTP 404 for every archive +URL APM tries. Pin the ref in `apm.yml` (`#master`) when the +project hasn't been renamed. + +#### Explicit Artifactory FQDN: deterministic boundary probe + +When a dep is written with the full proxy URL -- +`/artifactory///[/]` -- parse time gives a +simple `owner / first-segment / rest-as-virtual` split. The real +boundary is settled at install time by an authoritative resolver that +mirrors APM's native GitLab probing pattern, without a separate metadata +API: + +1. Enumerate every plausible `(owner, repo, virtual_path)` split + shallow-first. +2. `HEAD` each candidate's archive URL on the proxy (no follow on + redirects, so the bearer token can't leak cross-host). +3. The first candidate that responds 2xx-3xx wins; the dependency + reference is rebuilt at that boundary and persisted to `apm.yml` as + a structured `git:` + `path:` entry. + +If every candidate is rejected the resolver raises -- there is no +silent fallback to the parse-time guess: + +| Result | Behaviour | +|---|---| +| Single candidate (e.g. `host/artifactory/key/owner/repo`) | Parse-time ref returned unchanged; no HEAD probe issued. | +| All candidates `4xx` (excluding 401/403) | `ValueError: ... did not resolve to a reachable repository archive` | +| All candidates `401`/`403` | `ValueError: ... authentication problem, not a missing repo` -- check the token's read scope. | + +To opt out of probing -- e.g. when the proxy is offline at install time +or when you want a deterministic byte-for-byte string -- use the +explicit `//` boundary marker, which short-circuits the resolver to a +single candidate: + +```text +/artifactory////// +``` + When a surface is not proxy-routed and `PROXY_REGISTRY_ONLY=1`, APM aborts rather than silently fetching direct. @@ -141,6 +264,10 @@ and `apm cache clean`. | `git clone` hangs through the proxy | `HTTPS_PROXY` not set in the env that runs `git` | Export it in the shell that invokes `apm install`; CI secrets often miss this | | `DeprecationWarning: ARTIFACTORY_BASE_URL is deprecated` | Legacy env names | Rename to `PROXY_REGISTRY_*` | | Plaintext-token warning on proxy startup | Token sent over `http://` | Use `https://`, or set `PROXY_REGISTRY_ALLOW_HTTP=1` if the link is internal-only | +| `Invalid zip archive` with a body that starts `` and is ~17KB | Upstream returned a sign-in page; proxy cached the HTML | Configure upstream credentials on the registry remote, purge the cache, then refetch | +| 3-segment dep (`group/sub/project`) fails with HTTP 404 from the proxy | APM treated `project` as a virtual sub-path | Set `PROXY_REGISTRY_ONLY=1`; see [Nested-group repos](#nested-group-repos-gitlab-subgroups-behind-the-proxy) | +| HTTP 404 on every ref of an existing GitLab project | Default branch is `master`, APM defaults to `main` | Pin the ref: `#master` in `apm.yml` | +| Upstream URL in `X-Artifactory-Origin-Remote-Path` has a duplicated group name | The remote's "Repository Path Prefix" is prepending a segment that's also in the request | Clear the prefix field on the remote | For fully disconnected CI (no proxy reach at all), build a bundle on a connected host with `apm pack` and restore offline. See diff --git a/src/apm_cli/commands/install.py b/src/apm_cli/commands/install.py index f81f828b5..b6ee6a403 100644 --- a/src/apm_cli/commands/install.py +++ b/src/apm_cli/commands/install.py @@ -12,6 +12,7 @@ import click +from apm_cli.install.artifactory_resolver import _resolve_artifactory_boundary from apm_cli.install.errors import ( AuthenticationError, DirectDependencyError, @@ -436,11 +437,12 @@ def warning_handler(msg): # Canonicalize input try: - dep_ref, direct_gitlab_virtual_resolved = resolve_parsed_dependency_reference( + dep_ref, direct_virtual_resolved = resolve_parsed_dependency_reference( package, marketplace_dep_ref, dependency_reference_cls=DependencyReference, try_resolve_gitlab_direct_shorthand=_try_resolve_gitlab_direct_shorthand, + resolve_artifactory_boundary=_resolve_artifactory_boundary, auth_resolver=auth_resolver, verbose=bool(logger and logger.verbose), ) @@ -458,7 +460,7 @@ def warning_handler(msg): _seen.add(_s) _normalized.append(_s) dep_ref.skill_subset = _normalized - if marketplace_dep_ref is not None or direct_gitlab_virtual_resolved: + if marketplace_dep_ref is not None or direct_virtual_resolved: _apm_yml_entries[canonical] = dependency_reference_to_yaml_entry(dep_ref) except ValueError as e: reason = str(e) diff --git a/src/apm_cli/deps/artifactory_orchestrator.py b/src/apm_cli/deps/artifactory_orchestrator.py index cd7f0b616..ab07ed59d 100644 --- a/src/apm_cli/deps/artifactory_orchestrator.py +++ b/src/apm_cli/deps/artifactory_orchestrator.py @@ -175,12 +175,15 @@ def _resolve_host_prefix( @staticmethod def _split_owner_repo(dep_ref: DependencyReference) -> tuple[str, str]: repo_parts = dep_ref.repo_url.split("/") - if len(repo_parts) < 2 or not repo_parts[0] or not repo_parts[1]: + if len(repo_parts) < 2 or not all(repo_parts): raise ValueError( f"Invalid Artifactory repo reference '{dep_ref.repo_url}': " "expected 'owner/repo' format" ) - return repo_parts[0], repo_parts[1] + # Owner is the top-level namespace; the remainder of the path is the + # project slug. For GitLab projects behind an Artifactory VCS proxy + # the slug can include subgroups (e.g. ``group/subgroup/project``). + return repo_parts[0], "/".join(repo_parts[1:]) @staticmethod def _progress(progress_obj, progress_task_id, *, completed: int, total: int = 100) -> None: @@ -257,7 +260,9 @@ def download_subdirectory( subdir_path = dep_ref.virtual_path repo_parts = dep_ref.repo_url.split("/") owner = repo_parts[0] - repo = repo_parts[1] if len(repo_parts) > 1 else repo_parts[0] + # Preserve subgroup nesting (GitLab via proxy) by folding everything + # past the owner into the repo slug. + repo = "/".join(repo_parts[1:]) if len(repo_parts) > 1 else repo_parts[0] host, prefix, scheme = proxy_info self._progress(progress_obj, progress_task_id, completed=10) diff --git a/src/apm_cli/install/artifactory_resolver.py b/src/apm_cli/install/artifactory_resolver.py new file mode 100644 index 000000000..ae31769c4 --- /dev/null +++ b/src/apm_cli/install/artifactory_resolver.py @@ -0,0 +1,275 @@ +"""Authoritative Artifactory boundary resolution for install-time validation. + +Native GitLab support (see :mod:`apm_cli.install.gitlab_resolver`) probes the +GitLab API to find the real repository boundary inside a nested subgroup path. +For deps routed through a JFrog Artifactory VCS proxy the proxy itself serves +archive ZIPs at deterministic URLs, so the archive *is* the existence signal: +``HEAD`` the candidate archive URL, treat ``2xx-3xx`` as existence proof. + +The resolver covers both routing modes uniformly: + +* **Mode 1** -- explicit FQDN deps (``host/artifactory/key/owner/repo/...``). + Host and prefix come from the dependency reference itself. +* **Mode 2** -- bare shorthand (``owner/repo/...``) routed through the proxy + by ``PROXY_REGISTRY_URL`` + ``PROXY_REGISTRY_ONLY=1``. Host and prefix come + from the registry config. The rebuilt ref stays bare shorthand form (no + ``host``/``artifactory_prefix`` set) so identity and lockfile shape are + preserved. + +The resolver is **deterministic**: + +* For unambiguous paths (a single plausible owner/repo split), it returns the + parse-time dependency reference unchanged. +* For ambiguous paths it walks candidate splits shallow-first and locks in the + first one whose archive responds 2xx-3xx. No "best-effort fallback" -- if + every candidate is rejected by the proxy, the function raises and the + install pipeline surfaces a clear error (and distinguishes "missing repo" + from "auth problem"). + +Drift from this contract has historically masked broken deps by silently +falling back to a guess. Keep the resolver definite. +""" + +from __future__ import annotations + +import dataclasses + +import requests + +from apm_cli.core.auth import AuthResolver +from apm_cli.models.apm_package import DependencyReference +from apm_cli.utils.github_host import ( + build_artifactory_archive_url, + iter_artifactory_boundary_candidates, + sanitize_token_url_in_message, +) + +_ARTIFACTORY_BOUNDARY_UNRESOLVED = ( + "Artifactory host/path did not resolve to a reachable repository archive. " + "Verify the proxy URL, ref, and that the upstream project exists; the " + "``//`` notation can mark the repo/virtual boundary explicitly when the " + "proxy is unavailable." +) + +_ARTIFACTORY_BOUNDARY_AUTH = ( + "Artifactory proxy rejected the probe for every candidate boundary " + "(401/403). This is an authentication problem, not a missing repo: " + "check that the configured token has read access to the proxy." +) + + +class _CandidateStatus: + """Probe outcome for one candidate boundary.""" + + EXISTS = "exists" # at least one URL shape returned 2xx + MISSING = "missing" # every URL shape returned 4xx other than 401/403 + AUTH = "auth" # only 401/403 seen -- cannot tell if the repo exists + + +def _candidate_archive_status( + host: str, + prefix: str, + owner: str, + repo: str, + ref: str, + headers: dict[str, str], + verify, + timeout: int = 15, +) -> str: + """Classify one candidate's existence by HEAD-probing every archive URL shape. + + Returns one of :class:`_CandidateStatus` constants. Distinguishing + ``AUTH`` from ``MISSING`` is deliberate: a misconfigured token should + surface a different error than a wrong owner/repo split. + """ + urls = build_artifactory_archive_url(host, prefix, owner, repo, ref) + saw_auth = False + for url in urls: + try: + # ``allow_redirects=False`` keeps the Bearer token from leaking to + # any host the proxy might redirect us to. 3xx is still existence + # proof -- the server confirmed the resource by issuing a redirect. + r = requests.head( + url, headers=headers, timeout=timeout, verify=verify, allow_redirects=False + ) + except requests.RequestException: + continue + if 200 <= r.status_code < 400: + return _CandidateStatus.EXISTS + if r.status_code in (401, 403): + saw_auth = True + return _CandidateStatus.AUTH if saw_auth else _CandidateStatus.MISSING + + +def _proxy_probe_headers( + dep_ref: DependencyReference, + auth_resolver, + is_mode_1: bool, +) -> dict[str, str]: + """Build the HEAD-probe headers for the right authentication audience. + + * Mode 1 (explicit FQDN): the dep's host *is* the proxy host, so the + per-host auth resolver already returns the right token. + * Mode 2 (bare shorthand under the proxy): the dep's logical host is + typically ``github.com`` and the auth resolver would hand back a + GitHub token -- wrong audience. Use the proxy's own bearer token + from :class:`RegistryConfig` instead. + """ + headers: dict[str, str] = {"User-Agent": "apm-cli"} + if is_mode_1: + ctx = auth_resolver.resolve_for_dep(dep_ref) + if ctx and getattr(ctx, "token", None): + headers["Authorization"] = f"Bearer {ctx.token}" + return headers + # Mode 2: proxy-scoped token only. + from ..deps.registry_proxy import RegistryConfig + + cfg = RegistryConfig.from_env() + if cfg is not None: + headers.update(cfg.get_headers()) + return headers + + +def _proxy_routing_target( + dep_ref: DependencyReference, +) -> tuple[str, str, bool] | None: + """Return ``(host, prefix, is_mode_1)`` if *dep_ref* routes through the proxy. + + * Mode 1 (explicit FQDN): host and prefix come from the dependency ref. + * Mode 2 (bare shorthand under ``PROXY_REGISTRY_ONLY``): host and prefix + come from the registry-proxy config. + + Returns ``None`` when no proxy routing applies (regular GitHub/ADO deps) + or when host/prefix are not real strings (e.g. mocked dependency refs + in unit tests). + """ + if dep_ref.is_artifactory(): + host = dep_ref.host + prefix = dep_ref.artifactory_prefix + if isinstance(host, str) and isinstance(prefix, str) and host and prefix: + return (host, prefix, True) + return None + from ..deps.artifactory_orchestrator import ArtifactoryRouter + + if not ArtifactoryRouter.should_use_proxy(dep_ref): + return None + cfg = ArtifactoryRouter.parse_proxy_config() + if cfg is None: + return None + host, prefix, _scheme = cfg + if isinstance(host, str) and isinstance(prefix, str) and host and prefix: + return (host, prefix, False) + return None + + +def _resolve_artifactory_boundary( + package: str, + auth_resolver, + verbose: bool = False, + *, + dep_ref: DependencyReference | None = None, +) -> DependencyReference: + """Definitively resolve the (owner, repo, virtual_path) boundary on the proxy. + + Returns the rebuilt :class:`DependencyReference` with the proxy-verified + boundary, or *dep_ref* unchanged when there is nothing to disambiguate + (single candidate, or the dep doesn't route through the proxy at all). + Raises ``ValueError`` if every candidate is rejected by the proxy. + """ + if auth_resolver is None: + auth_resolver = AuthResolver() + + if dep_ref is None: + dep_ref = DependencyReference.parse(package) + + target = _proxy_routing_target(dep_ref) + if target is None: + # Not routed through the proxy -- nothing for this resolver to do. + return dep_ref + host, prefix, is_mode_1 = target + + # Strip any inlined ``user:pass@host`` credentials before echoing the + # package string back in error messages. Deferred until after the + # routing-target check so non-proxy deps short-circuit before touching + # the host (callers occasionally pass mocked dep_refs whose ``host`` is + # not a real string). + safe_package = sanitize_token_url_in_message(package, host=host) + + prefix_segs = prefix.split("/") + tail = dep_ref.repo_url or "" + if dep_ref.virtual_path: + tail = f"{tail}/{dep_ref.virtual_path}" + tail_segs = [s for s in tail.split("/") if s] + path_segments = [*prefix_segs, *tail_segs] + + candidates = list(iter_artifactory_boundary_candidates(path_segments)) + if not candidates: + raise ValueError( + f"Artifactory dep '{safe_package}' has no plausible owner/repo split" + ) + if len(candidates) == 1: + # Single candidate -- the parse-time dep_ref is already definitive. + return dep_ref + + headers = _proxy_probe_headers(dep_ref, auth_resolver, is_mode_1) + verify = True + + ref = dep_ref.reference or "main" + + all_auth = True + for cand_prefix, cand_owner, cand_repo, cand_virtual in candidates: + if verbose: + path_suffix = f" [path: {cand_virtual}]" if cand_virtual else "" + print( + f" artifactory-resolve: probing {host}/{cand_prefix}/{cand_owner}" + f"/{cand_repo}#{ref}{path_suffix}" + ) + status = _candidate_archive_status( + host, cand_prefix, cand_owner, cand_repo, ref, headers, verify + ) + if status == _CandidateStatus.EXISTS: + # If the probe confirms the parse-time boundary, return the + # original ref so the install pipeline doesn't re-serialize + # a structurally unchanged dep as a different shape. + if ( + dep_ref.repo_url == f"{cand_owner}/{cand_repo}" + and (dep_ref.virtual_path or None) == cand_virtual + ): + return dep_ref + return _rebuild_dep_ref(dep_ref, host, cand_prefix, cand_owner, cand_repo, cand_virtual, is_mode_1) + if status == _CandidateStatus.MISSING: + all_auth = False + + # Every candidate was rejected. Auth-only rejection deserves a different + # error than "the repo doesn't exist" so the user gets an actionable hint. + if all_auth: + raise ValueError(f"{_ARTIFACTORY_BOUNDARY_AUTH} (package: {safe_package})") + raise ValueError(f"{_ARTIFACTORY_BOUNDARY_UNRESOLVED} (package: {safe_package})") + + +def _rebuild_dep_ref( + dep_ref: DependencyReference, + host: str, + prefix: str, + owner: str, + repo: str, + virtual_path: str | None, + is_mode_1: bool, +) -> DependencyReference: + """Rebuild *dep_ref* at the probed boundary, preserving non-boundary fields. + + Mode 1 keeps the FQDN+prefix on the rebuilt ref (it was there to begin + with). Mode 2 keeps the rebuilt ref as bare shorthand so identity and + lockfile shape stay stable -- the proxy is still routed via env, not via + embedded host/prefix. + """ + if is_mode_1: + return DependencyReference.from_artifactory_boundary_probe( + host, prefix, owner, repo, virtual_path, dep_ref.reference + ) + return dataclasses.replace( + dep_ref, + repo_url=f"{owner}/{repo}", + virtual_path=virtual_path, + is_virtual=bool(virtual_path), + ) diff --git a/src/apm_cli/install/package_resolution.py b/src/apm_cli/install/package_resolution.py index 791e184e3..260c2bd21 100644 --- a/src/apm_cli/install/package_resolution.py +++ b/src/apm_cli/install/package_resolution.py @@ -40,14 +40,20 @@ def resolve_parsed_dependency_reference( try_resolve_gitlab_direct_shorthand: Callable[..., Any], auth_resolver: Any, verbose: bool, + resolve_artifactory_boundary: Callable[..., Any] | None = None, ) -> tuple[Any, bool]: """Parse or probe *package* into a ``DependencyReference``. - Returns ``(dep_ref, direct_gitlab_virtual_resolved)`` where the second flag - is True when GitLab direct shorthand probing produced a virtual path entry. + Returns ``(dep_ref, direct_virtual_resolved)`` where the second flag is + True when a probe (GitLab shorthand or Artifactory boundary) rebuilt the + dep ref, so the install pipeline persists it as a structured entry. + + For Artifactory deps the optional ``resolve_artifactory_boundary`` is + authoritative: it returns the proxy-verified boundary or raises -- there + is no silent fallback to the parse-time guess. Raises: - ValueError: When GitLab shorthand probing is required but fails to resolve. + ValueError: When GitLab or Artifactory probing fails to resolve. """ dep_ref = ( marketplace_dep_ref @@ -66,8 +72,21 @@ def resolve_parsed_dependency_reference( if resolved is None: raise ValueError(_GITLAB_DIRECT_SHORTHAND_UNRESOLVED) dep_ref = resolved - direct_gitlab_virtual_resolved = bool(dep_ref.is_virtual and dep_ref.virtual_path) - return dep_ref, direct_gitlab_virtual_resolved + direct_virtual_resolved = bool(dep_ref.is_virtual and dep_ref.virtual_path) + return dep_ref, direct_virtual_resolved + if marketplace_dep_ref is None and resolve_artifactory_boundary is not None: + # The resolver decides its own applicability -- it short-circuits for + # deps that don't route through the Artifactory proxy. When it rebuilds + # the dep_ref, the canonical shorthand can't round-trip the verified + # boundary, so persist as a structured ``git:`` + ``path:`` entry. + resolved = resolve_artifactory_boundary( + package, + auth_resolver, + verbose=verbose, + dep_ref=dep_ref, + ) + if resolved is not dep_ref: + return resolved, True return dep_ref, False diff --git a/src/apm_cli/models/dependency/reference.py b/src/apm_cli/models/dependency/reference.py index 6b2829980..9ed5dd193 100644 --- a/src/apm_cli/models/dependency/reference.py +++ b/src/apm_cli/models/dependency/reference.py @@ -760,6 +760,26 @@ def from_gitlab_shorthand_probe( is_virtual=True, ) + @classmethod + def from_artifactory_boundary_probe( + cls, + host: str, + prefix: str, + owner: str, + repo: str, + virtual_path: str | None, + reference: str | None, + ) -> "DependencyReference": + """Build a dependency ref for a resolved Artifactory boundary probe.""" + return cls( + repo_url=f"{owner}/{repo}", + host=host, + reference=reference, + virtual_path=virtual_path, + is_virtual=bool(virtual_path), + artifactory_prefix=prefix, + ) + @classmethod def _gitlab_shorthand_repo_segment_count( cls, @@ -804,6 +824,39 @@ def _gitlab_shorthand_repo_segment_count( return n + @classmethod + def _bare_shorthand_repo_segment_count(cls, path_segments: list[str]) -> int: + """Return how many leading segments belong to the repo path for bare shorthand. + + For ``owner/repo[/...]`` shorthand without an FQDN, the default is 2 + segments (GitHub convention). When registry-only mode is active, the + proxy may be fronting a host that allows nested namespaces (GitLab + subgroups) -- parse defaults to **all-as-repo** so the deterministic + boundary probe in :mod:`apm_cli.install.artifactory_resolver` can + rebuild the dependency reference at the proxy-verified split. + + The only parse-time inference kept is **structural**: a path whose + last segment ends in a virtual file extension + (``.prompt.md``/``.instructions.md``/``.chatmode.md``/``.agent.md``) + is by shape a virtual file dep -- the file is the last segment and + the repo is everything before it. This is not a directory-marker + heuristic; the file extension is the type. The shallower boundary + (when the file lives under a known directory like ``prompts/``) is + settled by the probe, not by a marker list. + """ + n = len(path_segments) + if n < 3: + return 2 + + from ...deps.registry_proxy import is_enforce_only + + if not is_enforce_only(): + return 2 + + if any(path_segments[-1].endswith(ext) for ext in cls.VIRTUAL_FILE_EXTENSIONS): + return n - 1 + return n + @classmethod def _detect_virtual_package(cls, dependency_str: str): """Detect whether *dependency_str* refers to a virtual package. @@ -890,7 +943,13 @@ def _detect_virtual_package(cls, dependency_str: str): else: min_base_segments = len(path_segments) else: - min_base_segments = 2 + # Bare shorthand (no FQDN). Default GitHub-style: owner/repo plus + # any tail is treated as a virtual sub-path. But when registry-only + # mode is active, the proxy may be fronting a GitLab instance where + # the project lives at an arbitrary subgroup depth -- fold non-marker + # segments into the repo path instead of mis-classifying them as + # virtual sub-paths (see issue: nested GitLab subgroup support). + min_base_segments = cls._bare_shorthand_repo_segment_count(path_segments) min_virtual_segments = min_base_segments + 1 @@ -1065,6 +1124,17 @@ def _resolve_virtual_shorthand_repo(cls, repo_url, validated_host, virtual_path= "Invalid Azure DevOps virtual package format: expected at least org/project/repo/path" ) repo_url = "/".join(parts[:3]) + elif validated_host is None and virtual_path: + # Bare shorthand under registry-only mode may carry a nested + # repo path (GitLab subgroup via proxy). Trust the boundary + # already chosen by ``_bare_shorthand_repo_segment_count`` -- + # everything before the virtual tail belongs to the repo. + vparts = [p for p in virtual_path.split("/") if p] + tail = len(vparts) + if tail > 0 and len(parts) > tail + 1: + repo_url = "/".join(parts[: len(parts) - tail]) + else: + repo_url = "/".join(parts[:2]) else: repo_url = "/".join(parts[:2]) @@ -1113,6 +1183,10 @@ def _resolve_shorthand_to_parsed_url(cls, repo_url, host): user_repo = "/".join(parts[:3]) elif host and not is_github_hostname(host) and not is_azure_devops_hostname(host): user_repo = "/".join(parts) + elif len(parts) >= 3 and cls._bare_shorthand_repo_segment_count(parts) > 2: + # Registry-only mode allows nested-group repo paths + # (GitLab via proxy). Keep the full multi-segment path. + user_repo = "/".join(parts[: cls._bare_shorthand_repo_segment_count(parts)]) else: user_repo = "/".join(parts[:2]) else: @@ -1245,6 +1319,14 @@ def _validate_url_repo_path(cls, parsed_url) -> tuple[str, str | None]: raise ValueError( f"Invalid repository path: expected at least 'user/repo', got '{path}'" ) + # Strip the Artifactory VCS prefix so ``repo_url`` is the bare + # ``owner/repo`` -- otherwise URL round-trip through + # ``to_github_url`` -> ``parse`` would carry the prefix in the + # repo_url and the orchestrator would double-prefix download URLs. + # The prefix itself is recovered separately via + # :meth:`_extract_artifactory_prefix`. + if is_artifactory_path(path_parts): + path_parts = path_parts[2:] for pp in path_parts: if any(pp.endswith(ext) for ext in cls.VIRTUAL_FILE_EXTENSIONS): raise ValueError( diff --git a/src/apm_cli/utils/github_host.py b/src/apm_cli/utils/github_host.py index 3031115de..5e515f040 100644 --- a/src/apm_cli/utils/github_host.py +++ b/src/apm_cli/utils/github_host.py @@ -610,25 +610,101 @@ def is_artifactory_path(path_segments: list) -> bool: return len(path_segments) >= 4 and path_segments[0].lower() == "artifactory" +def iter_artifactory_boundary_candidates(path_segments: list, shape_filter=None): + """Yield ``(prefix, owner, repo, virtual_path)`` candidates shallow-first. + + Mirrors :meth:`DependencyReference.iter_gitlab_direct_shorthand_boundary_candidates`: + enumerate every plausible (owner, repo) split and let the caller probe each + one against the Artifactory proxy. The probe (HEAD on the archive URL) + decides the real boundary; this iterator only proposes candidates. + + If *shape_filter* is provided, candidates whose ``virtual_path`` fails the + filter are skipped. The candidate with no virtual path (``k == n``) is + always yielded as the all-as-repo fallback so callers that need a + deterministic answer (no probing) can pick it. + + The ``//`` empty-segment notation explicitly marks the repo / virtual + boundary and short-circuits the iterator to a single candidate. + + Returns nothing for non-Artifactory paths. + """ + if not is_artifactory_path(path_segments): + return + repo_key = path_segments[1] + prefix = f"artifactory/{repo_key}" + remaining = path_segments[2:] + if not remaining: + return + owner = remaining[0] + after_owner = remaining[1:] + n = len(after_owner) + if n == 0: + return + + if "" in after_owner: + empty_idx = after_owner.index("") + repo_parts = after_owner[:empty_idx] + suffix_parts = [s for s in after_owner[empty_idx + 1 :] if s] + if repo_parts: + yield ( + prefix, + owner, + "/".join(repo_parts), + "/".join(suffix_parts) if suffix_parts else None, + ) + return + + for k in range(1, n + 1): + repo = "/".join(after_owner[:k]) + suffix_parts = after_owner[k:] + suffix = "/".join(suffix_parts) if suffix_parts else None + if suffix is not None and shape_filter is not None and not shape_filter(suffix): + continue + yield (prefix, owner, repo, suffix) + + def parse_artifactory_path(path_segments: list) -> tuple: - """Parse Artifactory path into (prefix, owner, repo, virtual_path). + """Parse Artifactory path into ``(prefix, owner, repo, virtual_path)``. - Input: ['artifactory', 'github', 'microsoft', 'apm-sample-package'] - Output: ('artifactory/github', 'microsoft', 'apm-sample-package', None) + Parse-time output is intentionally simple and unambiguous: ``owner`` is the + first segment after ``artifactory/{key}``, ``repo`` is the next segment, + and any further segments become ``virtual_path``. The authoritative + boundary -- needed for nested GitLab subgroup paths behind the Artifactory + proxy -- is determined by :func:`apm_cli.install.artifactory_resolver.\ +_resolve_artifactory_boundary`, which probes archive URLs and rebuilds the + dependency reference at the verified boundary. - Input: ['artifactory', 'github', 'owner', 'repo', 'skills', 'review'] - Output: ('artifactory/github', 'owner', 'repo', 'skills/review') + The ``//`` notation (empty segment) is honored as an explicit, deterministic + boundary marker so users can opt out of probing. Returns None if not a valid Artifactory path. """ if not is_artifactory_path(path_segments): return None repo_key = path_segments[1] - remaining = path_segments[2:] prefix = f"artifactory/{repo_key}" + remaining = path_segments[2:] + if not remaining: + return None owner = remaining[0] - repo = remaining[1] - virtual_path = "/".join(remaining[2:]) if len(remaining) > 2 else None + after_owner = remaining[1:] + if not after_owner: + return None + + if "" in after_owner: + empty_idx = after_owner.index("") + repo_parts = after_owner[:empty_idx] + suffix_parts = [s for s in after_owner[empty_idx + 1 :] if s] + if repo_parts: + return ( + prefix, + owner, + "/".join(repo_parts), + "/".join(suffix_parts) if suffix_parts else None, + ) + + repo = after_owner[0] + virtual_path = "/".join(after_owner[1:]) if len(after_owner) > 1 else None return (prefix, owner, repo, virtual_path) @@ -661,11 +737,15 @@ def build_artifactory_archive_url( Tuple of URLs to try in order """ base = f"{scheme}://{host}/{prefix}/{owner}/{repo}" + # GitLab archive filenames use only the project basename, even when the + # project sits inside a subgroup (e.g. ``group/sub/pkg`` becomes + # ``pkg-{ref}.zip``). ``rsplit`` keeps the flat case unchanged. + repo_basename = repo.rsplit("/", 1)[-1] return ( # GitHub-style: /archive/refs/heads/{ref}.zip f"{base}/archive/refs/heads/{ref}.zip", - # GitLab-style: /-/archive/{ref}/{repo}-{ref}.zip - f"{base}/-/archive/{ref}/{repo}-{ref}.zip", + # GitLab-style: /-/archive/{ref}/{basename}-{ref}.zip + f"{base}/-/archive/{ref}/{repo_basename}-{ref}.zip", # GitHub-style tags fallback f"{base}/archive/refs/tags/{ref}.zip", # codeload.github.com-style: /zip/refs/heads/{ref} diff --git a/tests/unit/test_artifactory_support.py b/tests/unit/test_artifactory_support.py index 0f3e14611..8fad78874 100644 --- a/tests/unit/test_artifactory_support.py +++ b/tests/unit/test_artifactory_support.py @@ -107,6 +107,122 @@ def test_different_repo_key(self): result = parse_artifactory_path(["artifactory", "my-proxy", "team", "project"]) assert result[0] == "artifactory/my-proxy" + def test_nested_gitlab_subgroup_simple_parse(self): + """Parse is intentionally simple/shallow; the install-time resolver is authoritative. + + For ``apm/acme-group/shared-modules/pkg-utils`` parse picks + ``owner=acme-group, repo=shared-modules`` and folds the rest into + ``virtual_path``. ``_resolve_artifactory_boundary`` then HEAD-probes + archive URLs and rebuilds the ref at the proxy-verified boundary. + """ + result = parse_artifactory_path( + ["artifactory", "apm", "acme-group", "shared-modules", "pkg-utils"] + ) + assert result == ("artifactory/apm", "acme-group", "shared-modules", "pkg-utils") + + def test_nested_subgroup_with_marker_simple_parse(self): + """Marker-like segments are no longer parse-time signals; the resolver decides.""" + result = parse_artifactory_path( + ["artifactory", "apm", "group", "subgroup", "repo", "skills", "foo"] + ) + assert result == ("artifactory/apm", "group", "subgroup", "repo/skills/foo") + + def test_nested_subgroup_with_virtual_file_extension(self): + """Virtual file extensions are no longer parse-time boundary signals either.""" + result = parse_artifactory_path( + ["artifactory", "apm", "group", "subgroup", "repo", "rules.prompt.md"] + ) + assert result == ("artifactory/apm", "group", "subgroup", "repo/rules.prompt.md") + + def test_double_slash_subdirectory_notation(self): + """The ``//subdir`` notation explicitly marks the repo/virtual boundary.""" + # Mirrors ``art.example.com/artifactory/github/owner/repo//subdir``. + result = parse_artifactory_path( + ["artifactory", "github", "owner", "repo", "", "subdir"] + ) + assert result[0] == "artifactory/github" + assert result[1] == "owner" + assert result[2] == "repo" + assert result[3] == "subdir" + + +class TestIterArtifactoryBoundaryCandidates: + """Test the candidate iterator that backs the install-time probe.""" + + def test_flat_path_yields_single_candidate(self): + from apm_cli.utils.github_host import iter_artifactory_boundary_candidates + + candidates = list( + iter_artifactory_boundary_candidates( + ["artifactory", "github", "owner", "repo"] + ) + ) + assert candidates == [("artifactory/github", "owner", "repo", None)] + + def test_nested_path_yields_shallow_to_deep(self): + from apm_cli.utils.github_host import iter_artifactory_boundary_candidates + + candidates = list( + iter_artifactory_boundary_candidates( + ["artifactory", "apm", "group", "sub", "repo"] + ) + ) + # Shallowest (k=1) yielded first; deepest (all-as-repo) last. + assert candidates == [ + ("artifactory/apm", "group", "sub", "repo"), + ("artifactory/apm", "group", "sub/repo", None), + ] + + def test_shape_filter_accepts_matching_suffixes(self): + from apm_cli.utils.github_host import iter_artifactory_boundary_candidates + + # Filter only accepts suffixes ending in ``.prompt.md``. + def only_prompt_md(v): + return v.endswith(".prompt.md") + + candidates = list( + iter_artifactory_boundary_candidates( + ["artifactory", "apm", "g", "a", "b", "rules.prompt.md"], + shape_filter=only_prompt_md, + ) + ) + # after_owner = [a, b, rules.prompt.md]: + # k=1 suffix='b/rules.prompt.md' -> ends in .prompt.md, accepted. + # k=2 suffix='rules.prompt.md' -> ends in .prompt.md, accepted. + # k=3 suffix=None -> filter bypassed (no-suffix fallback). + assert ("artifactory/apm", "g", "a", "b/rules.prompt.md") in candidates + assert ("artifactory/apm", "g", "a/b", "rules.prompt.md") in candidates + assert ("artifactory/apm", "g", "a/b/rules.prompt.md", None) in candidates + + def test_shape_filter_actually_rejects(self): + from apm_cli.utils.github_host import iter_artifactory_boundary_candidates + + # Reject every non-empty suffix. + candidates = list( + iter_artifactory_boundary_candidates( + ["artifactory", "apm", "g", "a", "b"], + shape_filter=lambda _v: False, + ) + ) + # All k rebuild at depth.""" + from apm_cli.install.artifactory_resolver import _resolve_artifactory_boundary + + package = "art.example.com/artifactory/apm/acme-group/shared-modules/pkg-utils" + auth = Mock() + auth.resolve_for_dep.return_value = Mock(token=None) + + ok_url = "/artifactory/apm/acme-group/shared-modules/pkg-utils/" + + with patch( + "apm_cli.install.artifactory_resolver.requests.head", + side_effect=self._fake_head(ok_url), + ): + resolved = _resolve_artifactory_boundary(package, auth, verbose=False) + + assert resolved.host == "art.example.com" + assert resolved.artifactory_prefix == "artifactory/apm" + assert resolved.repo_url == "acme-group/shared-modules/pkg-utils" + assert resolved.virtual_path is None + assert resolved.is_artifactory() + + def test_resolver_picks_shallower_boundary_when_subpath_is_virtual(self): + """Shallow boundary 200s -> lock in shallow + virtual_path.""" + from apm_cli.install.artifactory_resolver import _resolve_artifactory_boundary + + package = "art.example.com/artifactory/apm/group/repo/skills/foo" + auth = Mock() + auth.resolve_for_dep.return_value = Mock(token=None) + + ok_url = "/artifactory/apm/group/repo/" + + def _head(url, headers=None, timeout=None, verify=None, allow_redirects=None): + resp = MagicMock() + # Only the shallowest archive URL hits 200; deeper paths 404. + resp.status_code = ( + 200 + if ok_url in url and "/group/repo/skills" not in url + else 404 + ) + return resp + + with patch( + "apm_cli.install.artifactory_resolver.requests.head", + side_effect=_head, + ): + resolved = _resolve_artifactory_boundary(package, auth, verbose=False) + + assert resolved.repo_url == "group/repo" + assert resolved.virtual_path == "skills/foo" + assert resolved.is_virtual + + def test_resolver_no_op_for_non_proxy_routed_dep(self): + """Non-Artifactory deps without proxy mode are a no-op (unchanged ref). + + The resolver covers both routing modes (Mode 1 FQDN and Mode 2 bare + shorthand under ``PROXY_REGISTRY_ONLY``); deps that don't route through + the proxy at all (e.g. a plain ``github.com`` dep with no proxy env) + should pass through unchanged so the resolver can be called blindly + from the install pipeline. + """ + from apm_cli.install.artifactory_resolver import _resolve_artifactory_boundary + + auth = Mock() + # Clear any proxy env to make sure Mode 2 doesn't kick in here either. + with patch.dict(os.environ, {}, clear=True): + dep = DependencyReference.parse("github.com/owner/repo") + resolved = _resolve_artifactory_boundary( + "github.com/owner/repo", auth, verbose=False, dep_ref=dep + ) + assert resolved is dep + + def test_resolver_raises_when_no_candidate_matches(self): + """All-404 must raise -- deterministic failure, no silent fallback.""" + from apm_cli.install.artifactory_resolver import _resolve_artifactory_boundary + + package = "art.example.com/artifactory/apm/group/sub/repo" + auth = Mock() + auth.resolve_for_dep.return_value = Mock(token=None) + + def _all_404(url, headers=None, timeout=None, verify=None, allow_redirects=None): + resp = MagicMock() + resp.status_code = 404 + return resp + + with patch( + "apm_cli.install.artifactory_resolver.requests.head", side_effect=_all_404 + ): + with pytest.raises(ValueError, match="did not resolve"): + _resolve_artifactory_boundary(package, auth, verbose=False) + + def test_resolver_distinguishes_auth_from_missing(self): + """All-401 must raise an auth-specific error, not a "not found" one. + + A misconfigured token returning 401 for every probe must surface as an + auth problem so the user knows to fix credentials, not the dep path. + """ + from apm_cli.install.artifactory_resolver import _resolve_artifactory_boundary + + package = "art.example.com/artifactory/apm/group/sub/repo" + auth = Mock() + auth.resolve_for_dep.return_value = Mock(token="bad") + + def _all_401(url, headers=None, timeout=None, verify=None, allow_redirects=None): + resp = MagicMock() + resp.status_code = 401 + return resp + + with patch( + "apm_cli.install.artifactory_resolver.requests.head", side_effect=_all_401 + ): + with pytest.raises(ValueError, match="authentication problem"): + _resolve_artifactory_boundary(package, auth, verbose=False) + + def test_resolver_returns_unchanged_for_unambiguous_path(self): + """A 4-segment path has only one candidate -- return the parse-time ref unchanged.""" + from apm_cli.install.artifactory_resolver import _resolve_artifactory_boundary + + package = "art.example.com/artifactory/github/owner/repo" + dep_ref = DependencyReference.parse(package) + auth = Mock() + auth.resolve_for_dep.return_value = Mock(token=None) + + with patch( + "apm_cli.install.artifactory_resolver.requests.head" + ) as mock_head: + resolved = _resolve_artifactory_boundary( + package, auth, verbose=False, dep_ref=dep_ref + ) + + assert resolved is dep_ref + mock_head.assert_not_called() + + def test_resolver_handles_mode_2_bare_shorthand(self): + """Bare shorthand under ``PROXY_REGISTRY_ONLY`` is probed via the registry config. + + The rebuilt ref stays bare (no ``host`` / ``artifactory_prefix`` set); + the proxy is still routed via env, not embedded in the dep ref. + """ + from apm_cli.install.artifactory_resolver import _resolve_artifactory_boundary + + package = "group/sub/repo/skills/foo" + with patch.dict( + os.environ, + { + "PROXY_REGISTRY_URL": "https://art.example.com/artifactory/apm", + "PROXY_REGISTRY_ONLY": "1", + }, + clear=True, + ): + dep = DependencyReference.parse(package) + auth = Mock() + auth.resolve_for_dep.return_value = Mock(token=None) + + # Only the shallow boundary (group/sub/repo) exists. + ok_marker = "/artifactory/apm/group/sub/repo/" + + def _head(url, headers=None, timeout=None, verify=None, allow_redirects=None): + resp = MagicMock() + resp.status_code = ( + 200 + if ok_marker in url and "/group/sub/repo/skills" not in url + else 404 + ) + return resp + + with patch( + "apm_cli.install.artifactory_resolver.requests.head", side_effect=_head + ): + resolved = _resolve_artifactory_boundary( + package, auth, verbose=False, dep_ref=dep + ) + + assert resolved.repo_url == "group/sub/repo" + assert resolved.virtual_path == "skills/foo" + assert resolved.is_virtual + # Bare shorthand form preserved -- proxy routing is via env, not + # embedded as artifactory_prefix on the ref. + assert resolved.artifactory_prefix is None + assert resolved.is_artifactory() is False + + +# ── ArtifactoryOrchestrator: multi-segment repo support ── + + +class TestArtifactoryOrchestratorNestedRepo: + """``ArtifactoryOrchestrator`` keeps nested-group repo paths intact.""" + + def test_split_owner_repo_two_segments(self): + """Owner/repo split is unchanged for flat 2-segment paths.""" + from apm_cli.deps.artifactory_orchestrator import ArtifactoryOrchestrator + + dep = DependencyReference.parse( + "art.example.com/artifactory/github/owner/repo" + ) + owner, repo = ArtifactoryOrchestrator._split_owner_repo(dep) + assert owner == "owner" + assert repo == "repo" + + def test_split_owner_repo_nested_subgroup_after_probe(self): + """After a probe-rebuilt ref, subgroup segments are folded into ``repo``.""" + from apm_cli.deps.artifactory_orchestrator import ArtifactoryOrchestrator + + # Simulate the post-probe dep_ref directly (avoids needing a real proxy). + dep = DependencyReference.from_artifactory_boundary_probe( + host="art.example.com", + prefix="artifactory/apm", + owner="acme-group", + repo="shared-modules/pkg-utils", + virtual_path=None, + reference=None, + ) + owner, repo = ArtifactoryOrchestrator._split_owner_repo(dep) + assert owner == "acme-group" + assert repo == "shared-modules/pkg-utils" + # ── token_manager.py: Artifactory token support ── @@ -790,6 +1269,9 @@ def test_no_corporate_values_in_source(self): src_dir / "models" / "dependency.py", src_dir / "commands" / "install.py", src_dir / "core" / "token_manager.py", + src_dir / "install" / "artifactory_resolver.py", + src_dir / "install" / "package_resolution.py", + src_dir / "models" / "dependency" / "reference.py", ] forbidden = ["checkpoint", "chkp"] for py_file in target_files: From 385af9ead49edefd2149ad245288c86a76f2c929 Mon Sep 17 00:00:00 2001 From: chkp-roniz <35386615+chkp-roniz@users.noreply.github.com> Date: Mon, 25 May 2026 16:47:36 +0300 Subject: [PATCH 2/6] fix(artifactory): address Copilot review feedback on PR #1472 - parse_artifactory_path: reject ``owner//virtual`` (empty segment before the explicit boundary) instead of falling through and returning an empty ``repo`` slug. - _proxy_routing_target: thread the proxy ``scheme`` (https/http) through the resolver into ``build_artifactory_archive_url`` so installs that intentionally route through an ``http://`` proxy (under ``PROXY_REGISTRY_ALLOW_HTTP=1``) probe over the same transport. - _CandidateStatus.EXISTS: update inline comment to say ``2xx or 3xx`` (redirect-inclusive), matching the actual classification in ``_candidate_archive_status``. - CHANGELOG: split the parse-time-behavior note into explicit-FQDN vs bare-shorthand modes (the all-as-repo default applies only under ``PROXY_REGISTRY_ONLY``; Mode 1 stays shallow). - package_resolution: clarify the ``direct_virtual_resolved`` docstring -- GitLab path gates on ``is_virtual``, Artifactory path gates on probe rebuild; both signal "persist as structured ``git:`` + ``path:`` entry". - Add two regression tests for the empty-segment edge case (parser + iterator). --- CHANGELOG.md | 6 ++++- src/apm_cli/install/artifactory_resolver.py | 28 ++++++++++++--------- src/apm_cli/install/package_resolution.py | 13 ++++++++-- src/apm_cli/utils/github_host.py | 18 +++++++------ tests/unit/test_artifactory_support.py | 24 ++++++++++++++++++ 5 files changed, 67 insertions(+), 22 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index e2985904c..1d79f4a95 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -13,7 +13,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Changed -- Parse-time Artifactory boundary detection no longer uses directory-marker heuristics (`skills/`, `prompts/`, `agents/`, `collections/`, `instructions/`). The install-time resolver is authoritative for the (owner, repo, virtual_path) split; parse-time defaults to a simple all-as-repo / structural-file-extension rule. The `_VIRTUAL_PATH_ROOT_SEGMENTS`, `_ARTIFACTORY_VIRTUAL_MARKERS`, and `_ARTIFACTORY_VIRTUAL_FILE_EXTENSIONS` constants are removed. +- Parse-time Artifactory boundary detection no longer uses directory-marker heuristics (`skills/`, `prompts/`, `agents/`, `collections/`, `instructions/`). The install-time resolver is authoritative for the (owner, repo, virtual_path) split. Parse-time defaults differ by mode: + - **Explicit FQDN** (`host/artifactory/key/owner/repo/...`): `parse_artifactory_path` stays intentionally shallow -- `owner` = first segment after the prefix, `repo` = next segment, remainder = `virtual_path`. + - **Bare shorthand under `PROXY_REGISTRY_ONLY`**: `_bare_shorthand_repo_segment_count` defaults to all-as-repo with a structural file-extension rule on the last segment (a path ending in `.prompt.md`/`.instructions.md`/`.chatmode.md`/`.agent.md` is by shape a virtual file; everything before it is the repo). + + Both modes converge at install time: `_resolve_artifactory_boundary` HEAD-probes candidate splits and rebuilds the dependency reference at the proxy-verified split. The `_VIRTUAL_PATH_ROOT_SEGMENTS`, `_ARTIFACTORY_VIRTUAL_MARKERS`, and `_ARTIFACTORY_VIRTUAL_FILE_EXTENSIONS` constants are removed. ### Fixed diff --git a/src/apm_cli/install/artifactory_resolver.py b/src/apm_cli/install/artifactory_resolver.py index ae31769c4..85620bd82 100644 --- a/src/apm_cli/install/artifactory_resolver.py +++ b/src/apm_cli/install/artifactory_resolver.py @@ -61,7 +61,7 @@ class _CandidateStatus: """Probe outcome for one candidate boundary.""" - EXISTS = "exists" # at least one URL shape returned 2xx + EXISTS = "exists" # at least one URL shape returned 2xx or 3xx (existence proof, redirect inclusive) MISSING = "missing" # every URL shape returned 4xx other than 401/403 AUTH = "auth" # only 401/403 seen -- cannot tell if the repo exists @@ -75,6 +75,7 @@ def _candidate_archive_status( headers: dict[str, str], verify, timeout: int = 15, + scheme: str = "https", ) -> str: """Classify one candidate's existence by HEAD-probing every archive URL shape. @@ -82,7 +83,7 @@ def _candidate_archive_status( ``AUTH`` from ``MISSING`` is deliberate: a misconfigured token should surface a different error than a wrong owner/repo split. """ - urls = build_artifactory_archive_url(host, prefix, owner, repo, ref) + urls = build_artifactory_archive_url(host, prefix, owner, repo, ref, scheme=scheme) saw_auth = False for url in urls: try: @@ -132,12 +133,15 @@ def _proxy_probe_headers( def _proxy_routing_target( dep_ref: DependencyReference, -) -> tuple[str, str, bool] | None: - """Return ``(host, prefix, is_mode_1)`` if *dep_ref* routes through the proxy. +) -> tuple[str, str, str, bool] | None: + """Return ``(host, prefix, scheme, is_mode_1)`` if *dep_ref* routes through the proxy. - * Mode 1 (explicit FQDN): host and prefix come from the dependency ref. - * Mode 2 (bare shorthand under ``PROXY_REGISTRY_ONLY``): host and prefix - come from the registry-proxy config. + * Mode 1 (explicit FQDN): host and prefix come from the dependency ref; + scheme is always ``https`` (Mode 1 deps reject ``http://`` upstream). + * Mode 2 (bare shorthand under ``PROXY_REGISTRY_ONLY``): host, prefix, + and scheme come from the registry-proxy config, so installs that + intentionally route through an ``http://`` proxy (isolated networks + using ``PROXY_REGISTRY_ALLOW_HTTP=1``) probe over the same transport. Returns ``None`` when no proxy routing applies (regular GitHub/ADO deps) or when host/prefix are not real strings (e.g. mocked dependency refs @@ -147,7 +151,7 @@ def _proxy_routing_target( host = dep_ref.host prefix = dep_ref.artifactory_prefix if isinstance(host, str) and isinstance(prefix, str) and host and prefix: - return (host, prefix, True) + return (host, prefix, "https", True) return None from ..deps.artifactory_orchestrator import ArtifactoryRouter @@ -156,9 +160,9 @@ def _proxy_routing_target( cfg = ArtifactoryRouter.parse_proxy_config() if cfg is None: return None - host, prefix, _scheme = cfg + host, prefix, scheme = cfg if isinstance(host, str) and isinstance(prefix, str) and host and prefix: - return (host, prefix, False) + return (host, prefix, scheme or "https", False) return None @@ -186,7 +190,7 @@ def _resolve_artifactory_boundary( if target is None: # Not routed through the proxy -- nothing for this resolver to do. return dep_ref - host, prefix, is_mode_1 = target + host, prefix, scheme, is_mode_1 = target # Strip any inlined ``user:pass@host`` credentials before echoing the # package string back in error messages. Deferred until after the @@ -225,7 +229,7 @@ def _resolve_artifactory_boundary( f"/{cand_repo}#{ref}{path_suffix}" ) status = _candidate_archive_status( - host, cand_prefix, cand_owner, cand_repo, ref, headers, verify + host, cand_prefix, cand_owner, cand_repo, ref, headers, verify, scheme=scheme ) if status == _CandidateStatus.EXISTS: # If the probe confirms the parse-time boundary, return the diff --git a/src/apm_cli/install/package_resolution.py b/src/apm_cli/install/package_resolution.py index 260c2bd21..84b2119ee 100644 --- a/src/apm_cli/install/package_resolution.py +++ b/src/apm_cli/install/package_resolution.py @@ -45,8 +45,17 @@ def resolve_parsed_dependency_reference( """Parse or probe *package* into a ``DependencyReference``. Returns ``(dep_ref, direct_virtual_resolved)`` where the second flag is - True when a probe (GitLab shorthand or Artifactory boundary) rebuilt the - dep ref, so the install pipeline persists it as a structured entry. + True when the dep should be persisted as a structured ``git:`` + ``path:`` + entry in ``apm.yml`` (the canonical shorthand cannot round-trip the probed + boundary). The two probe paths gate this flag differently: + + * **GitLab shorthand** -- True only when the resolved ref is a virtual + package (``is_virtual and virtual_path``); a probe that lands on a bare + repo with no virtual path stays in canonical shorthand form. + * **Artifactory boundary** -- True whenever the probe rebuilt the ref + (parse-time guess differed from the proxy-verified split); a probe that + merely confirms the parse-time boundary keeps the original ref so + apm.yml stays in its existing shape. For Artifactory deps the optional ``resolve_artifactory_boundary`` is authoritative: it returns the proxy-verified boundary or raises -- there diff --git a/src/apm_cli/utils/github_host.py b/src/apm_cli/utils/github_host.py index 5e515f040..c41fed8c2 100644 --- a/src/apm_cli/utils/github_host.py +++ b/src/apm_cli/utils/github_host.py @@ -695,13 +695,17 @@ def parse_artifactory_path(path_segments: list) -> tuple: empty_idx = after_owner.index("") repo_parts = after_owner[:empty_idx] suffix_parts = [s for s in after_owner[empty_idx + 1 :] if s] - if repo_parts: - return ( - prefix, - owner, - "/".join(repo_parts), - "/".join(suffix_parts) if suffix_parts else None, - ) + if not repo_parts: + # ``owner//virtual`` has no segments before the explicit boundary, + # so there is no repo to install -- reject as invalid rather than + # falling through and returning ``repo=''``. + return None + return ( + prefix, + owner, + "/".join(repo_parts), + "/".join(suffix_parts) if suffix_parts else None, + ) repo = after_owner[0] virtual_path = "/".join(after_owner[1:]) if len(after_owner) > 1 else None diff --git a/tests/unit/test_artifactory_support.py b/tests/unit/test_artifactory_support.py index 8fad78874..7a548e111 100644 --- a/tests/unit/test_artifactory_support.py +++ b/tests/unit/test_artifactory_support.py @@ -145,6 +145,30 @@ def test_double_slash_subdirectory_notation(self): assert result[2] == "repo" assert result[3] == "subdir" + def test_empty_segment_before_boundary_returns_none(self): + """``owner//virtual`` has no repo before the explicit boundary -> invalid. + + Regression guard: previously fell through to ``repo=''``, producing a + malformed dep ref that broke downstream URL construction. + """ + assert ( + parse_artifactory_path( + ["artifactory", "github", "owner", "", "subdir"] + ) + is None + ) + + def test_iterator_yields_nothing_for_empty_segment_before_boundary(self): + """Same edge case at the iterator layer -- no candidate is yielded.""" + from apm_cli.utils.github_host import iter_artifactory_boundary_candidates + + candidates = list( + iter_artifactory_boundary_candidates( + ["artifactory", "github", "owner", "", "subdir"] + ) + ) + assert candidates == [] + class TestIterArtifactoryBoundaryCandidates: """Test the candidate iterator that backs the install-time probe.""" From 4ae18cab25f01a27227209dec4a0241ca3bc0f93 Mon Sep 17 00:00:00 2001 From: Copilot <223556219+Copilot@users.noreply.github.com> Date: Wed, 27 May 2026 18:18:55 +0200 Subject: [PATCH 3/6] test(artifactory): add regression traps for boundary resolver Adds focused regression-trap tests for the new install-pipeline boundary resolver (#1472) covering the contracts a future refactor could silently break: - allow_redirects=False on every HEAD probe (token-leak guard against proxy-issued cross-host redirects) - Mode 2 Authorization header is sourced from RegistryConfig (proxy bearer), never from AuthResolver (which returns the github.com PAT and would leak it to the proxy host) - 403 (like 401) classifies as AUTH; mixed 401/non-auth-4xx demotes the candidate set to MISSING so the user-facing error stays accurate (avoids misreporting a missing repo as auth failure) - 429 and other non-{401,403} 4xx are MISSING, not AUTH (guards against accidental broadening of the AUTH set) - _split_owner_repo subgroup folding for 3+ segment GitLab boundaries (group/sub1/sub2/repo correctly folds to owner=group, repo=sub1/sub2/repo) Both mutation-break gates were exercised: flipping allow_redirects to True and rewiring Mode 2 to AuthResolver each made the corresponding test fail; restoring the source code made them pass. Also applies ruff format pass to bring three files in line with the canonical formatter (pre-existing drift on the PR branch that would have failed CI lint check). Co-authored-by: chkp-roniz Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- src/apm_cli/install/artifactory_resolver.py | 12 +- .../deps/test_artifactory_orchestrator.py | 43 ++++ .../unit/install/test_artifactory_resolver.py | 208 ++++++++++++++++++ tests/unit/test_artifactory_support.py | 81 +++---- 4 files changed, 288 insertions(+), 56 deletions(-) create mode 100644 tests/unit/install/test_artifactory_resolver.py diff --git a/src/apm_cli/install/artifactory_resolver.py b/src/apm_cli/install/artifactory_resolver.py index 85620bd82..ed7c60f73 100644 --- a/src/apm_cli/install/artifactory_resolver.py +++ b/src/apm_cli/install/artifactory_resolver.py @@ -61,7 +61,9 @@ class _CandidateStatus: """Probe outcome for one candidate boundary.""" - EXISTS = "exists" # at least one URL shape returned 2xx or 3xx (existence proof, redirect inclusive) + EXISTS = ( + "exists" # at least one URL shape returned 2xx or 3xx (existence proof, redirect inclusive) + ) MISSING = "missing" # every URL shape returned 4xx other than 401/403 AUTH = "auth" # only 401/403 seen -- cannot tell if the repo exists @@ -208,9 +210,7 @@ def _resolve_artifactory_boundary( candidates = list(iter_artifactory_boundary_candidates(path_segments)) if not candidates: - raise ValueError( - f"Artifactory dep '{safe_package}' has no plausible owner/repo split" - ) + raise ValueError(f"Artifactory dep '{safe_package}' has no plausible owner/repo split") if len(candidates) == 1: # Single candidate -- the parse-time dep_ref is already definitive. return dep_ref @@ -240,7 +240,9 @@ def _resolve_artifactory_boundary( and (dep_ref.virtual_path or None) == cand_virtual ): return dep_ref - return _rebuild_dep_ref(dep_ref, host, cand_prefix, cand_owner, cand_repo, cand_virtual, is_mode_1) + return _rebuild_dep_ref( + dep_ref, host, cand_prefix, cand_owner, cand_repo, cand_virtual, is_mode_1 + ) if status == _CandidateStatus.MISSING: all_auth = False diff --git a/tests/unit/deps/test_artifactory_orchestrator.py b/tests/unit/deps/test_artifactory_orchestrator.py index 99e7954b5..a07a93328 100644 --- a/tests/unit/deps/test_artifactory_orchestrator.py +++ b/tests/unit/deps/test_artifactory_orchestrator.py @@ -246,3 +246,46 @@ def test_archive_runtime_error_propagates(self, tmp_path): ) with pytest.raises(RuntimeError, match=r"404"): orch.download_package(dep, tmp_path / "pkg") + + +# --------------------------------------------------------------------------- +# ArtifactoryOrchestrator._split_owner_repo subgroup folding +# --------------------------------------------------------------------------- + + +class TestSplitOwnerRepoSubgroupFolding: + """``_split_owner_repo`` must fold every segment past the owner into ``repo``. + + Regression trap for nested GitLab subgroup paths behind an Artifactory + proxy (#1498): a 4+ segment ``group/sub1/sub2/repo`` boundary -- as + rebuilt by the boundary resolver -- must produce ``owner=group`` and + ``repo=sub1/sub2/repo``. Truncating to two segments here would silently + install the wrong upstream project. + """ + + @pytest.mark.parametrize( + "repo_url,expected_owner,expected_repo", + [ + ("group/repo", "group", "repo"), + ("group/sub/repo", "group", "sub/repo"), + ("group/sub1/sub2/repo", "group", "sub1/sub2/repo"), + ("group/a/b/c/d/repo", "group", "a/b/c/d/repo"), + ], + ) + def test_subgroup_folding(self, repo_url, expected_owner, expected_repo): + dep = DependencyReference.from_artifactory_boundary_probe( + host="art.example.com", + prefix="artifactory/apm", + owner=repo_url.split("/", 1)[0], + repo=repo_url.split("/", 1)[1], + virtual_path=None, + reference=None, + ) + owner, repo = ArtifactoryOrchestrator._split_owner_repo(dep) + assert (owner, repo) == (expected_owner, expected_repo) + + def test_single_segment_rejected(self): + """A bare repo with no owner segment is malformed and must raise.""" + bad = types.SimpleNamespace(repo_url="onlyrepo") + with pytest.raises(ValueError, match=r"expected 'owner/repo' format"): + ArtifactoryOrchestrator._split_owner_repo(bad) diff --git a/tests/unit/install/test_artifactory_resolver.py b/tests/unit/install/test_artifactory_resolver.py new file mode 100644 index 000000000..f2f45b61a --- /dev/null +++ b/tests/unit/install/test_artifactory_resolver.py @@ -0,0 +1,208 @@ +"""Regression-trap tests for ``apm_cli.install.artifactory_resolver``. + +This file extends the broader Artifactory-boundary coverage in +``tests/unit/test_artifactory_support.py`` with focused regression traps for +the security and audience-separation contracts that a refactor could +silently break: + +* ``allow_redirects=False`` on every HEAD probe (token-leak guard against + proxy-issued cross-host redirects). +* Mode 2 ``Authorization`` header is sourced from + :class:`apm_cli.deps.registry_proxy.RegistryConfig` (the proxy bearer), + NOT from :class:`apm_cli.core.auth.AuthResolver` (which would hand back + the github.com PAT and leak it to the proxy host). +* 403 -- like 401 -- is classified as ``AUTH``; mixed 401/404 demotes the + candidate set to ``MISSING`` so the user-facing error stays accurate. + +These are paired with mutation-break gates in the originating shepherd +session: each test was confirmed to FAIL when the contract under test was +intentionally removed in the source. +""" + +from __future__ import annotations + +import os +from unittest.mock import MagicMock, Mock, patch + +import pytest + +from apm_cli.install.artifactory_resolver import _resolve_artifactory_boundary +from apm_cli.models.apm_package import DependencyReference + +# --------------------------------------------------------------------------- +# Helpers +# --------------------------------------------------------------------------- + + +def _make_head_recorder(status_code: int = 404): + """Build a fake ``requests.head`` that records every call. + + Returns ``(recorder_list, fake_head)``. Each entry in ``recorder_list`` + is the keyword-argument dict the resolver passed to ``requests.head``. + """ + recorded: list[dict] = [] + + def _head(url, headers=None, timeout=None, verify=None, allow_redirects=None): + recorded.append( + { + "url": url, + "headers": dict(headers or {}), + "timeout": timeout, + "verify": verify, + "allow_redirects": allow_redirects, + } + ) + resp = MagicMock() + resp.status_code = status_code + return resp + + return recorded, _head + + +# --------------------------------------------------------------------------- +# allow_redirects=False mutation-break guard +# --------------------------------------------------------------------------- + + +class TestRedirectLeakGuard: + """``allow_redirects=False`` must hold for every probe. + + Flipping it to ``True`` would let the proxy redirect the resolver to a + different host while the Bearer token rides along -- a cross-host token + leak. This is the single highest-severity invariant in the module. + """ + + def test_every_probe_disables_redirects(self): + """Probe every candidate of an ambiguous Mode 1 dep; assert each call + was made with ``allow_redirects=False``.""" + package = "art.example.com/artifactory/apm/group/sub/repo" + auth = Mock() + auth.resolve_for_dep.return_value = Mock(token=None) + + recorded, fake_head = _make_head_recorder(status_code=404) + with patch("apm_cli.install.artifactory_resolver.requests.head", side_effect=fake_head): + with pytest.raises(ValueError): + _resolve_artifactory_boundary(package, auth, verbose=False) + + # Sanity: the resolver actually probed something (otherwise the + # assertion below would pass vacuously and the mutation-break gate + # would be useless). + assert recorded, "expected at least one HEAD probe" + for call in recorded: + assert call["allow_redirects"] is False, ( + "allow_redirects must be False on every probe to prevent " + "cross-host Bearer-token leakage via proxy-issued redirects" + ) + + +# --------------------------------------------------------------------------- +# Mode 2 token-audience guard +# --------------------------------------------------------------------------- + + +class TestModeTwoTokenAudience: + """Mode 2 (bare shorthand under ``PROXY_REGISTRY_ONLY``) must source the + probe ``Authorization`` header from :class:`RegistryConfig.from_env`, + NOT from :class:`AuthResolver` (which would return the github.com PAT + and leak it to the proxy host). + """ + + def test_authorization_header_is_proxy_bearer_not_github_pat(self): + """Both a github.com PAT (via AuthResolver) and a proxy bearer (via + ``PROXY_REGISTRY_TOKEN``) are set. The HEAD probe must carry the + proxy bearer, never the PAT. + """ + package = "group/sub/repo" + with patch.dict( + os.environ, + { + "PROXY_REGISTRY_URL": "https://art.example.com/artifactory/apm", + "PROXY_REGISTRY_ONLY": "1", + "PROXY_REGISTRY_TOKEN": "proxy-bearer-PROXY", + }, + clear=True, + ): + dep = DependencyReference.parse(package) + + # Wire AuthResolver to hand back a github.com PAT. If the + # resolver wrongly consults it, this token will end up in the + # outgoing Authorization header. + auth = Mock() + auth.resolve_for_dep.return_value = Mock(token="github-pat-LEAK") + + recorded, fake_head = _make_head_recorder(status_code=404) + with patch( + "apm_cli.install.artifactory_resolver.requests.head", + side_effect=fake_head, + ): + with pytest.raises(ValueError): + _resolve_artifactory_boundary(package, auth, verbose=False, dep_ref=dep) + + assert recorded, "expected at least one HEAD probe" + for call in recorded: + auth_header = call["headers"].get("Authorization", "") + assert "github-pat-LEAK" not in auth_header, ( + "Mode 2 must not carry the github.com PAT to the proxy host" + ) + assert auth_header == "Bearer proxy-bearer-PROXY", ( + f"Mode 2 Authorization must come from RegistryConfig (got {auth_header!r})" + ) + # And the github PAT path must never have been consulted. + auth.resolve_for_dep.assert_not_called() + + +# --------------------------------------------------------------------------- +# 401 vs 403 vs other-4xx error discrimination +# --------------------------------------------------------------------------- + + +class TestErrorDiscrimination: + """401/403 -> AUTH; any non-auth 4xx demotes the candidate set to MISSING.""" + + def _run_with_status_map(self, status_for_url): + package = "art.example.com/artifactory/apm/group/sub/repo" + auth = Mock() + auth.resolve_for_dep.return_value = Mock(token="t") + + def _head(url, headers=None, timeout=None, verify=None, allow_redirects=None): + resp = MagicMock() + resp.status_code = status_for_url(url) + return resp + + with patch("apm_cli.install.artifactory_resolver.requests.head", side_effect=_head): + with pytest.raises(ValueError) as excinfo: + _resolve_artifactory_boundary(package, auth, verbose=False) + return str(excinfo.value) + + def test_all_403_raises_auth_specific_error(self): + """403 must classify as AUTH, same as 401.""" + msg = self._run_with_status_map(lambda _url: 403) + assert "authentication problem" in msg + + def test_mixed_401_and_404_demotes_to_unresolved(self): + """If any candidate returns a non-auth 4xx, the set is MISSING -- the + error must be the unresolved-boundary one, not the auth-specific one. + Otherwise a real 404 on one of several candidates would be misreported + as an auth failure and the user would chase the wrong fix. + """ + # First call (shallow candidate) returns 401, all others 404. + state = {"calls": 0} + + def status_for(url: str) -> int: + state["calls"] += 1 + return 401 if state["calls"] == 1 else 404 + + msg = self._run_with_status_map(status_for) + assert "did not resolve" in msg + assert "authentication problem" not in msg + + def test_429_is_not_auth(self): + """A non-auth status code (e.g. 429 Too Many Requests) returned on + the HEAD probe must not be classified as AUTH; the response code + only means ``MISSING`` because the resolver could not get an + existence proof. Guards against accidental broadening of the + AUTH set beyond {401, 403}. + """ + msg = self._run_with_status_map(lambda _url: 429) + assert "did not resolve" in msg + assert "authentication problem" not in msg diff --git a/tests/unit/test_artifactory_support.py b/tests/unit/test_artifactory_support.py index 7a548e111..c96c7c024 100644 --- a/tests/unit/test_artifactory_support.py +++ b/tests/unit/test_artifactory_support.py @@ -137,9 +137,7 @@ def test_nested_subgroup_with_virtual_file_extension(self): def test_double_slash_subdirectory_notation(self): """The ``//subdir`` notation explicitly marks the repo/virtual boundary.""" # Mirrors ``art.example.com/artifactory/github/owner/repo//subdir``. - result = parse_artifactory_path( - ["artifactory", "github", "owner", "repo", "", "subdir"] - ) + result = parse_artifactory_path(["artifactory", "github", "owner", "repo", "", "subdir"]) assert result[0] == "artifactory/github" assert result[1] == "owner" assert result[2] == "repo" @@ -151,21 +149,14 @@ def test_empty_segment_before_boundary_returns_none(self): Regression guard: previously fell through to ``repo=''``, producing a malformed dep ref that broke downstream URL construction. """ - assert ( - parse_artifactory_path( - ["artifactory", "github", "owner", "", "subdir"] - ) - is None - ) + assert parse_artifactory_path(["artifactory", "github", "owner", "", "subdir"]) is None def test_iterator_yields_nothing_for_empty_segment_before_boundary(self): """Same edge case at the iterator layer -- no candidate is yielded.""" from apm_cli.utils.github_host import iter_artifactory_boundary_candidates candidates = list( - iter_artifactory_boundary_candidates( - ["artifactory", "github", "owner", "", "subdir"] - ) + iter_artifactory_boundary_candidates(["artifactory", "github", "owner", "", "subdir"]) ) assert candidates == [] @@ -177,9 +168,7 @@ def test_flat_path_yields_single_candidate(self): from apm_cli.utils.github_host import iter_artifactory_boundary_candidates candidates = list( - iter_artifactory_boundary_candidates( - ["artifactory", "github", "owner", "repo"] - ) + iter_artifactory_boundary_candidates(["artifactory", "github", "owner", "repo"]) ) assert candidates == [("artifactory/github", "owner", "repo", None)] @@ -187,9 +176,7 @@ def test_nested_path_yields_shallow_to_deep(self): from apm_cli.utils.github_host import iter_artifactory_boundary_candidates candidates = list( - iter_artifactory_boundary_candidates( - ["artifactory", "apm", "group", "sub", "repo"] - ) + iter_artifactory_boundary_candidates(["artifactory", "apm", "group", "sub", "repo"]) ) # Shallowest (k=1) yielded first; deepest (all-as-repo) last. assert candidates == [ @@ -478,7 +465,10 @@ def test_three_segment_shorthand_kept_whole(self): """``group/subgroup/repo`` is a nested-group project, not a subdir.""" with patch.dict( os.environ, - {"PROXY_REGISTRY_URL": "https://art.example.com/artifactory/apm", "PROXY_REGISTRY_ONLY": "1"}, + { + "PROXY_REGISTRY_URL": "https://art.example.com/artifactory/apm", + "PROXY_REGISTRY_ONLY": "1", + }, clear=False, ): dep = DependencyReference.parse("acme-group/shared-modules/pkg-utils") @@ -499,7 +489,10 @@ def test_three_segment_with_marker_segment_defers_to_resolver(self): """ with patch.dict( os.environ, - {"PROXY_REGISTRY_URL": "https://art.example.com/artifactory/apm", "PROXY_REGISTRY_ONLY": "1"}, + { + "PROXY_REGISTRY_URL": "https://art.example.com/artifactory/apm", + "PROXY_REGISTRY_ONLY": "1", + }, clear=False, ): dep = DependencyReference.parse("group/sub/repo/skills/cve-patcher") @@ -518,7 +511,10 @@ def test_three_segment_with_virtual_file_ext_still_virtual(self): """ with patch.dict( os.environ, - {"PROXY_REGISTRY_URL": "https://art.example.com/artifactory/apm", "PROXY_REGISTRY_ONLY": "1"}, + { + "PROXY_REGISTRY_URL": "https://art.example.com/artifactory/apm", + "PROXY_REGISTRY_ONLY": "1", + }, clear=False, ): dep = DependencyReference.parse("group/sub/repo/rules.prompt.md") @@ -530,7 +526,10 @@ def test_two_segment_shorthand_unchanged(self): """Two-segment shorthand keeps the GitHub ``owner/repo`` shape.""" with patch.dict( os.environ, - {"PROXY_REGISTRY_URL": "https://art.example.com/artifactory/apm", "PROXY_REGISTRY_ONLY": "1"}, + { + "PROXY_REGISTRY_URL": "https://art.example.com/artifactory/apm", + "PROXY_REGISTRY_ONLY": "1", + }, clear=False, ): dep = DependencyReference.parse("microsoft/apm-sample-package") @@ -611,11 +610,7 @@ def test_resolver_picks_shallower_boundary_when_subpath_is_virtual(self): def _head(url, headers=None, timeout=None, verify=None, allow_redirects=None): resp = MagicMock() # Only the shallowest archive URL hits 200; deeper paths 404. - resp.status_code = ( - 200 - if ok_url in url and "/group/repo/skills" not in url - else 404 - ) + resp.status_code = 200 if ok_url in url and "/group/repo/skills" not in url else 404 return resp with patch( @@ -661,9 +656,7 @@ def _all_404(url, headers=None, timeout=None, verify=None, allow_redirects=None) resp.status_code = 404 return resp - with patch( - "apm_cli.install.artifactory_resolver.requests.head", side_effect=_all_404 - ): + with patch("apm_cli.install.artifactory_resolver.requests.head", side_effect=_all_404): with pytest.raises(ValueError, match="did not resolve"): _resolve_artifactory_boundary(package, auth, verbose=False) @@ -684,9 +677,7 @@ def _all_401(url, headers=None, timeout=None, verify=None, allow_redirects=None) resp.status_code = 401 return resp - with patch( - "apm_cli.install.artifactory_resolver.requests.head", side_effect=_all_401 - ): + with patch("apm_cli.install.artifactory_resolver.requests.head", side_effect=_all_401): with pytest.raises(ValueError, match="authentication problem"): _resolve_artifactory_boundary(package, auth, verbose=False) @@ -699,12 +690,8 @@ def test_resolver_returns_unchanged_for_unambiguous_path(self): auth = Mock() auth.resolve_for_dep.return_value = Mock(token=None) - with patch( - "apm_cli.install.artifactory_resolver.requests.head" - ) as mock_head: - resolved = _resolve_artifactory_boundary( - package, auth, verbose=False, dep_ref=dep_ref - ) + with patch("apm_cli.install.artifactory_resolver.requests.head") as mock_head: + resolved = _resolve_artifactory_boundary(package, auth, verbose=False, dep_ref=dep_ref) assert resolved is dep_ref mock_head.assert_not_called() @@ -736,18 +723,12 @@ def test_resolver_handles_mode_2_bare_shorthand(self): def _head(url, headers=None, timeout=None, verify=None, allow_redirects=None): resp = MagicMock() resp.status_code = ( - 200 - if ok_marker in url and "/group/sub/repo/skills" not in url - else 404 + 200 if ok_marker in url and "/group/sub/repo/skills" not in url else 404 ) return resp - with patch( - "apm_cli.install.artifactory_resolver.requests.head", side_effect=_head - ): - resolved = _resolve_artifactory_boundary( - package, auth, verbose=False, dep_ref=dep - ) + with patch("apm_cli.install.artifactory_resolver.requests.head", side_effect=_head): + resolved = _resolve_artifactory_boundary(package, auth, verbose=False, dep_ref=dep) assert resolved.repo_url == "group/sub/repo" assert resolved.virtual_path == "skills/foo" @@ -768,9 +749,7 @@ def test_split_owner_repo_two_segments(self): """Owner/repo split is unchanged for flat 2-segment paths.""" from apm_cli.deps.artifactory_orchestrator import ArtifactoryOrchestrator - dep = DependencyReference.parse( - "art.example.com/artifactory/github/owner/repo" - ) + dep = DependencyReference.parse("art.example.com/artifactory/github/owner/repo") owner, repo = ArtifactoryOrchestrator._split_owner_repo(dep) assert owner == "owner" assert repo == "repo" From 8e50d415ebd87402841d59093dd2442fff677c3c Mon Sep 17 00:00:00 2001 From: chkp-roniz <35386615+chkp-roniz@users.noreply.github.com> Date: Mon, 22 Jun 2026 21:43:36 +0300 Subject: [PATCH 4/6] fix(marketplace): propagate marketplace --ref to string sources; fix GitLab slash-ref proxy filenames Two related fixes for marketplace ref handling: 1. `apm install plugin@marketplace` now fetches from the registered --ref branch instead of silently falling back to the repo default branch. `resolve_marketplace_plugin` was not propagating `source.ref` downstream. Covers GitHub-family hosts (ref appended to canonical) and GitLab hosts (ref injected into DependencyReference). Guards skip main/HEAD and respect explicit version_spec overrides. Mirrors microsoft/apm#1824. 2. GitLab archive URLs with slash-containing branch names (e.g. feat/my-feature) now produce Artifactory-proxyable filenames: slash preserved in the path segment, replaced with '-' in the filename. Previously PROXY_REGISTRY_ONLY=1 installs returned HTTP 404 because the filename contained a literal slash. This is the proxy scenario absent from microsoft/apm#1824. 9 regression tests added; 0 pre-existing failures introduced. Co-Authored-By: Claude Sonnet 4.6 --- CHANGELOG.md | 2 + src/apm_cli/marketplace/resolver.py | 57 +++++-- src/apm_cli/utils/github_host.py | 5 +- .../marketplace/test_marketplace_resolver.py | 144 ++++++++++++++++++ tests/unit/test_artifactory_support.py | 41 +++++ 5 files changed, 235 insertions(+), 14 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 84afc46cc..b7b9ec1c7 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -28,6 +28,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - URL-form Artifactory deps no longer round-trip with the `artifactory/` prefix folded into `repo_url`. The duplicated prefix caused the downloader to construct double-prefixed archive URLs (`/artifactory/key/artifactory/key/owner/repo/...`) and 404. `_validate_url_repo_path` now strips the Artifactory VCS prefix before returning the bare `owner/repo` slug; the prefix is still recovered separately via `_extract_artifactory_prefix`. (#1472) - `apm install --update` now re-resolves direct git-source semver dependencies. Previously, when the dependency's install path already existed on disk, the BFS resolver short-circuited and `--update` was a silent no-op for git-semver refs; the lockfile kept the previously-resolved tag. - `policy.dependencies.require_pinned_constraint: true` no longer misclassifies the npm- and cargo-style explicit-equality form `=1.2.3` as `BARE_BRANCH`. Both `1.2.3` and `=1.2.3` are now recognized as pinned constraints; the pip-style `==1.2.3` form is still rejected (not part of node-semver). Follow-up to #1494 / #1505. +- `apm install plugin@marketplace` now correctly fetches from the marketplace's registered `--ref` branch instead of silently falling back to the repository's default branch. Root cause: `resolve_marketplace_plugin` did not propagate `source.ref` to downstream resolution calls. Covers both GitHub-family hosts (ref appended to canonical as `#ref`) and GitLab-hosted marketplaces (ref injected into `DependencyReference`). Guards prevent double-injection when a plugin's own dict source already carries an explicit `ref`, and skip `main`/`HEAD` as implicit defaults. (mirrors [microsoft/apm#1824](https://github.com/microsoft/apm/pull/1824)) +- GitLab archive URLs with slash-containing branch names (e.g. `feat/my-feature`) now produce correctly-formed Artifactory-proxyable filenames. The slash is preserved in the path segment (as the GitLab archive API expects) but replaced with `-` in the archive filename, matching GitLab's own naming convention. Previously, `PROXY_REGISTRY_ONLY=1` installs from such branches returned HTTP 404 from the proxy because the generated filename contained a literal slash. This is the proxy scenario not covered by [microsoft/apm#1824](https://github.com/microsoft/apm/pull/1824). ## [0.15.0] - 2026-05-27 diff --git a/src/apm_cli/marketplace/resolver.py b/src/apm_cli/marketplace/resolver.py index 0dc250998..1df3739db 100644 --- a/src/apm_cli/marketplace/resolver.py +++ b/src/apm_cli/marketplace/resolver.py @@ -792,8 +792,15 @@ def _emit_warning(msg: str) -> None: plugin, plugin_root=manifest.plugin_root ) if in_repo_path: + # Fall back to the marketplace's registered ref when neither a caller-supplied + # version_spec nor the plugin's own path_ref is available (string sources always + # return path_ref=None from _extract_in_repo_path_and_ref). Exclude "main" / + # "HEAD" to match the GitHub-family propagation guard and avoid redundant refs. + _gitlab_ref = version_spec or path_ref or ( + source.ref if source.ref not in ("main", "HEAD") else None + ) dep_ref = _gitlab_in_marketplace_dependency_reference( - source, in_repo_path, version_spec or path_ref + source, in_repo_path, _gitlab_ref ) canonical = dep_ref.to_canonical() @@ -830,18 +837,42 @@ def _emit_warning(msg: str) -> None: plugin, source, canonical, dep_ref ) - # ---- Raw ref override ---- - # When version_spec is provided it is treated as a raw git ref that - # overrides whatever ref came from the marketplace source field. - if version_spec and dep_ref is None: - base = canonical.split("#", 1)[0] - canonical = f"{base}#{version_spec}" - logger.debug( - "Using raw git ref '%s' for %s@%s", - version_spec, - plugin_name, - marketplace_name, - ) + # ---- Ref override / propagation ---- + # Priority: + # 1. Explicit version_spec from the caller (``apm install plugin@mkt#v2``) — always + # wins; strips any existing embedded ref and replaces it. + # 2. Marketplace registered ``--ref`` propagated to in-marketplace *string* sources. + # ``resolve_plugin_source`` has no knowledge of the marketplace's branch, so the + # canonical for a relative string source (e.g. ``"./plugins/foo"``) carries no + # ``#ref``. Without this injection the downloader silently falls back to the + # repo's default branch instead of the registered branch. + # Guard: only when no ref is already embedded in the canonical (dict sources that + # declare their own ref embed it during ``_resolve_github_source``), only for + # in-marketplace string sources, and only for non-default refs (``main`` / + # ``HEAD`` are implicit — embedding them would be redundant noise). + if dep_ref is None: + if version_spec: + base = canonical.split("#", 1)[0] + canonical = f"{base}#{version_spec}" + logger.debug( + "Using raw git ref '%s' for %s@%s (version_spec override)", + version_spec, + plugin_name, + marketplace_name, + ) + elif ( + "#" not in canonical + and isinstance(plugin.source, str) + and _is_in_marketplace_source(plugin, source) + and source.ref not in ("main", "HEAD") + ): + canonical = f"{canonical}#{source.ref}" + logger.debug( + "Propagated marketplace ref '%s' to string source canonical for %s@%s", + source.ref, + plugin_name, + marketplace_name, + ) # ---- Ref immutability check (advisory) ---- # Record the plugin -> ref mapping (scoped by version) and warn if diff --git a/src/apm_cli/utils/github_host.py b/src/apm_cli/utils/github_host.py index c41fed8c2..f490b92a1 100644 --- a/src/apm_cli/utils/github_host.py +++ b/src/apm_cli/utils/github_host.py @@ -749,7 +749,10 @@ def build_artifactory_archive_url( # GitHub-style: /archive/refs/heads/{ref}.zip f"{base}/archive/refs/heads/{ref}.zip", # GitLab-style: /-/archive/{ref}/{basename}-{ref}.zip - f"{base}/-/archive/{ref}/{repo_basename}-{ref}.zip", + # GitLab keeps the raw ref (including slashes) as a path segment but replaces + # slashes with dashes in the archive *filename*. E.g. branch + # ``feat/my-feature`` → ``.../-/archive/feat/my-feature/repo-feat-my-feature.zip``. + f"{base}/-/archive/{ref}/{repo_basename}-{ref.replace('/', '-')}.zip", # GitHub-style tags fallback f"{base}/archive/refs/tags/{ref}.zip", # codeload.github.com-style: /zip/refs/heads/{ref} diff --git a/tests/unit/marketplace/test_marketplace_resolver.py b/tests/unit/marketplace/test_marketplace_resolver.py index 252fa035f..34c454613 100644 --- a/tests/unit/marketplace/test_marketplace_resolver.py +++ b/tests/unit/marketplace/test_marketplace_resolver.py @@ -1432,3 +1432,147 @@ def test_fqdn_shorthand_without_git_path_misclassifies(self, monkeypatch): assert good.is_virtual is True assert good.repo_url == "epm-ease/ai-apm-registry" assert good.virtual_path == "agents/reverse-architect" + + +class TestMarketplaceRegisteredRefPropagation: + """Registered marketplace ``--ref`` must propagate to plugin canonicals. + + When a marketplace is added with ``apm marketplace add --ref feat/x owner/repo``, + subsequent ``apm install plugin@marketplace`` must download the plugin from + ``feat/x`` — not from the repo's default branch. This applies to both: + + - GitHub-family marketplaces (no ``dependency_reference``; ref embedded in canonical) + - GitLab / non-GitHub marketplaces (ref embedded in ``dependency_reference``) + """ + + @staticmethod + def _github_source(ref: str = "main") -> MarketplaceSource: + return MarketplaceSource( + name="my-marketplace", + owner="acme", + repo="plugins", + host="github.com", + ref=ref, + ) + + @staticmethod + def _gitlab_source(ref: str = "main") -> MarketplaceSource: + return MarketplaceSource( + name="my-marketplace", + owner="acme", + repo="plugins", + host="gitlab.com", + ref=ref, + ) + + @staticmethod + def _manifest(plugin: MarketplacePlugin) -> MarketplaceManifest: + return MarketplaceManifest(name="my-marketplace", plugins=(plugin,), plugin_root="") + + # ── GitHub-family: ref embedded in canonical ────────────────────────────── + + @patch("apm_cli.marketplace.resolver.fetch_or_cache") + @patch("apm_cli.marketplace.resolver.get_marketplace_by_name") + def test_github_string_source_registered_ref_appended(self, mock_get, mock_fetch): + """Non-main marketplace ref must appear as ``#ref`` suffix in the canonical.""" + plugin = MarketplacePlugin(name="my-plugin", source="./plugins/my-plugin") + mock_get.return_value = self._github_source(ref="feat/my-feature") + mock_fetch.return_value = self._manifest(plugin) + + result = resolve_marketplace_plugin("my-plugin", "my-marketplace") + + assert result.canonical == "acme/plugins/plugins/my-plugin#feat/my-feature" + assert result.dependency_reference is None + + @patch("apm_cli.marketplace.resolver.fetch_or_cache") + @patch("apm_cli.marketplace.resolver.get_marketplace_by_name") + def test_github_string_source_main_ref_not_appended(self, mock_get, mock_fetch): + """Default ``main`` ref must NOT be embedded (implicit; avoids cluttering canonicals).""" + plugin = MarketplacePlugin(name="my-plugin", source="./plugins/my-plugin") + mock_get.return_value = self._github_source(ref="main") + mock_fetch.return_value = self._manifest(plugin) + + result = resolve_marketplace_plugin("my-plugin", "my-marketplace") + + assert "#" not in result.canonical + + @patch("apm_cli.marketplace.resolver.fetch_or_cache") + @patch("apm_cli.marketplace.resolver.get_marketplace_by_name") + def test_github_string_source_head_ref_not_appended(self, mock_get, mock_fetch): + """``HEAD`` ref, like ``main``, must not be embedded.""" + plugin = MarketplacePlugin(name="my-plugin", source="./plugins/my-plugin") + mock_get.return_value = self._github_source(ref="HEAD") + mock_fetch.return_value = self._manifest(plugin) + + result = resolve_marketplace_plugin("my-plugin", "my-marketplace") + + assert "#" not in result.canonical + + @patch("apm_cli.marketplace.resolver.fetch_or_cache") + @patch("apm_cli.marketplace.resolver.get_marketplace_by_name") + def test_github_string_source_version_spec_overrides_registered_ref( + self, mock_get, mock_fetch + ): + """Explicit ``version_spec`` (``apm install plugin@mkt#v2``) must win over source.ref.""" + plugin = MarketplacePlugin(name="my-plugin", source="./plugins/my-plugin") + mock_get.return_value = self._github_source(ref="feat/my-feature") + mock_fetch.return_value = self._manifest(plugin) + + result = resolve_marketplace_plugin( + "my-plugin", "my-marketplace", version_spec="v2.0.0" + ) + + assert result.canonical.endswith("#v2.0.0") + assert "feat/my-feature" not in result.canonical + + @patch("apm_cli.marketplace.resolver.fetch_or_cache") + @patch("apm_cli.marketplace.resolver.get_marketplace_by_name") + def test_github_dict_source_registered_ref_not_propagated(self, mock_get, mock_fetch): + """Dict sources with an explicit ``ref`` must not be overwritten by source.ref.""" + plugin = MarketplacePlugin( + name="dict-plugin", + source={"type": "github", "repo": "acme/plugins", "path": "p", "ref": "v1.0.0"}, + ) + mock_get.return_value = self._github_source(ref="feat/my-feature") + mock_fetch.return_value = self._manifest(plugin) + + result = resolve_marketplace_plugin("dict-plugin", "my-marketplace") + + assert result.canonical.endswith("#v1.0.0"), ( + "Explicit plugin ref must survive; marketplace ref must not overwrite it" + ) + + # ── GitLab: ref embedded in dependency_reference ───────────────────────── + + @patch("apm_cli.marketplace.resolver.fetch_or_cache") + @patch("apm_cli.marketplace.resolver.get_marketplace_by_name") + def test_gitlab_string_source_registered_ref_propagated(self, mock_get, mock_fetch): + """GitLab monorepo: marketplace ref must reach dep_ref when path_ref is None.""" + plugin = MarketplacePlugin(name="my-plugin", source="registry/my-plugin") + mock_get.return_value = self._gitlab_source(ref="feat/my-feature") + mock_fetch.return_value = self._manifest(plugin) + + result = resolve_marketplace_plugin("my-plugin", "my-marketplace") + + assert result.dependency_reference is not None + dep = result.dependency_reference + assert dep.reference == "feat/my-feature", ( + "Registered marketplace ref must be threaded into the DependencyReference" + ) + assert dep.virtual_path == "registry/my-plugin" + + @patch("apm_cli.marketplace.resolver.fetch_or_cache") + @patch("apm_cli.marketplace.resolver.get_marketplace_by_name") + def test_gitlab_string_source_main_ref_not_propagated(self, mock_get, mock_fetch): + """Default ``main`` ref on a GitLab marketplace must not produce a dep_ref ref field.""" + plugin = MarketplacePlugin(name="my-plugin", source="registry/my-plugin") + mock_get.return_value = self._gitlab_source(ref="main") + mock_fetch.return_value = self._manifest(plugin) + + result = resolve_marketplace_plugin("my-plugin", "my-marketplace") + + dep = result.dependency_reference + assert dep is not None + assert dep.reference is None, ( + "Default 'main' ref must not be embedded in DependencyReference" + ) diff --git a/tests/unit/test_artifactory_support.py b/tests/unit/test_artifactory_support.py index c96c7c024..9b283a31a 100644 --- a/tests/unit/test_artifactory_support.py +++ b/tests/unit/test_artifactory_support.py @@ -335,6 +335,47 @@ def test_gitlab_nested_subgroup_archive_uses_repo_basename(self): "GitLab archive filename must use the repo basename, not the full slug" ) + def test_gitlab_slash_ref_filename_normalised(self): + """GitLab archive filename must replace slashes with dashes for slash-containing refs. + + A branch like ``feat/my-slash-branch`` must produce the URL + ``/-/archive/feat/my-slash-branch/repo-feat-my-slash-branch.zip`` + — the slash stays in the *path segment* (how GitLab addresses the archive) but is + replaced with a dash in the *filename* (how GitLab names the zip). + + Without this fix the filename would contain a literal slash + (``repo-feat/my-slash-branch.zip``) which is not a valid filename, causing + Artifactory to return 404 when ``PROXY_REGISTRY_ONLY=1`` suppresses the direct-API + fallback. + """ + ref = "feat/my-slash-branch" + urls = build_artifactory_archive_url( + "art.example.com", "artifactory/gitlab", "owner", "repo", ref=ref + ) + # The GitLab path segment must retain the slash (so Artifactory can proxy it). + assert any(f"/-/archive/{ref}/" in u for u in urls), ( + "GitLab path segment must preserve the raw ref including the slash" + ) + # The archive filename must have the slash replaced with a dash. + assert any(u.endswith("/-/archive/feat/my-slash-branch/repo-feat-my-slash-branch.zip") for u in urls), ( + "GitLab archive filename must replace '/' with '-' in the ref portion" + ) + # The GitHub-style candidates must be unchanged (slashes are valid there). + assert any("/archive/refs/heads/feat/my-slash-branch.zip" in u for u in urls), ( + "GitHub-style /archive/refs/heads/{ref}.zip must be unchanged for slash refs" + ) + + def test_non_slash_ref_gitlab_filename_unchanged(self): + """Single-component refs (no slash) must produce the same GitLab filename as before.""" + for ref in ("main", "v1.2.3", "my-feature"): + urls = build_artifactory_archive_url( + "art.example.com", "artifactory/github", "owner", "repo", ref=ref + ) + safe_ref = ref.replace("/", "-") + assert any(f"/-/archive/{ref}/repo-{safe_ref}.zip" in u for u in urls), ( + f"Non-slash ref '{ref}' must produce GitLab filename repo-{safe_ref}.zip" + ) + # ── apm_package.py: DependencyReference Artifactory parsing ── From ec3cdbc103b5a90c144c5893ffcd40f70e3da101 Mon Sep 17 00:00:00 2001 From: chkp-roniz <35386615+chkp-roniz@users.noreply.github.com> Date: Mon, 22 Jun 2026 22:00:37 +0300 Subject: [PATCH 5/6] style: replace non-ASCII em-dashes, arrows, and box-drawing chars with ASCII Addresses Copilot review comments on PR #1880. Repo encoding rules require source files to stay within printable ASCII to avoid Windows cp1252 UnicodeEncodeError in some environments. - github_host.py: U+2192 arrow (->) - resolver.py: two U+2014 em-dashes (--) - test_marketplace_resolver.py: U+2014 em-dash and two U+2500 separators - test_artifactory_support.py: U+2014 em-dash and three U+2500 separators Co-Authored-By: Claude Sonnet 4.6 --- src/apm_cli/marketplace/resolver.py | 4 ++-- src/apm_cli/utils/github_host.py | 2 +- tests/unit/marketplace/test_marketplace_resolver.py | 6 +++--- tests/unit/test_artifactory_support.py | 8 ++++---- 4 files changed, 10 insertions(+), 10 deletions(-) diff --git a/src/apm_cli/marketplace/resolver.py b/src/apm_cli/marketplace/resolver.py index 1df3739db..58d19b5dd 100644 --- a/src/apm_cli/marketplace/resolver.py +++ b/src/apm_cli/marketplace/resolver.py @@ -839,7 +839,7 @@ def _emit_warning(msg: str) -> None: # ---- Ref override / propagation ---- # Priority: - # 1. Explicit version_spec from the caller (``apm install plugin@mkt#v2``) — always + # 1. Explicit version_spec from the caller (``apm install plugin@mkt#v2``) -- always # wins; strips any existing embedded ref and replaces it. # 2. Marketplace registered ``--ref`` propagated to in-marketplace *string* sources. # ``resolve_plugin_source`` has no knowledge of the marketplace's branch, so the @@ -849,7 +849,7 @@ def _emit_warning(msg: str) -> None: # Guard: only when no ref is already embedded in the canonical (dict sources that # declare their own ref embed it during ``_resolve_github_source``), only for # in-marketplace string sources, and only for non-default refs (``main`` / - # ``HEAD`` are implicit — embedding them would be redundant noise). + # ``HEAD`` are implicit -- embedding them would be redundant noise). if dep_ref is None: if version_spec: base = canonical.split("#", 1)[0] diff --git a/src/apm_cli/utils/github_host.py b/src/apm_cli/utils/github_host.py index f490b92a1..855271790 100644 --- a/src/apm_cli/utils/github_host.py +++ b/src/apm_cli/utils/github_host.py @@ -751,7 +751,7 @@ def build_artifactory_archive_url( # GitLab-style: /-/archive/{ref}/{basename}-{ref}.zip # GitLab keeps the raw ref (including slashes) as a path segment but replaces # slashes with dashes in the archive *filename*. E.g. branch - # ``feat/my-feature`` → ``.../-/archive/feat/my-feature/repo-feat-my-feature.zip``. + # ``feat/my-feature`` -> ``.../-/archive/feat/my-feature/repo-feat-my-feature.zip``. f"{base}/-/archive/{ref}/{repo_basename}-{ref.replace('/', '-')}.zip", # GitHub-style tags fallback f"{base}/archive/refs/tags/{ref}.zip", diff --git a/tests/unit/marketplace/test_marketplace_resolver.py b/tests/unit/marketplace/test_marketplace_resolver.py index 34c454613..34579b9fd 100644 --- a/tests/unit/marketplace/test_marketplace_resolver.py +++ b/tests/unit/marketplace/test_marketplace_resolver.py @@ -1439,7 +1439,7 @@ class TestMarketplaceRegisteredRefPropagation: When a marketplace is added with ``apm marketplace add --ref feat/x owner/repo``, subsequent ``apm install plugin@marketplace`` must download the plugin from - ``feat/x`` — not from the repo's default branch. This applies to both: + ``feat/x`` -- not from the repo's default branch. This applies to both: - GitHub-family marketplaces (no ``dependency_reference``; ref embedded in canonical) - GitLab / non-GitHub marketplaces (ref embedded in ``dependency_reference``) @@ -1469,7 +1469,7 @@ def _gitlab_source(ref: str = "main") -> MarketplaceSource: def _manifest(plugin: MarketplacePlugin) -> MarketplaceManifest: return MarketplaceManifest(name="my-marketplace", plugins=(plugin,), plugin_root="") - # ── GitHub-family: ref embedded in canonical ────────────────────────────── + # -- GitHub-family: ref embedded in canonical --------------------------------- @patch("apm_cli.marketplace.resolver.fetch_or_cache") @patch("apm_cli.marketplace.resolver.get_marketplace_by_name") @@ -1542,7 +1542,7 @@ def test_github_dict_source_registered_ref_not_propagated(self, mock_get, mock_f "Explicit plugin ref must survive; marketplace ref must not overwrite it" ) - # ── GitLab: ref embedded in dependency_reference ───────────────────────── + # -- GitLab: ref embedded in dependency_reference ---------------------------- @patch("apm_cli.marketplace.resolver.fetch_or_cache") @patch("apm_cli.marketplace.resolver.get_marketplace_by_name") diff --git a/tests/unit/test_artifactory_support.py b/tests/unit/test_artifactory_support.py index 9b283a31a..f2ccc4dc6 100644 --- a/tests/unit/test_artifactory_support.py +++ b/tests/unit/test_artifactory_support.py @@ -340,7 +340,7 @@ def test_gitlab_slash_ref_filename_normalised(self): A branch like ``feat/my-slash-branch`` must produce the URL ``/-/archive/feat/my-slash-branch/repo-feat-my-slash-branch.zip`` - — the slash stays in the *path segment* (how GitLab addresses the archive) but is + -- the slash stays in the *path segment* (how GitLab addresses the archive) but is replaced with a dash in the *filename* (how GitLab names the zip). Without this fix the filename would contain a literal slash @@ -600,7 +600,7 @@ def test_three_segment_shorthand_without_proxy_unchanged(self): assert dep.virtual_path == "some-subdir" -# ── Artifactory boundary probe (mirrors native GitLab probing pattern) ── +# -- Artifactory boundary probe (mirrors native GitLab probing pattern) -- class TestArtifactoryBoundaryResolver: @@ -780,7 +780,7 @@ def _head(url, headers=None, timeout=None, verify=None, allow_redirects=None): assert resolved.is_artifactory() is False -# ── ArtifactoryOrchestrator: multi-segment repo support ── +# -- ArtifactoryOrchestrator: multi-segment repo support -- class TestArtifactoryOrchestratorNestedRepo: @@ -813,7 +813,7 @@ def test_split_owner_repo_nested_subgroup_after_probe(self): assert repo == "shared-modules/pkg-utils" -# ── token_manager.py: Artifactory token support ── +# -- token_manager.py: Artifactory token support -- class TestArtifactoryTokenManager: From 73d736044475fb52914ee27939bad653216e0aad Mon Sep 17 00:00:00 2001 From: chkp-roniz <35386615+chkp-roniz@users.noreply.github.com> Date: Mon, 22 Jun 2026 22:08:24 +0300 Subject: [PATCH 6/6] fix(artifactory): demote mixed 401+404 candidate to MISSING not AUTH _candidate_archive_status() was returning AUTH whenever any URL shape returned 401/403, even if other shapes returned non-auth 4xx (e.g. 404). This misreported a boundary-resolution failure as an authentication failure, giving users a misleading error message. Per the module spec: AUTH is returned only when *every* observed response was 401/403 with no non-auth 4xx seen. A mix means at least one URL shape was definitively absent -- the candidate is MISSING and the unresolved- boundary error path is triggered instead. Added saw_non_auth_4xx flag; AUTH gate now requires `saw_auth and not saw_non_auth_4xx`. Addresses Copilot review comment on PR #1880. Co-Authored-By: Claude Sonnet 4.6 --- src/apm_cli/install/artifactory_resolver.py | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/src/apm_cli/install/artifactory_resolver.py b/src/apm_cli/install/artifactory_resolver.py index af245831b..1648e2dec 100644 --- a/src/apm_cli/install/artifactory_resolver.py +++ b/src/apm_cli/install/artifactory_resolver.py @@ -93,6 +93,7 @@ def _candidate_archive_status( """ urls = build_artifactory_archive_url(host, prefix, owner, repo, ref, scheme=scheme) saw_auth = False + saw_non_auth_4xx = False saw_any_response = False last_exc: BaseException | None = None for url in urls: @@ -111,9 +112,14 @@ def _candidate_archive_status( return _CandidateStatus.EXISTS, None if r.status_code in (401, 403): saw_auth = True + else: + saw_non_auth_4xx = True if not saw_any_response: return _CandidateStatus.INCONCLUSIVE, last_exc - if saw_auth: + # AUTH only when *every* response was 401/403 -- no non-auth 4xx seen. + # A mixed 401+404 means at least one URL shape was definitively absent; + # report MISSING so the user-facing error stays accurate (per module spec). + if saw_auth and not saw_non_auth_4xx: return _CandidateStatus.AUTH, None return _CandidateStatus.MISSING, None