You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add mirror command and API for selective package mirroring
Add a `proxy mirror` CLI command and `/api/mirror` API endpoints that
pre-populate the cache from various input sources: individual PURLs,
SBOM files (CycloneDX and SPDX), or full registry enumeration.
The mirror reuses the existing handler.Proxy.GetOrFetchArtifact()
pipeline so cached artifacts are identical to those fetched on demand.
A bounded worker pool controls download parallelism.
Metadata caching is opt-in via `cache_metadata: true` in config (or
PROXY_CACHE_METADATA=true). The mirror command always enables it. When
enabled, upstream metadata responses are stored for offline fallback
with ETag-based conditional revalidation.
New internal/mirror package with Source interface, PURLSource,
SBOMSource, RegistrySource, and async JobStore. New metadata_cache
database table for offline metadata serving.
On PostgreSQL, `INTEGER PRIMARY KEY` becomes `SERIAL`, `DATETIME` becomes `TIMESTAMP`, `INTEGER DEFAULT 0` booleans become `BOOLEAN DEFAULT FALSE`, and size/count columns use `BIGINT`.
@@ -277,6 +291,12 @@ Version age filtering for supply chain attack mitigation. Configurable at global
277
291
278
292
Package metadata enrichment. Fetches license, description, homepage, repository URL, and vulnerability data from upstream registries. Powers the `/api/` endpoints and the web UI's package detail pages.
279
293
294
+
### `internal/mirror`
295
+
296
+
Selective package mirroring for pre-populating the proxy cache. Supports multiple input sources: individual PURLs (versioned or unversioned), CycloneDX/SPDX SBOM files, and full registry enumeration. Uses a bounded worker pool backed by `errgroup` to download artifacts in parallel, reusing `handler.Proxy.GetOrFetchArtifact()` for the actual fetch-and-cache work.
297
+
298
+
The package also provides a `MetadataCache` for storing raw upstream metadata blobs so the proxy can serve metadata responses offline. The `JobStore` manages async mirror jobs exposed via the `/api/mirror` endpoints.
299
+
280
300
### `internal/config`
281
301
282
302
Configuration loading.
@@ -326,10 +346,11 @@ Eviction can be implemented as:
326
346
- Ensures clients fetch artifacts through proxy
327
347
- Alternative: Let clients fetch directly, miss cache opportunity
328
348
329
-
**Why not cache metadata?**
349
+
**Why not cache metadata (by default)?**
330
350
- Simplicity - no invalidation logic needed
331
351
- Fresh data - new versions visible immediately
332
352
- Metadata is small, upstream fetch is fast
353
+
- Set `cache_metadata: true` or use the mirror command to enable metadata caching for offline use via the `metadata_cache` table
333
354
334
355
**Why stream artifacts?**
335
356
- Memory efficient - don't load large files into RAM
Copy file name to clipboardExpand all lines: docs/configuration.md
+34Lines changed: 34 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -211,6 +211,40 @@ Resolution order: package override, then ecosystem override, then global default
211
211
212
212
Currently supported for npm, PyPI, pub.dev, and Composer. These ecosystems include publish timestamps in their metadata. Other ecosystems (Go, Cargo, RubyGems) would require extra API calls and are not yet supported.
213
213
214
+
## Metadata Caching
215
+
216
+
By default the proxy fetches metadata fresh from upstream on every request. Enable `cache_metadata` to store metadata responses in the database and storage backend for offline fallback. When upstream is unreachable, the proxy serves the last cached copy. ETag-based revalidation avoids re-downloading unchanged metadata.
217
+
218
+
```yaml
219
+
cache_metadata: true
220
+
```
221
+
222
+
Or via environment variable: `PROXY_CACHE_METADATA=true`.
223
+
224
+
The `proxy mirror` command always enables metadata caching regardless of this setting.
225
+
226
+
## Mirror Command
227
+
228
+
The `proxy mirror` command pre-populates the cache from various sources. It accepts the same storage and database flags as `serve`.
229
+
230
+
| Flag | Default | Description |
231
+
|------|---------|-------------|
232
+
| `--sbom` | | Path to CycloneDX or SPDX SBOM file |
233
+
| `--registry` | | Ecosystem name for full registry mirror |
234
+
| `--concurrency` | `4` | Number of parallel downloads |
235
+
| `--dry-run` | `false` | Show what would be mirrored without downloading |
0 commit comments