GEP-0043: Spegel Registry Support in the registry-cache extension#44
GEP-0043: Spegel Registry Support in the registry-cache extension#44dimitar-kostadinov wants to merge 14 commits into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
rfranzke
left a comment
There was a problem hiding this comment.
Apart from the comments inline, can you explain the purpose of the registry-cache and registry-mirror extension when we have registry-spegel? Are they still of any use, and if so, when and how?
| ... | ||
| ``` | ||
| Spegel `bootstrapper` is deployed into the Shoot namespace in the Seed cluster. In the Shoot cluster nodes the `containerd`, `spegel`, and `kubelet` systemd units are started. | ||
| When `spegel` is starting, it gets the bootstrap peers from the `bootstrapper` and registers for image related events in `containerd`. It also lists existing content in the image store and advertises it to the DHT. |
There was a problem hiding this comment.
What happens when kubelet already starts to pull an image while Spegel has not yet contacted (or is still in the process of contacting) the bootstrapper?
There was a problem hiding this comment.
In this case Spegel returns 404 and containerd fall back to upstream registry:
Feb 23 13:25:24 spegel[768]: {"time":"2026-02-23T13:25:24.027277587Z","level":"ERROR","source":{"function":"github.com/spegel-org/spegel/pkg/httpx.(*ServeMux).Handle.func1.1","file":"/build/pkg/httpx/mux.go","line":80},"msg":"","err":"MANIFEST_UNKNOWN could not find peer for registry.k8s.io/pause:3.10","path":"/v2/pause/manifests/3.10","status":404,"method":"HEAD","latency":"1.100958ms","ip":"::1","handler":"mirror","registry":"registry.k8s.io"}
Feb 23 13:25:24 spegel[768]: {"time":"2026-02-23T13:25:24.074244712Z","level":"ERROR","source":{"function":"github.com/spegel-org/spegel/pkg/httpx.(*ServeMux).Handle.func1.1","file":"/build/pkg/httpx/mux.go","line":80},"msg":"","err":"MANIFEST_UNKNOWN could not find peer for registry.k8s.io/pause:3.10","path":"/v2/pause/manifests/3.10","status":404,"method":"HEAD","latency":"706.125µs","ip":"::1","handler":"mirror","registry":"registry.k8s.io"}
Feb 23 13:25:25 spegel[768]: {"time":"2026-02-23T13:25:25.441465754Z","level":"ERROR","source":{"function":"github.com/spegel-org/spegel/pkg/routing.ensureOnline.func2","file":"/build/pkg/routing/p2p.go","line":460},"msg":"failed to run bootstrap","err":"Get \"https://sp-local--local.ingress.local.seed.local.gardener.cloud/bootstrap-nodes\": tls: failed to verify certificate: x509: certificate is valid for ingress.local, not sp-local--local.ingress.local.seed.local.gardener.cloud","attempts":6}
Feb 23 13:25:24 containerd[623]: time="2026-02-23T13:25:24.035447878Z" level=debug msg=resolving host="localhost:5500"
Feb 23 13:25:24 containerd[623]: time="2026-02-23T13:25:24.035467170Z" level=debug msg="do request" host="localhost:5500" request.header.accept="application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, */*" request.header.user-agent=containerd/v2.1.3 request.method=HEAD url="http://localhost:5500/v2/pause/manifests/3.10?ns=registry.k8s.io"
Feb 23 13:25:24 containerd[623]: time="2026-02-23T13:25:24.036964795Z" level=debug msg="fetch response received" host="localhost:5500" response.header.content-length=127 response.header.content-type=application/json response.header.date="Mon, 23 Feb 2026 13:25:24 GMT" response.status="404 Not Found" url="http://localhost:5500/v2/pause/manifests/3.10?ns=registry.k8s.io"
Feb 23 13:25:24 containerd[623]: time="2026-02-23T13:25:24.036991462Z" level=info msg="trying next host after status: 404 Not Found" host="localhost:5500"
| - service | ||
| - ingress | ||
|
|
||
| The `spegel` registries send a GET request to `https://<ingress-host>/bootstrap-nodes` to get the bootstrap peers. Traffic is encrypted using mTLS. |
There was a problem hiding this comment.
Did you think about certificate rotation? How would this work?
There was a problem hiding this comment.
The secrets manager is used for certificate creation & rotation: https://github.com/dimitar-kostadinov/gardener-extension-registry-cache/blob/6f243bbe7f4f3667b3fff7806343c60f3079f408/pkg/secrets/config.go#L84
E2E test for rotation will be implemented.
There was a problem hiding this comment.
Can you elaborate? When will the certificate be replaced? Automatically? Manually on user demand? What if there is an issue with rolling out the new certificate to the worker nodes?
There was a problem hiding this comment.
The certificates will be rotated when Certificate Authorities rotation is triggered (suppose on user demand).
I tested following scenario in local setup:
- annotate Shoot with
gardener.cloud/operation=rotate-ca-startand ensure the Spegel bootstrap works - annotate Shoot with
gardener.cloud/operation=rotate-ca-completeand ensure the Spegel bootstrap works
There was an issue using the Shoot CA, but I switched to caBundle and now it works without problems.
If for some reason the certificates are not propagated to the nodes, Spegel will not be able to join the p2p cluster and the images will be pulled from the upstream registries.
There was a problem hiding this comment.
Fair enough. I guess it would makes sense to use a dedicated CA for Spegel, but that's a detail.
There was a problem hiding this comment.
Fair enough. I guess it would makes sense to use a dedicated CA for Spegel, but that's a detail.
Yes, currently common ca-extension-registry-cache CA is used for both registry-cache & registry-spegel. I'll set dedicated CA for Spegel.
ScheererJ
left a comment
There was a problem hiding this comment.
Thanks for proposing a solution to the limitations of the current registry cache extension.
|
|
||
| ### Risks, Downsides and Trade-offs | ||
|
|
||
| The external `Bootstrapper` should be contributed to Spegel. |
There was a problem hiding this comment.
What are the alternatives, i.e. if the upstream does not want the additional bootstrapper do we need to fork spegel?
Yes, the extensions types are still valid. They can work together with Spegel p2p image cache.
|
|
/kind enhancement |
timebertt
left a comment
There was a problem hiding this comment.
Thanks for the proposal. Using Spegel via an extension that integrates well with the shoot's/nodes' lifecycle seems like a great addition to the Gardener project.
Looking forward to this contribution!
| The content discovery in a Kubernetes cluster is based on [Kademlia DHT][content-provider-routing]. When an image content (blob, manifest or index) is available in the containerd image store, Spegel adds its `digest` to the DHT provider store, announcing that the node provides the content corresponding to the `digest`. Then, when the same `digest` is needed by another node, Spegel searches the DHT for peers that provide the content and pulls it from them. | ||
|
|
||
| The straightforward way to deploy Spegel to a Kubernetes cluster is by using the provided helm chart. However, this has some [drawbacks](https://spegel.dev/docs/faq/#what-should-i-do-if-other-pods-are-scheduled-on-new-nodes-before-spegel) when a new node joins the cluster. | ||
| Our goal is to be able to use Spegel for all images pulled from the kubelet, including the `registry.k8s.io/pause` image. Therefore it was decided to [run][run-spegel-on-host] Spegel registry as a systemd unit service on the node. This requires contributing a new Spegel `bootstrapper` or extending the existing [HTTP bootstrapper](https://github.com/spegel-org/spegel/blob/6f02215fa3fc1d3bbdb11fa62dfa7c07dbe3b7c2/pkg/routing/bootstrap.go#L131-L135). |
There was a problem hiding this comment.
Our goal is to be able to use Spegel for all images pulled from the kubelet, including the
registry.k8s.io/pauseimage.
That's a pretty high bar, which comes at a price.
We could address many of the mentioned drawbacks by running the spegel DaemonSet in the host network, tolerating the node.gardener.cloud/critical-components-not-ready, and labelling the spegel DaemonSet itself with node.gardener.cloud/critical-component=true, correct?
With this, we wouldn't ensure all images (especially other node-critical components) are pulled through spegel. However, we would have better visibility in Kubernetes itself if not running as a systemd unit.
There was a problem hiding this comment.
Our goal is to be able to cache as many images as we can.
Since Spegel is tightly coupled to containerd we can further decide to include it into machine images and to be able to cache the images pulled by the gardener-node-agent as well - see this comment for details.
Additionally, Spegel recommends using nidhogg to taint nodes. This way image pulls go through Spegel (except for spegel and pause images).
There was a problem hiding this comment.
Additionally, Spegel recommends using nidhogg to taint nodes. This way image pulls go through Spegel (except for spegel and pause images).
gardener-resource-manager's node critical components controller seems to provide similar benefits.
Can you outline the differences to nidhogg?
If that's the recommended way of running Spegel, can we do it the same way or adapt our components according to your goal?
| The `spegel` binary will be provided to the node in the `/opt/bin/` folder via the `OperatingSystemConfig` mutation. The same goes for the `spegel_metrics.sh` script used to write metrics to the node-exporter's textfile collector. | ||
| The same approach is used for systemd units: | ||
|
|
||
| - `spegel.service` systemd unit service depends on containerd service and must be started before the kubelet service: |
There was a problem hiding this comment.
Running spegel as a systemd unit has significant drawbacks: no visibility in Kubernetes (think DaemonSet/pod status, logs, etc.), easier debugging (e.g., port-forward), no monitoring (shoot system components condition), no autoscaling, no consideration of resource requests out of the box, etc.
| reviewers: | ||
| - "@timebertt" | ||
| - "@ScheererJ" | ||
| - "@rfranzke" | ||
| approvers: | ||
| - "@timebertt" | ||
| - "@ScheererJ" | ||
| - "@rfranzke" |
There was a problem hiding this comment.
Do these fields represent the reviewers/approvers of the proposal or the follow-up implementation?
If it is about the implementation, I guess that would be me as a maintainer of the registry-cache extension/repo.
There was a problem hiding this comment.
My understanding was that these lists represents the reviewers/approvers of the proposal, not necessarily the persons implementing it.
+1 for the question. We could add a new section to the proposal describing:
|
|
I have implemented PoCs based on Note In the first variant bootstrap run will be successful once the Note In both variants, the ReadinessProbe of the DaemonSet pod is NOT executed because pod becomes ready when there is more than 1 peer in the p2p cluster. The following test scenario is run for both variants:
Here are the test results:
Note Results may vary depending on when the spegel is ready and the content is advertised. |
|
@dimitar-kostadinov What is your conclusion out of this? How do you propose to continue? |
I need a little more time to prepare another PoC, after which we can arrange a second round to make the final decision. |
|
Fair enough, then let's mark this PR as |
| - `http_request_duration_seconds` - The latency of the HTTP requests - histogram type. | ||
| - `spegel_mirror_requests_total` - Total number of mirror requests - counter type. | ||
| - `spegel_resolve_duration_seconds` - The duration for router to resolve a peer - histogram type. | ||
| - `spegel_advertised_keys` - Number of keys advertised to be available - gauge type. |
There was a problem hiding this comment.
Please outline how the cache hit/miss ratio (i.e., cache efficiency) can be measured/observed.
There was a problem hiding this comment.
Here is a sample query grouped by registry:
sum by (registry) (spegel_mirror_requests_total{cache="hit"})
/
sum by (registry) (spegel_mirror_requests_total{cache="miss"})
I'll try to prepare a monitoring dashboard next week.
|
For completeness, I have prepared a PoC in which all images, including The same test scenario is used as in this comment. Note An additional 5 sec of wait time is added to the |
Summary[option 1] [current proposal] Spegel as systemd unit service with the External Bootstrapper (*)Pros:
Cons:
[option 2] Spegel as DaemonSet with the upstream DNS BootstrapperPros:
Cons:
[option 3] Spegel as DaemonSet with the External Bootstrapper (*)Pros:
Cons:
[option 4] Spegel as systemd unit service with the Spegel binary delivered by the OS with the External Bootstrapper (*)Pros:
Cons:
(*) The External Bootstrapper is to be contributed. TL;DRWe prefer option 1 which is described in the current revision of the proposal. /hold cancel @ScheererJ, can we schedule TSC round 2 for this topic? Thanks in advance! |
|
@dimitar-kostadinov Did you get clarity on #44 (comment)? I guess this would be relevant to decide how to proceed. If the Spegel community is not interested in accepting this, we cannot go with your preferred option. |
I still don't get why you don't want to start simple with the "Spegel as DaemonSet with the upstream DNS Bootstrapper" option. The listed cons are only relevant for node bootstrapping. However, most nodes live long enough to observe updates to the gardener-node-agent binary, kube-system images, etc. Even with this approach, we will be able to benefit from images cached by Spegel.
To me, it doesn't make sense to continue with the document's current state until you've convinced me why we shouldn't use option 2 as the first iteration of this extension. |
|
The Gardener project currently lacks enough active contributors to adequately respond to all PRs.
You can:
/lifecycle stale |
|
/remove-lifecycle stale |
|
@dimitar-kostadinov raised a PR to spegel some time ago for adding support for external bootstrapper - spegel-org/spegel#1250. Let's see if/when this contribution will be accepted in order to unblock option 1. from #44 (comment). |
|
The Gardener project currently lacks enough active contributors to adequately respond to all PRs.
You can:
/lifecycle stale |
|
/remove-lifecycle stale |
Spegel Registry Support in the registry-cache extension(moved from Add technical steering proposal forSpegelregistry extension documentation#814 because the process has changed)