Skip to content

Enable dynamic changes in the router for non-kube environments #2356

Open
nluaces wants to merge 25 commits intoskupperproject:mainfrom
nluaces:2337-system-adaptor-process
Open

Enable dynamic changes in the router for non-kube environments #2356
nluaces wants to merge 25 commits intoskupperproject:mainfrom
nluaces:2337-system-adaptor-process

Conversation

@nluaces
Copy link
Member

@nluaces nluaces commented Jan 18, 2026

Implements #2337
Branch based on #2334, the proper commits that implement this issue are bb8ab35 and the following.

flowchart TD
   id1([new listener resource is created])-->|triggers|InputResourceHandler-->Q{site exists?}
   Q -->|Yes| A[Refresh Router Config]
   Q -->|No| B[Bootstrap]
Loading
flowchart LR
  SystemAdaptorHandler-->|has a callback|RouterStatusHandler
  SystemAdaptorHandler-->|spawns|processRouterConfig-->id2([reconciliates the router config file with the router])
Loading

The InputResourceHandler was modified to not bootstrap in case a site is already created in the namespace.
It also includes validations for commands start, stop and reload, that should be disabled if the dynamic changes are happening.

@nluaces nluaces self-assigned this Jan 18, 2026
@nluaces nluaces changed the title Enable dynamic changes in the router for non-kube environments (wip) Enable dynamic changes in the router for non-kube environments Feb 1, 2026
@nluaces nluaces marked this pull request as ready for review February 1, 2026 17:31
Copy link
Member

@fgiorgetti fgiorgetti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My impression so far is that it is on the right track.
Really nice work @nluaces!

A few other comments I have so far:

  • The site name in the router config is changing (we need to check if this will cause issues to the console or vanflow metrics)
  • As I already added, the system controller update and the update of the router container for the controlled namespaces is something we need to figure out still


// the container endpoint is mapped to the podman socket inside the container
if api.IsRunningInContainer() {
endpoint = "unix:///var/run/podman.sock"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If running in a container, the endpoint should be provided through:

os.Getenv("CONTAINER_ENDPOINT")

Then, if the endpoint is not set, you can use the defaults you have below.

Maybe we could have a function under pkg/nonkube/api/environment.go that takes the platform as an argument and returns the correct container endpoint, eliminating the dups we have (I should have done that earlier).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When skupper system install runs, it creates a volume bind between with CONTAINER_ENDPOINT env variable and a volume destination that starts with /var/run/ (I believe that the bootstrap script works like that as well):

//To mount a volume as a bind, the host path must be specified in the Name field
//instead of the Source field. If the values Name/Destination are empty, volumes will be ignored, not mounted,
//and the system-controller container will fail to start.
mounts := []container.Volume{}
volumeDestination := fmt.Sprintf("/var/run/%s.sock", platform)
if strings.HasPrefix(config.containerEndpoint, "unix://") {
socketPath := strings.TrimPrefix(config.containerEndpoint, "unix://")
mounts = append(mounts, container.Volume{
Name: socketPath,
Destination: volumeDestination,
Mode: "z",
RW: true,
})
} else if strings.HasPrefix(config.containerEndpoint, "/") {
mounts = append(mounts, container.Volume{
Name: config.containerEndpoint,
Destination: volumeDestination,
Mode: "z",
RW: true,
})
}

If running inside the container we use the env variable instead of the mapped endpoint, I think it is not going to work as expected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, you brought up a good point. For auto-reloads, the system-controller container must be aware of all endpoints on the host machine, beforehands. So when the system-controller container is created, it needs to know the PODMAN_CONTAINER_ENDPOINT as well as the DOCKER_CONTAINER_ENDPOINT.

This way if the system controller is running with podman, but a docker site is created, the system controller already knows what is the endpoint to map, from the host machine into the target router container.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or... we could simply argue that if the system controller is using podman, it will only initialize podman sites.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a system controller can operate with different system platforms it would be really nice, I would prefer though to have this change implemented in a different pull request, what do you think?

defer h.lock.Unlock()

_, err := os.Stat(api.GetInternalOutputPath(h.namespace, api.RuntimeSiteStatePath))
if err == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe an extra validation is needed here.
If the namespace has already been initialized, the controller needs to check if the
router container is using the correct image. If not, then we need to "reload" the site.

This will be needed in cases when the system controller is updated.

By the way, we still need to handle system controller updates through the CLI.
I am not sure if it could be part of the system install command, through a flag,
or if it should be a separate command like reinstall.

Anyway, I believe we need to think about it, as it will be needed as existing sites
cannot be reloaded when using automated reloads.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the namespace has already been initialized, the controller needs to check if the
router container is using the correct image.

I need some clarification, what is considered the correct image? the one specified in the env variable? Or simply check that there is a router running for that namespace?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can handle updates separately from this initial PR, so it will be simpler to evaluate.
Let's just raise an enhancement issue for now, so we can think more about it.
Or we could do something similar to what the kube controller is doing, which consists of comparing
the version defined at the controller with the version defined to a given site/namespace. If controller
has a newer version, then we could use the same router image we use when a new site is created.
But again, better to keep it separate. WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you on keeping this change separately, we need to define as well the CLI improvement for this.

@nluaces nluaces force-pushed the 2337-system-adaptor-process branch from 0eb07fd to d4bc626 Compare February 10, 2026 10:22
@nluaces
Copy link
Member Author

nluaces commented Feb 10, 2026

  • The site name in the router config is changing (we need to check if this will cause issues to the console or vanflow metrics)

I checked that the name of the site was not changing when the controller just refresh (for example, when a listener has been created); do you have any examples for me to understand this better?

@nluaces nluaces requested a review from fgiorgetti February 10, 2026 16:02
defer h.lock.Unlock()

_, err := os.Stat(api.GetInternalOutputPath(h.namespace, api.RuntimeSiteStatePath))
if err == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can handle updates separately from this initial PR, so it will be simpler to evaluate.
Let's just raise an enhancement issue for now, so we can think more about it.
Or we could do something similar to what the kube controller is doing, which consists of comparing
the version defined at the controller with the version defined to a given site/namespace. If controller
has a newer version, then we could use the same router image we use when a new site is created.
But again, better to keep it separate. WDYT?


// the container endpoint is mapped to the podman socket inside the container
if api.IsRunningInContainer() {
endpoint = "unix:///var/run/podman.sock"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, you brought up a good point. For auto-reloads, the system-controller container must be aware of all endpoints on the host machine, beforehands. So when the system-controller container is created, it needs to know the PODMAN_CONTAINER_ENDPOINT as well as the DOCKER_CONTAINER_ENDPOINT.

This way if the system controller is running with podman, but a docker site is created, the system controller already knows what is the endpoint to map, from the host machine into the target router container.


//If there is no site configured, the namespace needs to be removed
if err != nil || siteState == nil || siteState.Site == nil {
err = h.tearDownNamespace()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I believe we should not teardown the namespace if an error occurred.

For example, if you have a site named "mysite" running on the default namespace and you accidentally create another site named "mysite2", site state loader will fail, as 2 sites have been defined, which is acceptable.

But then, if you remove the mysite2, to fix the situation, it will teardown this running site, which is not desirable, leaving it in a bad state.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed!

@nluaces nluaces linked an issue Feb 12, 2026 that may be closed by this pull request
@nluaces nluaces force-pushed the 2337-system-adaptor-process branch from 9c2e556 to d63b383 Compare February 13, 2026 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create system-adaptor that process dynamic changes in the router

2 participants