Skip to content

feat(network-agent): make host port allocator range configurable#576

Open
dushulin wants to merge 1 commit into
TencentCloud:masterfrom
dushulin:feat/upstream-network-agent-host-port-configurable
Open

feat(network-agent): make host port allocator range configurable#576
dushulin wants to merge 1 commit into
TencentCloud:masterfrom
dushulin:feat/upstream-network-agent-host-port-configurable

Conversation

@dushulin

Copy link
Copy Markdown
Contributor

Expose host_port_min / host_port_max in the cubelet TOML so the port pool can be sized to the cluster instead of the hard-coded 20000-29999. With the default 10k ports and 4 exposed ports per sandbox, a single node caps at ~2500 sandboxes regardless of the cpu/mem quotas. Letting the operator widen the range removes that ceiling without code changes.

Defaults stay 20000-29999, so existing deployments keep the same behavior. Validation rejects min > max and min < 1024.

Expose host_port_min / host_port_max in the cubelet TOML so the
port pool can be sized to the cluster instead of the hard-coded
20000-29999. With the default 10k ports and 4 exposed ports per
sandbox, a single node caps at ~2500 sandboxes regardless of the
cpu/mem quotas. Letting the operator widen the range removes that
ceiling without code changes.

Defaults stay 20000-29999, so existing deployments keep the same
behavior. Validation rejects min > max and min < 1024.
@dushulin dushulin requested a review from fslongjin as a code owner June 16, 2026 04:20
if min > max {
return nil, fmt.Errorf("invalid host port range: min=%d > max=%d", min, max)
}
if min < 1024 {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The min < 1024 check is insufficient to prevent overlap with the system ephemeral port range. The service sets /proc/sys/net/ipv4/ip_local_port_range to 10000-19999 at local_service.go:139, so a configured value like HostPortMin=15000 passes validation but overlaps with the ephemeral pool. This can cause silent intermittent connection failures when the kernel assigns an outbound source port that collides with a sandbox port-mapping.

Consider either:
(a) raising the floor from 1024 to 20000 (matching the ephemeral range's upper bound), or
(b) reading /proc/sys/net/ipv4/ip_local_port_range at construction time and explicitly rejecting any overlap.

}
if min < 1024 {
return nil, fmt.Errorf("invalid host port range: min=%d must be >= 1024", min)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no validation that the configured range has usable capacity. A configuration of HostPortMin=30000, HostPortMax=30000 (single port) passes both checks but will exhaust on the first allocation if that single port is reserved. This becomes a self-inflicted denial-of-service for sandbox creation — and it happens silently at startup because no error is returned until runtime.

Consider adding a minimum range width check, e.g. if max-min < 16 (or whatever minimum makes sense for your deployment), to fail fast with a clear error during initialization.

Comment on lines +172 to +177
if networkCfg.HostPortMin != 0 {
base.HostPortMin = networkCfg.HostPortMin
}
if networkCfg.HostPortMax != 0 {
base.HostPortMax = networkCfg.HostPortMax
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The != 0 checks are applied independently to each field. If an operator sets only host_port_min = 50000 in TOML (leaving host_port_max unset → zero), the result is HostPortMin=50000, HostPortMax=29999 (the default survives), which then fails in newPortAllocator with "invalid host port range: min=50000 > max=29999". The error is caught and the service won't start, but the message may confuse an operator who only set one value.

Consider either: (a) requiring both fields to be set when either is present, or (b) improving the error message in newPortAllocator to clearly show that only one value was overridden.

return nil, err
}
ports, err := newPortAllocator()
ports, err := newPortAllocator(cfg.HostPortMin, cfg.HostPortMax)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the ephemeral port range (/proc/sys/net/ipv4/ip_local_port_range) is set at line 139 — after the port allocator is constructed here. This means the allocator can never validate its range against the ephemeral pool during construction. If you enhance overlap detection, move the sysctl write above this line so the allocator can read the final ip_local_port_range value at startup.

@chenhengqi chenhengqi self-assigned this Jun 16, 2026

@chenhengqi chenhengqi left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bot has some comments, please address.

a single node caps at ~2500 sandboxes regardless of the cpu/mem quotas

Please share your use case where 2500 sandboxes is not enough for a single node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants