Skip to content

Stabilize the local gateway front door around port 26306#1232

Closed
chumyin wants to merge 7 commits intoeastreams:devfrom
chumyin:chumyin/gateway-26306-maturation-20260412
Closed

Stabilize the local gateway front door around port 26306#1232
chumyin wants to merge 7 commits intoeastreams:devfrom
chumyin:chumyin/gateway-26306-maturation-20260412

Conversation

@chumyin
Copy link
Copy Markdown
Collaborator

@chumyin chumyin commented Apr 12, 2026

Summary

  • Problem:
    • the gateway owner contract existed, but the loopback control surface still defaulted to an ephemeral port
  • Why it matters:
    • a stable local gateway boundary is easier to operate, easier to bootstrap local UI/tooling against, and closer to the intended single gateway noun
  • What changed:
    • added a first-class gateway config seam with default port 26306
    • wired gateway runtime startup so the control surface resolves precedence as CLI --port > env LOONGCLAW_GATEWAY_PORT > config [gateway].port > default 26306
    • preserved explicit ephemeral mode via --port 0
    • updated docs and gateway regression tests
  • What did not change (scope boundary):
    • no remote bind expansion
    • no ACP contract redesign
    • no new public gateway surface beyond the existing localhost control plane

Linked Issues

Change Type

  • Bug fix
  • Feature
  • Refactor
  • Documentation
  • Security hardening
  • CI / workflow / release

Touched Areas

  • Kernel / policy / approvals
  • Contracts / protocol / spec
  • Daemon / CLI / install
  • Providers / routing
  • Tools
  • Browser automation
  • Channels / integrations
  • ACP / conversation / session runtime
  • Memory / context assembly
  • Config / migration / onboarding
  • Docs / contributor workflow
  • CI / release / workflows

Risk Track

  • Track A (routine / low-risk)
  • Track B (higher-risk / policy-impacting)

If Track B, fill these in:

  • Risk notes:
    • changes the default gateway control-surface port from ephemeral to stable loopback 26306
    • introduces explicit port precedence and an ephemeral escape hatch
  • Rollout / guardrails:
    • loopback-only bind stays unchanged
    • local clients still discover the effective binding from persisted owner state
    • --port 0 preserves explicit ephemeral behavior for tests and labs
  • Rollback path:
    • revert the gateway port resolver and docs while keeping owner-state discovery intact

Validation

  • cargo fmt --all -- --check
  • cargo clippy --workspace --all-targets --all-features -- -D warnings
  • cargo test --workspace --locked
  • cargo test --workspace --all-features --locked
  • Relevant architecture / dep-graph / docs checks for touched areas
  • Additional scenario, benchmark, or manual checks when behavior changed
  • If this changes config/env fallback, limits, or defaults: include before/after behavior and regression coverage for explicit path, fallback path, and boundary values
  • If tests mutate process-global env: document how state is restored or serialized

Commands and evidence:

cargo fmt --all -- --check
cargo clippy --workspace --all-targets --all-features -- -D warnings
cargo test -p loongclaw gateway_ -- --test-threads=1
cargo test -p loongclaw-app gateway_ -- --test-threads=1
./scripts/check_architecture_boundaries.sh
LOONGCLAW_RELEASE_DOCS_STRICT=1 ./scripts/check-docs.sh
cargo test --workspace --locked --quiet
cargo test --workspace --all-features --locked --quiet

Validation notes:

  • cargo fmt passed.
  • cargo clippy passed.
  • Targeted gateway and config regression suites passed.
  • ./scripts/check_architecture_boundaries.sh passed.
  • ./scripts/check-docs.sh failed on pre-existing release debug/trace artifact gaps unrelated to this slice.
  • cargo test --workspace --locked --quiet still fails on a pre-existing loongclaw-app full-suite instability (observed failures included conversation::announce::tests::delegate_announce_queue_batches_children_completed_within_debounce_window on the feature branch and safe-lane session-dispatcher tests on the unmodified base worktree).
  • cargo test --workspace --all-features --locked --quiet still fails on a pre-existing loongclaw-app shell-exec test family unrelated to the gateway slice.

User-visible / Operator-visible Changes

  • loongclaw gateway run now defaults to 127.0.0.1:26306 instead of an ephemeral loopback port.
  • Operators can override with --port or LOONGCLAW_GATEWAY_PORT.
  • --port 0 remains available for explicit ephemeral lab/test runs.

Failure Recovery

  • Fast rollback or disable path:
    • revert this PR to restore ephemeral default binding
    • operators can also force an alternate port with --port or LOONGCLAW_GATEWAY_PORT if 26306 is occupied locally
  • Observable failure symptoms reviewers should watch for:
    • gateway status reports the wrong persisted port
    • local clients fail to discover the token or loopback endpoint
    • targeted gateway tests regress on port precedence or explicit ephemeral mode

Reviewer Focus

  • crates/app/src/config/runtime.rs
  • crates/daemon/src/gateway/control.rs
  • crates/daemon/src/gateway/service.rs
  • crates/daemon/tests/integration/gateway_owner_state.rs
  • crates/daemon/tests/integration/gateway_api_turn.rs
  • README.md
  • docs/product-specs/channel-setup.md

The gateway owner contract already exposed a loopback control surface, but the
listener defaulted to an ephemeral port. This change promotes a stable default
loopback endpoint at 127.0.0.1:26306 while preserving the persisted owner-state
discovery contract and an explicit ephemeral escape hatch.

The resolver now follows one clear precedence path for the control surface:
CLI --port, then LOONGCLAW_GATEWAY_PORT, then config [gateway].port, then the
built-in 26306 default. Gateway startup and docs now describe that contract,
and gateway integration tests explicitly request port 0 so parallel test runs do
not fight over the new stable default.

Constraint: Gateway control-surface exposure stays loopback-only in this slice
Constraint: Existing gateway integration tests still need an explicit ephemeral override to avoid port collisions
Rejected: Keep the ephemeral default forever | leaves the gateway without a stable local front door
Rejected: Expand into remote bind/auth work in the same slice | widens the security and product scope beyond the port contract
Confidence: high
Scope-risk: moderate
Directive: Treat --port 0 as a deliberate lab/test escape hatch, not the normal operator path
Tested: cargo fmt --all -- --check
Tested: cargo clippy --workspace --all-targets --all-features -- -D warnings
Tested: cargo test -p loongclaw-app gateway_ -- --test-threads=1
Tested: cargo test -p loongclaw gateway_ -- --test-threads=1
Tested: ./scripts/check_architecture_boundaries.sh
Not-tested: cargo test --workspace --locked --quiet | blocked by pre-existing loongclaw-app full-suite instability (observed in both the feature branch and the unmodified base worktree)
Not-tested: cargo test --workspace --all-features --locked --quiet | blocked by pre-existing loongclaw-app shell-exec failures unrelated to gateway changes
Not-tested: LOONGCLAW_RELEASE_DOCS_STRICT=1 ./scripts/check-docs.sh | blocked by pre-existing missing release debug/trace artifacts
@github-actions github-actions Bot added documentation Improvements or additions to documentation. spec Architecture boundaries, product specs, and design docs. daemon Daemon binary, CLI entrypoints, and install flow. config Runtime config parsing, schema, and defaults. docs Contributor docs, references, and issue/PR guidance. size: M Medium pull request: 201-500 changed lines. labels Apr 12, 2026
chumyin added 3 commits April 13, 2026 18:34
The initial gateway-26306 slice stabilized the default loopback port, but the
surrounding product and design docs still relied too much on README/context and
there was no direct test proving that a configured [gateway].port value reaches
the running control surface.

This follow-up fills those gaps. It adds resolver coverage for the config-driven
port path, adds an integration test proving gateway owner state reflects a
configured gateway port without overrides, and extends the local product-control
plane and Web UI docs so the stable localhost front door is described in the
right contract surfaces instead of only in the README.

Constraint: The gateway control surface remains loopback-only and uses persisted owner state as output truth
Rejected: Add only README notes | leaves the product/design contracts under-documented
Confidence: high
Scope-risk: narrow
Directive: Keep gateway port behavior documented in the control-plane and Web UI contracts whenever the bootstrap path changes
Tested: cargo fmt --all -- --check
Tested: cargo test -p loongclaw gateway_control_listener_port_ -- --test-threads=1
Tested: cargo test -p loongclaw gateway_owner_state_uses_configured_gateway_port_when_no_override_is_present -- --test-threads=1
Tested: ./scripts/check_architecture_boundaries.sh
Not-tested: cargo test --workspace --locked --quiet | still blocked by pre-existing loongclaw-app full-suite instability outside this slice
Not-tested: cargo test --workspace --all-features --locked --quiet | still blocked by pre-existing loongclaw-app shell-exec failures outside this slice
The gateway now has a stable default port, but operator-facing status still did
not explain whether the running control surface came from the built-in default,
a config override, or an env/CLI override. That made debugging override-heavy
setups harder than necessary.

This change persists and renders the control-surface port source, extends the
resolver to classify default/config/env/cli/ephemeral-cli selection, and locks
that contract in with focused gateway tests plus a config-backed integration
assertion. The product control-plane docs now also state that operator surfaces
should explain the effective port source.

Constraint: Gateway owner-state remains the durable runtime truth for the active binding
Rejected: Add more bootstrap/discovery behavior first | source visibility is a smaller, safer operator-contract improvement for this slice
Confidence: high
Scope-risk: narrow
Directive: When gateway bind precedence changes, update both the persisted status schema and operator-facing docs together
Tested: cargo fmt --all -- --check
Tested: cargo clippy --workspace --all-targets --all-features -- -D warnings
Tested: cargo test -p loongclaw gateway_ -- --test-threads=1
Not-tested: cargo test --workspace --locked --quiet | still blocked by pre-existing loongclaw-app full-suite instability outside this slice
Not-tested: cargo test --workspace --all-features --locked --quiet | still blocked by pre-existing loongclaw-app shell-exec failures outside this slice
The gateway now has a stable default loopback port, but the local client still
started by reading owner-state files first. This change makes default discovery
try the stable 127.0.0.1:26306 front door with the local bearer token before it
falls back to persisted owner-state, so the bootstrap path now matches the
product contract while preserving override and ephemeral-port support.

The change also extends gateway owner-state integration coverage around explicit
CLI and ephemeral port sources so the active runtime contract remains visible
and regression-resistant.

Constraint: Default bootstrap must stay loopback-only and continue using the local token file
Constraint: Override and ephemeral ports still need owner-state fallback because they intentionally diverge from the stable default front door
Rejected: Keep discovery file-first | leaves the stable default port as documentation-only behavior instead of a real bootstrap path
Confidence: high
Scope-risk: moderate
Directive: Keep default bootstrap front-door-first unless the gateway auth/bootstrap contract is deliberately redesigned end to end
Tested: cargo fmt --all -- --check
Tested: cargo clippy --workspace --all-targets --all-features -- -D warnings
Tested: cargo test -p loongclaw gateway_local_discovery_ -- --test-threads=1
Tested: cargo test -p loongclaw gateway_ -- --test-threads=1
Not-tested: cargo test --workspace --locked --quiet | still blocked by pre-existing loongclaw-app full-suite instability outside the gateway slice
Not-tested: cargo test --workspace --all-features --locked --quiet | still blocked by pre-existing loongclaw-app shell-exec failures outside the gateway slice
@github-actions github-actions Bot added size: L Large pull request: 501-1000 changed lines. and removed size: M Medium pull request: 201-500 changed lines. labels Apr 14, 2026
The gateway had matured into a stable local front door, but trusted operators
still had to leave that surface to inspect or resolve device pairing requests.
This change reuses the existing control-plane pairing registry and protocol
shapes so the gateway can act as the local pairing inbox for operator review
and approval without dragging in the larger remote connection handshake.

The local client can now bootstrap against the default front door, expose where
that discovery came from, and call pairing list/resolve routes on the gateway.
The gateway control surface now serves pairing request listing and pairing
resolution through the same loopback bearer boundary as the rest of the local
operator API.

Constraint: Pairing stays on the localhost operator bearer surface in this slice
Constraint: The remote challenge/connect handshake remains out of scope for this gateway increment
Rejected: Rebuild pairing storage inside the gateway | duplicates the existing control-plane pairing registry and persistence rules
Rejected: Pull the full remote control-plane connect path into gateway now | too wide for the current local maturity slice
Confidence: high
Scope-risk: moderate
Directive: Keep gateway pairing routes reusing the control-plane pairing registry and protocol payloads unless the trust model itself is redesigned
Tested: cargo fmt --all -- --check
Tested: cargo test -p loongclaw gateway_api_pairing -- --test-threads=1
Tested: cargo test -p loongclaw gateway_ -- --test-threads=1
Tested: cargo clippy -p loongclaw --all-targets --all-features -- -D warnings
Tested: ./scripts/check_architecture_boundaries.sh
Not-tested: cargo test --workspace --locked --quiet | still blocked by pre-existing loongclaw-app full-suite instability outside the gateway slice
Not-tested: cargo test --workspace --all-features --locked --quiet | still blocked by pre-existing loongclaw-app shell-exec failures outside the gateway slice
@github-actions github-actions Bot added size: XL Very large pull request: more than 1000 changed lines. and removed size: L Large pull request: 501-1000 changed lines. labels Apr 17, 2026
chumyin added 2 commits April 16, 2026 20:13
The gateway can now list and resolve local pairing requests, but operator-facing
summary surfaces still hid whether pairing work was pending or whether any
trusted devices had already been approved. This change projects pairing state
into the gateway operator summary and status text so the local front door shows
both runtime health and current trust posture.

The summary is intentionally small: pending pairing request count, approved
device count, and the latest pairing activity timestamp. It reuses the existing
control-plane pairing registry instead of inventing another store, and it keeps
operator-facing status aligned with the pairing routes added in the previous
slice.

Constraint: Pairing summary remains a local operator read model and does not widen the remote trust surface
Rejected: Add full device inventory rows first | the count/activity rollup is the smaller stable step that closes the operator visibility gap
Confidence: high
Scope-risk: narrow
Directive: Keep gateway operator summary and status text in sync with pairing workflow changes so trust posture stays visible from one surface
Tested: cargo fmt --all -- --check
Tested: cargo test -p loongclaw gateway_read_model_operator_summary_keeps_owner_control_and_runtime_rollups -- --test-threads=1
Tested: cargo test -p loongclaw render_status_cli_text_surfaces_drill_down_recipes -- --test-threads=1
Tested: cargo clippy -p loongclaw --all-targets --all-features -- -D warnings
Tested: ./scripts/check_architecture_boundaries.sh
Not-tested: cargo test --workspace --locked --quiet | still blocked by pre-existing loongclaw-app full-suite instability outside the gateway slice
Not-tested: cargo test --workspace --all-features --locked --quiet | still blocked by pre-existing loongclaw-app shell-exec failures outside the gateway slice
The gateway can now bootstrap locally and resolve pairing requests, but the
operator summary still hid whether pairing work was pending or whether any
devices had already been approved. This change projects pairing trust posture
into the gateway operator summary and status text using the existing control-
plane pairing registry as the source of truth.

The summary intentionally stays compact: pending pairing count, approved device
count, and latest pairing activity time. That gives operators one high-signal
view of runtime health plus trust posture without introducing a second device
inventory authority.

Constraint: Pairing summary remains a local operator read model and does not widen the remote trust surface
Rejected: Jump straight to a full approved-device inventory UI | the compact trust posture rollup is the smaller stable step that closes the operator visibility gap first
Confidence: high
Scope-risk: narrow
Directive: Keep gateway operator summary and pairing workflow routes aligned so trust posture is visible from the same front door
Tested: cargo fmt --all -- --check
Tested: cargo test -p loongclaw gateway_read_model_operator_summary_keeps_owner_control_and_runtime_rollups -- --test-threads=1
Tested: cargo test -p loongclaw render_status_cli_text_surfaces_drill_down_recipes -- --test-threads=1
Tested: cargo clippy -p loongclaw --all-targets --all-features -- -D warnings
Tested: ./scripts/check_architecture_boundaries.sh
Not-tested: cargo test --workspace --locked --quiet | still blocked by pre-existing loongclaw-app full-suite instability outside the gateway slice
Not-tested: cargo test --workspace --all-features --locked --quiet | still blocked by pre-existing loongclaw-app shell-exec failures outside the gateway slice
chumyin added a commit that referenced this pull request Apr 22, 2026
… surfaces

This change makes the status surface consume the running localhost gateway's
operator summary when the gateway belongs to the same config, so local status
stops rebuilding a lossy duplicate summary with forced zero pairing and node
counts. It also types the node inventory client surface and adds regression
coverage proving that status and local client consumers stay aligned with the
live gateway contract.

Constraint: The next stage prioritizes broader local platform adoption before relay-pairing or wider trust-surface expansion
Rejected: Add a new status-specific aggregate client surface | one real consumer can use the existing operator summary plus typed node inventory
Rejected: Force bootstrap-first gateway discovery inside async status collection | the blocking bootstrap lane is not async-safe in this path yet
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: Keep status and other local operator surfaces consuming the daemon-owned gateway contract instead of rebuilding gateway summaries locally
Tested: cargo check --workspace --locked --quiet
Tested: cargo test -p loong --test integration status_cli_ -- --nocapture
Tested: cargo test -p loong --test integration gateway_owner_state_local_client_channels_and_operator_summary_keep_plugin_backed_parity -- --nocapture
Tested: cargo test -p loong status_cli::tests::render_status_cli_text_surfaces_drill_down_recipes -- --nocapture
Tested: cargo clippy -p loong --tests --no-deps -- -D warnings
Not-tested: Full workspace all-features test matrix
Related: #1232
Related: #1300
Related: #1377
chumyin added a commit that referenced this pull request Apr 22, 2026
…atures

This change folds the localhost gateway handlers through shared request/session
helpers so the control code stays easier to evolve without drifting auth,
status, ACP, or pairing behavior. It also adds regression coverage around the
negative paths that the refactor now centralizes, including missing control
bearer auth, missing pairing session tokens, and pairing routes exercised
without an event bus.

Constraint: Phase 2 must preserve the localhost gateway contract while internal control flow is consolidated
Rejected: Fold this into a broader operator-surface redesign | would mix read-model evolution with handler cleanup and slow stabilization
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: Keep new gateway control helpers and regression tests aligned before expanding remote relay pairing
Tested: cargo check --workspace --locked --quiet
Tested: cargo clippy -p loong --tests --no-deps -- -D warnings
Tested: cargo test -p loong --test integration gateway_pairing_ -- --nocapture
Tested: cargo test -p loong --test integration gateway_owner_state_localhost_control_surface_requires_auth_and_stops_runtime -- --nocapture
Tested: cargo test -p loong gateway_nodes_ -- --nocapture
Tested: cargo test -p loong gateway_read_model_operator_summary_keeps_owner_control_and_runtime_rollups -- --nocapture
Not-tested: Full workspace all-features test matrix
Related: #1232
Related: #1300
chumyin added a commit that referenced this pull request Apr 22, 2026
… surfaces

This change makes the status surface consume the running localhost gateway's
operator summary when the gateway belongs to the same config, so local status
stops rebuilding a lossy duplicate summary with forced zero pairing and node
counts. It also types the node inventory client surface and adds regression
coverage proving that status and local client consumers stay aligned with the
live gateway contract.

Constraint: The next stage prioritizes broader local platform adoption before relay-pairing or wider trust-surface expansion
Rejected: Add a new status-specific aggregate client surface | one real consumer can use the existing operator summary plus typed node inventory
Rejected: Force bootstrap-first gateway discovery inside async status collection | the blocking bootstrap lane is not async-safe in this path yet
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: Keep status and other local operator surfaces consuming the daemon-owned gateway contract instead of rebuilding gateway summaries locally
Tested: cargo check --workspace --locked --quiet
Tested: cargo test -p loong --test integration status_cli_ -- --nocapture
Tested: cargo test -p loong --test integration gateway_owner_state_local_client_channels_and_operator_summary_keep_plugin_backed_parity -- --nocapture
Tested: cargo test -p loong status_cli::tests::render_status_cli_text_surfaces_drill_down_recipes -- --nocapture
Tested: cargo clippy -p loong --tests --no-deps -- -D warnings
Not-tested: Full workspace all-features test matrix
Related: #1232
Related: #1300
Related: #1377
chumyin added a commit that referenced this pull request Apr 22, 2026
…atures

This change folds the localhost gateway handlers through shared request/session
helpers so the control code stays easier to evolve without drifting auth,
status, ACP, or pairing behavior. It also adds regression coverage around the
negative paths that the refactor now centralizes, including missing control
bearer auth, missing pairing session tokens, and pairing routes exercised
without an event bus.

Constraint: Phase 2 must preserve the localhost gateway contract while internal control flow is consolidated
Rejected: Fold this into a broader operator-surface redesign | would mix read-model evolution with handler cleanup and slow stabilization
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: Keep new gateway control helpers and regression tests aligned before expanding remote relay pairing
Tested: cargo check --workspace --locked --quiet
Tested: cargo clippy -p loong --tests --no-deps -- -D warnings
Tested: cargo test -p loong --test integration gateway_pairing_ -- --nocapture
Tested: cargo test -p loong --test integration gateway_owner_state_localhost_control_surface_requires_auth_and_stops_runtime -- --nocapture
Tested: cargo test -p loong gateway_nodes_ -- --nocapture
Tested: cargo test -p loong gateway_read_model_operator_summary_keeps_owner_control_and_runtime_rollups -- --nocapture
Not-tested: Full workspace all-features test matrix
Related: #1232
Related: #1300
chumyin added a commit that referenced this pull request Apr 22, 2026
… surfaces

This change makes the status surface consume the running localhost gateway's
operator summary when the gateway belongs to the same config, so local status
stops rebuilding a lossy duplicate summary with forced zero pairing and node
counts. It also types the node inventory client surface and adds regression
coverage proving that status and local client consumers stay aligned with the
live gateway contract.

Constraint: The next stage prioritizes broader local platform adoption before relay-pairing or wider trust-surface expansion
Rejected: Add a new status-specific aggregate client surface | one real consumer can use the existing operator summary plus typed node inventory
Rejected: Force bootstrap-first gateway discovery inside async status collection | the blocking bootstrap lane is not async-safe in this path yet
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: Keep status and other local operator surfaces consuming the daemon-owned gateway contract instead of rebuilding gateway summaries locally
Tested: cargo check --workspace --locked --quiet
Tested: cargo test -p loong --test integration status_cli_ -- --nocapture
Tested: cargo test -p loong --test integration gateway_owner_state_local_client_channels_and_operator_summary_keep_plugin_backed_parity -- --nocapture
Tested: cargo test -p loong status_cli::tests::render_status_cli_text_surfaces_drill_down_recipes -- --nocapture
Tested: cargo clippy -p loong --tests --no-deps -- -D warnings
Not-tested: Full workspace all-features test matrix
Related: #1232
Related: #1300
Related: #1377
@chumyin
Copy link
Copy Markdown
Collaborator Author

chumyin commented Apr 22, 2026

closing this draft because the gateway front-door work landed through the cleaned-up successor stack in #1373, with the follow-up maturity slice merged in #1377.

@chumyin chumyin closed this Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

config Runtime config parsing, schema, and defaults. daemon Daemon binary, CLI entrypoints, and install flow. docs Contributor docs, references, and issue/PR guidance. documentation Improvements or additions to documentation. size: XL Very large pull request: more than 1000 changed lines. spec Architecture boundaries, product specs, and design docs.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Stabilize the loopback gateway control surface around port 26306

1 participant