Skip to content

Dev#22

Merged
ParsaKSH merged 4 commits intomainfrom
dev
Mar 29, 2026
Merged

Dev#22
ParsaKSH merged 4 commits intomainfrom
dev

Conversation

@ParsaKSH
Copy link
Copy Markdown
Owner

fix: prevent health probe interference with TunnelPool persistent connections

Root cause: health checker created SEPARATE TCP connections to instances
for framing protocol probes, concurrent with TunnelPool's persistent
connection. This likely disrupted the DNS tunnel, breaking the shared
persistent connection that all packet_split traffic flows through.

  • Skip probeFramingProtocol when TunnelPool has active connection to instance
  • Add keepalive frames (10s interval) to detect dead tunnels and keep DNS sessions alive
  • Add max-age (3min) forced reconnect to prevent long-lived connection degradation
  • Reduce stale threshold from 20s to 15s for faster dead tunnel detection
  • Ignore ConnID 0 (keepalive) on both client and CentralServer

after minutes

sendFrame held state.mu during sourceConn.WriteFrame (no timeout), and
handleData held state.mu during target.Write. When tunnel TCP writes got
slow, all frame dispatch for that tunnel froze — SYN frames couldn't be
processed, so new connections failed while existing ones continued.

- Add 10s write timeout to sourceConn.WriteFrame
- Refactor sendFrame to pick source under lock, write outside lock
- Refactor handleData/handleFIN to drain reorderer under lock, write
  outside
- Add 10s write timeout for upstream (Xray) writes
handleData wrote to upstream (Xray) synchronously inside the sequential
frame dispatch loop. A slow upstream write blocked ALL frame processing
on that tunnel — including SYN frames for new connections, causing new
connections to fail while existing ones continued.

Each connState now has a writeCh + upstreamWriter goroutine. handleData
inserts into the reorderer and sends chunks to writeCh non-blocking,
then returns immediately so the frame loop can process the next frame.
connections

Root cause: health checker created SEPARATE TCP connections to instances
for framing protocol probes, concurrent with TunnelPool's persistent
connection. This likely disrupted the DNS tunnel, breaking the shared
persistent connection that all packet_split traffic flows through.

- Skip probeFramingProtocol when TunnelPool has active connection to
  instance
- Add keepalive frames (10s interval) to detect dead tunnels and keep
  DNS sessions alive
- Add max-age (3min) forced reconnect to prevent long-lived connection
  degradation
- Reduce stale threshold from 20s to 15s for faster dead tunnel
  detection
- Ignore ConnID 0 (keepalive) on both client and CentralServer
@ParsaKSH ParsaKSH merged commit 94c4d86 into main Mar 29, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant