fix: prevent indefinite hangs during TLS operations (#819)#958
fix: prevent indefinite hangs during TLS operations (#819)#958Gengyscan wants to merge 2 commits intoprojectdiscovery:mainfrom
Conversation
…#819) Root cause: two concurrency bugs in ztls client caused indefinite hangs when scanning large target lists. 1. tlsHandshakeWithTimeout evaluated Handshake() synchronously as a send-expression in the select statement. Go evaluates the send value before entering select, so the timeout branch could never fire. Fix: run Handshake() in a goroutine, use a buffered errChan, and close the TLS connection on timeout to unblock any pending I/O. 2. EnumerateCiphers used context.TODO() which provides no deadline. Fix: create a context.WithTimeout derived from the configured Timeout value (guarded for Timeout==0 to avoid instant expiry). Fixes projectdiscovery#819
Neo - PR Security ReviewNo security issues found Highlights
Hardening Notes
Comment |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughHandshake logic was reworked to run TLS handshakes in a goroutine with an optional timeout-aware context, propagate the raw underlying connection for best-effort cleanup on timeout, treat Changes
Sequence Diagram(s)sequenceDiagram
participant Caller as EnumerateCiphers / Caller
participant CtxMgr as Context (with optional timeout)
participant Gor as Handshake Goroutine
participant TLS as tls.Conn
participant Raw as raw net.Conn
Caller->>CtxMgr: create ctx (options.Timeout?) + cancel
Caller->>Gor: call tlsHandshakeWithTimeout(TLS, Raw, ctx)
Gor->>TLS: invoke Handshake() (in goroutine)
Gor-->>Caller: send result / error via channel
alt Handshake completes before deadline
Caller->>CtxMgr: cancel ctx
Caller->>Raw: close (as before)
Caller-->>Caller: return success (treat ErrCertsOnly as success)
else Context deadline reached / timeout
CtxMgr->>Raw: best-effort check for concurrent result
CtxMgr->>Raw: close rawConn
Caller-->>Caller: return timeout error (tagged "ztls")
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/claim #819 |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
pkg/tlsx/ztls/ztls.go (2)
332-356: Well-implemented fix for the root cause of indefinite hangs.The implementation correctly addresses the original issue where
Handshake()was being evaluated synchronously in the select expression. Key strengths:
- Buffered channel (capacity 1) ensures the goroutine can always complete without blocking
- Non-blocking fairness check (lines 346-353) prevents discarding a result that arrives simultaneously with the deadline
- Connection close (line 354) unblocks any pending I/O operations in the handshake goroutine
Minor style note: Go convention places
ctx context.Contextas the first parameter. Consider reordering for consistency with standard library patterns.♻️ Optional: Reorder parameters to follow Go convention
-func (c *Client) tlsHandshakeWithTimeout(tlsConn *tls.Conn, ctx context.Context) error { +func (c *Client) tlsHandshakeWithTimeout(ctx context.Context, tlsConn *tls.Conn) error {And update the call sites:
- err = c.tlsHandshakeWithTimeout(tlsConn, ctx) + err = c.tlsHandshakeWithTimeout(ctx, tlsConn)- if err := c.tlsHandshakeWithTimeout(conn, handshakeCtx); err == nil { + if err := c.tlsHandshakeWithTimeout(handshakeCtx, conn); err == nil {,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/tlsx/ztls/ztls.go` around lines 332 - 356, The function tlsHandshakeWithTimeout currently takes (tlsConn *tls.Conn, ctx context.Context); change its signature to place ctx context.Context as the first parameter (tlsHandshakeWithTimeout(ctx context.Context, tlsConn *tls.Conn)) and update all call sites to pass the context first, ensuring any references to the function (including method receiver Client) are adjusted accordingly; rebuild and run tests to confirm no remaining compile errors from the signature change.
260-272: LGTM - Timeout handling mirrors the pattern in tls.go.The context creation and cancellation pattern is consistent with the standard TLS client implementation.
Same optional suggestion applies here: consider using
defer cancel()immediately after creation for safety.,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/tlsx/ztls/ztls.go` around lines 260 - 272, The handshake context/cancel pattern should use defer to ensure the timeout cancel is always run: after creating handshakeCtx and cancel (when c.options.Timeout != 0) call defer cancel() so the cancel is guaranteed even on early returns; remove or avoid the later explicit cancel() call to prevent double-cancel confusion, and keep the existing conn.Close()/GetHandshakeLog/enumeratedCiphers logic and the call to c.tlsHandshakeWithTimeout unchanged.pkg/tlsx/tls/tls.go (1)
239-251: LGTM - Timeout handling for cipher enumeration is correctly implemented.The context-based timeout pattern is consistent with
ConnectWithOptionsand uses the standardHandshakeContextmethod which properly respects context cancellation. The nil-check before callingcancel()handles theTimeout==0case correctly.Minor suggestion: Using
defer cancel()immediately after creation is more idiomatic and guards against future code changes that might introduce early returns.♻️ Optional: Use defer for cancel
handshakeCtx := context.Background() var cancel context.CancelFunc if c.options.Timeout != 0 { handshakeCtx, cancel = context.WithTimeout(handshakeCtx, time.Duration(c.options.Timeout)*time.Second) + defer cancel() } if err := conn.HandshakeContext(handshakeCtx); err == nil { ciphersuite := conn.ConnectionState().CipherSuite enumeratedCiphers = append(enumeratedCiphers, tls.CipherSuiteName(ciphersuite)) } - if cancel != nil { - cancel() - },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/tlsx/tls/tls.go` around lines 239 - 251, The timeout context and cancel handling around conn.HandshakeContext should use an immediate defer for cancel to be idiomatic and prevent future early-return leaks: after creating handshakeCtx and cancel via context.WithTimeout (the handshakeCtx, cancel variables), call defer cancel() right away (guarding only when cancel is non-nil) instead of calling cancel() later; keep using conn.HandshakeContext(handshakeCtx) and conn.ConnectionState() as-is and ensure _ = conn.Close() remains after the handshake.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/tlsx/tls/tls.go`:
- Around line 239-251: The timeout context and cancel handling around
conn.HandshakeContext should use an immediate defer for cancel to be idiomatic
and prevent future early-return leaks: after creating handshakeCtx and cancel
via context.WithTimeout (the handshakeCtx, cancel variables), call defer
cancel() right away (guarding only when cancel is non-nil) instead of calling
cancel() later; keep using conn.HandshakeContext(handshakeCtx) and
conn.ConnectionState() as-is and ensure _ = conn.Close() remains after the
handshake.
In `@pkg/tlsx/ztls/ztls.go`:
- Around line 332-356: The function tlsHandshakeWithTimeout currently takes
(tlsConn *tls.Conn, ctx context.Context); change its signature to place ctx
context.Context as the first parameter (tlsHandshakeWithTimeout(ctx
context.Context, tlsConn *tls.Conn)) and update all call sites to pass the
context first, ensuring any references to the function (including method
receiver Client) are adjusted accordingly; rebuild and run tests to confirm no
remaining compile errors from the signature change.
- Around line 260-272: The handshake context/cancel pattern should use defer to
ensure the timeout cancel is always run: after creating handshakeCtx and cancel
(when c.options.Timeout != 0) call defer cancel() so the cancel is guaranteed
even on early returns; remove or avoid the later explicit cancel() call to
prevent double-cancel confusion, and keep the existing
conn.Close()/GetHandshakeLog/enumeratedCiphers logic and the call to
c.tlsHandshakeWithTimeout unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e70e067f-532c-431c-9cc5-dadb7fa01d64
📒 Files selected for processing (2)
pkg/tlsx/tls/tls.gopkg/tlsx/ztls/ztls.go
…fairness The ztls handshake fix from the previous commit only covers the zcrypto TLS path. The standard crypto/tls EnumerateCiphers also called conn.Handshake() directly without any timeout context — the same class of indefinite hang. Fix: use conn.HandshakeContext() with a timeout derived from the configured Timeout value (guarded for Timeout==0). Also fix a deadlock in the ztls timeout path: zcrypto's Conn.Close() acquires the handshake mutex, which the blocked Handshake() goroutine already holds. Closing the TLS conn directly causes a deadlock. Fix: close the raw TCP connection instead, which unblocks the I/O read that the handshake is stuck on. Add a non-blocking select fairness check in the ctx.Done() branch: prefer a completed handshake result over a timeout when both are ready. Add TestHandshakeTimeout regression test: starts a TCP server that accepts but never responds, verifies the handshake returns within the configured timeout rather than hanging indefinitely. Fixes projectdiscovery#819
3842cd9 to
432d670
Compare
Updated: Deadlock fix + tls cipher timeout + regression testSummary of changes (2 commits):Commit 1 — ztls handshake goroutine + ztls cipher timeout
Commit 2 — tls cipher timeout + deadlock fix + regression test
Deadlock detail` Fix: close rawConn instead → net.Read returns error → Handshake() returns → goroutine exits Files modified
|
|
Thanks for the contribution! This has been resolved by #949 which was merged into dev. Your analysis of the synchronous handshake bug was spot on. The fix had the same |
Summary
Fixes #819 — tlsx hangs indefinitely when scanning large target lists (30k+ hosts).
Root Cause
Two concurrency bugs in the ztls client:
Bug 1:
tlsHandshakeWithTimeoutnever times outGo evaluates the send-expression (
tlsConn.Handshake()) before entering theselectstatement. This means the function blocks on the handshake synchronously, and thectx.Done()branch can never fire. If a remote host is slow or unresponsive, the goroutine hangs forever.Bug 2:
EnumerateCiphersusescontext.TODO()context.TODO()has no deadline, so even iftlsHandshakeWithTimeoutworked correctly, cipher enumeration would never time out.Fix
Fix 1: Run handshake in a goroutine
selectcan race it against the context deadlineerrChanis buffered (capacity 1) so the goroutine won't leak if the timeout fires firsttlsConn.Close()is called to unblock any pendingRead/Writein the handshake goroutineFix 2: Replace
context.TODO()with proper timeoutUses the same
Timeout != 0guard pattern already present inConnectWithOptions(line 118) to avoid creating a context that expires instantly when no timeout is configured.Testing
pkg/tlsx/ztls/ztls.goConnectWithOptionsalready uses the same timeout guard)Summary by CodeRabbit
Bug Fixes
Tests
Chores