Skip to content

fix: prevent indefinite hangs during TLS operations (#819)#958

Closed
Gengyscan wants to merge 2 commits intoprojectdiscovery:mainfrom
Gengyscan:fix/prevent-tls-hangs-819
Closed

fix: prevent indefinite hangs during TLS operations (#819)#958
Gengyscan wants to merge 2 commits intoprojectdiscovery:mainfrom
Gengyscan:fix/prevent-tls-hangs-819

Conversation

@Gengyscan
Copy link
Copy Markdown

@Gengyscan Gengyscan commented Mar 13, 2026

Summary

Fixes #819 — tlsx hangs indefinitely when scanning large target lists (30k+ hosts).

Root Cause

Two concurrency bugs in the ztls client:

Bug 1: tlsHandshakeWithTimeout never times out

select {
case <-ctx.Done():
    return errorutil.NewWithTag("ztls", "timeout while attempting handshake")
case errChan <- tlsConn.Handshake():  // ← evaluated BEFORE select
}

Go evaluates the send-expression (tlsConn.Handshake()) before entering the select statement. This means the function blocks on the handshake synchronously, and the ctx.Done() branch can never fire. If a remote host is slow or unresponsive, the goroutine hangs forever.

Bug 2: EnumerateCiphers uses context.TODO()

if err := c.tlsHandshakeWithTimeout(conn, context.TODO()); err == nil {

context.TODO() has no deadline, so even if tlsHandshakeWithTimeout worked correctly, cipher enumeration would never time out.

Fix

Fix 1: Run handshake in a goroutine

errChan := make(chan error, 1)  // buffered to prevent goroutine leak
go func() {
    errChan <- tlsConn.Handshake()
}()
select {
case err := <-errChan:
    // handle result
case <-ctx.Done():
    _ = tlsConn.Close()  // unblock pending I/O in the goroutine
    return timeout error
}
  • Handshake runs in a separate goroutine so select can race it against the context deadline
  • errChan is buffered (capacity 1) so the goroutine won't leak if the timeout fires first
  • On timeout, tlsConn.Close() is called to unblock any pending Read/Write in the handshake goroutine

Fix 2: Replace context.TODO() with proper timeout

handshakeCtx := context.Background()
var cancel context.CancelFunc
if c.options.Timeout != 0 {
    handshakeCtx, cancel = context.WithTimeout(handshakeCtx,
        time.Duration(c.options.Timeout)*time.Second)
}

Uses the same Timeout != 0 guard pattern already present in ConnectWithOptions (line 118) to avoid creating a context that expires instantly when no timeout is configured.

Testing

  • The fix is minimal and contained to pkg/tlsx/ztls/ztls.go
  • Both changes follow existing patterns in the codebase (ConnectWithOptions already uses the same timeout guard)
  • The goroutine pattern follows standard Go concurrency idioms for timeout-aware I/O

Summary by CodeRabbit

  • Bug Fixes

    • Improved TLS handshake timeout handling to avoid hangs and better respect configured timeouts.
    • Enhanced treatment of certificate-only handshake responses to prevent false errors and spurious connection failures.
    • Ensured underlying connections are closed promptly to avoid deadlocks.
  • Tests

    • Added a regression test verifying handshake timeouts complete reliably against non‑responsive servers.
  • Chores

    • Internal API surface for handshake logic adjusted (may affect integrations).

…#819)

Root cause: two concurrency bugs in ztls client caused indefinite hangs
when scanning large target lists.

1. tlsHandshakeWithTimeout evaluated Handshake() synchronously as a
   send-expression in the select statement. Go evaluates the send value
   before entering select, so the timeout branch could never fire.
   Fix: run Handshake() in a goroutine, use a buffered errChan, and
   close the TLS connection on timeout to unblock any pending I/O.

2. EnumerateCiphers used context.TODO() which provides no deadline.
   Fix: create a context.WithTimeout derived from the configured
   Timeout value (guarded for Timeout==0 to avoid instant expiry).

Fixes projectdiscovery#819
@neo-by-projectdiscovery-dev
Copy link
Copy Markdown

neo-by-projectdiscovery-dev bot commented Mar 13, 2026

Neo - PR Security Review

No security issues found

Highlights

  • Fixes goroutine blocking issue by running TLS handshake in separate goroutine, allowing timeout to work correctly
  • Replaces context.TODO() with proper timeout context in EnumerateCiphers to enable timeout protection
  • Uses buffered channel (capacity 1) to prevent goroutine leaks when timeout fires first
Hardening Notes
  • Consider adding validation in clients.Options to reject negative or excessively large Timeout values (e.g., > 300 seconds) to prevent configuration errors
  • In EnumerateCiphers at line 266, consider adding nil check for h1.ServerHello before accessing CipherSuite to prevent potential panics in edge cases

Comment @pdneo help for available commands. · Open in Neo

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6b136700-872e-46b2-b751-0475c4f239bc

📥 Commits

Reviewing files that changed from the base of the PR and between 3842cd9 and 432d670.

📒 Files selected for processing (3)
  • pkg/tlsx/tls/tls.go
  • pkg/tlsx/ztls/timeout_test.go
  • pkg/tlsx/ztls/ztls.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/tlsx/tls/tls.go

Walkthrough

Handshake logic was reworked to run TLS handshakes in a goroutine with an optional timeout-aware context, propagate the raw underlying connection for best-effort cleanup on timeout, treat tls.ErrCertsOnly as success, and add a regression test verifying the handshake timeout behavior.

Changes

Cohort / File(s) Summary
ztls handshake & cipher enumeration
pkg/tlsx/ztls/ztls.go
EnumerateCiphers and ConnectWithOptions now build an optional timeout context and pass the underlying rawConn plus the *tls.Conn and ctx into tlsHandshakeWithTimeout. tlsHandshakeWithTimeout signature changed to accept (tlsConn *tls.Conn, rawConn net.Conn, ctx context.Context). Handshake runs in a goroutine; tls.ErrCertsOnly is treated as success; on timeout the underlying rawConn is closed and a timeout error tagged with "ztls" is returned. Context cancelation and connection cleanup paths were tightened.
tls handshake with context
pkg/tlsx/tls/tls.go
Replaced direct Handshake calls with HandshakeContext using an optional timeout context created from options. Handshake runs with cancel semantics and preserves cipher enumeration and connection close behavior.
timeout regression test
pkg/tlsx/ztls/timeout_test.go
Adds TestHandshakeTimeout: spins a blackhole TCP listener, performs a handshake with a short deadline via tlsHandshakeWithTimeout, asserts a timeout error is returned and the call completes within an observable window (no indefinite hang).

Sequence Diagram(s)

sequenceDiagram
    participant Caller as EnumerateCiphers / Caller
    participant CtxMgr as Context (with optional timeout)
    participant Gor as Handshake Goroutine
    participant TLS as tls.Conn
    participant Raw as raw net.Conn

    Caller->>CtxMgr: create ctx (options.Timeout?) + cancel
    Caller->>Gor: call tlsHandshakeWithTimeout(TLS, Raw, ctx)
    Gor->>TLS: invoke Handshake() (in goroutine)
    Gor-->>Caller: send result / error via channel
    alt Handshake completes before deadline
        Caller->>CtxMgr: cancel ctx
        Caller->>Raw: close (as before)
        Caller-->>Caller: return success (treat ErrCertsOnly as success)
    else Context deadline reached / timeout
        CtxMgr->>Raw: best-effort check for concurrent result
        CtxMgr->>Raw: close rawConn
        Caller-->>Caller: return timeout error (tagged "ztls")
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I nibbled at timeouts, hopped into the net,
Spawned a tiny goroutine, waited, then reset.
If handshakes stall or servers won’t play,
I close the raw tunnel and hop on my way.
Hooray for tidy handshakes — carrots for the day! 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'fix: prevent indefinite hangs during TLS operations' directly and clearly describes the main change—addressing indefinite hangs in TLS operations as documented in issue #819.
Linked Issues check ✅ Passed The PR implements all primary coding objectives: (1) tlsHandshakeWithTimeout now runs handshakes in a goroutine with timeout via context, (2) EnumerateCiphers uses context.WithTimeout instead of context.TODO(), and (3) timeout errors are properly returned to unblock operations.
Out of Scope Changes check ✅ Passed All changes are scoped to fixing TLS hangs: modifications in pkg/tlsx/ztls/ztls.go (core fix), pkg/tlsx/tls/tls.go (consistent handshake timing), and pkg/tlsx/ztls/timeout_test.go (regression test). No unrelated changes detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Gengyscan
Copy link
Copy Markdown
Author

/claim #819

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
pkg/tlsx/ztls/ztls.go (2)

332-356: Well-implemented fix for the root cause of indefinite hangs.

The implementation correctly addresses the original issue where Handshake() was being evaluated synchronously in the select expression. Key strengths:

  1. Buffered channel (capacity 1) ensures the goroutine can always complete without blocking
  2. Non-blocking fairness check (lines 346-353) prevents discarding a result that arrives simultaneously with the deadline
  3. Connection close (line 354) unblocks any pending I/O operations in the handshake goroutine

Minor style note: Go convention places ctx context.Context as the first parameter. Consider reordering for consistency with standard library patterns.

♻️ Optional: Reorder parameters to follow Go convention
-func (c *Client) tlsHandshakeWithTimeout(tlsConn *tls.Conn, ctx context.Context) error {
+func (c *Client) tlsHandshakeWithTimeout(ctx context.Context, tlsConn *tls.Conn) error {

And update the call sites:

-	err = c.tlsHandshakeWithTimeout(tlsConn, ctx)
+	err = c.tlsHandshakeWithTimeout(ctx, tlsConn)
-	if err := c.tlsHandshakeWithTimeout(conn, handshakeCtx); err == nil {
+	if err := c.tlsHandshakeWithTimeout(handshakeCtx, conn); err == nil {

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/tlsx/ztls/ztls.go` around lines 332 - 356, The function
tlsHandshakeWithTimeout currently takes (tlsConn *tls.Conn, ctx
context.Context); change its signature to place ctx context.Context as the first
parameter (tlsHandshakeWithTimeout(ctx context.Context, tlsConn *tls.Conn)) and
update all call sites to pass the context first, ensuring any references to the
function (including method receiver Client) are adjusted accordingly; rebuild
and run tests to confirm no remaining compile errors from the signature change.

260-272: LGTM - Timeout handling mirrors the pattern in tls.go.

The context creation and cancellation pattern is consistent with the standard TLS client implementation.

Same optional suggestion applies here: consider using defer cancel() immediately after creation for safety.

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/tlsx/ztls/ztls.go` around lines 260 - 272, The handshake context/cancel
pattern should use defer to ensure the timeout cancel is always run: after
creating handshakeCtx and cancel (when c.options.Timeout != 0) call defer
cancel() so the cancel is guaranteed even on early returns; remove or avoid the
later explicit cancel() call to prevent double-cancel confusion, and keep the
existing conn.Close()/GetHandshakeLog/enumeratedCiphers logic and the call to
c.tlsHandshakeWithTimeout unchanged.
pkg/tlsx/tls/tls.go (1)

239-251: LGTM - Timeout handling for cipher enumeration is correctly implemented.

The context-based timeout pattern is consistent with ConnectWithOptions and uses the standard HandshakeContext method which properly respects context cancellation. The nil-check before calling cancel() handles the Timeout==0 case correctly.

Minor suggestion: Using defer cancel() immediately after creation is more idiomatic and guards against future code changes that might introduce early returns.

♻️ Optional: Use defer for cancel
 		handshakeCtx := context.Background()
 		var cancel context.CancelFunc
 		if c.options.Timeout != 0 {
 			handshakeCtx, cancel = context.WithTimeout(handshakeCtx, time.Duration(c.options.Timeout)*time.Second)
+			defer cancel()
 		}
 		if err := conn.HandshakeContext(handshakeCtx); err == nil {
 			ciphersuite := conn.ConnectionState().CipherSuite
 			enumeratedCiphers = append(enumeratedCiphers, tls.CipherSuiteName(ciphersuite))
 		}
-		if cancel != nil {
-			cancel()
-		}

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/tlsx/tls/tls.go` around lines 239 - 251, The timeout context and cancel
handling around conn.HandshakeContext should use an immediate defer for cancel
to be idiomatic and prevent future early-return leaks: after creating
handshakeCtx and cancel via context.WithTimeout (the handshakeCtx, cancel
variables), call defer cancel() right away (guarding only when cancel is
non-nil) instead of calling cancel() later; keep using
conn.HandshakeContext(handshakeCtx) and conn.ConnectionState() as-is and ensure
_ = conn.Close() remains after the handshake.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/tlsx/tls/tls.go`:
- Around line 239-251: The timeout context and cancel handling around
conn.HandshakeContext should use an immediate defer for cancel to be idiomatic
and prevent future early-return leaks: after creating handshakeCtx and cancel
via context.WithTimeout (the handshakeCtx, cancel variables), call defer
cancel() right away (guarding only when cancel is non-nil) instead of calling
cancel() later; keep using conn.HandshakeContext(handshakeCtx) and
conn.ConnectionState() as-is and ensure _ = conn.Close() remains after the
handshake.

In `@pkg/tlsx/ztls/ztls.go`:
- Around line 332-356: The function tlsHandshakeWithTimeout currently takes
(tlsConn *tls.Conn, ctx context.Context); change its signature to place ctx
context.Context as the first parameter (tlsHandshakeWithTimeout(ctx
context.Context, tlsConn *tls.Conn)) and update all call sites to pass the
context first, ensuring any references to the function (including method
receiver Client) are adjusted accordingly; rebuild and run tests to confirm no
remaining compile errors from the signature change.
- Around line 260-272: The handshake context/cancel pattern should use defer to
ensure the timeout cancel is always run: after creating handshakeCtx and cancel
(when c.options.Timeout != 0) call defer cancel() so the cancel is guaranteed
even on early returns; remove or avoid the later explicit cancel() call to
prevent double-cancel confusion, and keep the existing
conn.Close()/GetHandshakeLog/enumeratedCiphers logic and the call to
c.tlsHandshakeWithTimeout unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e70e067f-532c-431c-9cc5-dadb7fa01d64

📥 Commits

Reviewing files that changed from the base of the PR and between 761a872 and 3842cd9.

📒 Files selected for processing (2)
  • pkg/tlsx/tls/tls.go
  • pkg/tlsx/ztls/ztls.go

…fairness

The ztls handshake fix from the previous commit only covers the zcrypto
TLS path. The standard crypto/tls EnumerateCiphers also called
conn.Handshake() directly without any timeout context — the same class
of indefinite hang.

Fix: use conn.HandshakeContext() with a timeout derived from the
configured Timeout value (guarded for Timeout==0).

Also fix a deadlock in the ztls timeout path: zcrypto's Conn.Close()
acquires the handshake mutex, which the blocked Handshake() goroutine
already holds. Closing the TLS conn directly causes a deadlock.
Fix: close the raw TCP connection instead, which unblocks the I/O
read that the handshake is stuck on.

Add a non-blocking select fairness check in the ctx.Done() branch:
prefer a completed handshake result over a timeout when both are ready.

Add TestHandshakeTimeout regression test: starts a TCP server that
accepts but never responds, verifies the handshake returns within the
configured timeout rather than hanging indefinitely.

Fixes projectdiscovery#819
@Gengyscan Gengyscan force-pushed the fix/prevent-tls-hangs-819 branch from 3842cd9 to 432d670 Compare March 17, 2026 18:42
@Gengyscan
Copy link
Copy Markdown
Author

Updated: Deadlock fix + tls cipher timeout + regression test

Summary of changes (2 commits):

Commit 1 — ztls handshake goroutine + ztls cipher timeout

  • lsHandshakeWithTimeout: move Handshake() into goroutine with buffered errChan
  • EnumerateCiphers (ztls): replace context.TODO() with context.WithTimeout

Commit 2 — tls cipher timeout + deadlock fix + regression test

  • EnumerateCiphers (tls): replace bare Handshake() with HandshakeContext() + timeout
  • Deadlock fix: zcrypto's Conn.Close() acquires the handshake mutex, which the blocked Handshake() goroutine already holds. Calling lsConn.Close() on timeout causes a deadlock (confirmed via test). Fix: close the raw TCP connection instead, which unblocks the I/O read and lets the handshake goroutine exit.
  • Select fairness: on ctx.Done(), try errChan non-blocking first — prefer a completed result over a timeout when both fire simultaneously.
  • Regression test: TestHandshakeTimeout — TCP server that accepts but never responds, verifies handshake returns within 500ms timeout rather than hanging.

Deadlock detail

`
goroutine A: Handshake() → readFromUntil() → BLOCKED on net.Read (holds handshakeMutex)
goroutine B: <-ctx.Done() → tlsConn.Close() → Lock(handshakeMutex) → DEADLOCK

Fix: close rawConn instead → net.Read returns error → Handshake() returns → goroutine exits
`

Files modified

  • pkg/tlsx/ztls/ztls.go — handshake goroutine, rawConn close, cipher timeout, select fairness
  • pkg/tlsx/tls/tls.go — HandshakeContext() with timeout for cipher enumeration
  • pkg/tlsx/ztls/timeout_test.go — regression test (new file)

@Mzack9999
Copy link
Copy Markdown
Member

Thanks for the contribution! This has been resolved by #949 which was merged into dev. Your analysis of the synchronous handshake bug was spot on. The fix had the same tlsConn.Close() deadlock risk as other attempts — the raw TCP connection needs to be closed instead to safely unblock zcrypto's Handshake(). Closing as superseded.

@Mzack9999 Mzack9999 closed this Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tlsx hangs indefinitely for some hosts

2 participants