Skip to content

fix: prevent indefinite hangs during TLS operations#891

Closed
robutrader wants to merge 2 commits intoprojectdiscovery:mainfrom
robutrader:fix/indefinite-hang-timeout
Closed

fix: prevent indefinite hangs during TLS operations#891
robutrader wants to merge 2 commits intoprojectdiscovery:mainfrom
robutrader:fix/indefinite-hang-timeout

Conversation

@robutrader
Copy link
Copy Markdown

@robutrader robutrader commented Feb 4, 2026

Fixes #819

Summary

This PR fixes the issue where tlsx hangs indefinitely when processing large target lists (~25k+ hosts). The output gets cut mid-line (e.g., mid-JSON key) because the process hangs during a write operation.

Root Causes & Fixes

1. Broken timeout in ztls handshake

The tlsHandshakeWithTimeout function ran the handshake inline in a select statement instead of a goroutine, making the timeout ineffective. The handshake would block forever for unresponsive servers.

Fix: Run handshake in a goroutine so the select can properly choose between handshake completion and context timeout. Close the connection on timeout to unblock the goroutine.

2. Missing timeout in cipher enumeration

Both ztls and tls EnumerateCiphers methods called Handshake() without any timeout context, causing indefinite hangs during cipher enumeration.

Fix: Added proper timeout context using c.options.Timeout for all cipher enumeration handshakes.

3. No periodic flush for file output

The file writer only flushed on Close(), meaning data buffered in memory would be lost if the process hung or crashed.

Fix: Added periodic flush mechanism triggered every 100 writes to minimize data loss on unexpected termination.

Testing

  • Added unit tests for timeout cancellation behavior
  • Added unit tests for periodic flush mechanism
  • All tests pass with race detector enabled
  • go vet passes with no issues

Summary by CodeRabbit

  • New Features

    • Periodic flushing for file output to reduce data loss on hangs or crashes.
    • Context-aware TLS handshakes with per-attempt timeouts to avoid indefinite hangs.
  • Tests

    • Tests verifying periodic flush behavior and write counting.
    • Tests validating TLS handshake timeouts, cancellation behavior, and no-deadlock operation.

Fixes projectdiscovery#819

## Summary
This PR fixes the issue where tlsx hangs indefinitely when processing
large target lists (~25k+ hosts). The output gets cut mid-line
(e.g., mid-JSON key) because the process hangs during a write operation.

## Root Causes & Fixes

### 1. Broken timeout in ztls handshake
The tlsHandshakeWithTimeout function ran the handshake inline in a
select statement instead of a goroutine, making the timeout ineffective.
The handshake would block forever for unresponsive servers.

**Fix:** Run handshake in a goroutine so the select can properly choose
between handshake completion and context timeout. Close the connection
on timeout to unblock the goroutine.

### 2. Missing timeout in cipher enumeration
Both ztls and tls EnumerateCiphers methods called Handshake() without
any timeout context, causing indefinite hangs during cipher enumeration.

**Fix:** Added proper timeout context using c.options.Timeout for all
cipher enumeration handshakes.

### 3. No periodic flush for file output
The file writer only flushed on Close(), meaning data buffered in memory
would be lost if the process hung or crashed.

**Fix:** Added periodic flush mechanism triggered every 100 writes to
minimize data loss on unexpected termination.

## Testing
- Added unit tests for timeout cancellation behavior
- Added unit tests for periodic flush mechanism
- All tests pass with race detector enabled
- go vet passes with no issues
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 4, 2026

Walkthrough

Adds a periodic flush counter to the file writer (flush every 100 writes) and implements context-aware TLS handshakes with per-iteration deadlines and goroutine-driven handshake execution to avoid indefinite hangs.

Changes

Cohort / File(s) Summary
File Writer — periodic flush & tests
pkg/output/file_writer.go, pkg/output/file_writer_test.go
Adds flushThreshold (100) and an atomic writeCount to fileWriter; increments count on each write and flushes when threshold is reached. New tests validate periodic flushing, write count, and final flush on Close.
TLS handshake timeout handling & tests
pkg/tlsx/tls/tls.go, pkg/tlsx/ztls/ztls.go, pkg/tlsx/ztls/timeout_test.go
Replaces direct Handshake calls with context-aware HandshakeContext or tlsHandshakeWithTimeout which runs handshakes in a goroutine, uses per-iteration deadlines from options.Timeout, cancels contexts after attempts, closes connections on timeout, and continues enumeration on failures. Adds tests for timeout/cancellation behavior.

Sequence Diagram(s)

sequenceDiagram
    actor Enum as EnumerateCiphers
    participant Ctx as Context (with timeout)
    participant Handler as tlsHandshakeWithTimeout
    participant GR as Goroutine (handshake)
    participant Conn as TLS Conn

    Enum->>Ctx: Create per-iteration context (options.Timeout)
    Enum->>Handler: Call tlsHandshakeWithTimeout(conn, ctx)

    Handler->>GR: Spawn goroutine to run conn.Handshake()
    par
        GR->>Conn: Perform Handshake
        Conn-->>GR: Handshake result (success / error)
    and
        Handler->>Ctx: Monitor ctx.Done()
        Ctx-->>Handler: Timeout / cancel signal
    end

    alt Handshake completes first
        GR-->>Handler: return result
        Handler-->>Enum: success -> record cipher / or error -> continue
    else Context times out first
        Handler->>Conn: Close connection to unblock goroutine
        Handler-->>Enum: return timeout error -> continue enumeration
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I thump my paw and count each write in tune,

Every hundred hops I flush beneath the moon.
Context winds the handshake, timers keep things kind,
No more frozen lines — smooth output you will find.
Hooray for careful hops and deadlines well-designed!

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: prevent indefinite hangs during TLS operations' accurately describes the main change—addressing indefinite hangs in TLS operations with timeout support.
Linked Issues check ✅ Passed All objectives from issue #819 are met: TLS handshakes now respect timeouts via context, cipher enumeration applies configured timeouts, and periodic file flushing prevents data loss.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing the hang issue: timeout handling in TLS operations and periodic file flushing to prevent data loss.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@pkg/tlsx/tls/tls.go`:
- Around line 239-245: The loop creates a timeout context with ctx, cancel :=
context.WithTimeout(...) and defers cancel(), which retains all timers until the
function exits; change this so you do not defer cancel() inside the loop—only
call cancel() explicitly at the end of each iteration after the handshake
completes (and only when c.options.Timeout != 0 / cancel is non-nil), ensuring
you don't shadow variables (use the existing ctx variable or a new scoped one)
so each iteration cancels its timer immediately instead of accumulating
goroutines.

In `@pkg/tlsx/ztls/ztls.go`:
- Around line 260-266: The current cipher-enumeration handshake uses
context.WithTimeout and defers cancel(), which delays timer cancelation until
function return; instead, inside the cipher enumeration loop create the
per-iteration context (ctx) and cancel func when c.options.Timeout != 0, pass
ctx into tlsHandshakeWithTimeout, and explicitly call cancel() immediately after
the tlsHandshakeWithTimeout call (regardless of outcome) rather than deferring
it; adjust use of ctx/tlsHandshakeWithTimeout/c.options.Timeout so each
iteration releases its timer promptly.
🧹 Nitpick comments (1)
pkg/output/file_writer_test.go (1)

1-77: Consider adding edge case tests.

The tests effectively cover the happy path for periodic flush, write counting, and close behavior. For more comprehensive coverage, consider adding:

  1. A test verifying the exact line count in the flushed file matches the number of writes.
  2. A test for multiple flush cycles (e.g., 250 writes should trigger 2 flushes).

Address CodeRabbit review feedback: defer cancel() in loops keeps
all timers alive until function exit. Call cancel() explicitly
after each handshake iteration to release resources immediately.

Affected files:
- pkg/tlsx/tls/tls.go
- pkg/tlsx/ztls/ztls.go
@Mzack9999
Copy link
Copy Markdown
Member

Thanks for the contribution! This has been resolved by #949 which was merged into dev. Your fix had good scope covering the handshake and cipher enumeration in both TLS backends. The remaining issue was closing tlsConn instead of the raw TCP connection on timeout, which risks deadlocking on zcrypto's internal mutex. Closing as superseded.

@Mzack9999 Mzack9999 closed this Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tlsx hangs indefinitely for some hosts

3 participants