Skip to content

fix(js): prevent pool slot starvation under load#6896

Open
AuditeMarlow wants to merge 1 commit intoprojectdiscovery:devfrom
guardian360:fix/js-pool-slot-starvation
Open

fix(js): prevent pool slot starvation under load#6896
AuditeMarlow wants to merge 1 commit intoprojectdiscovery:devfrom
guardian360:fix/js-pool-slot-starvation

Conversation

@AuditeMarlow
Copy link

@AuditeMarlow AuditeMarlow commented Feb 13, 2026

Proposed changes

Partially addresses #6894 — JS protocol templates silently drop all matches
under load. This PR fixes pool slot starvation (the primary cause); a follow-up
is needed to propagate execution contexts into JS library network calls.

Root cause: zombie goroutines from timed-out JS executions hold pool slots
indefinitely. compiler.go creates a 20s deadline context but passes the
original (no-deadline) context to ExecuteProgram. Both pooljsc.Add() and
ephemeraljsc.Add() block with context.Background(), so goroutines waiting
for a slot never fail fast. Once a slot is acquired, JS library network calls
use context.TODO() — they hang on non-matching targets with no deadline.
ExecFuncWithTwoReturns abandons the caller after 20s, but the goroutine
continues holding its pool slot via defer Done(). Both pools (80 pooled + 20
non-pooled slots) fill with zombies; subsequent executions time out waiting for
slots.

Three changes fix slot lifecycle management:

  1. Propagate the deadline context (compiler.go): set opts.Context = ctx
    before calling ExecuteProgram so the 20s deadline flows through to both
    execution paths.

  2. Fail-fast slot acquisition (pool.go, non-pool.go): replace Add()
    with AddWithContext(ctx). Goroutines waiting for a slot return immediately
    when the deadline expires instead of blocking indefinitely.

  3. Watchdog goroutine (pool.go, non-pool.go): a background goroutine
    listens on ctx.Done() and releases the pool slot when the deadline expires,
    even if the zombie is still running. An atomic.Bool ensures exactly one
    Done() call between the watchdog and the normal defer path.

Note: this PR does not address context.TODO() usage in the 16 JS library
files. That is a follow-up concern. The watchdog already prevents permanent pool
starvation; context propagation into JS libraries would make zombies exit faster.

Proof

Setup

1. Start a Redis container (no password):

docker run -d --name redis-repro -p 6379:6379 redis:latest

2. Save as listeners.py and start 25 TCP listeners (15s response delay):

Each listener accepts a connection, waits 15 seconds, then responds with
garbage. The 15s delay is within Nuclei's default JS execution timeout (20s)
but long enough to hold pool slots. Open ports cause isPortOpen pre-conditions
to pass, so JS templates execute against all listeners and consume pool slots.

listeners.py
#!/usr/bin/env python3
"""Start TCP listeners that accept connections but don't speak Redis."""
import socket
import threading
import signal
import sys
import time

PORT_START = 20000
COUNT = int(sys.argv[1]) if len(sys.argv) > 1 else 25
DELAY = 15

def handle_conn(conn):
    try:
        conn.recv(1024)
        time.sleep(DELAY)
        conn.sendall(b"-ERR not redis\r\n")
    except Exception:
        pass
    finally:
        conn.close()

def listener(port):
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    s.bind(('127.0.0.1', port))
    s.listen(5)
    while True:
        try:
            conn, _ = s.accept()
            threading.Thread(target=handle_conn, args=(conn,), daemon=True).start()
        except OSError:
            break

for p in range(PORT_START, PORT_START + COUNT):
    threading.Thread(target=listener, args=(p,), daemon=True).start()

print(f"Listening on {COUNT} ports ({PORT_START}-{PORT_START + COUNT - 1}), delay={DELAY}s")
signal.pause()
python3 listeners.py 25

3. Generate target list (25 haystack + 1 Redis last):

python3 -c "
for i in range(20000, 20025):
    print(f'localhost:{i}')
print('localhost:6379')
" > targets.txt

Baseline — single target (identical on both dev and fix)

$ nuclei -t ~/nuclei-templates/javascript/ -target localhost:6379 -no-mhe -silent
[redis-default-logins] [javascript] [high] localhost:6379 [passwords=""]
[redis-default-logins] [javascript] [high] localhost:6379 [passwords="iamadmin"]
[redis-default-logins] [javascript] [high] localhost:6379 [passwords="root"]
[redis-default-logins] [javascript] [high] localhost:6379 [passwords="password"]
[redis-default-logins] [javascript] [high] localhost:6379 [passwords="admin"]
[redis-info] [javascript] [info] localhost:6379
[snmpv3-detect] [javascript] [info] localhost:6379

7 matches.

Under load — dev branch (567794f), 3 consecutive runs

$ nuclei -t ~/nuclei-templates/javascript/ -l targets.txt -no-mhe -silent | wc -l
0
$ nuclei -t ~/nuclei-templates/javascript/ -l targets.txt -no-mhe -silent | wc -l
0
$ nuclei -t ~/nuclei-templates/javascript/ -l targets.txt -no-mhe -silent | wc -l
0

0 matches every time. All results silently dropped.

Under load — this branch, 3 consecutive runs

$ nuclei -t ~/nuclei-templates/javascript/ -l targets.txt -no-mhe -silent | wc -l
15
$ nuclei -t ~/nuclei-templates/javascript/ -l targets.txt -no-mhe -silent | wc -l
33
$ nuclei -t ~/nuclei-templates/javascript/ -l targets.txt -no-mhe -silent | wc -l
9

9–33 matches per run (vs 0 on dev). All three runs find redis-info
(Export, pooled path). redis-default-logins (non-Export, 20-slot path)
recovered in 2 of 3 runs. The remaining variance is caused by context.TODO()
in JS library network calls — zombie goroutines still hold slots for up to 15s
(the listener delay) before completing, limiting throughput on the crowded
20-slot non-pooled path. That is a follow-up fix.

Unit tests

Three new tests in pkg/js/compiler/pool_test.go verify the fix mechanics:

  • TestAddWithContextRespectsDeadline — slot acquisition fails fast when
    deadline expires (vs blocking indefinitely with Add())
  • TestWatchdogReleasesSlotOnDeadline — watchdog frees a slot held by a zombie,
    allowing a new execution to acquire it
  • TestPoolExhaustionRecovery — full starvation/recovery cycle: all slots
    filled with zombies, deadlines expire, subsequent acquisitions succeed
$ go test -race -v -run 'TestAddWithContext|TestWatchdog|TestPoolExhaustion' ./pkg/js/compiler/...
=== RUN   TestAddWithContextRespectsDeadline
--- PASS: TestAddWithContextRespectsDeadline (0.05s)
=== RUN   TestWatchdogReleasesSlotOnDeadline
--- PASS: TestWatchdogReleasesSlotOnDeadline (0.07s)
=== RUN   TestPoolExhaustionRecovery
--- PASS: TestPoolExhaustionRecovery (0.20s)
PASS

Checklist

  • Pull request is created against the dev branch
  • All checks passed (lint, unit/integration/regression tests etc.) with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Summary by CodeRabbit

  • Improvements

    • Script execution now respects outer cancellation and deadlines, making timeouts more predictable.
    • Resource pool handling and cleanup are more robust, ensuring slots are reliably released when executions are cancelled or time out.
  • Tests

    • Added tests validating deadline-aware acquisitions, watchdog-driven slot release on deadline, and pool exhaustion recovery.

@auto-assign auto-assign bot requested a review from dwisiswant0 February 13, 2026 12:51
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 13, 2026

Walkthrough

Propagates caller context into JS execution by setting opts.Context, replaces pool Add/Done with context-aware AddWithContext, and adds atomic guards plus watchdog goroutines to ensure pooled and non-pooled slots are released exactly once on cancellation or deadline.

Changes

Cohort / File(s) Summary
Compiler entry
pkg/js/compiler/compiler.go
Sets opts.Context = ctx before invoking ExecuteProgram so the inner execution observes outer cancellation/deadlines.
Non-pooled execution
pkg/js/compiler/non-pool.go
Replaces Add()/Done() with AddWithContext(opts.Context), handles acquisition errors, adds sync/atomic flag, done channel, and a cancellation watcher to ensure Done() is called exactly once.
Pooled execution
pkg/js/compiler/pool.go
Adds sync/atomic, stores ctx in runtime context, uses AddWithContext(opts.Context) with error handling, and introduces an atomic slotReleased, watchdog goroutine, and deterministic deferred cleanup to guarantee single Done() invocation and close watcher channel.
Tests
pkg/js/compiler/pool_test.go
Adds tests validating AddWithContext deadline behavior, watchdog-driven slot release on deadline, and pool exhaustion/recovery with zombie workers.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Executor
    participant Pool
    participant Watchdog
    participant Cleanup

    Client->>Executor: ExecuteProgram(request)
    Executor->>Executor: derive ctx (with timeout/deadline)
    Executor->>Pool: AddWithContext(opts.Context)
    alt slot acquired
        Pool-->>Executor: slot acquired
        Executor->>Watchdog: start cancellation watcher (goroutine)
        Executor->>Executor: run JS runtime using opts.Context
        par normal run
            Executor->>Executor: execute script
        and cancellation path
            Watchdog->>Watchdog: wait for ctx.Done() or done signal
            alt ctx.Done()
                Watchdog->>Pool: call Done() (CAS guard ensures single call)
            end
        end
        Executor->>Cleanup: deferred cleanup
        Cleanup->>Pool: call Done() if not already (CAS)
        Cleanup->>Watchdog: close done channel
        Cleanup-->>Client: return result
    else AddWithContext fails
        Pool-->>Executor: error (deadline/full)
        Executor-->>Client: return error
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I set a context, snug and bright,
Watched slots free when day turned night,
Atomic paws and a tiny bell,
Deadlines rang, the pool did swell,
Hooray — the rabbit hops, all's right! 🎉

🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(js): prevent pool slot starvation under load' directly and clearly summarizes the main change: addressing pool slot starvation issues in JS execution under load conditions.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into dev

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments
pkg/js/compiler/pool.go (1)

145-172: Watchdog pattern is correct; consider extracting the shared helper.

The CAS-guarded watchdog + defer cleanup is identical in both pool.go and non-pool.go. The logic is sound, but the duplication could be eliminated with a small helper, e.g.:

♻️ Optional: extract shared watchdog helper
// acquireSlot acquires a pool slot respecting ctx and returns a release
// function that is safe to call exactly once (idempotent).  A background
// watchdog releases the slot if ctx expires before release is called.
func acquireSlot(ctx context.Context, pool *syncutil.AdaptiveWaitGroup) (release func(), err error) {
	if err := pool.AddWithContext(ctx); err != nil {
		return nil, err
	}
	var released atomic.Bool
	done := make(chan struct{})
	go func() {
		select {
		case <-ctx.Done():
			if released.CompareAndSwap(false, true) {
				pool.Done()
			}
		case <-done:
		}
	}()
	return func() {
		close(done)
		if released.CompareAndSwap(false, true) {
			pool.Done()
		}
	}, nil
}

Usage in both files would reduce to:

release, err := acquireSlot(opts.Context, pooljsc)
if err != nil {
    return nil, err
}
defer release()
pkg/js/compiler/pool_test.go (1)

118-127: Zombie goroutines linger for 10 s after the test returns.

The sleeping goroutines outlive the test function. While not a correctness issue (they're goroutine-safe), they add 10 s of background noise to the test process. Consider using a context-aware sleep so they exit promptly:

♻️ Suggested improvement
-		// Zombie worker: blocks for 10s simulating a hung network call.
-		go func() {
-			defer func() {
-				close(done)
-				if released.CompareAndSwap(false, true) {
-					pool.Done()
-				}
-			}()
-			time.Sleep(10 * time.Second)
-		}()
+		// Zombie worker: blocks simulating a hung network call, but
+		// exits promptly once the test's context is canceled.
+		zombieCtx := ctx // capture per-iteration
+		go func() {
+			defer func() {
+				close(done)
+				if released.CompareAndSwap(false, true) {
+					pool.Done()
+				}
+			}()
+			select {
+			case <-time.After(10 * time.Second):
+			case <-zombieCtx.Done():
+				// Still sleep briefly so we don't beat the watchdog.
+				time.Sleep(50 * time.Millisecond)
+			}
+		}()

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@AuditeMarlow AuditeMarlow force-pushed the fix/js-pool-slot-starvation branch from 1618147 to 4a26c99 Compare February 13, 2026 13:07
Zombie goroutines from timed-out JS executions held pool slots
indefinitely: Add() blocked with context.Background(), and defer Done()
only ran when the goroutine eventually completed. Under load, both
pools (80 pooled + 20 non-pooled slots) filled with zombies, silently
dropping all subsequent matches.

Three changes fix slot lifecycle management:

1. Propagate the 20s deadline context into ExecuteProgram (compiler.go)
   so both execution paths can respect the deadline.

2. Replace Add() with AddWithContext(ctx) in both pool.go and
   non-pool.go so goroutines waiting for a slot fail fast when the
   deadline expires instead of blocking indefinitely.

3. Add a watchdog goroutine that releases the slot when the deadline
   expires, even if the zombie is still running. An atomic.Bool
   ensures exactly one Done() call between the watchdog and the
   normal defer path.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant