Skip to content

Fix segfaults caused by race condition between ntlm close handler and client conn cleanup#3

Draft
Copilot wants to merge 2 commits intomasterfrom
copilot/fix-nginx-segfault-issue
Draft

Fix segfaults caused by race condition between ntlm close handler and client conn cleanup#3
Copilot wants to merge 2 commits intomasterfrom
copilot/fix-nginx-segfault-issue

Conversation

Copy link
Copy Markdown

Copilot AI commented May 5, 2026

When a client connection closes, ngx_http_upstream_client_conn_cleanup posted the peer read event and then immediately moved the cache item to the free queue. When that posted event later fired, ngx_http_upstream_ntlm_close_handler operated on an item already in the free queue — removing and re-inserting it, causing queue corruption and segfaults.

Changes

  • ngx_http_upstream_client_conn_cleanup: Replace the post-event approach with a direct, atomic close sequence — set item->peer_connection = NULL first (acts as a sentinel), move item to free queue, then call ngx_http_upstream_ntlm_close() directly. No posted event, no window for the race.

  • ngx_http_upstream_ntlm_close_handler (close: label): Add a guard on item->peer_connection == NULL — if the cleanup handler already processed this item, just close the connection and return without touching the queue.

// cleanup handler — atomic, no posted event
if (item->peer_connection != NULL) {
    c = item->peer_connection;
    item->peer_connection = NULL;          // sentinel for close_handler
    ngx_queue_remove(&item->queue);
    ngx_queue_insert_head(&item->conf->free, &item->queue);
    ngx_http_upstream_ntlm_close(c);
}

// close_handler — guard against double-processing
item = c->data;
if (item->peer_connection == NULL) {
    ngx_http_upstream_ntlm_close(c);      // cleanup already moved the item
    return;
}
Original prompt

Problem

When using this module with nginx 1.28.3, segfaults occur in ngx_http_upstream_ntlm_close_handler and ngx_http_upstream_client_conn_cleanup due to a race condition between these two functions manipulating the same cache item's queue membership and peer connection.

Root Cause

When a client connection closes, ngx_http_upstream_client_conn_cleanup (line 371):

  1. Sets item->peer_connection->read->timedout = 1
  2. Posts item->peer_connection->read to ngx_posted_events
  3. Removes item from the cache queue and inserts it into the free queue (lines 385-386)

Later, when the posted event fires, ngx_http_upstream_ntlm_close_handler (line 394) runs and:

  1. Gets item = c->data — the same item that's now already in the free queue
  2. Sets item->peer_connection = NULL
  3. Calls ngx_queue_remove(&item->queue) — removes from the free queue (corruption!)
  4. Inserts into conf->free again — double insertion / queue corruption

This causes three types of crashes:

  • Segfault in ngx_http_upstream_ntlm_close_handler accessing corrupted queue/item data
  • Segfault in ngx_http_upstream_handler because queue corruption causes a freed/reused connection to still be found in cache
  • Segfault in ngx_http_upstream_client_conn_cleanup because item->peer_connection->read is already freed

Fix

  1. In ngx_http_upstream_client_conn_cleanup: Instead of posting an event and then moving the queue item (which creates the race), close the peer connection directly and move the queue item to free in one atomic operation. Set item->peer_connection = NULL before closing so the close handler can detect it's already been handled.

  2. In ngx_http_upstream_ntlm_close_handler: Add a guard at the close: label — if item->peer_connection is already NULL, it means the cleanup handler already processed this item, so just close the connection and return without touching the queue.

Patched ngx_http_upstream_client_conn_cleanup:

static void ngx_http_upstream_client_conn_cleanup(void *data) {
    ngx_http_upstream_ntlm_cache_t *item = data;
    ngx_connection_t *c;

    ngx_log_debug2(
        NGX_LOG_DEBUG_HTTP, ngx_cycle->log, 0,
        "ntlm client connection closed %p, dropping peer connection %p",
        item->client_connection, item->peer_connection);

    if (item->peer_connection != NULL) {
        c = item->peer_connection;

        // Null out peer_connection BEFORE closing so the close_handler
        // won't double-process this item
        item->peer_connection = NULL;

        // Remove from cache queue and move to free queue
        ngx_queue_remove(&item->queue);
        ngx_queue_insert_head(&item->conf->free, &item->queue);

        // Close the peer connection directly instead of posting an event
        ngx_http_upstream_ntlm_close(c);
    }
}

Patched ngx_http_upstream_ntlm_close_handler (close label):

close:

    ngx_log_debug2(NGX_LOG_DEBUG_HTTP, ev->log, 0,
                   "ntlm close peer connection %p, timeout %ui", c,
                   c->read->timedout);

    item = c->data;
    conf = item->conf;

    // Guard: if peer_connection is already NULL, the cleanup handler
    // already processed this item — just close the connection.
    if (item->peer_connection == NULL) {
        ngx_http_upstream_ntlm_close(c);
        return;
    }

    item->peer_connection = NULL;
    ngx_http_upstream_ntlm_close(c);

    ngx_queue_remove(&item->queue);
    ngx_queue_insert_head(&conf->free, &item->queue);

Note: ngx_http_upstream_ntlm_close is declared static before ngx_http_upstream_client_conn_cleanup, so it may need to be forward-declared or the function order adjusted so that the cleanup function can call it.

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: We are using nginx (https://github.com/nginx/nginx/tree/release-1.28.3) with this module.
Some segfaults occure. Here are some backtraces:

core-nginx-11-65534-65534-614067-1777025800:
[New LWP 614067]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
Core was generated by `nginx: worker process '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 ngx_http_upstream_ntlm_close_handler (ev=) at ../nginx-ntlm-module/ngx_http_upstream_ntlm_module.c:463
#0 ngx_http_upstream_ntlm_close_handler (ev=) at ../nginx-ntlm-module/ngx_http_upstream_ntlm_module.c:463
#1 0x0000561a3085cb58 in ngx_event_process_posted (cycle=cycle@entry=0x561a48238c60, posted=0x561a309494f0 <ngx_posted_events>) at src/event/ngx_event_posted.c:34
#2 0x0000561a3085c695 in ngx_process_events_and_timers (cycle=cycle@entry=0x561a48238c60) at src/event/ngx_event.c:263
#3 0x0000561a30865d55 in ngx_worker_process_cycle (cycle=cycle@entry=0x561a48238c60, data=data@entry=0x0) at src/os/unix/ngx_process_cycle.c:721
#4...

This pull request was created from Copilot chat.

Copilot AI changed the title [WIP] Fix segmentation fault in nginx due to race condition Fix segfaults caused by race condition between ntlm close handler and client conn cleanup May 5, 2026
Copilot AI requested a review from dsabotta May 5, 2026 09:42
Copilot AI added a commit that referenced this pull request May 7, 2026
- Rewrites ngx_http_upstream_ntlm_module.c with four documented security invariants:
  I1. Atomic item release via ngx_http_upstream_ntlm_item_release() helper
  I2. c->read->data (not c->data) for cache item storage on idle connections
  I3. OOM guard: cleanup-add failure aborts cache insertion (session-hijack fix)
  I4. Stale-credential eviction on re-auth with established session

- Fixes from all open PRs (#3, #6, #7):
  - Synchronous cleanup handler (no posted events)
  - Session-hijack via cleanup-OOM (critical)
  - Stale-credential reuse on established connections (high)
  - client_connection NULL in eviction path
  - Fault-injection macro NGX_NTLM_TEST_CLEANUP_NULL for testing

- Adds ntlm_time and ntlm_requests directives
- Adds notify peer callback pass-through
- Updates README.md for v2 with security section
- Updates Docker files to nginx 1.28.0 / alpine 3.20
- Adds TEST 5 (stale credential eviction) and TEST 6 (OOM guard, SKIP)
  to t/001-sanity.t

Agent-Logs-Url: https://github.com/Securepoint/nginx-ntlm-module/sessions/3343cf49-d4a7-4188-89b8-5a89fad4a7fe

Co-authored-by: matthias-lay <163420385+matthias-lay@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants