Fix segfaults caused by race condition between ntlm close handler and client conn cleanup by Copilot · Pull Request #3 · Securepoint/nginx-ntlm-module

Copilot · 2026-05-05T09:39:31Z

When a client connection closes, ngx_http_upstream_client_conn_cleanup posted the peer read event and then immediately moved the cache item to the free queue. When that posted event later fired, ngx_http_upstream_ntlm_close_handler operated on an item already in the free queue — removing and re-inserting it, causing queue corruption and segfaults.

Changes

ngx_http_upstream_client_conn_cleanup: Replace the post-event approach with a direct, atomic close sequence — set item->peer_connection = NULL first (acts as a sentinel), move item to free queue, then call ngx_http_upstream_ntlm_close() directly. No posted event, no window for the race.
ngx_http_upstream_ntlm_close_handler (close: label): Add a guard on item->peer_connection == NULL — if the cleanup handler already processed this item, just close the connection and return without touching the queue.

// cleanup handler — atomic, no posted event
if (item->peer_connection != NULL) {
    c = item->peer_connection;
    item->peer_connection = NULL;          // sentinel for close_handler
    ngx_queue_remove(&item->queue);
    ngx_queue_insert_head(&item->conf->free, &item->queue);
    ngx_http_upstream_ntlm_close(c);
}

// close_handler — guard against double-processing
item = c->data;
if (item->peer_connection == NULL) {
    ngx_http_upstream_ntlm_close(c);      // cleanup already moved the item
    return;
}

Original prompt

Problem

When using this module with nginx 1.28.3, segfaults occur in ngx_http_upstream_ntlm_close_handler and ngx_http_upstream_client_conn_cleanup due to a race condition between these two functions manipulating the same cache item's queue membership and peer connection.

Root Cause

When a client connection closes, ngx_http_upstream_client_conn_cleanup (line 371):

Sets item->peer_connection->read->timedout = 1
Posts item->peer_connection->read to ngx_posted_events
Removes item from the cache queue and inserts it into the free queue (lines 385-386)

Later, when the posted event fires, ngx_http_upstream_ntlm_close_handler (line 394) runs and:

Gets item = c->data — the same item that's now already in the free queue
Sets item->peer_connection = NULL
Calls ngx_queue_remove(&item->queue) — removes from the free queue (corruption!)
Inserts into conf->free again — double insertion / queue corruption

This causes three types of crashes:

Segfault in ngx_http_upstream_ntlm_close_handler accessing corrupted queue/item data
Segfault in ngx_http_upstream_handler because queue corruption causes a freed/reused connection to still be found in cache
Segfault in ngx_http_upstream_client_conn_cleanup because item->peer_connection->read is already freed

Fix

In ngx_http_upstream_client_conn_cleanup: Instead of posting an event and then moving the queue item (which creates the race), close the peer connection directly and move the queue item to free in one atomic operation. Set item->peer_connection = NULL before closing so the close handler can detect it's already been handled.
In ngx_http_upstream_ntlm_close_handler: Add a guard at the close: label — if item->peer_connection is already NULL, it means the cleanup handler already processed this item, so just close the connection and return without touching the queue.

Patched `ngx_http_upstream_client_conn_cleanup`:

static void ngx_http_upstream_client_conn_cleanup(void *data) {
    ngx_http_upstream_ntlm_cache_t *item = data;
    ngx_connection_t *c;

    ngx_log_debug2(
        NGX_LOG_DEBUG_HTTP, ngx_cycle->log, 0,
        "ntlm client connection closed %p, dropping peer connection %p",
        item->client_connection, item->peer_connection);

    if (item->peer_connection != NULL) {
        c = item->peer_connection;

        // Null out peer_connection BEFORE closing so the close_handler
        // won't double-process this item
        item->peer_connection = NULL;

        // Remove from cache queue and move to free queue
        ngx_queue_remove(&item->queue);
        ngx_queue_insert_head(&item->conf->free, &item->queue);

        // Close the peer connection directly instead of posting an event
        ngx_http_upstream_ntlm_close(c);
    }
}

Patched `ngx_http_upstream_ntlm_close_handler` (close label):

close:

    ngx_log_debug2(NGX_LOG_DEBUG_HTTP, ev->log, 0,
                   "ntlm close peer connection %p, timeout %ui", c,
                   c->read->timedout);

    item = c->data;
    conf = item->conf;

    // Guard: if peer_connection is already NULL, the cleanup handler
    // already processed this item — just close the connection.
    if (item->peer_connection == NULL) {
        ngx_http_upstream_ntlm_close(c);
        return;
    }

    item->peer_connection = NULL;
    ngx_http_upstream_ntlm_close(c);

    ngx_queue_remove(&item->queue);
    ngx_queue_insert_head(&conf->free, &item->queue);

Note: ngx_http_upstream_ntlm_close is declared static before ngx_http_upstream_client_conn_cleanup, so it may need to be forward-declared or the function order adjusted so that the cleanup function can call it.

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: We are using nginx (https://github.com/nginx/nginx/tree/release-1.28.3) with this module.
Some segfaults occure. Here are some backtraces:

core-nginx-11-65534-65534-614067-1777025800:
[New LWP 614067]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
Core was generated by `nginx: worker process '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 ngx_http_upstream_ntlm_close_handler (ev=) at ../nginx-ntlm-module/ngx_http_upstream_ntlm_module.c:463
#0 ngx_http_upstream_ntlm_close_handler (ev=) at ../nginx-ntlm-module/ngx_http_upstream_ntlm_module.c:463
#1 0x0000561a3085cb58 in ngx_event_process_posted (cycle=cycle@entry=0x561a48238c60, posted=0x561a309494f0 <ngx_posted_events>) at src/event/ngx_event_posted.c:34
#2 0x0000561a3085c695 in ngx_process_events_and_timers (cycle=cycle@entry=0x561a48238c60) at src/event/ngx_event.c:263
#3 0x0000561a30865d55 in ngx_worker_process_cycle (cycle=cycle@entry=0x561a48238c60, data=data@entry=0x0) at src/os/unix/ngx_process_cycle.c:721
#4...

This pull request was created from Copilot chat.

… conn cleanup Agent-Logs-Url: https://github.com/Securepoint/nginx-ntlm-module/sessions/39515b70-c60c-4797-87d5-a048779c5e98 Co-authored-by: dsabotta <262123149+dsabotta@users.noreply.github.com>

- Rewrites ngx_http_upstream_ntlm_module.c with four documented security invariants: I1. Atomic item release via ngx_http_upstream_ntlm_item_release() helper I2. c->read->data (not c->data) for cache item storage on idle connections I3. OOM guard: cleanup-add failure aborts cache insertion (session-hijack fix) I4. Stale-credential eviction on re-auth with established session - Fixes from all open PRs (#3, #6, #7): - Synchronous cleanup handler (no posted events) - Session-hijack via cleanup-OOM (critical) - Stale-credential reuse on established connections (high) - client_connection NULL in eviction path - Fault-injection macro NGX_NTLM_TEST_CLEANUP_NULL for testing - Adds ntlm_time and ntlm_requests directives - Adds notify peer callback pass-through - Updates README.md for v2 with security section - Updates Docker files to nginx 1.28.0 / alpine 3.20 - Adds TEST 5 (stale credential eviction) and TEST 6 (OOM guard, SKIP) to t/001-sanity.t Agent-Logs-Url: https://github.com/Securepoint/nginx-ntlm-module/sessions/3343cf49-d4a7-4188-89b8-5a89fad4a7fe Co-authored-by: matthias-lay <163420385+matthias-lay@users.noreply.github.com>

Initial plan

226c6ed

Copilot AI assigned Copilot and dsabotta May 5, 2026

Copilot started work on behalf of dsabotta May 5, 2026 09:39 View session

Fix race condition causing segfaults in ntlm close handler and client…

5734615

… conn cleanup Agent-Logs-Url: https://github.com/Securepoint/nginx-ntlm-module/sessions/39515b70-c60c-4797-87d5-a048779c5e98 Co-authored-by: dsabotta <262123149+dsabotta@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix segmentation fault in nginx due to race condition~~ Fix segfaults caused by race condition between ntlm close handler and client conn cleanup May 5, 2026

Copilot AI requested a review from dsabotta May 5, 2026 09:42

Copilot finished work on behalf of dsabotta May 5, 2026 09:42

Copilot AI mentioned this pull request May 6, 2026

Fix segfault: stop storing NTLM cache item in c->data on idle upstream connections #4

Merged

7 tasks

Copilot AI mentioned this pull request May 7, 2026

feat: nginx-ntlm-modulev2 — full rewrite with security hardening for nginx ≥ 1.25 #11

Draft

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix segfaults caused by race condition between ntlm close handler and client conn cleanup#3

Fix segfaults caused by race condition between ntlm close handler and client conn cleanup#3
Copilot wants to merge 2 commits intomasterfrom
copilot/fix-nginx-segfault-issue

Copilot AI commented May 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Problem

Root Cause

Fix

Patched ngx_http_upstream_client_conn_cleanup:

Patched ngx_http_upstream_ntlm_close_handler (close label):

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented May 5, 2026 •

edited

Loading

Patched `ngx_http_upstream_client_conn_cleanup`:

Patched `ngx_http_upstream_ntlm_close_handler` (close label):