Skip to content

**Streaming: connection force-closed (TCP FIN) after [DONE] SSE event because chunked terminator is not drained — regression from 6132922c** #3440

Description

@CH3CHO

Confirm this is an issue with the Python library and not an underlying OpenAI API

  • This is an issue with the Python library

Describe the bug

When calling client.chat.completions.create(..., stream=True) (or client.responses.create(..., stream=True)), Stream.__stream__ (and AsyncStream.__stream__) breaks out of its iteration loop as soon as the underlying SSE decoder yields a data: [DONE] event, and then immediately invokes response.close() (sync) or await response.aclose() (async) inside the try/finally.

The problem is that the underlying httpx.Response.iter_bytes() may not have been read to EOF at the moment of [DONE]. Concretely, the HTTP/1.1 chunked transfer encoding terminator (0\r\n\r\n) may still be in flight (on the wire or buffered in the kernel/h11) and has not yet been delivered to the h11 state machine. When their_state is not yet h11.DONE, response.close() flows through httpcoreh11 and takes the "destroy the connection" branch — emitting an immediate TCP FIN — instead of the graceful "back to pool (IDLE)" branch.

This produces two classes of observable failure downstream:

  1. Upstream proxy (envoy / nginx / custom gateway) logs show a spike of downstream_remote_disconnect, especially for responses whose body ends in [DONE] followed shortly by the chunked terminator (the typical LLM streaming response shape).
  2. Client side occasionally raises httpcore.RemoteProtocolError / httpx.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read) on the next read, because the underlying stream was terminated before h11 finished parsing the body.

A tcpdump / wireshark capture shows: the client emits TCP FIN immediately after receiving data: [DONE], while the server-side chunked terminator (0\r\n\r\n) is still being flushed, so the terminator bytes are lost.

Expected behavior

Stream should fully drain the underlying iter_bytes() / aiter_bytes() after observing [DONE] — including consuming the HTTP/1.1 chunked terminator 0\r\n\r\n — so that h11's their_state advances to DONE before response.close() is called. With their_state == h11.DONE, httpcore/h11 takes the graceful close path (back to IDLE, connection returns to the pool) rather than the destructive one.

Actual behavior

Sync path, src/openai/_streaming.py (current main, ~line 106–110):

finally:
    # Ensure the response is closed even if the consumer doesn't read all data
    response.close()

After for sse in iterator: hits if sse.data.startswith("[DONE]"): break, control jumps directly into the finally and response.close() is called without consuming any remaining bytes from self._iter_events() / self.response.iter_bytes().

Async path, src/openai/_streaming.py (current main, ~line 218–220):

finally:
    # Ensure the response is closed even if the consumer doesn't read all data
    await response.aclose()

Same problem: async for sse in iterator: breaks on [DONE] and immediately calls aclose().

Root cause

  1. Stream._iter_events() is a thin yield from self._decoder.iter_bytes(self.response.iter_bytes()) chain. The SSE decoder yields [DONE] as soon as it sees the SSE event; the underlying byte iterator (response.iter_bytes()) is not exhausted at that point.
  2. __stream__ consumes iterator = self._iter_events() once in the for sse in iterator: loop, then breaks on [DONE]. No one continues calling __next__() afterwards.
  3. The generator unwinds, the finally block runs response.close().
  4. httpcore's HTTP11Connection._response_closed() checks self._h11_state.their_state is h11.DONE. Because the chunked terminator has not yet been parsed, their_state is still SEND_RESPONSE, the condition fails, and the connection is destroyed.
  5. HTTP11Connection.close() calls self._network_stream.close() → socket close → TCP FIN is sent upstream while the server may still be flushing the chunked terminator → downstream_remote_disconnect on the proxy.

Regression history (upstream)

  • 7e2b2544 (2023-11) — "fix(client): correctly flush the stream response body" — originally implemented the correct drain behavior.
  • 6132922c (2025-10-29) — "fix(client): close streams without requiring full consumption" — removed the drain. This is the regression point.
  • f6552d76 (2025-12-01) — wrapped the close in try/finally but did not restore the drain.
  • 6d9262d5 (current main, release 2.44.0) — regression is still present.

To Reproduce

  1. Stand up any OpenAI-compatible endpoint behind a reverse proxy that records downstream_remote_disconnect (envoy / nginx / any HTTP/1.1 chunked-forwarding gateway).
  2. Issue a streaming chat completion whose response contains several SSE events followed by data: [DONE] and the chunked terminator 0\r\n\r\n.
  3. Capture traffic with tcpdump / wireshark on the client side, or watch the proxy's downstream_remote_disconnect counter.
  4. Run the minimal repro script in the Code snippets section.
  5. Observe: the client emits FIN shortly after [DONE]; the chunked terminator is not fully drained; the proxy records downstream_remote_disconnect.

If you want a more aggressive local repro, monkey-patch httpx.Response.close to print the call stack:

import httpx, traceback
orig_close = httpx.Response.close
def traced_close(self):
    traceback.print_stack()
    return orig_close(self)
httpx.Response.close = traced_close

You will see response.close() invoked from Stream.__stream__'s finally block while self.response.iter_bytes() is still suspended inside iter_chunks / aiter_chunks and has not yet delivered the chunked terminator to h11.

Code snippets

Minimal repro (Python, against any OpenAI-compatible streaming endpoint behind a chunked HTTP/1.1 proxy):


"""
Repro: Stream.__stream__ does not drain iter_bytes() after [DONE],
so httpcore destroys the connection and the client sends TCP FIN
before the chunked terminator is consumed.

Observable symptoms:
  - Wireshark: client emits FIN immediately after [DONE]; chunked
    terminator is not yet consumed.
  - Proxy logs: spike of `downstream_remote_disconnect`.
"""
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("OPENAI_API_KEY", "sk-fake"),
    base_url=os.environ.get("OPENAI_BASE_URL"),  # routed through envoy/gateway
)

stream = client.chat.completions.create(
    model="gpt-4o-mini",
    stream=True,
    messages=[{"role": "user", "content": "Write a short poem."}],
)

# Normal consumption
for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
# generator exit → __stream__ finally → response.close() is called while
# iter_bytes() still has the chunked terminator 0\r\n\r\n buffered →
# httpcore destroys the connection → TCP FIN.
print()

OS

Windows 11 + Rancher Desktop 1.22.3 + Higress 2.2.2

Python version

Python v3.11.x (also reproducible with any v3.9+ supported by openai-python main / 2.44.0).

Library version

openai v2.44.0 (main branch HEAD 6d9262d5); identical behavior reproducible on locally installed openai 2.15.0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions