Confirm this is an issue with the Python library and not an underlying OpenAI API
Describe the bug
When calling client.chat.completions.create(..., stream=True) (or client.responses.create(..., stream=True)), Stream.__stream__ (and AsyncStream.__stream__) breaks out of its iteration loop as soon as the underlying SSE decoder yields a data: [DONE] event, and then immediately invokes response.close() (sync) or await response.aclose() (async) inside the try/finally.
The problem is that the underlying httpx.Response.iter_bytes() may not have been read to EOF at the moment of [DONE]. Concretely, the HTTP/1.1 chunked transfer encoding terminator (0\r\n\r\n) may still be in flight (on the wire or buffered in the kernel/h11) and has not yet been delivered to the h11 state machine. When their_state is not yet h11.DONE, response.close() flows through httpcore → h11 and takes the "destroy the connection" branch — emitting an immediate TCP FIN — instead of the graceful "back to pool (IDLE)" branch.
This produces two classes of observable failure downstream:
- Upstream proxy (envoy / nginx / custom gateway) logs show a spike of
downstream_remote_disconnect, especially for responses whose body ends in [DONE] followed shortly by the chunked terminator (the typical LLM streaming response shape).
- Client side occasionally raises
httpcore.RemoteProtocolError / httpx.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read) on the next read, because the underlying stream was terminated before h11 finished parsing the body.
A tcpdump / wireshark capture shows: the client emits TCP FIN immediately after receiving data: [DONE], while the server-side chunked terminator (0\r\n\r\n) is still being flushed, so the terminator bytes are lost.
Expected behavior
Stream should fully drain the underlying iter_bytes() / aiter_bytes() after observing [DONE] — including consuming the HTTP/1.1 chunked terminator 0\r\n\r\n — so that h11's their_state advances to DONE before response.close() is called. With their_state == h11.DONE, httpcore/h11 takes the graceful close path (back to IDLE, connection returns to the pool) rather than the destructive one.
Actual behavior
Sync path, src/openai/_streaming.py (current main, ~line 106–110):
finally:
# Ensure the response is closed even if the consumer doesn't read all data
response.close()
After for sse in iterator: hits if sse.data.startswith("[DONE]"): break, control jumps directly into the finally and response.close() is called without consuming any remaining bytes from self._iter_events() / self.response.iter_bytes().
Async path, src/openai/_streaming.py (current main, ~line 218–220):
finally:
# Ensure the response is closed even if the consumer doesn't read all data
await response.aclose()
Same problem: async for sse in iterator: breaks on [DONE] and immediately calls aclose().
Root cause
Stream._iter_events() is a thin yield from self._decoder.iter_bytes(self.response.iter_bytes()) chain. The SSE decoder yields [DONE] as soon as it sees the SSE event; the underlying byte iterator (response.iter_bytes()) is not exhausted at that point.
__stream__ consumes iterator = self._iter_events() once in the for sse in iterator: loop, then breaks on [DONE]. No one continues calling __next__() afterwards.
- The generator unwinds, the
finally block runs response.close().
- httpcore's
HTTP11Connection._response_closed() checks self._h11_state.their_state is h11.DONE. Because the chunked terminator has not yet been parsed, their_state is still SEND_RESPONSE, the condition fails, and the connection is destroyed.
HTTP11Connection.close() calls self._network_stream.close() → socket close → TCP FIN is sent upstream while the server may still be flushing the chunked terminator → downstream_remote_disconnect on the proxy.
Regression history (upstream)
7e2b2544 (2023-11) — "fix(client): correctly flush the stream response body" — originally implemented the correct drain behavior.
6132922c (2025-10-29) — "fix(client): close streams without requiring full consumption" — removed the drain. This is the regression point.
f6552d76 (2025-12-01) — wrapped the close in try/finally but did not restore the drain.
6d9262d5 (current main, release 2.44.0) — regression is still present.
To Reproduce
- Stand up any OpenAI-compatible endpoint behind a reverse proxy that records
downstream_remote_disconnect (envoy / nginx / any HTTP/1.1 chunked-forwarding gateway).
- Issue a streaming chat completion whose response contains several SSE events followed by
data: [DONE] and the chunked terminator 0\r\n\r\n.
- Capture traffic with
tcpdump / wireshark on the client side, or watch the proxy's downstream_remote_disconnect counter.
- Run the minimal repro script in the Code snippets section.
- Observe: the client emits FIN shortly after
[DONE]; the chunked terminator is not fully drained; the proxy records downstream_remote_disconnect.
If you want a more aggressive local repro, monkey-patch httpx.Response.close to print the call stack:
import httpx, traceback
orig_close = httpx.Response.close
def traced_close(self):
traceback.print_stack()
return orig_close(self)
httpx.Response.close = traced_close
You will see response.close() invoked from Stream.__stream__'s finally block while self.response.iter_bytes() is still suspended inside iter_chunks / aiter_chunks and has not yet delivered the chunked terminator to h11.
Code snippets
Minimal repro (Python, against any OpenAI-compatible streaming endpoint behind a chunked HTTP/1.1 proxy):
"""
Repro: Stream.__stream__ does not drain iter_bytes() after [DONE],
so httpcore destroys the connection and the client sends TCP FIN
before the chunked terminator is consumed.
Observable symptoms:
- Wireshark: client emits FIN immediately after [DONE]; chunked
terminator is not yet consumed.
- Proxy logs: spike of `downstream_remote_disconnect`.
"""
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("OPENAI_API_KEY", "sk-fake"),
base_url=os.environ.get("OPENAI_BASE_URL"), # routed through envoy/gateway
)
stream = client.chat.completions.create(
model="gpt-4o-mini",
stream=True,
messages=[{"role": "user", "content": "Write a short poem."}],
)
# Normal consumption
for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
# generator exit → __stream__ finally → response.close() is called while
# iter_bytes() still has the chunked terminator 0\r\n\r\n buffered →
# httpcore destroys the connection → TCP FIN.
print()
OS
Windows 11 + Rancher Desktop 1.22.3 + Higress 2.2.2
Python version
Python v3.11.x (also reproducible with any v3.9+ supported by openai-python main / 2.44.0).
Library version
openai v2.44.0 (main branch HEAD 6d9262d5); identical behavior reproducible on locally installed openai 2.15.0.
Confirm this is an issue with the Python library and not an underlying OpenAI API
Describe the bug
When calling
client.chat.completions.create(..., stream=True)(orclient.responses.create(..., stream=True)),Stream.__stream__(andAsyncStream.__stream__) breaks out of its iteration loop as soon as the underlying SSE decoder yields adata: [DONE]event, and then immediately invokesresponse.close()(sync) orawait response.aclose()(async) inside thetry/finally.The problem is that the underlying
httpx.Response.iter_bytes()may not have been read to EOF at the moment of[DONE]. Concretely, the HTTP/1.1 chunked transfer encoding terminator (0\r\n\r\n) may still be in flight (on the wire or buffered in the kernel/h11) and has not yet been delivered to the h11 state machine. Whentheir_stateis not yeth11.DONE,response.close()flows throughhttpcore→h11and takes the "destroy the connection" branch — emitting an immediate TCP FIN — instead of the graceful "back to pool (IDLE)" branch.This produces two classes of observable failure downstream:
downstream_remote_disconnect, especially for responses whose body ends in[DONE]followed shortly by the chunked terminator (the typical LLM streaming response shape).httpcore.RemoteProtocolError/httpx.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)on the next read, because the underlying stream was terminated before h11 finished parsing the body.A
tcpdump/wiresharkcapture shows: the client emits TCP FIN immediately after receivingdata: [DONE], while the server-side chunked terminator (0\r\n\r\n) is still being flushed, so the terminator bytes are lost.Expected behavior
Streamshould fully drain the underlyingiter_bytes()/aiter_bytes()after observing[DONE]— including consuming the HTTP/1.1 chunked terminator0\r\n\r\n— so that h11'stheir_stateadvances toDONEbeforeresponse.close()is called. Withtheir_state == h11.DONE, httpcore/h11 takes the graceful close path (back toIDLE, connection returns to the pool) rather than the destructive one.Actual behavior
Sync path,
src/openai/_streaming.py(current main, ~line 106–110):After
for sse in iterator:hitsif sse.data.startswith("[DONE]"): break, control jumps directly into thefinallyandresponse.close()is called without consuming any remaining bytes fromself._iter_events()/self.response.iter_bytes().Async path,
src/openai/_streaming.py(current main, ~line 218–220):Same problem:
async for sse in iterator:breaks on[DONE]and immediately callsaclose().Root cause
Stream._iter_events()is a thinyield from self._decoder.iter_bytes(self.response.iter_bytes())chain. The SSE decoder yields[DONE]as soon as it sees the SSE event; the underlying byte iterator (response.iter_bytes()) is not exhausted at that point.__stream__consumesiterator = self._iter_events()once in thefor sse in iterator:loop, then breaks on[DONE]. No one continues calling__next__()afterwards.finallyblock runsresponse.close().HTTP11Connection._response_closed()checksself._h11_state.their_state is h11.DONE. Because the chunked terminator has not yet been parsed,their_stateis stillSEND_RESPONSE, the condition fails, and the connection is destroyed.HTTP11Connection.close()callsself._network_stream.close()→ socket close → TCP FIN is sent upstream while the server may still be flushing the chunked terminator →downstream_remote_disconnecton the proxy.Regression history (upstream)
7e2b2544(2023-11) — "fix(client): correctly flush the stream response body" — originally implemented the correct drain behavior.6132922c(2025-10-29) — "fix(client): close streams without requiring full consumption" — removed the drain. This is the regression point.f6552d76(2025-12-01) — wrapped the close intry/finallybut did not restore the drain.6d9262d5(current main, release 2.44.0) — regression is still present.To Reproduce
downstream_remote_disconnect(envoy / nginx / any HTTP/1.1 chunked-forwarding gateway).data: [DONE]and the chunked terminator0\r\n\r\n.tcpdump/wiresharkon the client side, or watch the proxy'sdownstream_remote_disconnectcounter.[DONE]; the chunked terminator is not fully drained; the proxy recordsdownstream_remote_disconnect.If you want a more aggressive local repro, monkey-patch
httpx.Response.closeto print the call stack:You will see
response.close()invoked fromStream.__stream__'sfinallyblock whileself.response.iter_bytes()is still suspended insideiter_chunks/aiter_chunksand has not yet delivered the chunked terminator to h11.Code snippets
OS
Windows 11 + Rancher Desktop 1.22.3 + Higress 2.2.2
Python version
Python v3.11.x (also reproducible with any v3.9+ supported by openai-python main / 2.44.0).
Library version
openai v2.44.0 (main branch HEAD
6d9262d5); identical behavior reproducible on locally installed openai 2.15.0.