**Streaming: connection force-closed (TCP FIN) after `[DONE]` SSE event because chunked terminator is not drained — regression from `6132922c`**

### Confirm this is an issue with the Python library and not an underlying OpenAI API

- [x] This is an issue with the Python library

### Describe the bug

When calling `client.chat.completions.create(..., stream=True)` (or `client.responses.create(..., stream=True)`), `Stream.__stream__` (and `AsyncStream.__stream__`) breaks out of its iteration loop as soon as the underlying SSE decoder yields a `data: [DONE]` event, and then immediately invokes `response.close()` (sync) or `await response.aclose()` (async) inside the `try/finally`.

The problem is that **the underlying `httpx.Response.iter_bytes()` may not have been read to EOF** at the moment of `[DONE]`. Concretely, the HTTP/1.1 chunked transfer encoding terminator (`0\r\n\r\n`) may still be in flight (on the wire or buffered in the kernel/h11) and has not yet been delivered to the h11 state machine. When `their_state` is not yet `h11.DONE`, `response.close()` flows through `httpcore` → `h11` and takes the "destroy the connection" branch — emitting an immediate TCP FIN — instead of the graceful "back to pool (IDLE)" branch.

This produces two classes of observable failure downstream:

1. **Upstream proxy** (envoy / nginx / custom gateway) logs show a spike of `downstream_remote_disconnect`, especially for responses whose body ends in `[DONE]` followed shortly by the chunked terminator (the typical LLM streaming response shape).
2. **Client side** occasionally raises `httpcore.RemoteProtocolError` / `httpx.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)` on the next read, because the underlying stream was terminated before h11 finished parsing the body.

A `tcpdump` / `wireshark` capture shows: the client emits TCP FIN immediately after receiving `data: [DONE]`, while the server-side chunked terminator (`0\r\n\r\n`) is still being flushed, so the terminator bytes are lost.

**Expected behavior**

`Stream` should fully drain the underlying `iter_bytes()` / `aiter_bytes()` after observing `[DONE]` — including consuming the HTTP/1.1 chunked terminator `0\r\n\r\n` — so that h11's `their_state` advances to `DONE` *before* `response.close()` is called. With `their_state == h11.DONE`, httpcore/h11 takes the graceful close path (back to `IDLE`, connection returns to the pool) rather than the destructive one.

**Actual behavior**

Sync path, `src/openai/_streaming.py` (current main, ~line 106–110):

```python
finally:
    # Ensure the response is closed even if the consumer doesn't read all data
    response.close()
```

After `for sse in iterator:` hits `if sse.data.startswith("[DONE]"): break`, control jumps directly into the `finally` and `response.close()` is called **without** consuming any remaining bytes from `self._iter_events()` / `self.response.iter_bytes()`.

Async path, `src/openai/_streaming.py` (current main, ~line 218–220):

```python
finally:
    # Ensure the response is closed even if the consumer doesn't read all data
    await response.aclose()
```

Same problem: `async for sse in iterator:` breaks on `[DONE]` and immediately calls `aclose()`.

**Root cause**

1. `Stream._iter_events()` is a thin `yield from self._decoder.iter_bytes(self.response.iter_bytes())` chain. The SSE decoder yields `[DONE]` as soon as it sees the SSE event; the underlying byte iterator (`response.iter_bytes()`) is *not* exhausted at that point.
2. `__stream__` consumes `iterator = self._iter_events()` once in the `for sse in iterator:` loop, then breaks on `[DONE]`. No one continues calling `__next__()` afterwards.
3. The generator unwinds, the `finally` block runs `response.close()`.
4. httpcore's `HTTP11Connection._response_closed()` checks `self._h11_state.their_state is h11.DONE`. Because the chunked terminator has not yet been parsed, `their_state` is still `SEND_RESPONSE`, the condition fails, and the connection is destroyed.
5. `HTTP11Connection.close()` calls `self._network_stream.close()` → socket close → TCP FIN is sent upstream while the server may still be flushing the chunked terminator → `downstream_remote_disconnect` on the proxy.

**Regression history (upstream)**

- `7e2b2544` (2023-11) — "fix(client): correctly flush the stream response body" — originally implemented the correct drain behavior.
- `6132922c` (2025-10-29) — "fix(client): close streams without requiring full consumption" — **removed** the drain. This is the regression point.
- `f6552d76` (2025-12-01) — wrapped the close in `try/finally` but did **not** restore the drain.
- `6d9262d5` (current main, release 2.44.0) — regression is still present.

### To Reproduce

1. Stand up any OpenAI-compatible endpoint behind a reverse proxy that records `downstream_remote_disconnect` (envoy / nginx / any HTTP/1.1 chunked-forwarding gateway).
2. Issue a streaming chat completion whose response contains several SSE events followed by `data: [DONE]` and the chunked terminator `0\r\n\r\n`.
3. Capture traffic with `tcpdump` / `wireshark` on the client side, or watch the proxy's `downstream_remote_disconnect` counter.
4. Run the minimal repro script in the **Code snippets** section.
5. Observe: the client emits FIN shortly after `[DONE]`; the chunked terminator is not fully drained; the proxy records `downstream_remote_disconnect`.

If you want a more aggressive local repro, monkey-patch `httpx.Response.close` to print the call stack:

```python
import httpx, traceback
orig_close = httpx.Response.close
def traced_close(self):
    traceback.print_stack()
    return orig_close(self)
httpx.Response.close = traced_close
```

You will see `response.close()` invoked from `Stream.__stream__`'s `finally` block while `self.response.iter_bytes()` is still suspended inside `iter_chunks` / `aiter_chunks` and has not yet delivered the chunked terminator to h11.

### Code snippets

```Python
Minimal repro (Python, against any OpenAI-compatible streaming endpoint behind a chunked HTTP/1.1 proxy):


"""
Repro: Stream.__stream__ does not drain iter_bytes() after [DONE],
so httpcore destroys the connection and the client sends TCP FIN
before the chunked terminator is consumed.

Observable symptoms:
  - Wireshark: client emits FIN immediately after [DONE]; chunked
    terminator is not yet consumed.
  - Proxy logs: spike of `downstream_remote_disconnect`.
"""
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("OPENAI_API_KEY", "sk-fake"),
    base_url=os.environ.get("OPENAI_BASE_URL"),  # routed through envoy/gateway
)

stream = client.chat.completions.create(
    model="gpt-4o-mini",
    stream=True,
    messages=[{"role": "user", "content": "Write a short poem."}],
)

# Normal consumption
for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
# generator exit → __stream__ finally → response.close() is called while
# iter_bytes() still has the chunked terminator 0\r\n\r\n buffered →
# httpcore destroys the connection → TCP FIN.
print()
```

### OS

Windows 11 + Rancher Desktop 1.22.3 + Higress 2.2.2

### Python version

Python v3.11.x (also reproducible with any v3.9+ supported by openai-python main / 2.44.0).

### Library version

openai v2.44.0 (main branch HEAD `6d9262d5`); identical behavior reproducible on locally installed openai 2.15.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Streaming: connection force-closed (TCP FIN) after `[DONE]` SSE event because chunked terminator is not drained — regression from `6132922c` #3440

Confirm this is an issue with the Python library and not an underlying OpenAI API

Describe the bug

To Reproduce

Code snippets

OS

Python version

Library version

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

**Streaming: connection force-closed (TCP FIN) after [DONE] SSE event because chunked terminator is not drained — regression from 6132922c** #3440

Description

Confirm this is an issue with the Python library and not an underlying OpenAI API

Describe the bug

To Reproduce

Code snippets

OS

Python version

Library version

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Streaming: connection force-closed (TCP FIN) after `[DONE]` SSE event because chunked terminator is not drained — regression from `6132922c` #3440