Skip to content

Handle TransferEncodingError and ClientConnectorError as graceful network disconnects#2

Open
livepeer-tessa wants to merge 1 commit intomainfrom
fix/transfer-encoding-error-graceful-disconnect
Open

Handle TransferEncodingError and ClientConnectorError as graceful network disconnects#2
livepeer-tessa wants to merge 1 commit intomainfrom
fix/transfer-encoding-error-graceful-disconnect

Conversation

@livepeer-tessa
Copy link
Copy Markdown

Summary

Fixes daydreamlive/scope#805 — media input loop and control channel fail with TransferEncodingError: 400 when an orchestrator truncates the trickle connection mid-stream, then cascades into ClientConnectorError on teardown.

Root Cause

When an orchestrator goes down or is restarted during an active session, aiohttp raises ClientPayloadError (specifically TransferEncodingError) on the open HTTP response. This is a network-level disconnect, not an application error — but it was being caught by the generic except Exception handler and re-raised as LivepeerGatewayError, propagating as an ERROR up to livepeer_app.py.

Subsequently, teardown attempts to DELETE/close the trickle channels, but the orchestrator is already unreachable — resulting in ClientConnectorError logs also at ERROR level.

Changes

channel_reader.py

  • Add aiohttp.ClientPayloadError handler in both ChannelReader and JSONLReader before the generic except Exception clause
  • Log at WARNING level and return cleanly — stops iteration without raising, so callers see a clean EOF rather than an exception

trickle_publisher.py

  • Demote ClientConnectorError in _run_delete from ERROR to DEBUG
  • When the orchestrator is already down, connection-refused during trickle DELETE is expected and not actionable

Behaviour Before/After

Before:

ERROR - Media input loop failed: Response payload is not completed: <TransferEncodingError: 400, ...>
ERROR - Control channel subscription error: Trickle JSONL subscription error: ClientPayloadError: ...>
ERROR - Trickle DELETE exception url=...
ERROR - Trickle DELETE exception url=...

After:

WARNING - Trickle JSONL channel disconnected (network): TransferEncodingError: ...
DEBUG   - Trickle DELETE: orchestrator unreachable (suppressed) url=...

Worker shuts down cleanly instead of propagating errors.

Related

…work disconnects

When an orchestrator truncates a trickle transfer mid-stream (HTTP 400 +
incomplete transfer encoding), aiohttp raises ClientPayloadError (with
TransferEncodingError as a subclass). Previously this propagated as an
application error in both ChannelReader and JSONLReader, causing noisy
ERROR-level logs and unclean session teardown.

Changes:
- channel_reader.py: Catch aiohttp.ClientPayloadError in both ChannelReader
  and JSONLReader before the generic Exception handler. Log at WARNING and
  return cleanly instead of wrapping as LivepeerGatewayError. This stops the
  error from bubbling up to livepeer_app.py's control channel handler.
- trickle_publisher.py: Demote ClientConnectorError in _run_delete from ERROR
  to DEBUG. When the orchestrator is already down, connection-refused errors
  during trickle DELETE teardown are expected and not actionable.

Fixes: daydreamlive/scope#805
Related: daydreamlive/scope#771 (similar but for EOFError on clean disconnect)
Signed-off-by: livepeer-tessa <livepeer-tessa@users.noreply.github.com>
livepeer-tessa pushed a commit to daydreamlive/scope that referenced this pull request Apr 2, 2026
…input loop

When an orchestrator truncates the trickle connection mid-stream, aiohttp
raises ClientPayloadError (subclass TransferEncodingError). Previously this
was caught by the broad 'except Exception' handler and logged at ERROR level,
causing noisy logs and unclean teardown.

- Catch aiohttp.ClientPayloadError before the generic handler; log at WARNING
  and let the finally block run the normal media_output.close() path
- Suppress ClientConnectorError during media_output.close() (logged at DEBUG)
  when the orchestrator is already unreachable at teardown time

The deeper fix (in livepeer-python-gateway channel_reader.py / trickle_publisher.py)
ensures the control channel subscription also terminates cleanly without raising.
See: livepeer/livepeer-python-gateway#2

Fixes: #805
Related: #771
Signed-off-by: livepeer-robot <robot@livepeer.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant