Skip to content

[fal.ai/livepeer-staging] Trickle publisher logs ERROR on session teardown — 404 'Stream not found' after WebSocket disconnect #846

@livepeer-tessa

Description

@livepeer-tessa

Summary

When a WebSocket client disconnects from the scope-livepeer-staging app, the trickle_publisher continues attempting to POST to trickle channels that have already been closed on the orchestrator side. These get 404 responses and are logged at ERROR level instead of being handled as a normal teardown condition.

Error Logs (Grafana Loki, 2026-04-05 ~19:37 UTC)

2026-04-05 19:37:35,376 - livepeer_gateway.trickle_publisher - ERROR - Trickle POST failed url=https://orch-staging-1.daydream.monster:8935/ai/trickle/301bf130-1-out/1699 status=404 body='Stream not found\n'
2026-04-05 19:37:35,376 - livepeer_gateway.trickle_publisher - ERROR - Trickle publisher channel does not exist url=https://orch-staging-1.daydream.monster:8935/ai/trickle/301bf130-1-out
2026-04-05 19:37:36,512 - livepeer_gateway.trickle_publisher - ERROR - Trickle POST failed url=https://orch-staging-1.daydream.monster:8935/ai/trickle/301bf130-events/360 status=404 body='Stream not found\n'
2026-04-05 19:37:36,512 - livepeer_gateway.trickle_publisher - ERROR - Trickle publisher channel does not exist url=https://orch-staging-1.daydream.monster:8935/ai/trickle/301bf130-events

App: github_f1lhgmk5v76a0ev1w0u378by-scope-livepeer-staging
Job: 11ff1c22-00e3-47ad-b728-e6ce555b9c82
Node: f59b33dc-21f8-be39-6eb3-4c1286ad2f16

Teardown Sequence

The sequence immediately prior shows this is a graceful disconnect, not a crash:

  1. 19:37:33 - INFO: connection closed — WebSocket disconnects cleanly
  2. Session stats log normally (1338 segments consumed, 1699 segments started, 0 failures)
  3. 19:37:35–36 — Trickle publisher still running, POSTs to now-defunct channels → 404 → ERROR

Root Cause

After the WebSocket connection closes, the trickle publisher is still active and attempts to POST to channels (301bf130-1-out, 301bf130-events) that have already been torn down on the orchestrator. A 404 during teardown is expected and should be treated as a signal that the channel is gone — not an error condition.

Expected Behavior

  • 404 responses from the orchestrator during session teardown should be handled gracefully — log at DEBUG or INFO level, not ERROR
  • The publisher should detect the connection closed signal and stop attempting to publish
  • Analogous to how Error in input loop (connection closure) is already handled as a non-error

Suggested Fix

In livepeer_gateway/trickle_publisher.py, the Trickle POST failed handler should distinguish between:

  • 404 during active session → ERROR (something is wrong)
  • 404 after connection closed / teardown → INFO/DEBUG (expected, channel already gone)

Alternatively, the trickle publisher should be shut down before/when the WebSocket closes, preventing it from POSTing to already-closed channels.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions