Skip to content

feat(novita): 1h async media poll budget for a single generation#11

Merged
duanbing merged 1 commit into
mainfrom
feat/novita-async-1h-timeout
Jun 1, 2026
Merged

feat(novita): 1h async media poll budget for a single generation#11
duanbing merged 1 commit into
mainfrom
feat/novita-async-1h-timeout

Conversation

@duanbing

@duanbing duanbing commented Jun 1, 2026

Copy link
Copy Markdown

What

Splits the overloaded REQUEST_TIMEOUT constant in the Novita provider:

  • REQUEST_TIMEOUT (300s) — stays the per-HTTP-request timeout (create-task submit, single poll fetch).
  • ASYNC_TASK_TIMEOUT (3600s / 1h) — new constant, the total poll-loop deadline for one async media generation.

Why

Heavy models (e.g. sora_2_pro_i2v) routinely render past the old 300s budget, producing spurious Novita async task … did not complete within 300s failures even though the upstream render eventually succeeds.

On the RouterBase side this generation is now single-attempt, no-retry (video_submitter MAX_ATTEMPTS=1), so this is the one and only budget for a request — one upstream Novita task, no throwaway re-submissions spawning fresh task ids.

Paired change

Requires the RouterBase parent PR that:

  • pins this submodule commit,
  • sets MAX_ATTEMPTS=1 and raises the worker reaper above 1h (STUCK_TIMEOUT_MINUTES=70),
  • adds 60s TCP keepalive on the backend→gateway client (the connection sits idle for up to 1h while the gateway polls Novita),
  • surfaces in-flight generations in /logs.

Risk

A single submission now holds an idle connection to the gateway open for up to 1h. Backend-side keepalive mitigates conntrack/idle drops; worth watching connection counts at high concurrency on first real 1h render.

🤖 Generated with Claude Code

…tion

Split the overloaded REQUEST_TIMEOUT const: it stays the per-HTTP-request
timeout (create-task submit, single poll fetch); a new ASYNC_TASK_TIMEOUT
(3600s) governs the total poll-loop deadline for one async media generation.

Heavy models (e.g. sora_2_pro_i2v) routinely render past the old 300s budget,
producing spurious "did not complete within 300s" timeouts. RouterBase no
longer retries these (MAX_ATTEMPTS=1), so this is the single, final budget —
one upstream task per request, no throwaway re-submissions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the Contributor License Agreement (CLA) and hereby sign the CLA.


You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

@duanbing duanbing merged commit 6877b93 into main Jun 1, 2026
6 of 7 checks passed
@duanbing duanbing deleted the feat/novita-async-1h-timeout branch June 1, 2026 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant