Skip to content

Preserve OpenAI cached_tokens + Wan 2.7 R2V media types#10

Merged
duanbing merged 2 commits into
mainfrom
novita/wan-r2v-media-types
May 29, 2026
Merged

Preserve OpenAI cached_tokens + Wan 2.7 R2V media types#10
duanbing merged 2 commits into
mainfrom
novita/wan-r2v-media-types

Conversation

@duanbing

Copy link
Copy Markdown

RouterBase fork patches for the chat-LLM + media catalogue work upstream in RouterBase/RouterBase.

1. Preserve OpenAI cached_tokens through Usage (6498b88)

RouterBase routes chat LLMs (OpenAI / Anthropic / Gemini, all via Novita's OpenAI-compatible endpoint) through this gateway and needs the prompt-cache read count to bill cache reads at the discounted rate and show users their savings. Stock TZ normalizes provider usage to {input_tokens, output_tokens} and drops prompt_tokens_details.cached_tokens.

  • Usage gains cached_tokens: Option<u32> (ts-bindings + skip_serializing_if/default).
  • OpenAI provider parses prompt_tokens_details.cached_tokens in OpenAIUsage → Usage.
  • Threaded through Usage::zero() and the streaming / cross-inference aggregators (sum, None-as-0).
  • ~38 files (mostly mechanical constructor updates). Anthropic/Bedrock/Vertex native paths leave it None (out of scope — our Anthropic models use the openai-compat path).

cargo check --package tensorzero-core (lib) clean; no new clippy warnings.

2. Wan 2.7 R2V media item types (54277f0)

Pre-existing fork commit (carried on this branch): fixes the Wan 2.7 reference-to-video request shape to match the upstream enum.

Notes

  • Consumed by RouterBase/RouterBase PR #103 (submodule pointer bumped to 6498b88).
  • The chat path forwards a per-user prompt_cache_key via extra_body; this PR only handles surfacing the cache usage back to the caller.

🤖 Generated with Claude Code

duanbing and others added 2 commits May 20, 2026 13:37
The Wan 2.7 R2V (`/v3/async/wan2.7-r2v`) endpoint requires each item
in the `media` array to carry a `type` value from the enum:
  - `reference_image`
  - `reference_video`
  - `first_frame`

We were sending `image` and `video`, which Novita rejects with the
generic "failed to exec task" 500 — every R2V submission via the
playground / legacy `image_urls`+`video_urls` shape was failing
silently for that reason.

Two changes in `build_body`:

1. Repack each `image_urls[]` URL as `{type: "reference_image", url}`
   and each `video_urls[]` URL as `{type: "reference_video", url}`.
   No way to express `first_frame` or per-item `reference_voice`
   from the legacy flat shape — callers who want those use the new
   pass-through path below.

2. Pass `media` through the allowed-fields whitelist for the R2V
   shape so direct API callers / a future media-editor UI can
   submit the rich shape (`[{type, url, reference_voice?}, ...]`)
   verbatim. The `!body.contains_key("media")` guard in the repack
   block ensures the pass-through wins when both shapes are present.

Also cap the synthesised `media` array at 5 items to match Novita's
documented ceiling (combined images+videos ≤ 5), so users who upload
more get a deterministic truncate-from-front rather than a 422.
TensorZero normalized provider usage to {input_tokens, output_tokens}
and dropped OpenAI's prompt_tokens_details.cached_tokens. RouterBase
routes chat LLMs (incl. Claude/Gemini via Novita's OpenAI-compat
endpoint) through this gateway and needs the prompt-cache read count to
bill cache reads at the discounted rate and show users their savings.

- Add `cached_tokens: Option<u32>` to Usage (ts-bindings + skip-if-none).
- Parse prompt_tokens_details.cached_tokens in the OpenAI provider's
  OpenAIUsage → Usage conversion.
- Thread through Usage::zero() and the streaming/cross-inference
  aggregators (sum, treating None as 0). Anthropic/Bedrock/Vertex native
  paths leave it None (out of scope; our Anthropic models use the
  openai-compat path).

cargo check --package tensorzero-core (lib) clean; no new clippy warnings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the Contributor License Agreement (CLA) and hereby sign the CLA.


You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

@duanbing duanbing merged commit 4a182b3 into main May 29, 2026
6 of 7 checks passed
@duanbing duanbing deleted the novita/wan-r2v-media-types branch May 29, 2026 00:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant