Skip to content

ImageUsage.num_images counts response candidates, not generated images #274

Description

@Kamilbenkirane

Summary

ImageUsage.num_images (emitted on celeste image-generation spans as celeste.usage.num_images) reflects the count of response candidates returned by the provider, not the count of actually-generated images. When a provider returns a candidate with no image content (e.g. Gemini's IMAGE_OTHER finish reason), num_images still reports 1.

Reproducer

A gemini-3.1-flash-image-preview streaming call that resolves with finish reason IMAGE_OTHER (Gemini's "image not generated, non-safety reason") emits a span like:

{
  "name": "celeste.images gemini-3.1-flash-image-preview",
  "attributes": {
    "gen_ai.usage.input_tokens": 15,
    "gen_ai.usage.total_tokens": 15,
    "celeste.usage.num_images": 1,
    "gen_ai.response.finish_reasons": ["IMAGE_OTHER"]
  }
}

output_tokens is absent (no encoded image bytes were returned), total_tokens == input_tokens, finish reason flags failure — yet num_images: 1 suggests an image was produced. Compare to a successful gen on the same model:

{
  "gen_ai.usage.input_tokens": 263,
  "gen_ai.usage.output_tokens": 1350,
  "celeste.usage.num_images": 1,
  "gen_ai.response.finish_reasons": ["STOP"]
}

The two are observationally indistinguishable on num_images alone.

Why it matters

Cost analytics and per-call billing dashboards keying on num_images will over-count failed-but-billable image-gen attempts as if they had produced images. For per-image-priced models (Imagen) the dollar impact is direct. For token-priced image models (Gemini's flash-image-preview) num_images is the only modality-specific count emitted today.

Suggested fix

num_images should count candidates whose content actually contains a non-empty image artifact. The Stream's _aggregate_usage (or wherever num_images is computed in the image modality) should iterate response candidates and only increment when an image was returned. IMAGE_OTHER / safety-blocked / empty candidates should be excluded.

Discovered during real-call validation of #271 / #273 against Gemini.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions