Summary
ImageUsage.num_images (emitted on celeste image-generation spans as celeste.usage.num_images) reflects the count of response candidates returned by the provider, not the count of actually-generated images. When a provider returns a candidate with no image content (e.g. Gemini's IMAGE_OTHER finish reason), num_images still reports 1.
Reproducer
A gemini-3.1-flash-image-preview streaming call that resolves with finish reason IMAGE_OTHER (Gemini's "image not generated, non-safety reason") emits a span like:
{
"name": "celeste.images gemini-3.1-flash-image-preview",
"attributes": {
"gen_ai.usage.input_tokens": 15,
"gen_ai.usage.total_tokens": 15,
"celeste.usage.num_images": 1,
"gen_ai.response.finish_reasons": ["IMAGE_OTHER"]
}
}
output_tokens is absent (no encoded image bytes were returned), total_tokens == input_tokens, finish reason flags failure — yet num_images: 1 suggests an image was produced. Compare to a successful gen on the same model:
{
"gen_ai.usage.input_tokens": 263,
"gen_ai.usage.output_tokens": 1350,
"celeste.usage.num_images": 1,
"gen_ai.response.finish_reasons": ["STOP"]
}
The two are observationally indistinguishable on num_images alone.
Why it matters
Cost analytics and per-call billing dashboards keying on num_images will over-count failed-but-billable image-gen attempts as if they had produced images. For per-image-priced models (Imagen) the dollar impact is direct. For token-priced image models (Gemini's flash-image-preview) num_images is the only modality-specific count emitted today.
Suggested fix
num_images should count candidates whose content actually contains a non-empty image artifact. The Stream's _aggregate_usage (or wherever num_images is computed in the image modality) should iterate response candidates and only increment when an image was returned. IMAGE_OTHER / safety-blocked / empty candidates should be excluded.
Discovered during real-call validation of #271 / #273 against Gemini.
Summary
ImageUsage.num_images(emitted on celeste image-generation spans asceleste.usage.num_images) reflects the count of response candidates returned by the provider, not the count of actually-generated images. When a provider returns a candidate with no image content (e.g. Gemini'sIMAGE_OTHERfinish reason),num_imagesstill reports1.Reproducer
A
gemini-3.1-flash-image-previewstreaming call that resolves with finish reasonIMAGE_OTHER(Gemini's "image not generated, non-safety reason") emits a span like:{ "name": "celeste.images gemini-3.1-flash-image-preview", "attributes": { "gen_ai.usage.input_tokens": 15, "gen_ai.usage.total_tokens": 15, "celeste.usage.num_images": 1, "gen_ai.response.finish_reasons": ["IMAGE_OTHER"] } }output_tokensis absent (no encoded image bytes were returned),total_tokens == input_tokens, finish reason flags failure — yetnum_images: 1suggests an image was produced. Compare to a successful gen on the same model:{ "gen_ai.usage.input_tokens": 263, "gen_ai.usage.output_tokens": 1350, "celeste.usage.num_images": 1, "gen_ai.response.finish_reasons": ["STOP"] }The two are observationally indistinguishable on
num_imagesalone.Why it matters
Cost analytics and per-call billing dashboards keying on
num_imageswill over-count failed-but-billable image-gen attempts as if they had produced images. For per-image-priced models (Imagen) the dollar impact is direct. For token-priced image models (Gemini's flash-image-preview)num_imagesis the only modality-specific count emitted today.Suggested fix
num_imagesshould count candidates whose content actually contains a non-empty image artifact. The Stream's_aggregate_usage(or wherevernum_imagesis computed in the image modality) should iterate response candidates and only increment when an image was returned.IMAGE_OTHER/ safety-blocked / empty candidates should be excluded.Discovered during real-call validation of #271 / #273 against Gemini.