ImageUsage.num_images counts response candidates, not generated images

## Summary

`ImageUsage.num_images` (emitted on celeste image-generation spans as `celeste.usage.num_images`) reflects the count of response candidates returned by the provider, not the count of actually-generated images. When a provider returns a candidate with no image content (e.g. Gemini's `IMAGE_OTHER` finish reason), `num_images` still reports `1`.

## Reproducer

A `gemini-3.1-flash-image-preview` streaming call that resolves with finish reason `IMAGE_OTHER` (Gemini's "image not generated, non-safety reason") emits a span like:

```json
{
  "name": "celeste.images gemini-3.1-flash-image-preview",
  "attributes": {
    "gen_ai.usage.input_tokens": 15,
    "gen_ai.usage.total_tokens": 15,
    "celeste.usage.num_images": 1,
    "gen_ai.response.finish_reasons": ["IMAGE_OTHER"]
  }
}
```

`output_tokens` is absent (no encoded image bytes were returned), `total_tokens == input_tokens`, finish reason flags failure — yet `num_images: 1` suggests an image was produced. Compare to a successful gen on the same model:

```json
{
  "gen_ai.usage.input_tokens": 263,
  "gen_ai.usage.output_tokens": 1350,
  "celeste.usage.num_images": 1,
  "gen_ai.response.finish_reasons": ["STOP"]
}
```

The two are observationally indistinguishable on `num_images` alone.

## Why it matters

Cost analytics and per-call billing dashboards keying on `num_images` will over-count failed-but-billable image-gen attempts as if they had produced images. For per-image-priced models (Imagen) the dollar impact is direct. For token-priced image models (Gemini's flash-image-preview) `num_images` is the only modality-specific count emitted today.

## Suggested fix

`num_images` should count candidates whose content actually contains a non-empty image artifact. The Stream's `_aggregate_usage` (or wherever `num_images` is computed in the image modality) should iterate response candidates and only increment when an image was returned. `IMAGE_OTHER` / safety-blocked / empty candidates should be excluded.

Discovered during real-call validation of #271 / #273 against Gemini.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ImageUsage.num_images counts response candidates, not generated images #274

Summary

Reproducer

Why it matters

Suggested fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

ImageUsage.num_images counts response candidates, not generated images #274

Description

Summary

Reproducer

Why it matters

Suggested fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions