Skip to content

Add troubleshooting guide for timeseries with NaN #164

@sed-i

Description

@sed-i

Via @jameinel via matrix.

When timeseries have NaNs,

avg_over_time(
  juju_apiserver_request_duration_seconds{
    juju_controller="$controller",
    juju_unit=~"$controller_host",
    method="FullStatus",
    quantile="0.99",error_code!~"not found|unauthorized access"
  }[5m]
)
Image

then aggregation operations produce unexpected results:

avg without (version, error_code) 
(
  juju_apiserver_request_duration_seconds{
    juju_controller="$controller",
    juju_unit=~"$controller_host",
    method="FullStatus",
    quantile="0.99",error_code!~"not found|unauthorized access"
  }[5m]
)
Image

and only when we filter out the NaNs with e.g.

avg(avg_over_time((juju_apiserver_request_duration_seconds > 0.0001)[1h:]))

then we get the expected results.

We should document:

  • How to avoid NaNs at instrumentation time.
  • Treating potential NaN in dashboard/alert exprs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions