feat(loadtest): migrate metrics to OpenTelemetry#3236
feat(loadtest): migrate metrics to OpenTelemetry#3236amir-deris wants to merge 1 commit intomainfrom
Conversation
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3236 +/- ##
==========================================
+ Coverage 59.26% 59.28% +0.01%
==========================================
Files 2068 2069 +1
Lines 169695 169742 +47
==========================================
+ Hits 100572 100624 +52
+ Misses 60334 60321 -13
- Partials 8789 8797 +8
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
| if err != nil { | ||
| panic(err) | ||
| } | ||
| defer func() { _ = shutdownOtel(context.Background()) }() |
There was a problem hiding this comment.
You'd want to create a context here that respects termination signals. This relates to the comment below on contexts. Example:
ctx, cancel := signal.NotifyContext(parent, os.Interrupt, syscall.SIGTERM)
This gives you a signal-aware context and a cancel method you can call if you want to manually trigger the threads using that context to gracefully stop.
There was a problem hiding this comment.
That is a great feedback. I will work on fixing the context usage.
| provider := otelmetric.NewMeterProvider(otelmetric.WithReader(exporter)) | ||
| otel.SetMeterProvider(provider) | ||
|
|
||
| meter := provider.Meter("loadtest") |
There was a problem hiding this comment.
Definitely not blocking
It may be useful to support dimensions parameters so that you can run a load test with some ID and you are able to filter metrics across load test data by it. You could pass this in via parameter or env.
I could see this helping users maintain their sanity when comparing between load test results. I sadly know from painful experience.
| loadtestTpsGauge.Record(ctx, float64(value), loadtestMetricOpts(msgType)) | ||
| } | ||
|
|
||
| func loadtestPrometheusGatherer() prometheus.Gatherer { |
| loadtestProduceCounter.Add(ctx, 1, loadtestMetricOpts(msgType)) | ||
| } | ||
|
|
||
| func incrConsumerEventCount(msgType string) { |
There was a problem hiding this comment.
For my own understanding, what does msgType usually contain? Any examples?
| func setThroughputMetricByType(metricName string, value float32, msgType string) { | ||
| ctx := context.Background() | ||
| // Legacy keys were sei, loadtest, tps, <metricName>. Loadtest only passes "tps". | ||
| if metricName != "tps" { |
There was a problem hiding this comment.
Should this just return an error since loadtest only supports tps? This will fail silently which might be intended so double checking
|
Since the |
Summary
Replaces the loadtest package's legacy
utils/metricsandsei-cosmos/telemetryinstrumentation with OpenTelemetry (OTel), exporting metrics via the standard Prometheus bridge.otel_metrics.go: initializes a process-scoped OTelMeterProviderbacked by a dedicated Prometheus registry. Instruments three metrics:sei_loadtest_produce_count,sei_loadtest_consume_count, andsei_loadtest_tps_tps.metrics.IncrProducerEventCount/IncrConsumerEventCount/SetThroughputMetricByTypecalls inloadtest_client.goandmain.gowith package-local wrappers over the OTel counters/gauge.metrics.go: removes the oldtelemetry.Metrics-based handler; the/metricsendpoint now serves from the OTel-registered Prometheus gatherer viapromhttp.run()initializes the OTel provider on startup and defers a clean shutdown.otel_metrics_scrape_test.go) to verify metrics are exposed correctly.Test plan
go test ./loadtest/...passes, including new scrape tests/metricsendpoint exposessei_loadtest_*metrics withmsg_type,service, andhostlabels