Skip to content

feat(loadtest): migrate metrics to OpenTelemetry#3236

Closed
amir-deris wants to merge 1 commit intomainfrom
amir/plt-262-migrate-loadtest-metrics-to-OTEL
Closed

feat(loadtest): migrate metrics to OpenTelemetry#3236
amir-deris wants to merge 1 commit intomainfrom
amir/plt-262-migrate-loadtest-metrics-to-OTEL

Conversation

@amir-deris
Copy link
Copy Markdown
Contributor

@amir-deris amir-deris commented Apr 13, 2026

Summary

Replaces the loadtest package's legacy utils/metrics and sei-cosmos/telemetry instrumentation with OpenTelemetry (OTel), exporting metrics via the standard Prometheus bridge.

  • Adds otel_metrics.go: initializes a process-scoped OTel MeterProvider backed by a dedicated Prometheus registry. Instruments three metrics: sei_loadtest_produce_count, sei_loadtest_consume_count, and sei_loadtest_tps_tps.
  • Replaces metrics.IncrProducerEventCount / IncrConsumerEventCount / SetThroughputMetricByType calls in loadtest_client.go and main.go with package-local wrappers over the OTel counters/gauge.
  • Simplifies metrics.go: removes the old telemetry.Metrics-based handler; the /metrics endpoint now serves from the OTel-registered Prometheus gatherer via promhttp.
  • run() initializes the OTel provider on startup and defers a clean shutdown.
  • Adds scrape-level tests (otel_metrics_scrape_test.go) to verify metrics are exposed correctly.

Test plan

  • go test ./loadtest/... passes, including new scrape tests
  • Loadtest binary runs and /metrics endpoint exposes sei_loadtest_* metrics with msg_type, service, and host labels

@github-actions
Copy link
Copy Markdown

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedApr 13, 2026, 10:27 PM

@github-actions
Copy link
Copy Markdown

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedApr 13, 2026, 10:27 PM

@amir-deris amir-deris changed the title used otel in loadtest package feat(loadtest): migrate metrics to OpenTelemetry Apr 13, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 13, 2026

Codecov Report

❌ Patch coverage is 54.28571% with 32 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.28%. Comparing base (21433ce) to head (a5732c9).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
loadtest/otel_metrics.go 65.51% 12 Missing and 8 partials ⚠️
loadtest/main.go 0.00% 5 Missing ⚠️
loadtest/metrics.go 0.00% 5 Missing ⚠️
loadtest/loadtest_client.go 0.00% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3236      +/-   ##
==========================================
+ Coverage   59.26%   59.28%   +0.01%     
==========================================
  Files        2068     2069       +1     
  Lines      169695   169742      +47     
==========================================
+ Hits       100572   100624      +52     
+ Misses      60334    60321      -13     
- Partials     8789     8797       +8     
Flag Coverage Δ
sei-chain-pr 4.86% <54.28%> (?)
sei-db 70.41% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
loadtest/loadtest_client.go 0.00% <0.00%> (ø)
loadtest/main.go 2.97% <0.00%> (+2.97%) ⬆️
loadtest/metrics.go 0.00% <0.00%> (ø)
loadtest/otel_metrics.go 65.51% <65.51%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@amir-deris amir-deris requested review from bdchatham and masih April 13, 2026 22:42
if err != nil {
panic(err)
}
defer func() { _ = shutdownOtel(context.Background()) }()
Copy link
Copy Markdown

@bdchatham bdchatham Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd want to create a context here that respects termination signals. This relates to the comment below on contexts. Example:

ctx, cancel := signal.NotifyContext(parent, os.Interrupt, syscall.SIGTERM)

This gives you a signal-aware context and a cancel method you can call if you want to manually trigger the threads using that context to gracefully stop.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a great feedback. I will work on fixing the context usage.

provider := otelmetric.NewMeterProvider(otelmetric.WithReader(exporter))
otel.SetMeterProvider(provider)

meter := provider.Meter("loadtest")
Copy link
Copy Markdown

@bdchatham bdchatham Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not blocking

It may be useful to support dimensions parameters so that you can run a load test with some ID and you are able to filter metrics across load test data by it. You could pass this in via parameter or env.

I could see this helping users maintain their sanity when comparing between load test results. I sadly know from painful experience.

loadtestTpsGauge.Record(ctx, float64(value), loadtestMetricOpts(msgType))
}

func loadtestPrometheusGatherer() prometheus.Gatherer {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is this needed?

loadtestProduceCounter.Add(ctx, 1, loadtestMetricOpts(msgType))
}

func incrConsumerEventCount(msgType string) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own understanding, what does msgType usually contain? Any examples?

func setThroughputMetricByType(metricName string, value float32, msgType string) {
ctx := context.Background()
// Legacy keys were sei, loadtest, tps, <metricName>. Loadtest only passes "tps".
if metricName != "tps" {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this just return an error since loadtest only supports tps? This will fail silently which might be intended so double checking

@amir-deris
Copy link
Copy Markdown
Contributor Author

Since the loadtest package is deprecated and we are going to improve sei-load repository instead, declining this PR.

@amir-deris amir-deris closed this Apr 14, 2026
@amir-deris amir-deris deleted the amir/plt-262-migrate-loadtest-metrics-to-OTEL branch April 14, 2026 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants