Add OTLP HTTP MetricExporter max_export_batch_size#4576
Add OTLP HTTP MetricExporter max_export_batch_size#4576tammy-baylis-swi wants to merge 33 commits intoopen-telemetry:mainfrom
Conversation
|
I think the aiohttp-client test failure is a hiccup from the recent release, not from changes in this PR. |
| # used to write batched pb2 objects for export when finalized | ||
| split_resource_metrics = [] | ||
|
|
||
| for resource_metrics in metrics_data.resource_metrics: |
There was a problem hiding this comment.
This borrows from the 4-deep for-each loop approach in the gRPC exporter.
|
|
||
| for resource_metrics in metrics_data.resource_metrics: | ||
| split_scope_metrics = [] | ||
| split_resource_metrics.append( |
There was a problem hiding this comment.
The split_* lists store references because the HTTP protobuf representations of ResourceMetrics, ScopeMetrics, etc are not replace-able like the gRPC data classes. I'm not sure if there's a better way -- doing CopyFrom and clear is still quite involved.
...orter-otlp-proto-http/src/opentelemetry/exporter/otlp/proto/http/metric_exporter/__init__.py
Outdated
Show resolved
Hide resolved
| ) | ||
| ) | ||
|
|
||
| def _get_split_resource_metrics_pb2( |
There was a problem hiding this comment.
This helper is used to write actual pb2 objects before each yield back to the export caller.
|
|
||
| if split_resp.ok: | ||
| export_result = MetricExportResult.SUCCESS | ||
| elif self._retryable(split_resp): |
There was a problem hiding this comment.
Wouldn't we need to remove the success/failure batches from the entire list when we attempt a retry? For example, batch size of 1 and there are 2 metric datas. The first one returns retryable and the second one returns failure. When the loop returns to the outer loop, we don't want to retry the one that failed again right?
There was a problem hiding this comment.
Good call, no we don't want to retry a non-retryable failure!
I reorganized the unbatched-vs-batched and, for the latter, made batch the outer loop and delay the inner loop with breaks for clarity in f1ef6c4.
There was a problem hiding this comment.
Hi @lzchen any other changes to suggest?
|
Working on some conflict resolution with changes introduced in #4564 |
|
Hi @lzchen and @open-telemetry/python-approvers , please may I have a new review of this PR. In 25e8be9 I resolved merge conflicts which required some redesign of HTTP metrics batching to match the timeout updates in #4564. That was the main change since last approval; the The last Cassandra and Celery instrumentor test failures seem unrelated and are citing commit hashes like similar failures in other PRs. |
...orter-otlp-proto-http/src/opentelemetry/exporter/otlp/proto/http/metric_exporter/__init__.py
Outdated
Show resolved
Hide resolved
...orter-otlp-proto-http/src/opentelemetry/exporter/otlp/proto/http/metric_exporter/__init__.py
Outdated
Show resolved
Hide resolved
...orter-otlp-proto-http/src/opentelemetry/exporter/otlp/proto/http/metric_exporter/__init__.py
Outdated
Show resolved
Hide resolved
pmcollins
left a comment
There was a problem hiding this comment.
Added a couple of comments. LGTM otherwise after checks pass.
...orter-otlp-proto-http/src/opentelemetry/exporter/otlp/proto/http/metric_exporter/__init__.py
Outdated
Show resolved
Hide resolved
...orter-otlp-proto-http/src/opentelemetry/exporter/otlp/proto/http/metric_exporter/__init__.py
Outdated
Show resolved
Hide resolved
...orter-otlp-proto-http/src/opentelemetry/exporter/otlp/proto/http/metric_exporter/__init__.py
Outdated
Show resolved
Hide resolved
|
Hi @open-telemetry/python-approvers , any other revisions needed for this one? |
|
Hi again @open-telemetry/python-approvers , @open-telemetry/python-maintainers please could I have a review of this? |
Hi again, please may I have a review |
42Questions
left a comment
There was a problem hiding this comment.
Two comments re implementation.
| for split_metrics_data in split_metrics_batches: | ||
| export_result = self._export_with_retries( | ||
| split_metrics_data.SerializeToString(), | ||
| deadline_sec, | ||
| ) |
There was a problem hiding this comment.
Is it ok for data loss to happen here?
For example if there are two batches and the first one fails, and the first one succeeds.
I can see that there is a test that confirms the behaviour test_export_retries_with_batching_failure_first, would just like to make sure that it's intentional.
There was a problem hiding this comment.
It looks like the gRPC MetricExporter returns failure if any batch fails - we should probably be consistent between the exporters.
There was a problem hiding this comment.
Thank you both. In 6f5efbf I've set it to fail fast instead.
| return True | ||
|
|
||
|
|
||
| def _split_metrics_data( |
There was a problem hiding this comment.
This seems quite verbose, are the if elif chains needed?
I think all metric types (sum, histogram, gauge, etc.) have a data_points field within their data container, could we use WhichOneof() and CopyFrom() to handle them agnostically?
There was a problem hiding this comment.
Yes you're right! In c5ddba8 I cut out several elif using WhichOneof. CopyFrom didn't quite work but getattr does.
MikeGoldsmith
left a comment
There was a problem hiding this comment.
Looks good. Left one comment regarding reporting failures across batches.
| for split_metrics_data in split_metrics_batches: | ||
| export_result = self._export_with_retries( | ||
| split_metrics_data.SerializeToString(), | ||
| deadline_sec, | ||
| ) |
There was a problem hiding this comment.
It looks like the gRPC MetricExporter returns failure if any batch fails - we should probably be consistent between the exporters.
Description
Adds support for HTTP OTLPMetricExporter configurable
max_export_batch_size, like the gRPC OTLPMetricExporter already does (completed through issue #2710 with PR #2809).This is currently much longer than the gRPC version because:
Fixes #4577
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
add(1)calls to eachmax_export_batch_size=2and endpoint as Collector (debug).Does This PR Require a Contrib Repo Change?
Checklist: