Skip to content

feat(telemetry): implement app-extended-heartbeat event#301

Open
khanayan123 wants to merge 6 commits intomainfrom
ayan.khan/add-extended-heartbeat
Open

feat(telemetry): implement app-extended-heartbeat event#301
khanayan123 wants to merge 6 commits intomainfrom
ayan.khan/add-extended-heartbeat

Conversation

@khanayan123
Copy link
Copy Markdown

@khanayan123 khanayan123 commented Mar 31, 2026

Summary

Implement the app-extended-heartbeat telemetry event for the C++ tracer.

Motivation

Long-running services (24h+) currently only report their configuration state via the initial app-started event. If the backend misses or loses that event, there's no way to recover visibility into the SDK's configuration. The app-extended-heartbeat event solves this by re-sending the full configuration payload every 24h, ensuring reliable state reporting for long-running instances.

Implementation

The event fires periodically (default 24h) and includes the full configuration payload, matching app-started. The interval is configurable via DD_TELEMETRY_EXTENDED_HEARTBEAT_INTERVAL (integer seconds) for system test parity validation — production always uses the 24h default.

Changes

  • include/datadog/environment.h: Declare DD_TELEMETRY_EXTENDED_HEARTBEAT_INTERVAL env var (INT, default 86400)
  • include/datadog/telemetry/configuration.h: Add extended_heartbeat_interval_seconds to Configuration and extended_heartbeat_interval to FinalizedConfiguration
  • src/datadog/telemetry/configuration.cpp: Parse env var and finalize interval with validation
  • src/datadog/telemetry/telemetry_impl.h: Declare extended_heartbeat_payload() method
  • src/datadog/telemetry/telemetry_impl.cpp: Schedule recurring extended heartbeat task; build payload with full configuration
  • supported-configurations.json: Updated supported configurations manifest
  • test/telemetry/test_configuration.cpp: Test for env var parsing and default value
  • test/telemetry/test_telemetry.cpp: Test that extended heartbeat includes configuration payload

Related

Add support for the app-extended-heartbeat telemetry event per the
telemetry v2 API spec. The event fires periodically (default 24h) and
includes the full configuration payload, matching app-started.

The interval is configurable via DD_TELEMETRY_EXTENDED_HEARTBEAT_INTERVAL
(integer seconds) to enable system testing with shorter intervals.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@pr-commenter
Copy link
Copy Markdown

pr-commenter bot commented Mar 31, 2026

Benchmarks

Benchmark execution time: 2026-03-31 05:04:20

Comparing candidate commit 6766649 in PR branch ayan.khan/add-extended-heartbeat with baseline commit 910e3d5 in branch main.

Found 0 performance improvements and 1 performance regressions! Performance is the same for 0 metrics, 0 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

  • 🟩 = significantly better candidate vs. baseline
  • 🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

scenario:BM_TraceTinyCCSource

  • 🟥 execution_time [+3.079ms; +3.510ms] or [+4.050%; +4.618%]

khanayan123 and others added 2 commits March 31, 2026 00:19
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eartbeat default

Move extended heartbeat scheduling after metrics to preserve the
positional task order expected by FakeEventScheduler in tests
(heartbeat=0, metrics=1). Add default value check for
extended_heartbeat_interval in test_configuration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The FakeEventScheduler used positional indexing to identify callbacks,
which broke when the extended heartbeat task was added. Use interval
duration to distinguish metrics (<=60s) from extended heartbeat
(>60s) callbacks instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@datadog-prod-us1-6
Copy link
Copy Markdown

datadog-prod-us1-6 bot commented Mar 31, 2026

🎯 Code Coverage (details)
Patch Coverage: 75.76%
Overall Coverage: 90.74% (-0.09%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 6766649 | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!

…load

Add test that creates a telemetry instance with configuration, triggers
the extended heartbeat, and verifies the payload contains the expected
configuration entries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@khanayan123 khanayan123 marked this pull request as ready for review March 31, 2026 15:17
@khanayan123 khanayan123 requested review from a team as code owners March 31, 2026 15:17
@khanayan123 khanayan123 requested review from cataphract and removed request for a team March 31, 2026 15:17
MACRO(DD_VERSION, STRING, "") \
MACRO(DD_TRACE_128_BIT_TRACEID_GENERATION_ENABLED, BOOLEAN, true) \
MACRO(DD_TELEMETRY_HEARTBEAT_INTERVAL, DECIMAL, 10) \
MACRO(DD_TELEMETRY_EXTENDED_HEARTBEAT_INTERVAL, INT, 86400) \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this could be a decimal, so that later on you don't have to static_cast(*maybe_value), but can just deal with a double

std::string Telemetry::extended_heartbeat_payload() {
auto configuration_json = nlohmann::json::array();

for (const auto& product : config_.products) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config_.products would not reflect runtime configuration changes (e.g., via remote config) in the extended heartbeat, as this field is never updated, you'd need to introduce a new field to track config changes I think like so https://github.com/DataDog/dd-trace-cpp/pull/289/changes#diff-8e4b8c344253799b7a41954c017a79f7b026dca44849dc2dec9460120dc57a53R807

for (const auto& [_, config_metadatas] : product.configurations) {
for (const auto& config_metadata : config_metadatas) {
configuration_json.emplace_back(
generate_configuration_field(config_metadata));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure you want to call this function here, as it will increment the seq-id as if a new configuration had been added, maybe you want to split the function in 2, encode the field, and increment for new configs, like so https://github.com/DataDog/dd-trace-cpp/pull/289/changes#diff-8e4b8c344253799b7a41954c017a79f7b026dca44849dc2dec9460120dc57a53R800

}
}

auto extended_hb_msg = nlohmann::json{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REQUIRE(find_payload(message_batch["payload"], "app-heartbeat"));
}

SECTION("generates an extended heartbeat with configuration") {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you change the above to also capture configuration changes, maybe you want to test it later with remote config changes like so https://github.com/DataDog/dd-trace-cpp/pull/289/changes#diff-6a69962f102d55319c4c00418c82707a9c13a11fe0f75195e6b60fd50da7627aR914

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants