fix(openfeature): improve FFE eval metrics cross-tracer consistency by sameerank · Pull Request #4590 · DataDog/dd-trace-go

sameerank · 2026-03-24T16:46:00Z

What does this PR do?

Fixes consistency issues in FFE flag evaluation metrics to align with OpenFeature telemetry conventions and other SDK implementations:

Map errNoConfiguration to PROVIDER_NOT_READY error code
- Previously returned GENERAL, now returns provider_not_ready
- Aligns with Python, Ruby, Java, .NET, and JS implementations and is tested by Test_FFE_Eval_No_Config_Loaded in system-tests
Return TYPE_MISMATCH for NUMERIC→INTEGER conversion
- Previously returned parse_error when evaluating a NUMERIC flag as INTEGER
- Now returns type_mismatch to align with libdatadog FFE behavior
- libdatadog treats NUMERIC and INTEGER as incompatible types (bitwise type check)
Add "unknown" fallback for empty reason values
- Matches the OpenFeature SDK telemetry convention for missing reasons
Use raw lowercase error codes directly
- Removed errorCodeToTag() helper function
- OpenFeature ErrorCode values are already snake_case (e.g., FLAG_NOT_FOUND)
- Just lowercase them directly for the metric tag, similar to dd-trace-py

Motivation

FFL-1972 - Cross-tracer consistency for FFE eval metrics

Changes in dd-trace-py that deviated from dd-trace-go and resulted in needing to make this consistent: DataDog/dd-trace-py#17029

System-tests PR that enforces that result.reason and error.type are consistent: DataDog/system-tests#6545

Reviewer's Checklist

Changed code has unit tests for its functionality at or near 100% coverage.
System-Tests covering this feature have been added and enabled with the va.b.c-dev version tag.
There is a benchmark for any new code, or changes to existing code.
If this interacts with the agent in a new way, a system test has been added.
New code is free of linting errors.
New code doesn't break existing tests.
Add an appropriate team label so this PR gets put in the right place for the release notes.
All generated files are up to date.
Non-trivial go.mod changes reviewed by @DataDog/dd-trace-go-guild.

codecov · 2026-03-24T16:50:41Z

Codecov Report

❌ Patch coverage is 88.88889% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 60.67%. Comparing base (7fbda1c) to head (7712585).

Files with missing lines	Patch %	Lines
openfeature/provider.go	50.00%	1 Missing ⚠️

Additional details and impacted files

Files with missing lines	Coverage Δ
openfeature/flageval_metrics.go	`85.36% <100.00%> (-8.12%)`	⬇️
openfeature/provider.go	`74.24% <50.00%> (-0.76%)`	⬇️

... and 265 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

datadog-official · 2026-03-24T16:51:31Z

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
• Patch Coverage: 88.89%
• Overall Coverage: 60.03% (-0.06%)

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 7712585 | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!}

pr-commenter · 2026-03-24T18:48:14Z

Benchmarks

Benchmark execution time: 2026-03-28 03:21:46

Comparing candidate commit 7712585 in PR branch sameerank/FFL-1972/fix-flag-evaluation-metrics-consistency-issues with baseline commit 7fbda1c in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 217 metrics, 7 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

🟩 = significantly better candidate vs. baseline
🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

dd-oleksii · 2026-03-26T14:48:57Z

openfeature/provider.go

+		// NUMERIC and INTEGER are distinct types; reject float-to-int conversion
+		// to align with libdatadog FFE which treats them as incompatible types.
+		conversionErr = fmt.Errorf("%w: flag %q is NUMERIC but INTEGER was requested", errTypeMismatch, flagKey)


This should normally never happen. The fact that it does, means we've ignored the type field and now trying to cast arbitrary values to integer. numeric -> integer is one conversion you just fixed but there's also json, which may contain arbitrary values including integers (probably worth adding to test cases).

Return TYPE_MISMATCH for NUMERIC→INTEGER conversion

Previously returned parse_error when evaluating a NUMERIC flag as INTEGER

Now returns type_mismatch to align with libdatadog FFE behavior

We should return parse_error when configuration's value does not match the type declared in configuration (e.g. when variant type is set to integer but the value is string). Do we still handle this case?

This commit fixes three consistency issues in FFE flag evaluation metrics to align with the OpenFeature telemetry conventions and other SDK implementations: 1. Map errNoConfiguration to PROVIDER_NOT_READY error code Previously returned GENERAL, now returns provider_not_ready to match Python, Ruby, Java, .NET, and JS implementations. 2. Add "unknown" fallback for empty reason values Matches the OpenFeature SDK telemetry convention for missing reasons. 3. Use raw lowercase error codes directly Remove errorCodeToTag() helper function since OpenFeature ErrorCode values are already snake_case (e.g., FLAG_NOT_FOUND). Just lowercase them directly for the metric tag. FFL-1972 #close

NUMERIC and INTEGER are distinct flag types. Attempting to evaluate a NUMERIC flag as INTEGER should return TypeMismatch (not ParseError) to align with libdatadog FFE which treats them as incompatible types.

sameerank mentioned this pull request Mar 24, 2026

[FFL-1972] fix(ffe): enable No_Config_Loaded test for Go [golang@sameerank/FFL-1972/fix-flag-evaluation-metrics-consistency-issues] DataDog/system-tests#6577

Closed

5 tasks

sameerank mentioned this pull request Mar 24, 2026

[FFL-1942] feat(ffe): add eval metrics tests [python@main] [golang@sameerank/FFL-1972/fix-flag-evaluation-metrics-consistency-issues] DataDog/system-tests#6545

Draft

3 tasks

sameerank marked this pull request as ready for review March 24, 2026 21:08

sameerank requested review from a team as code owners March 24, 2026 21:08

sameerank requested review from dd-oleksii and greghuels March 24, 2026 21:08

dd-oleksii approved these changes Mar 26, 2026

View reviewed changes

greghuels approved these changes Mar 28, 2026

View reviewed changes

sameerank added 3 commits March 27, 2026 19:54

fix(openfeature): return type_mismatch for NUMERIC→INTEGER conversion

5caf3e8

NUMERIC and INTEGER are distinct flag types. Attempting to evaluate a NUMERIC flag as INTEGER should return TypeMismatch (not ParseError) to align with libdatadog FFE which treats them as incompatible types.

test(openfeature): remove stale comment about no-config test coverage

7712585

sameerank force-pushed the sameerank/FFL-1972/fix-flag-evaluation-metrics-consistency-issues branch from 6ebda74 to 7712585 Compare March 28, 2026 02:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(openfeature): improve FFE eval metrics cross-tracer consistency#4590

fix(openfeature): improve FFE eval metrics cross-tracer consistency#4590
sameerank wants to merge 3 commits intomainfrom
sameerank/FFL-1972/fix-flag-evaluation-metrics-consistency-issues

sameerank commented Mar 24, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 24, 2026 •

edited

Loading

Uh oh!

datadog-official bot commented Mar 24, 2026 •

edited by datadog-prod-us1-3 bot

Loading

Uh oh!

pr-commenter bot commented Mar 24, 2026 •

edited

Loading

Explanation

More details about the CI and significant changes

Uh oh!

dd-oleksii Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sameerank commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation

Reviewer's Checklist

Uh oh!

codecov bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

datadog-official bot commented Mar 24, 2026 • edited by datadog-prod-us1-3 bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pr-commenter bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Explanation

More details about the CI and significant changes

Uh oh!

dd-oleksii Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sameerank commented Mar 24, 2026 •

edited

Loading

codecov bot commented Mar 24, 2026 •

edited

Loading

datadog-official bot commented Mar 24, 2026 •

edited by datadog-prod-us1-3 bot

Loading

pr-commenter bot commented Mar 24, 2026 •

edited

Loading