feat(openfeature): emit server-side EVP flagevaluation by leoromanovsky · Pull Request #11639 · DataDog/dd-trace-java

leoromanovsky · 2026-06-12T21:44:59Z

Motivation

Customers need consistent server-side feature-flag evaluation visibility across supported runtimes so rollout behavior can be correlated with application behavior in APM and Event Platform. This Java contribution adds that server-side flagevaluation signal for Java OpenFeature evaluations while preserving the existing OTel feature_flag.evaluations path and the existing exposure telemetry path.

Changes

Adds the Java EVP flagevaluation writer path behind DD_FLAGGING_EVALUATION_COUNTS_ENABLED.
Reuses the existing Event Platform publisher and Agent plumbing instead of adding a separate EVP writer stack.
Routes batches through the Agent v2 EVP proxy endpoint and singular route segment: /evp_proxy/v2/api/v2/flagevaluation.
Disables response compression for this flagevaluation EVP path, matching the merged Go reference behavior.
Preserves the canonical batch shape with top-level context and flagEvaluations.
Keeps targeting_key as the dedicated event field and removes duplicate targetingKey from context.evaluation for flagevaluation events.
Moves aggregation off the synchronous OpenFeature hook path: the hook captures a bounded scalar snapshot and offers it to a bounded worker queue.
Aggregates by schema-visible dimensions only: flag key, variant key, allocation key, runtime-default state, error message, real targeting rule key when available, and pruned context.
Uses flush time for the envelope timestamp, while preserving evaluation-entry time for first_evaluation and last_evaluation.
Keeps the existing Java OTel flag-evaluation hook unchanged.

Decisions

Reuse the existing Event Platform publisher and Agent EVP proxy plumbing instead of introducing a Java-specific HTTP writer, so flagevaluation delivery follows the same Agent discovery, headers, and lifecycle model as the existing feature-flagging EVP paths.
Use the Agent v2 EVP proxy route for this track and disable response compression, matching the merged Go implementation and avoiding the generic latest EVP endpoint behavior for this flagevaluation path.
Emit one batched FlagEvaluationsRequest per flush, with a top-level context object and a flagEvaluations array, instead of sending one HTTP request per evaluation.
Keep the OpenFeature hook cheap: capture only a bounded scalar snapshot on the evaluation thread, then enqueue to a bounded worker queue for aggregation and flush.
Define aggregation cardinality only from fields serialized to the worker contract; OpenFeature reason is not a hidden aggregate key because it is not a worker field.
Treat targeting_key as the dedicated event field and keep it out of context.evaluation so the same identity is not encoded twice.
Use a two-tier aggregation model: full-fidelity buckets first, degraded buckets without targeting key/context after cap pressure, then counted drops if both tiers are exhausted.
Preserve evaluation-time bounds in each aggregate row while using flush time for the batch/event timestamp, matching the backend distinction between when evaluations happened and when the SDK sent the batch.

Validation Evidence

Dogfooding App

ffe-dogfooding app-java was rebuilt with local Java artifacts and reached PROVIDER_READY.
Evaluated ffe-dogfooding-string-flag through the Java dogfooding app 15 times total: 5 evaluations for each public-safe targeting key:
- java-batch-evp-20260622T233009Z-alpha
- java-batch-evp-20260622T233009Z-bravo
- java-batch-evp-20260622T233009Z-charlie
App-side result: all 15 evaluations returned variant_2 and allocation allocation-override-392dd7c149f8.

System Tests

Companion draft PR: Enable EVP flagevaluation system tests for Java system-tests#7185

Staging End-To-End

Dogfooding ran without the local mock-intake EVP tee/proxy, so the Agent sent EVP traffic through the normal backend path.
Retriever staging query returned 3 flagevaluation rows for the exact targeting keys above, proving SDK aggregation/batching instead of one backend row per evaluation.
Each row had flag.key=ffe-dogfooding-string-flag, variant.key=variant_2, allocation.key=allocation-override-392dd7c149f8, and evaluation_count=5.
The three rows shared the same flush timestamp (1782171009940), while each row preserved distinct first_evaluation and last_evaluation bounds for the five evaluations in that aggregate.

…EVP writer - Tests for identical-event aggregation (count 2, first<=last min/max) - Test for type-tagged canonical key distinguishing int vs string - Tests for global/degraded cap overflow and drop-counted overflow - Test for absent variant -> runtimeDefaultUsed - Tests for 256-field/256-char context pruning - Tests for flush posting to 'flagevaluations' with required JSON fields

…nWriterImpl two-tier EVP writer - FlagEvalEvent (bootstrap): lightweight data record for hook->writer channel - FlagEvaluationWriter (bootstrap): interface with enqueue/start/close - FlagEvaluationWriterImpl (lib): two-tier aggregation with frozen contract constants - GLOBAL_CAP=131072 / PER_FLAG_CAP=10000 / DEGRADED_CAP=32768 - Canonical context key: sorted, type-tagged, length-delimited (no hash; reviewer concern #3) - Context pruning: <=256 fields, string values <=256 chars (reviewer concern #1) - Min/max first/last eval time (reviewer concern #4) - Absent variant -> runtimeDefaultUsed (reviewer concern #5) - Drop-counted overflow (reviewer concern #8) - Posts to 'flagevaluations' via BackendApiFactory(EVENT_PLATFORM) - Flush interval 10s (differs from exposure 1s) - Add FEATURE_FLAG_EVALUATION_PROCESSOR thread to AgentThreadFactory - Register/deregister writer with FeatureFlaggingGateway - FlagEvaluationWriterImplTest: all 10 unit tests GREEN

- Test: enqueue called with flagKey, variant, reason (lowercased), allocationKey - Test: evalTimeMs from dd.eval.timestamp_ms metadata (reviewer concern #4) - Test: evalTimeMs fallback to hook-fire time when metadata absent - Test: null value -> null variant (runtime default; reviewer concern #5) - Test: only enqueue called, no inline aggregation (reviewer concern #7) - Test: writer=null is no-op (killswitch off) - Test: targetingKey extracted from evaluation context

- FlagEvalEVPHook: Hook<T> with finallyAfter() doing cheap capture only - Reads allocationKey from metadata.getString('allocationKey') - Reads eval-time from metadata.getLong('dd.eval.timestamp_ms'), fallback to hook-fire time - Null value -> null variant (runtime default; reviewer concern #5) - Lowercases reason string - Resolves writer lazily from FeatureFlaggingGateway (test: injected directly) - No aggregation on hook thread (reviewer concern #7) - DDEvaluator: stamp dd.eval.timestamp_ms in flag metadata at resolution point - Provider.getProviderHooks(): returns [OTel FlagEvalHook, EVP FlagEvalEVPHook] - OTel path preserved byte-for-byte (PRES-01 non-regression) - FeatureFlaggingSystem: create + start FlagEvaluationWriterImpl behind killswitch - DD_FLAGGING_EVALUATION_COUNTS_ENABLED=false disables EVP path only - Default: enabled (EVP path on) - ProviderTest: updated to expect 2 hooks (OTel + EVP) - FlagEvalEVPHookTest: all 8 unit tests GREEN

- Apply project-required code style (google-java-format) to all new files in this plan before submitting

…valuation code

dd-octo-sts · 2026-06-12T22:06:16Z

🟢 Java Benchmark SLOs — All performance SLOs passed

Suite	Status
Startup	🟢 pass

SLO thresholds are defined here based on automatically generated metrics. A warning is raised when results are within 5% of the threshold.

PR vs. master results

Scenario	Candidate	master	Δ (95% CI of mean)
startup:insecure-bank:iast:Agent	13.91 s	13.89 s	[-0.6%; +0.8%] (no difference)
startup:insecure-bank:tracing:Agent	12.87 s	12.96 s	[-1.5%; -0.0%] (maybe better)
startup:petclinic:appsec:Agent	16.80 s	16.16 s	[-0.5%; +8.4%] (no difference)
startup:petclinic:iast:Agent	16.79 s	16.85 s	[-1.3%; +0.5%] (no difference)
startup:petclinic:profiling:Agent	16.70 s	16.85 s	[-2.0%; +0.3%] (no difference)
startup:petclinic:sca:Agent	16.84 s	16.75 s	[-0.4%; +1.6%] (no difference)
startup:petclinic:tracing:Agent	15.94 s	16.02 s	[-1.3%; +0.3%] (no difference)

Commit: 4a93c611 · CI Pipeline · Benchmarking Platform UI

Load and DaCapo benchmarks can be triggered manually in the GitLab pipeline. Results will appear in the Benchmarking Platform UI after completion.

… pruning, shutdown drain, and schema validation - variant sourced from details.getVariant() (not the evaluated value) - error object captured from evaluation details, schema {"message":...} - killswitch DD_FLAGGING_EVALUATION_COUNTS_ENABLED via the tracer config system - deterministic context pruning (sort before cut), stored pruned attrs - observable queue-overflow drop counter; close() drains and final-flushes - JMH hot-path benchmark; structural validation against vendored worker schema

… evaluation writer The flag evaluation serializer's lastTicks and globalFullCount fields are confined to the single serializer thread but are written from more than one method, which SpotBugs flags as AT_NONATOMIC_64BIT_PRIMITIVE and AT_STALE_THREAD_WRITE_OF_PRIMITIVE. Annotate both fields with @SuppressFBWarnings and a thread-confinement justification, matching the existing convention used across the tracer.

…hema

…uations-cross-sdk # Conflicts: # products/feature-flagging/feature-flagging-lib/gradle.lockfile

…uations-cross-sdk

…uations-cross-sdk # Conflicts: # products/feature-flagging/feature-flagging-lib/gradle.lockfile

…uations-cross-sdk

dd-octo-sts · 2026-06-23T00:23:25Z

Hi! 👋 Thanks for your pull request! 🎉

To help us review it, please make sure to:

Add at least one type, and one component or instrumentation label to the pull request

If you need help, please check our contributing guidelines.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3d4244f8ae

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-23T00:30:07Z

+    final FlagEvaluationWriter w =
+        injectedWriter != null ? injectedWriter : FeatureFlaggingGateway.getFlagEvalWriter();


Guard EVP hook lookup against missing bootstrap

When dd-openfeature is used with an older or absent tracer bootstrap on the classpath, resolving the newly added FlagEvaluationWriter/FeatureFlaggingGateway.getFlagEvalWriter() path can throw a LinkageError/NoSuchMethodError here. Because this lookup happens before the try block and the catch only handles Exception, the optional EVP hook can break flag evaluation instead of becoming the intended no-op; this needs the same kind of LinkageError guard used around the OTel hook.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-23T00:30:07Z

      String evpProxyEndpoint = featuresDiscovery.getEvpProxyEndpoint();
+      if (preferredEvpProxyEndpoint != null
+          && featuresDiscovery.supportsEvpProxyEndpoint(preferredEvpProxyEndpoint)) {
+        evpProxyEndpoint = preferredEvpProxyEndpoint;


Do not fall back to v4 for v2-only flagevaluation

When the flag-evaluation writer requests V2_EVP_PROXY_ENDPOINT, Agents that advertise only /evp_proxy/v4/ leave evpProxyEndpoint at the discovered v4 value because this condition simply skips the preference. The writer then posts to /evp_proxy/v4/api/v2/flagevaluation, despite this signal being documented in the new writer as using /evp_proxy/v2/api/v2/flagevaluation; those environments will send the new telemetry to the wrong Agent route instead of disabling it when v2 is unavailable.

Useful? React with 👍 / 👎.

leoromanovsky added 5 commits June 12, 2026 15:31

style(02-05): apply google-java-format via spotlessApply

fcc884e

- Apply project-required code style (google-java-format) to all new files in this plan before submitting

This comment has been minimized.

Sign in to view

chore(openfeature): remove internal review annotations from EVP flage…

78bbaba

…valuation code

leoromanovsky added 2 commits June 13, 2026 11:27

chore(openfeature): remove internal planning annotations

c8c95a5

leoromanovsky changed the title ~~[FFL-2446] dd-trace-java: emit EVP flagevaluation (Phase 2 fan-out)~~ feat(openfeature): emit server-side EVP flagevaluation Jun 14, 2026

leoromanovsky added 3 commits June 15, 2026 11:29

fix(openfeature): align EVP flagevaluation aggregation with worker sc…

8ef8ea4

…hema

Merge remote-tracking branch 'origin/master' into workspace/flag-eval…

8823846

…uations-cross-sdk # Conflicts: # products/feature-flagging/feature-flagging-lib/gradle.lockfile

leoromanovsky mentioned this pull request Jun 17, 2026

Enable EVP flagevaluation system tests for Java DataDog/system-tests#7158

Closed

leoromanovsky added 4 commits June 19, 2026 16:24

Merge remote-tracking branch 'origin/master' into workspace/flag-eval…

96a00db

…uations-cross-sdk

Align flagevaluation EVP contract

8bd298b

Fix flagevaluation writer test formatting

afe4d56

fix(openfeature): satisfy flagevaluation spotless

7f0cf1f

leoromanovsky mentioned this pull request Jun 22, 2026

Enable EVP flagevaluation system tests for Java DataDog/system-tests#7185

Draft

leoromanovsky added 5 commits June 22, 2026 15:39

Remove flagevaluation schema validator fixture

cc3c75e

Share feature flag EVP publishing

9cc14b3

Merge remote-tracking branch 'origin/master' into workspace/flag-eval…

fd4dcf6

…uations-cross-sdk # Conflicts: # products/feature-flagging/feature-flagging-lib/gradle.lockfile

Merge remote-tracking branch 'origin/master' into workspace/flag-eval…

74f1407

…uations-cross-sdk

fix(openfeature): align flagevaluation EVP transport

3d4244f

leoromanovsky marked this pull request as ready for review June 23, 2026 00:23

leoromanovsky requested review from a team as code owners June 23, 2026 00:23

leoromanovsky removed the request for review from a team June 23, 2026 00:23

leoromanovsky requested review from PerfectSlayer, bric3, dd-oleksii and typotter and removed request for a team June 23, 2026 00:23

leoromanovsky requested a review from manuel-alvarez-alvarez June 23, 2026 00:25

leoromanovsky added type: enhancement Enhancements and improvements comp: openfeature OpenFeature labels Jun 23, 2026

chatgpt-codex-connector Bot reviewed Jun 23, 2026

View reviewed changes

leoromanovsky added 3 commits June 23, 2026 10:50

fix(openfeature): bound flagevaluation evp payloads

64ede83

fix(openfeature): emit flagevaluation telemetry counters

033a471

refactor(openfeature): share EVP limits and clarify hooks

c211de3

leoromanovsky requested a review from a team as a code owner June 23, 2026 19:41

leoromanovsky added 4 commits June 23, 2026 16:01

refactor(openfeature): clarify flag eval logging hook

64f5951

style(feature-flagging): fix spotless formatting

6e699cf

style(openfeature): format flag eval logging test

15a04d7

test(evpproxy): use shared EVP header constant

4a93c61

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(openfeature): emit server-side EVP flagevaluation#11639

feat(openfeature): emit server-side EVP flagevaluation#11639
leoromanovsky wants to merge 27 commits into
masterfrom
leo.romanovsky/ffl-2446-evp-flagevaluation-java

leoromanovsky commented Jun 12, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

dd-octo-sts Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

dd-octo-sts Bot commented Jun 23, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 23, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		final FlagEvaluationWriter w =
		injectedWriter != null ? injectedWriter : FeatureFlaggingGateway.getFlagEvalWriter();

Conversation

leoromanovsky commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Decisions

Validation Evidence

Dogfooding App

System Tests

Staging End-To-End

Uh oh!

This comment has been minimized.

dd-octo-sts Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🟢 Java Benchmark SLOs — All performance SLOs passed

Uh oh!

dd-octo-sts Bot commented Jun 23, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

leoromanovsky commented Jun 12, 2026 •

edited

Loading

dd-octo-sts Bot commented Jun 12, 2026 •

edited

Loading