Skip to content

feat(openfeature): emit server-side EVP flagevaluation#11639

Open
leoromanovsky wants to merge 27 commits into
masterfrom
leo.romanovsky/ffl-2446-evp-flagevaluation-java
Open

feat(openfeature): emit server-side EVP flagevaluation#11639
leoromanovsky wants to merge 27 commits into
masterfrom
leo.romanovsky/ffl-2446-evp-flagevaluation-java

Conversation

@leoromanovsky

@leoromanovsky leoromanovsky commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Motivation

Customers need consistent server-side feature-flag evaluation visibility across supported runtimes so rollout behavior can be correlated with application behavior in APM and Event Platform. This Java contribution adds that server-side flagevaluation signal for Java OpenFeature evaluations while preserving the existing OTel feature_flag.evaluations path and the existing exposure telemetry path.

Changes

  • Adds the Java EVP flagevaluation writer path behind DD_FLAGGING_EVALUATION_COUNTS_ENABLED.
  • Reuses the existing Event Platform publisher and Agent plumbing instead of adding a separate EVP writer stack.
  • Routes batches through the Agent v2 EVP proxy endpoint and singular route segment: /evp_proxy/v2/api/v2/flagevaluation.
  • Disables response compression for this flagevaluation EVP path, matching the merged Go reference behavior.
  • Preserves the canonical batch shape with top-level context and flagEvaluations.
  • Keeps targeting_key as the dedicated event field and removes duplicate targetingKey from context.evaluation for flagevaluation events.
  • Moves aggregation off the synchronous OpenFeature hook path: the hook captures a bounded scalar snapshot and offers it to a bounded worker queue.
  • Aggregates by schema-visible dimensions only: flag key, variant key, allocation key, runtime-default state, error message, real targeting rule key when available, and pruned context.
  • Uses flush time for the envelope timestamp, while preserving evaluation-entry time for first_evaluation and last_evaluation.
  • Keeps the existing Java OTel flag-evaluation hook unchanged.

Decisions

  • Reuse the existing Event Platform publisher and Agent EVP proxy plumbing instead of introducing a Java-specific HTTP writer, so flagevaluation delivery follows the same Agent discovery, headers, and lifecycle model as the existing feature-flagging EVP paths.
  • Use the Agent v2 EVP proxy route for this track and disable response compression, matching the merged Go implementation and avoiding the generic latest EVP endpoint behavior for this flagevaluation path.
  • Emit one batched FlagEvaluationsRequest per flush, with a top-level context object and a flagEvaluations array, instead of sending one HTTP request per evaluation.
  • Keep the OpenFeature hook cheap: capture only a bounded scalar snapshot on the evaluation thread, then enqueue to a bounded worker queue for aggregation and flush.
  • Define aggregation cardinality only from fields serialized to the worker contract; OpenFeature reason is not a hidden aggregate key because it is not a worker field.
  • Treat targeting_key as the dedicated event field and keep it out of context.evaluation so the same identity is not encoded twice.
  • Use a two-tier aggregation model: full-fidelity buckets first, degraded buckets without targeting key/context after cap pressure, then counted drops if both tiers are exhausted.
  • Preserve evaluation-time bounds in each aggregate row while using flush time for the batch/event timestamp, matching the backend distinction between when evaluations happened and when the SDK sent the batch.

Validation Evidence

Dogfooding App

  • ffe-dogfooding app-java was rebuilt with local Java artifacts and reached PROVIDER_READY.
  • Evaluated ffe-dogfooding-string-flag through the Java dogfooding app 15 times total: 5 evaluations for each public-safe targeting key:
    • java-batch-evp-20260622T233009Z-alpha
    • java-batch-evp-20260622T233009Z-bravo
    • java-batch-evp-20260622T233009Z-charlie
  • App-side result: all 15 evaluations returned variant_2 and allocation allocation-override-392dd7c149f8.

System Tests

Staging End-To-End

  • Dogfooding ran without the local mock-intake EVP tee/proxy, so the Agent sent EVP traffic through the normal backend path.
  • Retriever staging query returned 3 flagevaluation rows for the exact targeting keys above, proving SDK aggregation/batching instead of one backend row per evaluation.
  • Each row had flag.key=ffe-dogfooding-string-flag, variant.key=variant_2, allocation.key=allocation-override-392dd7c149f8, and evaluation_count=5.
  • The three rows shared the same flush timestamp (1782171009940), while each row preserved distinct first_evaluation and last_evaluation bounds for the five evaluations in that aggregate.

…EVP writer

- Tests for identical-event aggregation (count 2, first<=last min/max)
- Test for type-tagged canonical key distinguishing int vs string
- Tests for global/degraded cap overflow and drop-counted overflow
- Test for absent variant -> runtimeDefaultUsed
- Tests for 256-field/256-char context pruning
- Tests for flush posting to 'flagevaluations' with required JSON fields
…nWriterImpl two-tier EVP writer

- FlagEvalEvent (bootstrap): lightweight data record for hook->writer channel
- FlagEvaluationWriter (bootstrap): interface with enqueue/start/close
- FlagEvaluationWriterImpl (lib): two-tier aggregation with frozen contract constants
  - GLOBAL_CAP=131072 / PER_FLAG_CAP=10000 / DEGRADED_CAP=32768
  - Canonical context key: sorted, type-tagged, length-delimited (no hash; reviewer concern #3)
  - Context pruning: <=256 fields, string values <=256 chars (reviewer concern #1)
  - Min/max first/last eval time (reviewer concern #4)
  - Absent variant -> runtimeDefaultUsed (reviewer concern #5)
  - Drop-counted overflow (reviewer concern #8)
  - Posts to 'flagevaluations' via BackendApiFactory(EVENT_PLATFORM)
  - Flush interval 10s (differs from exposure 1s)
- Add FEATURE_FLAG_EVALUATION_PROCESSOR thread to AgentThreadFactory
- Register/deregister writer with FeatureFlaggingGateway
- FlagEvaluationWriterImplTest: all 10 unit tests GREEN
- Test: enqueue called with flagKey, variant, reason (lowercased), allocationKey
- Test: evalTimeMs from dd.eval.timestamp_ms metadata (reviewer concern #4)
- Test: evalTimeMs fallback to hook-fire time when metadata absent
- Test: null value -> null variant (runtime default; reviewer concern #5)
- Test: only enqueue called, no inline aggregation (reviewer concern #7)
- Test: writer=null is no-op (killswitch off)
- Test: targetingKey extracted from evaluation context
- FlagEvalEVPHook: Hook<T> with finallyAfter() doing cheap capture only
  - Reads allocationKey from metadata.getString('allocationKey')
  - Reads eval-time from metadata.getLong('dd.eval.timestamp_ms'), fallback to hook-fire time
  - Null value -> null variant (runtime default; reviewer concern #5)
  - Lowercases reason string
  - Resolves writer lazily from FeatureFlaggingGateway (test: injected directly)
  - No aggregation on hook thread (reviewer concern #7)
- DDEvaluator: stamp dd.eval.timestamp_ms in flag metadata at resolution point
- Provider.getProviderHooks(): returns [OTel FlagEvalHook, EVP FlagEvalEVPHook]
  - OTel path preserved byte-for-byte (PRES-01 non-regression)
- FeatureFlaggingSystem: create + start FlagEvaluationWriterImpl behind killswitch
  - DD_FLAGGING_EVALUATION_COUNTS_ENABLED=false disables EVP path only
  - Default: enabled (EVP path on)
- ProviderTest: updated to expect 2 hooks (OTel + EVP)
- FlagEvalEVPHookTest: all 8 unit tests GREEN
- Apply project-required code style (google-java-format) to all new files
  in this plan before submitting
@datadog-datadog-prod-us1-2

This comment has been minimized.

@dd-octo-sts

dd-octo-sts Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

🟢 Java Benchmark SLOs — All performance SLOs passed

Suite Status
Startup 🟢 pass

SLO thresholds are defined here based on automatically generated metrics. A warning is raised when results are within 5% of the threshold.

PR vs. master results
Scenario Candidate master Δ (95% CI of mean)
startup:insecure-bank:iast:Agent 13.91 s 13.89 s [-0.6%; +0.8%] (no difference)
startup:insecure-bank:tracing:Agent 12.87 s 12.96 s [-1.5%; -0.0%] (maybe better)
startup:petclinic:appsec:Agent 16.80 s 16.16 s [-0.5%; +8.4%] (no difference)
startup:petclinic:iast:Agent 16.79 s 16.85 s [-1.3%; +0.5%] (no difference)
startup:petclinic:profiling:Agent 16.70 s 16.85 s [-2.0%; +0.3%] (no difference)
startup:petclinic:sca:Agent 16.84 s 16.75 s [-0.4%; +1.6%] (no difference)
startup:petclinic:tracing:Agent 15.94 s 16.02 s [-1.3%; +0.3%] (no difference)

Commit: 4a93c611 · CI Pipeline · Benchmarking Platform UI


Load and DaCapo benchmarks can be triggered manually in the GitLab pipeline. Results will appear in the Benchmarking Platform UI after completion.

… pruning, shutdown drain, and schema validation

- variant sourced from details.getVariant() (not the evaluated value)
- error object captured from evaluation details, schema {"message":...}
- killswitch DD_FLAGGING_EVALUATION_COUNTS_ENABLED via the tracer config system
- deterministic context pruning (sort before cut), stored pruned attrs
- observable queue-overflow drop counter; close() drains and final-flushes
- JMH hot-path benchmark; structural validation against vendored worker schema
@leoromanovsky leoromanovsky changed the title [FFL-2446] dd-trace-java: emit EVP flagevaluation (Phase 2 fan-out) feat(openfeature): emit server-side EVP flagevaluation Jun 14, 2026
… evaluation writer

The flag evaluation serializer's lastTicks and globalFullCount fields are
confined to the single serializer thread but are written from more than one
method, which SpotBugs flags as AT_NONATOMIC_64BIT_PRIMITIVE and
AT_STALE_THREAD_WRITE_OF_PRIMITIVE. Annotate both fields with
@SuppressFBWarnings and a thread-confinement justification, matching the
existing convention used across the tracer.
…uations-cross-sdk

# Conflicts:
#	products/feature-flagging/feature-flagging-lib/gradle.lockfile
@leoromanovsky leoromanovsky marked this pull request as ready for review June 23, 2026 00:23
@leoromanovsky leoromanovsky requested review from a team as code owners June 23, 2026 00:23
@leoromanovsky leoromanovsky removed the request for review from a team June 23, 2026 00:23
@leoromanovsky leoromanovsky requested review from PerfectSlayer, bric3, dd-oleksii and typotter and removed request for a team June 23, 2026 00:23
@dd-octo-sts

dd-octo-sts Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Hi! 👋 Thanks for your pull request! 🎉

To help us review it, please make sure to:

  • Add at least one type, and one component or instrumentation label to the pull request

If you need help, please check our contributing guidelines.

@leoromanovsky leoromanovsky added type: enhancement Enhancements and improvements comp: openfeature OpenFeature labels Jun 23, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3d4244f8ae

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +61 to +62
final FlagEvaluationWriter w =
injectedWriter != null ? injectedWriter : FeatureFlaggingGateway.getFlagEvalWriter();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard EVP hook lookup against missing bootstrap

When dd-openfeature is used with an older or absent tracer bootstrap on the classpath, resolving the newly added FlagEvaluationWriter/FeatureFlaggingGateway.getFlagEvalWriter() path can throw a LinkageError/NoSuchMethodError here. Because this lookup happens before the try block and the catch only handles Exception, the optional EVP hook can break flag evaluation instead of becoming the intended no-op; this needs the same kind of LinkageError guard used around the OTel hook.

Useful? React with 👍 / 👎.

Comment on lines 56 to +59
String evpProxyEndpoint = featuresDiscovery.getEvpProxyEndpoint();
if (preferredEvpProxyEndpoint != null
&& featuresDiscovery.supportsEvpProxyEndpoint(preferredEvpProxyEndpoint)) {
evpProxyEndpoint = preferredEvpProxyEndpoint;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Do not fall back to v4 for v2-only flagevaluation

When the flag-evaluation writer requests V2_EVP_PROXY_ENDPOINT, Agents that advertise only /evp_proxy/v4/ leave evpProxyEndpoint at the discovered v4 value because this condition simply skips the preference. The writer then posts to /evp_proxy/v4/api/v2/flagevaluation, despite this signal being documented in the new writer as using /evp_proxy/v2/api/v2/flagevaluation; those environments will send the new telemetry to the wrong Agent route instead of disabling it when v2 is unavailable.

Useful? React with 👍 / 👎.

@leoromanovsky leoromanovsky requested a review from a team as a code owner June 23, 2026 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: openfeature OpenFeature type: enhancement Enhancements and improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant