Skip to content

Overhaul set benchmarks: split Immutable / SingleThreaded, add Set.copyOf#11721

Draft
dougqh wants to merge 1 commit into
masterfrom
dougqh/set-benchmark
Draft

Overhaul set benchmarks: split Immutable / SingleThreaded, add Set.copyOf#11721
dougqh wants to merge 1 commit into
masterfrom
dougqh/set-benchmark

Conversation

@dougqh

@dougqh dougqh commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

What

Overhauls the internal-api set membership benchmarks, mirroring the map-benchmark overhaul (#11679). Replaces the single SetBenchmark with two classes that each pick the correct threading model for their use case (@State scope can't vary by @Param, so one class can't host both):

  • ImmutableSetBenchmark — fixed, read-only membership shared across threads (@State(Scope.Benchmark); sharing is realistic and contention-free since nothing mutates). Compares array / sortedArray / HashSet / TreeSet / Set.copyOf (the JDK's compact, array-backed ImmutableCollections.SetN, via CollectionUtils.tryMakeImmutableSet — what the agent actually uses for fixed config sets). hit/miss split, per-thread lookup cursor. Sets in the tracer skew strongly toward this shape.
  • SingleThreadedSetBenchmark — per-thread mutable lifecycle (@State(Scope.Thread)): create/clone + contains/iterate, plus a Collections.synchronizedSet case for the uncontended synchronization tax (each thread owns its set → monitor only ever locked by one thread → the biased-locking story, read across JVM versions). Unsynchronized HashSet is the in-harness baseline.

Why

The old SetBenchmark used a shared mutable rotation counter under @Threads(8) (turning fast structures into a contention measurement) and had a contains_treeSet that actually queried HASH_SET. The split fixes both, and the Set.copyOf case answers a real question: for our ~10 fixed static final HashSet config sites, is the JDK's compact immutable set better than HashSet on speed/footprint?

Notes

  • Run at default JVM flags, across versions. Set.copyOf only materializes the compact SetN on Java 10+ (falls back to HashSet pre-10); the synchronizedSet biased-locking delta shows across Java 11 → 17. Result blocks are intentionally empty pending a fresh multi-JVM run.
  • StringIndex (Add StringIndex: a generic open-addressed string set #11660) rows fold into these later — kept out so this lands independent of that data structure.

🤖 Generated with Claude Code

…pyOf

Mirror the map-benchmark overhaul for sets. Replace the single SetBenchmark
(shared mutable counter under @threads(8); contains_treeSet bug that queried
HASH_SET) with two classes that each pick the right threading model:

  - ImmutableSetBenchmark: fixed read-only membership shared across threads
    (@State(Scope.Benchmark)); array / sortedArray / HashSet / TreeSet /
    Set.copyOf (the JDK compact SetN the agent actually uses for config sets,
    via CollectionUtils.tryMakeImmutableSet). hit/miss split, per-thread cursor.
  - SingleThreadedSetBenchmark: per-thread mutable lifecycle
    (@State(Scope.Thread)); create/clone + contains/iterate, plus a
    Collections.synchronizedSet case for the uncontended synchronization tax
    (per-thread => bias never revoked; biased-locking story across JVMs).

StringIndex rows fold in later. Result blocks empty pending a fresh multi-JVM run.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
dougqh added a commit that referenced this pull request Jun 23, 2026
…edMapBenchmark

StringIndex's benchmark integration is moving to the dedicated benchmark PRs
(set overhaul #11721, map overhaul #11679) and will be folded in there later.
Revert both benchmark files to master so this PR is purely the StringIndex data
structure + tests. Avoids the #11679/#11721 deletions-vs-edits conflicts too.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dd-octo-sts

dd-octo-sts Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

🟢 Java Benchmark SLOs — All performance SLOs passed

Suite Status
Startup 🟢 pass

SLO thresholds are defined here based on automatically generated metrics. A warning is raised when results are within 5% of the threshold.

PR vs. master results
Scenario Candidate master Δ (95% CI of mean)
startup:insecure-bank:iast:Agent 13.97 s 13.98 s [-1.0%; +0.8%] (no difference)
startup:insecure-bank:tracing:Agent 12.88 s 12.98 s [-1.7%; +0.0%] (no difference)
startup:petclinic:appsec:Agent 16.92 s 16.74 s [+0.2%; +1.9%] (maybe worse)
startup:petclinic:iast:Agent 16.85 s 16.98 s [-1.6%; +0.1%] (no difference)
startup:petclinic:profiling:Agent 16.58 s 16.93 s [-2.9%; -1.2%] (significantly better)
startup:petclinic:sca:Agent 16.89 s 16.72 s [+0.0%; +2.1%] (maybe worse)
startup:petclinic:tracing:Agent 15.98 s 16.04 s [-1.4%; +0.6%] (no difference)

Commit: 24a9fdcc · CI Pipeline · Benchmarking Platform UI


Load and DaCapo benchmarks can be triggered manually in the GitLab pipeline. Results will appear in the Benchmarking Platform UI after completion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant