Skip to content

feat: Phase 1 & 2 — Read/Write Splitting Configuration Infrastructure & Query Routing#459

Draft
Copilot wants to merge 27 commits intomainfrom
copilot/implement-read-write-splitting-phase-1
Draft

feat: Phase 1 & 2 — Read/Write Splitting Configuration Infrastructure & Query Routing#459
Copilot wants to merge 27 commits intomainfrom
copilot/implement-read-write-splitting-phase-1

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 25, 2026

  • Add INFO investigation logs (TODO: change to DEBUG) in resolveReadReplicaDataSource for each routing decision branch
  • Add INFO investigation logs in Session.hasActiveTransaction (result + primary connection state)
  • Add INFO investigation logs in Session.getOrCreateReplicaConnection (new vs reused, which datasource)
  • Add INFO investigation logs in ReadWriteDataSourceRegistry.isStickyActive (elapsed/timeout values)
  • Add INFO investigation log in ReadWriteDataSourceRegistry.markWrite (when sticky is marked, with timeout)
  • Upgrade routing-decision log in executeQueryInternal from debug to INFO with TODO
  • Verified build compiles cleanly and checkstyle passes
  • Implement dual-lazy connection design: Session holds primaryDataSource/replicaDataSource, connections acquired lazily
  • Fix driver property forwarding (ojp.* and *.ojp.* from info map)
  • Fix getOrCreateReplicaConnection(DataSource fallback) overload for sessions created without a replica datasource
  • Root cause identified: connectionHashMap[clientUUID] is stale (points to replica's connHash)
  • Fix: Add createSession(clientUUID, connHash, Connection) and createSession(clientUUID, connHash, DataSource, DataSource) overloads
  • Update SessionConnectionHelper replica and primary paths to pass sessionInfo.getConnHash()
  • Update StartTransactionAction to pass sessionInfo.getConnHash()
  • Convert all [RW-SPLIT] investigation logs from INFO to DEBUG and remove all associated TODO comments
  • Run and verify failing tests pass with the fix — all 10 H2ReadWriteSplittingEndToEndTest pass, 730 server unit tests pass, checkstyle clean
  • Remove spurious throws InterruptedException from shouldExpireAfterTimeoutElapses test method
  • Clarify LEAST_CONNECTIONS documentation — note it falls back to ROUND_ROBIN; RANDOM is fully implemented
  • Add OjpSystemPropertiesBridgeTest coverage for all new read/write splitting properties (kebab-case conversion, camelCase passthrough, end-to-end system property forwarding)
  • Remove stale "(2) callProxy swallows exceptions" point from disabled test Javadoc (fixed in main)
  • Simplify testAfterTransactionCommit_ReadsGoToPrimary_WithNoStickySession Javadoc — remove setAutoCommit(true) propagation explanation
  • Remove connection.setAutoCommit(true) line and its comment from test body
  • Extract ParsedConfig holder + parseReadWriteConfig helper in ReadWriteDataSourceManagerconvertPropertiesToJava is now called from a single place

Copilot AI and others added 3 commits April 25, 2026 12:33
- Add readwrite/ package: ReadWriteConfiguration, ReadWriteConfigurationParser,
  ReadWriteDataSourceRegistry, ReadWriteDataSourceManager
- Wire registry into ActionContext, StatementServiceImpl, ConnectAction
- Update DatasourcePropertiesLoader to forward full .ojp.* keys to server
- Add unit tests for all 3 server config classes
- Add H2ReadWriteSplittingEndToEndTest integration test

Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/f22a5174-b9f5-4721-829c-9d776642a2bd

Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
@rrobetti
Copy link
Copy Markdown
Contributor

@copilot before continuing merge main branch into this branch.

…d-write-splitting-phase-1

Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 25, 2026

@copilot before continuing merge main branch into this branch.

Done — merged main (commit fb68830) into this branch. Clean merge, no conflicts. Only .github/copilot-instructions.md was brought in from main.

@rrobetti
Copy link
Copy Markdown
Contributor

@copilot now fix the compilations issues in the code with the new .github/copilot-instructions.md I am hoping you will be able to use java 21 to compile.

…ionContext constructor

Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/2f02f94b-5063-4f3a-9eae-4286c2b1e2cf

Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 25, 2026

@copilot now fix the compilations issues in the code with the new .github/copilot-instructions.md I am hoping you will be able to use java 21 to compile.

Fixed in commit 6862076. The compilation errors were in two existing tests (ConnectActionConcurrencyTest and HandleUnpooledXAConnectionActionTest) that weren't updated when ReadWriteDataSourceRegistry was added as the 8th parameter to the ActionContext constructor. Both tests now pass the new parameter. BUILD SUCCESS confirmed with Java 21.

Copy link
Copy Markdown
Contributor

@rrobetti rrobetti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot execute phase 2 as well as the small fixes in my comments. Make sure you follow sonar link standards.


@Test
void testValidation_NullPrimaryName() {
assertThrows(NullPointerException.class, () -> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Refactor the code of the lambda to have only one invocation possibly throwing a runtime exception.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 4afb87e. All 5 assertThrows lambdas in ReadWriteConfigurationTest now contain only a single invocation that can throw — the builder is constructed outside the lambda and only builder::build (the method reference) is passed to assertThrows.


@Test
void testValidation_EmptyPrimaryName() {
assertThrows(IllegalArgumentException.class, () -> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Refactor the code of the lambda to have only one invocation possibly throwing a runtime exception.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 4afb87e.


@Test
void testValidation_EnabledWithoutReplicas() {
assertThrows(IllegalArgumentException.class, () -> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Refactor the code of the lambda to have only one invocation possibly throwing a runtime exception.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 4afb87e.


@Test
void testValidation_NegativeStickySessionSeconds() {
assertThrows(IllegalArgumentException.class, () -> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Refactor the code of the lambda to have only one invocation possibly throwing a runtime exception.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 4afb87e.


@Test
void testValidation_DuplicateReplicas() {
assertThrows(IllegalArgumentException.class, () -> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Refactor the code of the lambda to have only one invocation possibly throwing a runtime exception.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 4afb87e.

…r, sticky session, query routing

Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/b0f48dbd-89f2-4f81-b0df-4caee408691b

Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 25, 2026

@copilot execute phase 2 as well as the small fixes in my comments. Make sure you follow sonar link standards.

Done in commit 4afb87e. Phase 2 routing is implemented:

New classes:

  • ReadWriteSqlClassifier — classifies SQL as READ (SELECT/WITH/EXPLAIN/SHOW/DESCRIBE/DESC) or WRITE using proper word-boundary detection
  • ReadReplicaSelector — selects a replica using ROUND_ROBIN, RANDOM, or LEAST_CONNECTIONS (falls back to ROUND_ROBIN in Phase 2)

Extended:

  • ReadWriteDataSourceRegistry — sticky session tracking (markWrite, isStickyActive) and strategy storage (registerStrategy, getStrategy)
  • ReadWriteDataSourceManager — registers replica selection strategy during setup
  • SessionConnectionHelper — overloaded sessionConnection accepting an optional DataSource override for replica routing
  • ExecuteQueryAction — routes stateless read-only SQL to the selected replica; skips routing when inside a transaction or sticky session is active
  • ExecuteUpdateAction — marks sticky session after every successful write

Unit tests: 730 server tests pass. Sonar-flagged issues also addressed: integer overflow in ReadReplicaSelector, DESC/DESCRIBE word-boundary consistency, and removed fully-qualified java.util.List reference.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • 999.999.999.999
    • Triggering command: /usr/lib/jvm/temurin-21-jdk-amd64/bin/java /usr/lib/jvm/temurin-21-jdk-amd64/bin/java -jar /home/REDACTED/work/ojp/ojp/ojp-server/target/surefire/surefirebooter-20260425131714688_3.jar /home/REDACTED/work/ojp/ojp/ojp-server/target/surefire 2026-04-25T13-17-14_643-jvmRun1 surefire-20260425131714688_1tmp surefire_0-20260425131714688_2tmp (dns block)
    • Triggering command: `/usr/lib/jvm/temurin-21-jdk-amd64/bin/java /usr/lib/jvm/temurin-21-jdk-amd64/bin/java -jar /home/REDACTED/work/ojp/ojp/ojp-server/target/surefire/surefirebooter-20260425132153917_3.jar /home/REDACTED/work/ojp/ojp/ojp-server/target/surefire 2026-04-25T13-21-53_855-jvmRun1 surefire-20260425132153917_1tmp surefire_0-20260425132153917_2tmp STCONTAINER_GUIDE.md on_plans/SCHEMA_git -action-pattern-show -use-hikaricp.md68620762dd053b75b722f1d3d58009d6c0ca32a1:ojp-server/src/main/java/org/openjproxy/grpc/server/action/transaction/ExecuteQueryAction.java -use-java.md -use�� -use-grpc.md selector, sticky session, query routing

Agent-Logs-Url: REDACTED -use-caffeine-for-caching.md -adopt-spi-patte/opt/hostedtoolcache/CodeQL/2.25.1/x64/codeql/java/tools/autobuild.sh -implement-jdbc-interface.md DECIMAL_WIRE_FORMAT.md ltinode-architecture.md` (dns block)

  • invalid-ip
    • Triggering command: /usr/lib/jvm/temurin-21-jdk-amd64/bin/java /usr/lib/jvm/temurin-21-jdk-amd64/bin/java -jar /home/REDACTED/work/ojp/ojp/ojp-server/target/surefire/surefirebooter-20260425131714688_3.jar /home/REDACTED/work/ojp/ojp/ojp-server/target/surefire 2026-04-25T13-17-14_643-jvmRun1 surefire-20260425131714688_1tmp surefire_0-20260425131714688_2tmp (dns block)
    • Triggering command: `/usr/lib/jvm/temurin-21-jdk-amd64/bin/java /usr/lib/jvm/temurin-21-jdk-amd64/bin/java -jar /home/REDACTED/work/ojp/ojp/ojp-server/target/surefire/surefirebooter-20260425132153917_3.jar /home/REDACTED/work/ojp/ojp/ojp-server/target/surefire 2026-04-25T13-21-53_855-jvmRun1 surefire-20260425132153917_1tmp surefire_0-20260425132153917_2tmp STCONTAINER_GUIDE.md on_plans/SCHEMA_git -action-pattern-show -use-hikaricp.md68620762dd053b75b722f1d3d58009d6c0ca32a1:ojp-server/src/main/java/org/openjproxy/grpc/server/action/transaction/ExecuteQueryAction.java -use-java.md -use�� -use-grpc.md selector, sticky session, query routing

Agent-Logs-Url: REDACTED -use-caffeine-for-caching.md -adopt-spi-patte/opt/hostedtoolcache/CodeQL/2.25.1/x64/codeql/java/tools/autobuild.sh -implement-jdbc-interface.md DECIMAL_WIRE_FORMAT.md ltinode-architecture.md` (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

…eger overflow in round-robin, remove FQ List type

Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/b0f48dbd-89f2-4f81-b0df-4caee408691b

Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
Copilot AI changed the title feat: Phase 1 — Read/Write Splitting Configuration Infrastructure feat: Phase 1 & 2 — Read/Write Splitting Configuration Infrastructure & Query Routing Apr 25, 2026
Copilot AI requested a review from rrobetti April 25, 2026 13:24
@rrobetti
Copy link
Copy Markdown
Contributor

@copilot investigate and root cause, report with level of confidence, suggest solution or solutions, do not change code yet:

Error: Failures:
Error: H2ReadWriteSplittingEndToEndTest.testMultipleReads_AllGoToReplica:217 SELECT #0 should route to replica (id=2) ==> expected: <2> but was: <1>
Error: H2ReadWriteSplittingEndToEndTest.testSelectGoesToReplica_WithoutTransaction:194 SELECT outside transaction should route to replica (id=2) ==> expected: <2> but was: <1>
Error: H2ReadWriteSplittingEndToEndTest.testWriteThenRead_WithStickySession_SeesWrite:389 After sticky session expires, read should go to replica and not see the write ==> expected: <0> but was: <1>
Error: H2ReadWriteSplittingEndToEndTest.testWriteThenRead_WithoutStickySession_DoesNotSeeWrite:328 Replica should not yet have the row just inserted into the primary ==> expected: <0> but was: <1>
[INFO]
Error: Tests run: 996, Failures: 4, Errors: 0, Skipped: 682

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 25, 2026

@copilot investigate and root cause, report with level of confidence, suggest solution or solutions, do not change code yet:

Error: Failures:
...

Root cause analysis — two distinct bugs:


Root Cause 1 — Default sticky-session of 5 seconds (confidence: ~95%)

Affects: testSelectGoesToReplica_WithoutTransaction, testMultipleReads_AllGoToReplica, testWriteThenRead_WithoutStickySession_DoesNotSeeWrite

In ReadWriteConfigurationParser.parseForPrimary (line 117):

int stickySessionSeconds = getIntProperty(properties, prefix + STICKY_SESSION_SUFFIX, 5);

The default is 5 seconds, applied even when *.ojp.readwrite.stickySessionSeconds is not set. The test's primaryProps() never sets rw_e2e_ds.ojp.readwrite.stickySessionSeconds, so a 5-second sticky window is silently active.

Exact flow:

  1. setupDatabases() executes DROP TABLE, CREATE TABLE, INSERT via ExecuteUpdateAction → each calls markWrite("rw_e2e_ds")lastWriteTimestamps["rw_e2e_ds"] = now.
  2. Milliseconds later the test SELECT runs → isStickyActive("rw_e2e_ds") returns true (elapsed < 5000 ms) → resolveReadReplicaDataSource returns null → routes to primary → was: <1>.

testWriteThenRead_WithoutStickySession_DoesNotSeeWrite is the same: the test's own INSERT marks sticky, and the immediately-following SELECT is still inside the 5-second window.


Root Cause 2 — Session UUID created during a sticky-window SELECT leaks into post-expiry requests (confidence: ~90%)

Affects: testWriteThenRead_WithStickySession_SeesWrite line 389

ExecuteQueryAction.executeQueryInternal always calls sessionConnection with startSessionIfNone=true (line 74):

ConnectionSessionDTO dto = sessionConnection(actionContext, request.getSession(), true, replicaDs);

This creates a server-side session even for stateless (autoCommit) SELECT queries. The session UUID is sent back in the OpResult, and the driver stores it (Statement.java:499: this.connection.setSession(response.getSession())).

Exact flow:

  1. INSERTstartSessionIfNone=false → no session UUID returned. Sticky marked at T=0.
  2. Immediate SELECT (sticky active) → resolveReadReplicaDataSource returns null (sticky active) → sessionConnection(startSessionIfNone=true, null)creates a PRIMARY session → returns sessionUUID="<primary-uuid>" → driver stores it.
  3. Sleep 3.5 s → sticky expires.
  4. Second SELECT → driver sends sessionUUID="<primary-uuid>"StringUtils.isNotBlank(sessionUUID) is trueresolveReadReplicaDataSource short-circuits and returns nullsessionConnection reuses the existing PRIMARY session → returns count=1. Expected: 0 (from replica).

Suggested solutions

Root Cause 1 — change the default sticky timeout to 0:

// parseForPrimary — default 5 → 0 (opt-in, not opt-out)
int stickySessionSeconds = getIntProperty(properties, prefix + STICKY_SESSION_SUFFIX, 0);

Sticky sessions are a read-your-writes guarantee that should be explicitly configured. Having a non-zero default silently poisons all un-configured datasources.

Root Cause 2 — do not start a session for stateless SELECT:

// executeQueryInternal — startSessionIfNone=false when routing to replica (or unrouted stateless query)
ConnectionSessionDTO dto = sessionConnection(actionContext, request.getSession(), false, replicaDs);

Stateless autoCommit SELECTs don't need a persistent server-side session: handleResultSet already fully streams all rows in the initial response. With false, no session UUID is created, so no UUID contaminates the driver for subsequent requests. Existing-session (transaction) queries already arrive with a non-blank UUID, so they take the existing-session branch unaffected.

@rrobetti
Copy link
Copy Markdown
Contributor

@copilot did you set the logs added to DEBUG, if not do it now and remove the TODOs associated with them.

…ove TODO comments

Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/b35c4e92-c392-4c8a-8977-dc828b4ef54a

Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 26, 2026

@copilot did you set the logs added to DEBUG, if not do it now and remove the TODOs associated with them.

Done in 80e3730. All [RW-SPLIT] investigation logs in Session, ExecuteQueryAction, and ReadWriteDataSourceRegistry have been changed from log.info to log.debug, and all associated // TODO: change to DEBUG before merging comments have been removed.

Copy link
Copy Markdown
Contributor

@rrobetti rrobetti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Do the replica datasource connection pool in ojp server grows and shrinks the same way the primary do whenever an ojp dies or comes back online?

Comment thread documents/configuration/ojp-jdbc-configuration.md Outdated
@rrobetti
Copy link
Copy Markdown
Contributor

@copilot answer my latest comments

@rrobetti
Copy link
Copy Markdown
Contributor

@copilot Do the replica datasource connection pool in ojp server grows and shrinks the same way the primary do whenever an ojp dies or comes back online?

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 27, 2026

@copilot answer my latest comments

The previously failing tests are now passing. I ran the full H2ReadWriteSplittingEndToEndTest suite locally and all 10 tests pass (including testMultipleReads_AllGoToReplica, testWriteThenRead_WithStickySession_SeesWrite, and testAfterTransactionCommit_ReadsGoToReplica_WithNoStickySession). The 730 server unit tests also pass and Checkstyle is clean.

The root cause fix in ceba2f2 (passing sessionInfo.getConnHash() directly instead of the stale connectionHashMap lookup) resolved the routing failures. The subsequent commit 80e3730 converted all [RW-SPLIT] logs from INFO to DEBUG and removed the associated TODO comments as requested. The CI action_required status on 80e3730c is the human approval gate, not a test failure.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • 999.999.999.999
    • Triggering command: /usr/lib/jvm/temurin-21-jdk-amd64/bin/java /usr/lib/jvm/temurin-21-jdk-amd64/bin/java -jar /home/REDACTED/work/ojp/ojp/ojp-server/target/surefire/surefirebooter-20260427052402007_3.jar /home/REDACTED/work/ojp/ojp/ojp-server/target/surefire 2026-04-27T05-24-01_535-jvmRun1 surefire-20260427052402007_1tmp surefire_0-20260427052402007_2tmp /protoc-4.34.1-linux-x86_64.exe core.hooksPath (dns block)
    • Triggering command: /usr/lib/jvm/temurin-21-jdk-amd64/bin/java /usr/lib/jvm/temurin-21-jdk-amd64/bin/java -jar /home/REDACTED/work/ojp/ojp/ojp-server/target/surefire/surefirebooter-20260427052503135_3.jar /home/REDACTED/work/ojp/ojp/ojp-server/target/surefire 2026-04-27T05-25-03_076-jvmRun1 surefire-20260427052503135_1tmp surefire_0-20260427052503135_2tmp time/copilot-developer-action-main/dist/ripgrep/bin/linux-x64/rg gpg.program (dns block)
    • Triggering command: /usr/lib/jvm/temurin-21-jdk-amd64/bin/java /usr/lib/jvm/temurin-21-jdk-amd64/bin/java -jar /home/REDACTED/work/ojp/ojp/ojp-server/target/surefire/surefirebooter-20260427053208363_3.jar /home/REDACTED/work/ojp/ojp/ojp-server/target/surefire 2026-04-27T05-32-08_301-jvmRun1 surefire-20260427053208363_1tmp surefire_0-20260427053208363_2tmp 4/bin/javadoc (dns block)
  • invalid-ip
    • Triggering command: /usr/lib/jvm/temurin-21-jdk-amd64/bin/java /usr/lib/jvm/temurin-21-jdk-amd64/bin/java -jar /home/REDACTED/work/ojp/ojp/ojp-server/target/surefire/surefirebooter-20260427052402007_3.jar /home/REDACTED/work/ojp/ojp/ojp-server/target/surefire 2026-04-27T05-24-01_535-jvmRun1 surefire-20260427052402007_1tmp surefire_0-20260427052402007_2tmp /protoc-4.34.1-linux-x86_64.exe core.hooksPath (dns block)
    • Triggering command: /usr/lib/jvm/temurin-21-jdk-amd64/bin/java /usr/lib/jvm/temurin-21-jdk-amd64/bin/java -jar /home/REDACTED/work/ojp/ojp/ojp-server/target/surefire/surefirebooter-20260427052503135_3.jar /home/REDACTED/work/ojp/ojp/ojp-server/target/surefire 2026-04-27T05-25-03_076-jvmRun1 surefire-20260427052503135_1tmp surefire_0-20260427052503135_2tmp time/copilot-developer-action-main/dist/ripgrep/bin/linux-x64/rg gpg.program (dns block)
    • Triggering command: /usr/lib/jvm/temurin-21-jdk-amd64/bin/java /usr/lib/jvm/temurin-21-jdk-amd64/bin/java -jar /home/REDACTED/work/ojp/ojp/ojp-server/target/surefire/surefirebooter-20260427053208363_3.jar /home/REDACTED/work/ojp/ojp/ojp-server/target/surefire 2026-04-27T05-32-08_301-jvmRun1 surefire-20260427053208363_1tmp surefire_0-20260427053208363_2tmp 4/bin/javadoc (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 27, 2026

@copilot Do the replica datasource connection pool in ojp server grows and shrinks the same way the primary do whenever an ojp dies or comes back online?

Yes, the replica pool follows the same grow/shrink lifecycle as the primary pool. Both are HikariCP pools created via ConnectionPoolProviderRegistry.createDataSource(poolConfig).

When an OJP client dies (session terminates): Session.terminate() calls replicaConnection.close() followed by primaryConnection.close(). For HikariCP, close() on a proxied connection returns it to the pool — it is not a real close. So dying sessions return their borrowed connections to both the primary and replica pools. HikariCP then evicts idle connections down to minIdle after idleTimeout, exactly the same for both pools.

When an OJP client comes back online: It borrows a connection from whichever pool already exists. ConnectAction has an idempotency check — if the primary pool already exists in datasourceMap it is reused. ReadWriteDataSourceManager.setupReadWriteSplitting has the same guard — if the primary is already mapped and replicas are registered, it is a no-op. So both pools are simply borrowed from.

Structural difference worth noting: The primary pool is keyed by connHash in datasourceMap. The replica pool is keyed by primaryName (datasource name) in ReadWriteDataSourceRegistry.replicaMap. Neither pool is created or destroyed based on individual client events — both are server-lifetime pools. There is currently no explicit close/destroy path for either pool when clients come and go; pool size is managed entirely by HikariCP's internal eviction logic (idleTimeout, maxLifetime, minIdle).

@copilot Do the replica datasource connection pool in ojp server grows and shrinks the same way the primary do whenever an ojp dies or comes back online?

Yes — the replica pool behaves exactly the same as the primary pool from HikariCP's perspective, because both are HikariCP pools created via the same ConnectionPoolProviderRegistry.createDataSource(PoolConfig) path.

Session termination (OJP client disconnects): When a session terminates (gracefully or via SessionCleanupTask), replicaConnection.close() and primaryConnection.close() are called. Both calls return the checked-out HikariCP proxy connections back to their respective pools. The pool objects themselves are not destroyed — they remain alive at the server level.

Pool growth/shrink: Both pools are managed solely by HikariCP using the configured maxPoolSize/minIdle/idleTimeout/maxLifetime settings. HikariCP grows the pool on demand (up to maxPoolSize) and evicts idle connections (down to minIdle) after idleTimeoutMs. This behaviour is identical for primary and replica.

Server restart: If the OJP server process is restarted, both the primary pool (in datasourceMap) and the replica pool (in ReadWriteDataSourceRegistry.replicaMap) are destroyed and recreated when the first client with R/W splitting configured connects.

One asymmetry to note: The replica pool size is configured independently via replicaName.ojp.pool.* properties, so it can have a different maxPoolSize/minIdle than the primary — but the same HikariCP growth/shrink mechanics apply regardless of those values.

@rrobetti
Copy link
Copy Markdown
Contributor

@copilot find reseolve my comments, you did not act on those 2 comments.

only ROUND_ROBIN is implemented or did you implement all these?

Remove the declaration of thrown exception 'java.lang.InterruptedException', as it cannot be thrown from method's body.
Check warning: Remove the declaration of thrown exception 'java.lang.InterruptedException', as it cannot be thrown from method's body.

…erTimeoutElapses test

Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/1649625d-ea5d-431a-95aa-55bf588c287f

Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 27, 2026

@copilot find reseolve my comments, you did not act on those 2 comments.

only ROUND_ROBIN is implemented or did you implement all these?
...

Strategy question: All three enum values are handled in ReadReplicaSelector.select():

  • ROUND_ROBIN — fully implemented via the roundRobin() method (cycles through replicas with an AtomicInteger counter)
  • RANDOM — fully implemented via ThreadLocalRandom.current().nextInt(replicas.size())
  • LEAST_CONNECTIONS — falls through to the default branch (ROUND_ROBIN) for now; the comment in the code notes that metrics-based selection is planned for a future phase when pool utilization metrics are available

throws InterruptedException warning: Fixed in 6925229 — removed the spurious throws InterruptedException declaration from shouldExpireAfterTimeoutElapses() in ReadWriteDataSourceRegistryTest.java. The method body only calls registerStickyTimeout, markWrite, and assertFalse, none of which can throw it.

@rrobetti
Copy link
Copy Markdown
Contributor

@copilot you were suppose to find and resolve my comments:
#459 (comment)
#459 (comment)

…fully implemented

Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/07ba46cb-1778-411f-be74-4b5a7e2ffb1d

Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 27, 2026

@copilot you were suppose to find and resolve my comments:
#459 (comment)
#459 (comment)

Both comments have been addressed:

r3142000016 (throws InterruptedException): Already removed in commit 6925229shouldExpireAfterTimeoutElapses no longer declares throws InterruptedException.

r3143895227 (strategy documentation): Updated in 18d8313. ROUND_ROBIN and RANDOM are both fully implemented. LEAST_CONNECTIONS is accepted as a value but currently falls back to ROUND_ROBIN; the documentation now says so explicitly:

Replica selection strategy: ROUND_ROBIN or RANDOM. LEAST_CONNECTIONS is accepted but currently falls back to ROUND_ROBIN; metrics-based selection is planned for a future phase.

@rrobetti
Copy link
Copy Markdown
Contributor

@copilot Are all the new properties working in the spring boot starter ojp module? will all they be correctly parsed?

…ting properties

Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/e06b19fd-a052-4b11-852a-96f35ad351c1

Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 27, 2026

@copilot Are all the new properties working in the spring boot starter ojp module? will all they be correctly parsed?

Yes — all new read/write splitting properties are correctly parsed and forwarded by the Spring Boot starter, with no code changes needed.

How it works: OjpSystemPropertiesBridge.toSystemPropertyKey() already handles any key containing .ojp. generically (via the OJP_INFIX = ".ojp." check). Since every new property follows the {datasourceName}.ojp.readwrite.* or {datasourceName}.ojp.connection.* pattern, they are all picked up automatically:

  • primary.ojp.readwrite.enabled=true → system property primary.ojp.readwrite.enabled
  • primary.ojp.readwrite.replica-selection-strategy=ROUND_ROBINprimary.ojp.readwrite.replicaSelectionStrategy (kebab→camelCase)
  • primary.ojp.readwrite.sticky-session-seconds=5primary.ojp.readwrite.stickySessionSeconds
  • replica1.ojp.readwrite.role=replicareplica1.ojp.readwrite.role
  • replica1.ojp.connection.url=jdbc:…replica1.ojp.connection.url

The camelCase keys match exactly what ReadWriteConfigurationParser and ReadWriteDataSourceManager look up on the server side.

Added 3 explicit tests to OjpSystemPropertiesBridgeTest (commit a804375) to document and guard this behaviour: key conversion for camelCase passthrough, kebab-case conversion, and the full end-to-end system property forwarding for a primary + replica setup.

* include: (1) the {@link com.openjproxy.grpc.TransactionInfo} embedded in
* {@link com.openjproxy.grpc.SessionInfo} would become stale after a {@code callResource}
* invocation (the response does not update {@code TransactionInfo}); (2) the
* {@code callProxy} helper currently swallows exceptions silently, meaning a failed
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot this has been fixed in the main branch, no longer swallows exceptions, remove this line.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in 1e6b0cc. Point (2) about callProxy silently swallowing exceptions is gone; the remaining points are renumbered to (1)–(3).

/**
* Documents the current actual behavior: after an explicit transaction is committed and the
* client calls {@code setAutoCommit(true)}, reads continue to go to the <em>primary</em>
* because {@code setAutoCommit(true)} is not propagated to the server.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot remove teh explanations that set setCommit(true) is not propagated to server, this is a different test, we are not going to call setCommit(true) here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 1e6b0cc. The Javadoc for testAfterTransactionCommit_ReadsGoToPrimary_WithNoStickySession now only explains that reads stay on primary because the connection is in autoCommit=false mode — all references to setAutoCommit(true) propagation are removed.

// setAutoCommit(true) is NOT propagated to the server. The server's physical connection
// remains in autoCommit=false mode, so hasActiveTransaction() still returns true and
// the SELECT below is routed to the primary rather than the replica.
connection.setAutoCommit(true);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot remove this line and comment.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in 1e6b0cc. The connection.setAutoCommit(true) call and its preceding comment are both gone from the test body.

}
// conn remains null — caller must resolve the connection from the session
} else {
// Primary path: eager acquisition as before
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot why do we still need this path? I thought we would always use lazy acquisition from now on.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The eager primary path is still needed because every other action handler that calls sessionConnectionExecuteUpdateAction, CallResourceAction, CreateLobAction, FetchNextRowsAction — immediately calls dto.getConnection() after the call returns. If we removed the eager acquisition and left conn = null for the primary path, all of those handlers would NPE.

The lazy optimisation only pays off for the replica path: when a SELECT is routed to a replica, the primary connection is never allocated at all. For the primary path the connection is needed immediately on every request, so eager vs lazy makes no practical difference.

We could unify both paths by having every action handler call session.getConnection() instead of dto.getConnection(), but that is a cross-cutting refactor that would touch every action. Happy to do it in a follow-up if you'd like to clean this up.

…l per review feedback

- Remove stale point (2) "callProxy swallows exceptions" from disabled test Javadoc
  (fixed in main branch — callProxy no longer swallows exceptions silently)
- Simplify testAfterTransactionCommit_ReadsGoToPrimary_WithNoStickySession Javadoc
  by removing setAutoCommit(true) propagation explanation (not relevant to this test)
- Remove connection.setAutoCommit(true) call and its inline comment from same test
  (test documents autoCommit=false behavior, not setAutoCommit(true) propagation)

Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/93f6e4d5-1a3b-4b60-8447-bc17def1f9aa

Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
}

// Parse configuration
Properties props = convertPropertiesToJava(connectionDetails.getPropertiesList());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot I see this convertPropertiesToJava multiple times, can it not be wrapped in a method and cached the first time it is converted?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 07c94c5. Added a private ParsedConfig holder class and a parseReadWriteConfig(ConnectionDetails, String) helper method that converts gRPC properties to Java once and returns both the ReadWriteConfiguration and the Properties in the holder. Both isReadWriteSplittingEnabled and setupReadWriteSplitting now call this single helper — convertPropertiesToJava is invoked from exactly one place.

Copilot AI and others added 2 commits April 28, 2026 17:03
…n ReadWriteDataSourceManager

convertPropertiesToJava is now called from a single private method (parseReadWriteConfig) that
both converts and parses, returning the result in a private ParsedConfig holder class. Both public
methods (isReadWriteSplittingEnabled and setupReadWriteSplitting) call this helper, eliminating
the duplicated convert+parse pattern. All 730 server unit tests pass; checkstyle clean.

Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/8346b297-6986-4fe4-bac1-9a1fc4e3977a

Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
@sonarqubecloud
Copy link
Copy Markdown

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 28, 2026

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • 999.999.999.999
    • Triggering command: /usr/lib/jvm/temurin-21-jdk-amd64/bin/java /usr/lib/jvm/temurin-21-jdk-amd64/bin/java -jar /home/REDACTED/work/ojp/ojp/ojp-server/target/surefire/surefirebooter-20260428170252321_3.jar /home/REDACTED/work/ojp/ojp/ojp-server/target/surefire 2026-04-28T17-02-52_259-jvmRun1 surefire-20260428170252321_1tmp surefire_0-20260428170252321_2tmp (dns block)
  • checkstyle.org
    • Triggering command: /opt/hostedtoolcache/CodeQL/2.25.1/x64/codeql/tools/linux64/java/bin/java /opt/hostedtoolcache/CodeQL/2.25.1/x64/codeql/tools/linux64/java/bin/java -jar /opt/hostedtoolcache/CodeQL/2.25.1/x64/codeql/xml/tools/xml-extractor.jar --fileList=/tmp/codeql-scratch-0fe3065af824c7c2/dbs/java/working/files-to-index17852378063508920137.list --sourceArchiveDir=/tmp/codeql-scratch-0fe3065af824c7c2/dbs/java/src --outputDir=/tmp/codeql-scratch-0fe3065af824c7c2/dbs/java/trap/java (dns block)
  • invalid-ip
    • Triggering command: /usr/lib/jvm/temurin-21-jdk-amd64/bin/java /usr/lib/jvm/temurin-21-jdk-amd64/bin/java -jar /home/REDACTED/work/ojp/ojp/ojp-server/target/surefire/surefirebooter-20260428170252321_3.jar /home/REDACTED/work/ojp/ojp/ojp-server/target/surefire 2026-04-28T17-02-52_259-jvmRun1 surefire-20260428170252321_1tmp surefire_0-20260428170252321_2tmp (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants