feat: Phase 1 & 2 — Read/Write Splitting Configuration Infrastructure & Query Routing#459
feat: Phase 1 & 2 — Read/Write Splitting Configuration Infrastructure & Query Routing#459
Conversation
- Add readwrite/ package: ReadWriteConfiguration, ReadWriteConfigurationParser, ReadWriteDataSourceRegistry, ReadWriteDataSourceManager - Wire registry into ActionContext, StatementServiceImpl, ConnectAction - Update DatasourcePropertiesLoader to forward full .ojp.* keys to server - Add unit tests for all 3 server config classes - Add H2ReadWriteSplittingEndToEndTest integration test Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/f22a5174-b9f5-4721-829c-9d776642a2bd Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/f22a5174-b9f5-4721-829c-9d776642a2bd Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
…ctAction Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/f22a5174-b9f5-4721-829c-9d776642a2bd Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
|
@copilot before continuing merge main branch into this branch. |
…d-write-splitting-phase-1 Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
Done — merged |
|
@copilot now fix the compilations issues in the code with the new .github/copilot-instructions.md I am hoping you will be able to use java 21 to compile. |
…ionContext constructor Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/2f02f94b-5063-4f3a-9eae-4286c2b1e2cf Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
Fixed in commit |
|
|
||
| @Test | ||
| void testValidation_NullPrimaryName() { | ||
| assertThrows(NullPointerException.class, () -> { |
There was a problem hiding this comment.
@copilot Refactor the code of the lambda to have only one invocation possibly throwing a runtime exception.
There was a problem hiding this comment.
Fixed in commit 4afb87e. All 5 assertThrows lambdas in ReadWriteConfigurationTest now contain only a single invocation that can throw — the builder is constructed outside the lambda and only builder::build (the method reference) is passed to assertThrows.
|
|
||
| @Test | ||
| void testValidation_EmptyPrimaryName() { | ||
| assertThrows(IllegalArgumentException.class, () -> { |
There was a problem hiding this comment.
@copilot Refactor the code of the lambda to have only one invocation possibly throwing a runtime exception.
There was a problem hiding this comment.
Fixed in commit 4afb87e.
|
|
||
| @Test | ||
| void testValidation_EnabledWithoutReplicas() { | ||
| assertThrows(IllegalArgumentException.class, () -> { |
There was a problem hiding this comment.
@copilot Refactor the code of the lambda to have only one invocation possibly throwing a runtime exception.
There was a problem hiding this comment.
Fixed in commit 4afb87e.
|
|
||
| @Test | ||
| void testValidation_NegativeStickySessionSeconds() { | ||
| assertThrows(IllegalArgumentException.class, () -> { |
There was a problem hiding this comment.
@copilot Refactor the code of the lambda to have only one invocation possibly throwing a runtime exception.
There was a problem hiding this comment.
Fixed in commit 4afb87e.
|
|
||
| @Test | ||
| void testValidation_DuplicateReplicas() { | ||
| assertThrows(IllegalArgumentException.class, () -> { |
There was a problem hiding this comment.
@copilot Refactor the code of the lambda to have only one invocation possibly throwing a runtime exception.
There was a problem hiding this comment.
Fixed in commit 4afb87e.
…r, sticky session, query routing Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/b0f48dbd-89f2-4f81-b0df-4caee408691b Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
Done in commit New classes:
Extended:
Unit tests: 730 server tests pass. Sonar-flagged issues also addressed: integer overflow in Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
Agent-Logs-Url: REDACTED -use-caffeine-for-caching.md -adopt-spi-patte/opt/hostedtoolcache/CodeQL/2.25.1/x64/codeql/java/tools/autobuild.sh -implement-jdbc-interface.md DECIMAL_WIRE_FORMAT.md ltinode-architecture.md` (dns block)
Agent-Logs-Url: REDACTED -use-caffeine-for-caching.md -adopt-spi-patte/opt/hostedtoolcache/CodeQL/2.25.1/x64/codeql/java/tools/autobuild.sh -implement-jdbc-interface.md DECIMAL_WIRE_FORMAT.md ltinode-architecture.md` (dns block)
|
…eger overflow in round-robin, remove FQ List type Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/b0f48dbd-89f2-4f81-b0df-4caee408691b Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
|
@copilot investigate and root cause, report with level of confidence, suggest solution or solutions, do not change code yet: Error: Failures: |
Root cause analysis — two distinct bugs: Root Cause 1 — Default sticky-session of 5 seconds (confidence: ~95%)Affects: In int stickySessionSeconds = getIntProperty(properties, prefix + STICKY_SESSION_SUFFIX, 5);The default is 5 seconds, applied even when Exact flow:
Root Cause 2 — Session UUID created during a sticky-window SELECT leaks into post-expiry requests (confidence: ~90%)Affects:
ConnectionSessionDTO dto = sessionConnection(actionContext, request.getSession(), true, replicaDs);This creates a server-side session even for stateless (autoCommit) SELECT queries. The session UUID is sent back in the Exact flow:
Suggested solutionsRoot Cause 1 — change the default sticky timeout to 0: // parseForPrimary — default 5 → 0 (opt-in, not opt-out)
int stickySessionSeconds = getIntProperty(properties, prefix + STICKY_SESSION_SUFFIX, 0);Sticky sessions are a read-your-writes guarantee that should be explicitly configured. Having a non-zero default silently poisons all un-configured datasources. Root Cause 2 — do not start a session for stateless SELECT: // executeQueryInternal — startSessionIfNone=false when routing to replica (or unrouted stateless query)
ConnectionSessionDTO dto = sessionConnection(actionContext, request.getSession(), false, replicaDs);Stateless autoCommit SELECTs don't need a persistent server-side session: |
|
@copilot did you set the logs added to DEBUG, if not do it now and remove the TODOs associated with them. |
…ove TODO comments Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/b35c4e92-c392-4c8a-8977-dc828b4ef54a Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
Done in |
|
@copilot answer my latest comments |
|
@copilot Do the replica datasource connection pool in ojp server grows and shrinks the same way the primary do whenever an ojp dies or comes back online? |
The previously failing tests are now passing. I ran the full The root cause fix in Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
Yes, the replica pool follows the same grow/shrink lifecycle as the primary pool. Both are HikariCP pools created via When an OJP client dies (session terminates): When an OJP client comes back online: It borrows a connection from whichever pool already exists. Structural difference worth noting: The primary pool is keyed by
Yes — the replica pool behaves exactly the same as the primary pool from HikariCP's perspective, because both are HikariCP pools created via the same Session termination (OJP client disconnects): When a session terminates (gracefully or via Pool growth/shrink: Both pools are managed solely by HikariCP using the configured Server restart: If the OJP server process is restarted, both the primary pool (in One asymmetry to note: The replica pool size is configured independently via |
|
@copilot find reseolve my comments, you did not act on those 2 comments. only ROUND_ROBIN is implemented or did you implement all these? Remove the declaration of thrown exception 'java.lang.InterruptedException', as it cannot be thrown from method's body. |
…erTimeoutElapses test Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/1649625d-ea5d-431a-95aa-55bf588c287f Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
Strategy question: All three enum values are handled in
|
|
@copilot you were suppose to find and resolve my comments: |
…fully implemented Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/07ba46cb-1778-411f-be74-4b5a7e2ffb1d Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
Both comments have been addressed:
|
|
@copilot Are all the new properties working in the spring boot starter ojp module? will all they be correctly parsed? |
…ting properties Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/e06b19fd-a052-4b11-852a-96f35ad351c1 Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
Yes — all new read/write splitting properties are correctly parsed and forwarded by the Spring Boot starter, with no code changes needed. How it works:
The camelCase keys match exactly what Added 3 explicit tests to |
| * include: (1) the {@link com.openjproxy.grpc.TransactionInfo} embedded in | ||
| * {@link com.openjproxy.grpc.SessionInfo} would become stale after a {@code callResource} | ||
| * invocation (the response does not update {@code TransactionInfo}); (2) the | ||
| * {@code callProxy} helper currently swallows exceptions silently, meaning a failed |
There was a problem hiding this comment.
@copilot this has been fixed in the main branch, no longer swallows exceptions, remove this line.
There was a problem hiding this comment.
Removed in 1e6b0cc. Point (2) about callProxy silently swallowing exceptions is gone; the remaining points are renumbered to (1)–(3).
| /** | ||
| * Documents the current actual behavior: after an explicit transaction is committed and the | ||
| * client calls {@code setAutoCommit(true)}, reads continue to go to the <em>primary</em> | ||
| * because {@code setAutoCommit(true)} is not propagated to the server. |
There was a problem hiding this comment.
@copilot remove teh explanations that set setCommit(true) is not propagated to server, this is a different test, we are not going to call setCommit(true) here
There was a problem hiding this comment.
Done in 1e6b0cc. The Javadoc for testAfterTransactionCommit_ReadsGoToPrimary_WithNoStickySession now only explains that reads stay on primary because the connection is in autoCommit=false mode — all references to setAutoCommit(true) propagation are removed.
| // setAutoCommit(true) is NOT propagated to the server. The server's physical connection | ||
| // remains in autoCommit=false mode, so hasActiveTransaction() still returns true and | ||
| // the SELECT below is routed to the primary rather than the replica. | ||
| connection.setAutoCommit(true); |
There was a problem hiding this comment.
Removed in 1e6b0cc. The connection.setAutoCommit(true) call and its preceding comment are both gone from the test body.
| } | ||
| // conn remains null — caller must resolve the connection from the session | ||
| } else { | ||
| // Primary path: eager acquisition as before |
There was a problem hiding this comment.
@copilot why do we still need this path? I thought we would always use lazy acquisition from now on.
There was a problem hiding this comment.
The eager primary path is still needed because every other action handler that calls sessionConnection — ExecuteUpdateAction, CallResourceAction, CreateLobAction, FetchNextRowsAction — immediately calls dto.getConnection() after the call returns. If we removed the eager acquisition and left conn = null for the primary path, all of those handlers would NPE.
The lazy optimisation only pays off for the replica path: when a SELECT is routed to a replica, the primary connection is never allocated at all. For the primary path the connection is needed immediately on every request, so eager vs lazy makes no practical difference.
We could unify both paths by having every action handler call session.getConnection() instead of dto.getConnection(), but that is a cross-cutting refactor that would touch every action. Happy to do it in a follow-up if you'd like to clean this up.
…l per review feedback - Remove stale point (2) "callProxy swallows exceptions" from disabled test Javadoc (fixed in main branch — callProxy no longer swallows exceptions silently) - Simplify testAfterTransactionCommit_ReadsGoToPrimary_WithNoStickySession Javadoc by removing setAutoCommit(true) propagation explanation (not relevant to this test) - Remove connection.setAutoCommit(true) call and its inline comment from same test (test documents autoCommit=false behavior, not setAutoCommit(true) propagation) Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/93f6e4d5-1a3b-4b60-8447-bc17def1f9aa Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
| } | ||
|
|
||
| // Parse configuration | ||
| Properties props = convertPropertiesToJava(connectionDetails.getPropertiesList()); |
There was a problem hiding this comment.
@copilot I see this convertPropertiesToJava multiple times, can it not be wrapped in a method and cached the first time it is converted?
There was a problem hiding this comment.
Done in 07c94c5. Added a private ParsedConfig holder class and a parseReadWriteConfig(ConnectionDetails, String) helper method that converts gRPC properties to Java once and returns both the ReadWriteConfiguration and the Properties in the holder. Both isReadWriteSplittingEnabled and setupReadWriteSplitting now call this single helper — convertPropertiesToJava is invoked from exactly one place.
…n ReadWriteDataSourceManager convertPropertiesToJava is now called from a single private method (parseReadWriteConfig) that both converts and parses, returning the result in a private ParsedConfig holder class. Both public methods (isReadWriteSplittingEnabled and setupReadWriteSplitting) call this helper, eliminating the duplicated convert+parse pattern. All 730 server unit tests pass; checkstyle clean. Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/8346b297-6986-4fe4-bac1-9a1fc4e3977a Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/8346b297-6986-4fe4-bac1-9a1fc4e3977a Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
|
|
Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|



resolveReadReplicaDataSourcefor each routing decision branchSession.hasActiveTransaction(result + primary connection state)Session.getOrCreateReplicaConnection(new vs reused, which datasource)ReadWriteDataSourceRegistry.isStickyActive(elapsed/timeout values)ReadWriteDataSourceRegistry.markWrite(when sticky is marked, with timeout)executeQueryInternalfrom debug to INFO with TODOSessionholdsprimaryDataSource/replicaDataSource, connections acquired lazilyojp.*and*.ojp.*frominfomap)getOrCreateReplicaConnection(DataSource fallback)overload for sessions created without a replica datasourceconnectionHashMap[clientUUID]is stale (points to replica's connHash)createSession(clientUUID, connHash, Connection)andcreateSession(clientUUID, connHash, DataSource, DataSource)overloadsSessionConnectionHelperreplica and primary paths to passsessionInfo.getConnHash()StartTransactionActionto passsessionInfo.getConnHash()[RW-SPLIT]investigation logs from INFO to DEBUG and remove all associated TODO commentsH2ReadWriteSplittingEndToEndTestpass, 730 server unit tests pass, checkstyle cleanthrows InterruptedExceptionfromshouldExpireAfterTimeoutElapsestest methodLEAST_CONNECTIONSdocumentation — note it falls back to ROUND_ROBIN; RANDOM is fully implementedOjpSystemPropertiesBridgeTestcoverage for all new read/write splitting properties (kebab-case conversion, camelCase passthrough, end-to-end system property forwarding)testAfterTransactionCommit_ReadsGoToPrimary_WithNoStickySessionJavadoc — remove setAutoCommit(true) propagation explanationconnection.setAutoCommit(true)line and its comment from test bodyParsedConfigholder +parseReadWriteConfighelper inReadWriteDataSourceManager—convertPropertiesToJavais now called from a single place