Question #74

Thrameos · 2026-04-24T07:04:31Z

Thrameos
Apr 24, 2026

Why not build off jpype.dbapi2? Jaydebe is legacy code and doesn't have marshalling.

HenryNebula · 2026-04-24T12:32:21Z

HenryNebula
Apr 24, 2026
Maintainer

Good question - a few years back when I was testing and comparing the performance between JPype and JayDeBeApi, the former was faster but was still one order of magnitude slower than an arrow based conversion. Therefore, I created this fork from JayDeBeApi intended to test out if the conversion between Java and Python objects is the bottleneck (which turns out to be the case). I haven't tested the latest build of JPype to see if performance improves over year, but that could be something I can add to the benchmark suite.

Also I thought jpype.dbapi2 is pretty much self-contained? So not really sure how to build off from there when the backbone of data conversion under the hood is fundamentally different. But would love to hear more on that in case I miss something here.

0 replies

Thrameos · 2026-04-24T13:58:54Z

Thrameos
Apr 24, 2026
Author

It is connected to jpype, but every call used it available to users. Meaning if you have a database like object pipeline you can fork just that dbapi2 file and create your own interface. But if you are not using jpype I wont help you unless you copy the convert api.

I did make an attempt to optimize speed, though to really crank it up you need a jar component to collect and block transfer. JayDeBeAPI just pulls everything as object which I thought I benchmarked and was 5 times slower. I added fast caching which bumps up the speed for primitives but that was still pre panama vh handles. But I did need to give some of that speed back for safety marshalling.

If you have a block collect and then pull columns of like information which is what jpype specializes in then you get rid if that jni per call tax. My benchmarks had a 10 times speedup as we get to direct buffer exchange. Unfortunately, that is were the funding agency direction changed. So I closed the work and published what I had which was elementwise exchange.

I hope that helps.

0 replies

Thrameos · 2026-04-24T14:11:31Z

Thrameos
Apr 24, 2026
Author

I found the benchmark data that the arrow folks worked on.

https://uwekorn.com/2020/12/30/fast-jdbc-revisited.html

0 replies

Thrameos · 2026-04-24T17:57:42Z

Thrameos
Apr 24, 2026
Author

Ultimately the cost really comes down to how many transactions one needs to perform. Python is very heavy, but JNI also has a tax. I am working on Panama based solution to try to improve marshalling, but even then JPype only 50% heavier than a marshal to Graalvm so I am running near the floor of what JVM is able to do at boundaries. That is why the only way to get real speed is to either marshal an entire column of data at once, or use Panama to push a MemorySegment into a struct directly rowwise. This would save one round trip per element. So if you are pushing 20 columns out that JNI tax is becomes very grueling (given you must pay both tax collectors). In other words currently in jpype you fetch a row handle (one python object), then pass it back to get the first item (two objects born), then the next. So you end up with 3*(N+1) JNI trips (isinstance, invoke, exception check) and 2*N+1 Python objects (handle + N*handle+N*index) plus the type checks and you are paying for a python loop range. A column wise pull takes about 5 calls to setup the dispatch, and then once buffer transfer request (M*python objects born) so about a 2 to 5 times speedup. For a primitive transfer you dodge all the python object creates, meaning new Java transfer speed. So I was at arrow level speed and that was without panama support. My J2NI replacement cuts off the Java tax a bit, but you still have than indirect costs (which may eat most of the dispatch savings on the trampoline). The key advantage though is with direct control I could fetch rowwise into buffers as an option as one transaction. I am currently reviewing the needs for JPype in a Post Java27 world where they start locking down the JNI layer. Feel free to participate our discussion. I will ping you when I have benchmarking numbers. Hopefully this helps you in your design.

0 replies

HenryNebula · 2026-04-25T13:27:36Z

HenryNebula
Apr 25, 2026
Maintainer

Hi @Thrameos,

Thank you so much for the detailed explanation - this really helped me understand the underlying cost structure. I want to share some benchmark results that validate your analysis, and clarify a bit about our current design.

Your JNI per-call tax theory is spot-on. We ran a column-scaling benchmark (1M rows, variable columns) and the results align closely with your formula (using our benchmark code in the repo)

Columns	Drop-in (tuples)	Native (Arrow)	Speedup
4	5.74s	2.35s	2.4x
20	26.20s	10.68s	2.5x
40	56.41s	19.36s	2.9x

The speedup of Native over Drop-in actually grows with column count - exactly as you predicted about the JNI tax becoming "very grueling" with more columns. Profiling confirms ~80% of Drop-in execution time is JNI boundary cost (55% Python object creation + 25% JPype bridge overhead).

On the Java-side optimization - we're already doing something similar. Our architecture uses a JAR component that performs JDBC → Arrow conversion entirely in Java, so data crosses the JNI boundary as columnar Arrow RecordBatches rather than individual Java objects. This is essentially the "block collect" approach you described, and it's what makes our Native mode approach Psycopg2-level speed (only ~1.15x slower at 40 columns).

To clarify on the performance comparison: Native mode (i.e. returning pyarrow records directly) is actually our fastest path - the Drop-in mode is slower specifically because it converts Arrow RecordBatches into Python tuples for DB-API 2.0 compatibility. That conversion requires creating a Python object for every cell (~40M PyObject_Alloc calls for 1M rows × 40 cols), which is an irreducible CPython runtime cost. We've tested alternatives (to_pydict() is 1.3x faster, but still fundamentally limited) and concluded that this overhead is the price of DB-API compatibility.

So our design intentionally offers both paths: Drop-in for users who need a seamless fetchall() replacement, and Native for users who can work with Arrow directly (or convert to pandas/polars on their own terms, which is common for Python data analytics workload). Your analysis gives us confidence that this is the right tradeoff.

Really appreciate the Uwe Korn blog link too - that was actually one of the original inspirations for this project. I'd be very interested to follow your Panama work. Please do ping me when you have benchmarking numbers - I'm curious whether that could help reduce the remaining ~15% gap between Native Arrow and Psycopg2.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question #74

Uh oh!

{{title}}

Uh oh!

Replies: 5 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Question #74

Uh oh!

Thrameos Apr 24, 2026

Replies: 5 comments

Uh oh!

HenryNebula Apr 24, 2026 Maintainer

Uh oh!

Thrameos Apr 24, 2026 Author

Uh oh!

Thrameos Apr 24, 2026 Author

Uh oh!

Thrameos Apr 24, 2026 Author

Uh oh!

HenryNebula Apr 25, 2026 Maintainer

Thrameos
Apr 24, 2026

HenryNebula
Apr 24, 2026
Maintainer

Thrameos
Apr 24, 2026
Author

Thrameos
Apr 24, 2026
Author

Thrameos
Apr 24, 2026
Author

HenryNebula
Apr 25, 2026
Maintainer