Replies: 5 comments
-
|
Good question - a few years back when I was testing and comparing the performance between JPype and JayDeBeApi, the former was faster but was still one order of magnitude slower than an arrow based conversion. Therefore, I created this fork from JayDeBeApi intended to test out if the conversion between Java and Python objects is the bottleneck (which turns out to be the case). I haven't tested the latest build of JPype to see if performance improves over year, but that could be something I can add to the benchmark suite. Also I thought jpype.dbapi2 is pretty much self-contained? So not really sure how to build off from there when the backbone of data conversion under the hood is fundamentally different. But would love to hear more on that in case I miss something here. |
Beta Was this translation helpful? Give feedback.
-
|
It is connected to jpype, but every call used it available to users. Meaning if you have a database like object pipeline you can fork just that dbapi2 file and create your own interface. But if you are not using jpype I wont help you unless you copy the convert api. I did make an attempt to optimize speed, though to really crank it up you need a jar component to collect and block transfer. JayDeBeAPI just pulls everything as object which I thought I benchmarked and was 5 times slower. I added fast caching which bumps up the speed for primitives but that was still pre panama vh handles. But I did need to give some of that speed back for safety marshalling. If you have a block collect and then pull columns of like information which is what jpype specializes in then you get rid if that jni per call tax. My benchmarks had a 10 times speedup as we get to direct buffer exchange. Unfortunately, that is were the funding agency direction changed. So I closed the work and published what I had which was elementwise exchange. I hope that helps. |
Beta Was this translation helpful? Give feedback.
-
|
I found the benchmark data that the arrow folks worked on. |
Beta Was this translation helpful? Give feedback.
-
|
Ultimately the cost really comes down to how many transactions one needs to perform. Python is very heavy, but JNI also has a tax. I am working on Panama based solution to try to improve marshalling, but even then JPype only 50% heavier than a marshal to Graalvm so I am running near the floor of what JVM is able to do at boundaries. That is why the only way to get real speed is to either marshal an entire column of data at once, or use Panama to push a MemorySegment into a struct directly rowwise. This would save one round trip per element. So if you are pushing 20 columns out that JNI tax is becomes very grueling (given you must pay both tax collectors).
In other words currently in jpype you fetch a row handle (one python object), then pass it back to get the first item (two objects born), then the next. So you end up with 3*(N+1) JNI trips (isinstance, invoke, exception check) and 2*N+1 Python objects (handle + N*handle+N*index) plus the type checks and you are paying for a python loop range. A column wise pull takes about 5 calls to setup the dispatch, and then once buffer transfer request (M*python objects born) so about a 2 to 5 times speedup. For a primitive transfer you dodge all the python object creates, meaning new Java transfer speed. So I was at arrow level speed and that was without panama support. My J2NI replacement cuts off the Java tax a bit, but you still have than indirect costs (which may eat most of the dispatch savings on the trampoline). The key advantage though is with direct control I could fetch rowwise into buffers as an option as one transaction.
I am currently reviewing the needs for JPype in a Post Java27 world where they start locking down the JNI layer. Feel free to participate our discussion. I will ping you when I have benchmarking numbers.
Hopefully this helps you in your design.
|
Beta Was this translation helpful? Give feedback.
-
|
Hi @Thrameos, Thank you so much for the detailed explanation - this really helped me understand the underlying cost structure. I want to share some benchmark results that validate your analysis, and clarify a bit about our current design. Your JNI per-call tax theory is spot-on. We ran a column-scaling benchmark (1M rows, variable columns) and the results align closely with your formula (using our benchmark code in the repo)
The speedup of Native over Drop-in actually grows with column count - exactly as you predicted about the JNI tax becoming "very grueling" with more columns. Profiling confirms ~80% of Drop-in execution time is JNI boundary cost (55% Python object creation + 25% JPype bridge overhead). On the Java-side optimization - we're already doing something similar. Our architecture uses a JAR component that performs JDBC → Arrow conversion entirely in Java, so data crosses the JNI boundary as columnar Arrow RecordBatches rather than individual Java objects. This is essentially the "block collect" approach you described, and it's what makes our Native mode approach Psycopg2-level speed (only ~1.15x slower at 40 columns). To clarify on the performance comparison: Native mode (i.e. returning pyarrow records directly) is actually our fastest path - the Drop-in mode is slower specifically because it converts Arrow RecordBatches into Python tuples for DB-API 2.0 compatibility. That conversion requires creating a Python object for every cell (~40M PyObject_Alloc calls for 1M rows × 40 cols), which is an irreducible CPython runtime cost. We've tested alternatives (to_pydict() is 1.3x faster, but still fundamentally limited) and concluded that this overhead is the price of DB-API compatibility. So our design intentionally offers both paths: Drop-in for users who need a seamless fetchall() replacement, and Native for users who can work with Arrow directly (or convert to pandas/polars on their own terms, which is common for Python data analytics workload). Your analysis gives us confidence that this is the right tradeoff. Really appreciate the Uwe Korn blog link too - that was actually one of the original inspirations for this project. I'd be very interested to follow your Panama work. Please do ping me when you have benchmarking numbers - I'm curious whether that could help reduce the remaining ~15% gap between Native Arrow and Psycopg2. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Why not build off jpype.dbapi2? Jaydebe is legacy code and doesn't have marshalling.
Beta Was this translation helpful? Give feedback.
All reactions