Skip to content

Unsupported Apache Arrow large utf-8 vu data format on direct path load #573

@jaredschwartz-ofs

Description

@jaredschwartz-ofs
  1. What versions are you using?
Oracle DB: 19.29.0.0.0
Running in thin mode
platform.platform: Windows-11-10.0.26200-SP0
sys.maxsize > 2**32: True
platform.python_version: 3.13.1
  1. Is it an error or a hang or a crash?
Error
  1. What error(s) or behavior you are seeing?
    I'm getting a not-supported error when trying to use the direct path load functionality on data that uses the Apache Arrow large utf-8 vu format. This is particularly troublesome because this is the only supported string/text format in polars, even for small strings. It is possible to get around this by mapping the polars dataframe into pandas or a pyarrow table to encode the string columns as small utf-8 u format, but that is a point of significant friction.
---------------------------------------------------------------------------
NotSupportedError                         Traceback (most recent call last)
Cell In[26], line 5
      1 testdf = pl.DataFrame({'testcol': ['item1','item2','item3']})
      3 conn = engine.raw_connection()
----> 5 conn.direct_path_load(
      6     schema_name="testschema",
      7     table_name="testtable1",
      8     column_names=testdf.columns,
      9     data=testdf
     10 )
     11 conn.close()

File ~\python\Lib\site-packages\oracledb\connection.py:1089, in Connection.direct_path_load(self, schema_name, table_name, column_names, data, batch_size)
   1076 """
   1077 Load data into Oracle Database using the Direct Path Load interface.
   1078 It is available only in python-oracledb Thin mode.
   (...)
   1086 in each batch. This parameter can be used to tune performance.
   1087 """
   1088 self._verify_connected()
-> 1089 self._impl.direct_path_load(
   1090     schema_name, table_name, column_names, data, batch_size
   1091 )

File src/oracledb/impl/thin/connection.pyx:563, in oracledb.thin_impl.ThinConnImpl.direct_path_load()

File src/oracledb/impl/base/batch_load_manager.pyx:122, in oracledb.base_impl.BatchLoadManager.create_for_direct_path_load()

File src/oracledb/impl/base/batch_load_manager.pyx:70, in oracledb.base_impl.BatchLoadManager._create()

File src/oracledb/impl/arrow/dataframe.pyx:61, in oracledb.arrow_impl.DataFrameImpl.from_arrow_stream()

File src/oracledb/impl/arrow/schema.pyx:191, in oracledb.arrow_impl.ArrowSchemaImpl.populate_from_schema()

File ~\python\Lib\site-packages\oracledb\errors.py:199, in _raise_err(error_num, context_error_message, cause, **args)
    194 """
    195 Raises a driver specific exception from the specified error number and
    196 supplied arguments.
    197 """
    198 error = _create_err(error_num, context_error_message, cause, **args)
--> 199 raise error.exc_type(error) from cause

NotSupportedError: DPY-3032: conversion from Apache Arrow format "vu" to Oracle Database is not supported
  1. Does your application call init_oracle_client()?
No
  1. Include a runnable Python script that shows the problem.
import polars as pl
import oracledb

conn = engine.raw_connection() # Replace with your own oracledb Connection object

cursor = conn.cursor()

create_sql = """
CREATE TABLE testschema.testtable1 (
    testcol VARCHAR2(100)
)
"""    
cursor.execute(create_sql)

testdf = pl.DataFrame({'testcol': ['item1','item2','item3']})

conn.direct_path_load(
    schema_name="testschema",
    table_name="testtable1",
    column_names=testdf.columns,
    data=testdf
)
conn.close()

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions