Describe the bug
When the IPC writer encodes a record that has a schema with a Dict(Dict(...,...)) encoded column, the StreamReader cannot decode it. It throws a Buffer count mismatched with metadata error.
running 1 test
test tests::dict_of_dict_ipc_error ... FAILED
failures:
---- tests::dict_of_dict_ipc_error stdout ----
Error: IpcError("Buffer count mismatched with metadata")
failures:
tests::dict_of_dict_ipc_error
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
To Reproduce
use std::sync::Arc;
use arrow_array::RecordBatch;
use arrow_array::types::UInt32Type;
use arrow_array::{ArrayRef, DictionaryArray, StringArray, UInt32Array};
use arrow_schema::{ArrowError, DataType, Field, Schema};
fn dict_of_dict() -> ArrayRef {
let values = Arc::new(StringArray::from(vec!["a", "b", "c"])) as ArrayRef;
let inner = Arc::new(DictionaryArray::<UInt32Type>::new(
UInt32Array::from(vec![0u32, 1, 2, 0]),
values,
)) as ArrayRef;
Arc::new(DictionaryArray::<UInt32Type>::new(
UInt32Array::from(vec![0u32, 1, 2, 3]),
inner,
)) as ArrayRef
}
#[test]
fn dict_of_dict_ipc_error() -> std::result::Result<(), ArrowError> {
use arrow_ipc::reader::StreamReader;
use arrow_ipc::writer::StreamWriter;
fn ipc_roundtrip(batch: &RecordBatch) -> std::result::Result<RecordBatch, ArrowError> {
let mut buf = Vec::new();
{
let mut writer = StreamWriter::try_new(&mut buf, &batch.schema())?;
writer.write(batch)?;
writer.finish()?;
}
StreamReader::try_new(buf.as_slice(), None)?
.next()
.expect("one batch")
}
let single = DataType::Dictionary(Box::new(DataType::UInt32), Box::new(DataType::Utf8));
let dod = DataType::Dictionary(Box::new(DataType::UInt32), Box::new(single.clone()));
let original = dict_of_dict();
let declared = Arc::new(Schema::new(vec![Field::new("f", dod, true)]));
let batch =
RecordBatch::try_new(Arc::clone(&declared), vec![Arc::clone(&original)]).unwrap();
// Reproduces the bug: dict-of-dict cannot round-trip through Arrow IPC.
ipc_roundtrip(&batch)?;
Ok(())
}
Expected behavior
I would expect it to encode/decode it correctly, or at the very least throw an error that this schema is not supported by IPC.
Additional context
No response
Describe the bug
When the IPC writer encodes a record that has a schema with a
Dict(Dict(...,...))encoded column, the StreamReader cannot decode it. It throws aBuffer count mismatched with metadataerror.To Reproduce
Expected behavior
I would expect it to encode/decode it correctly, or at the very least throw an error that this schema is not supported by IPC.
Additional context
No response