batch concatenation leads to duplicate dictionary entries and not readable by pandas

### Describe the bug

When we concatenate 2 or more batches where some columns are dictionaries, the dictionaries are concatenated instead of being merged.

The dictionary may end-up with duplicates. For example if both batches have the string "alpha", the new collapsed batch with have 2 dictionary entries for that string. The result is strictly correct (all indices point to their original value).  However any library that tries to perform an operation on the indices will obtain a wrong result. (e.g, an aggregation).

Perhaps more direct: pandas will reject the batch because it validates uniqueness:
```
lib/python3.11/site-packages/pandas/core/dtypes/dtypes.py", line 570, in validate_categories
    raise ValueError("Categorical categories must be unique")
```

Pandas does have a function (union_categoricals) to merge dataframes with different dictionaries but it is not intended to reduce dictionaries of a single dataframe.

### To Reproduce

```
//! Concatenation tests for dictionary arrays.
use std::sync::Arc;

use arrow::{
    array::{Array, ArrayRef, AsArray, DictionaryArray, Int32Array, RecordBatch, StringArray},
    compute::concat_batches,
    datatypes::{DataType, Field, Int32Type, Schema},
};

/// Build a dictionary array with explicit dictionary value order and key values.
fn dictionary_array(dictionary_values: Vec<&str>, keys: Vec<i32>) -> ArrayRef {
    Arc::new(
        DictionaryArray::<Int32Type>::try_new(
            Int32Array::from(keys),
            Arc::new(StringArray::from(dictionary_values)),
        )
        .expect("dictionary array"),
    )
}

/// Build a one-column record batch containing a dictionary array.
fn dictionary_batch(
    schema: Arc<Schema>,
    dictionary_values: Vec<&str>,
    keys: Vec<i32>,
) -> RecordBatch {
    RecordBatch::try_new(schema, vec![dictionary_array(dictionary_values, keys)])
        .expect("record batch")
}

/// this test will start to fail when arrow dictionary concat is supported
#[test]
fn concat_then_normalize_deduplicates_dictionary_values_and_remaps_keys() {
    let schema = Arc::new(Schema::new(vec![Field::new(
        "symbol",
        DataType::Dictionary(Box::new(DataType::Int32), Box::new(DataType::Utf8)),
        false,
    )]));

    let batch_0 = dictionary_batch(
        schema.clone(),
        vec!["alpha", "beta", "gamma"],
        vec![0, 1, 2, 0],
    );
    let batch_1 = dictionary_batch(
        schema.clone(),
        vec!["gamma", "alpha", "beta"],
        vec![2, 1, 0, 2],
    );

    let raw_concatenated = concat_batches(&schema, &[batch_0, batch_1]).expect("concat batches");
    let raw_column = raw_concatenated.column(0).as_dictionary::<Int32Type>();
    // this should be 3 because both batches had the same values
    assert_eq!(raw_column.values().len(), 6);
}

```

### Expected behavior

The dictionary should only contain unique entries.

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

batch concatenation leads to duplicate dictionary entries and not readable by pandas #10160

Describe the bug

To Reproduce

Expected behavior

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

batch concatenation leads to duplicate dictionary entries and not readable by pandas #10160

Description

Describe the bug

To Reproduce

Expected behavior

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions