Bug: Arkouda-backed Series creates NumPy RangeIndex
Summary
When constructing a pandas Series backed by an Arkouda
ExtensionArray, pandas automatically creates a default RangeIndex
backed by NumPy.
This silently materializes the index on the client, breaking scalability
for very large arrays.
Problem
Calling:
pd.Series(arkouda_extension_array)
creates a NumPy-backed RangeIndex when index=None.
For large Arkouda arrays, creating a large NumPy index:
- Uses client memory
- Breaks distributed semantics
- May be impossible for very large datasets
Expected Behavior
If no index is provided, the default index should be constructed on the
Arkouda server (e.g., using ak.arange(n)), ensuring the entire Series
remains Arkouda-backed.
Fix
Construct the default index using Arkouda and wrap it in an
ArkoudaExtensionArray instead of relying on pandas' default
RangeIndex.
Bug: Arkouda-backed Series creates NumPy RangeIndex
Summary
When constructing a pandas
Seriesbacked by an ArkoudaExtensionArray, pandas automatically creates a defaultRangeIndexbacked by NumPy.
This silently materializes the index on the client, breaking scalability
for very large arrays.
Problem
Calling:
creates a NumPy-backed
RangeIndexwhenindex=None.For large Arkouda arrays, creating a large NumPy index:
Expected Behavior
If no index is provided, the default index should be constructed on the
Arkouda server (e.g., using
ak.arange(n)), ensuring the entire Seriesremains Arkouda-backed.
Fix
Construct the default index using Arkouda and wrap it in an
ArkoudaExtensionArrayinstead of relying on pandas' defaultRangeIndex.