GH-49644: [Python] Support converting list of multi-dimensional arrays to FixedShapeTensor by aboderinsamuel · Pull Request #50203 · apache/arrow

aboderinsamuel · 2026-06-17T09:21:38Z

Rationale for this change

Constructing a fixed-shape-tensor array from a list of individual ndarrays only
worked when each element was 1-D; ≥2-D elements failed with
ArrowInvalid: Can only convert 1-dimensional array values. The only workaround
was stacking the list into a single ndarray and using
FixedShapeTensorArray.from_numpy_ndarray.

What changes are included in this PR?

The C++ list converter PyListConverter::AppendNdarray now accepts
multi-dimensional ndarray elements for fixed-size lists (the storage of a
fixed-shape tensor) by flattening them in C order. The fixed-size-list builder
still validates that the flattened length matches the list width, so wrong sizes
error cleanly. Variable-sized lists remain restricted to 1-D values to avoid
ambiguity. As a side benefit, plain fixed_size_list also accepts
multi-dimensional ndarray elements now.

Are these changes tested?

Yes:

test_tensor_array_from_list_of_ndarrays — construction from 2-D and 3-D
ndarrays, null handling, storage parity with from_numpy_ndarray, and the
size-mismatch error, across int8/int64/float32.
test_fixed_size_list_from_multidim_ndarray — plain fixed_size_list from
multi-dim arrays, plus a check that variable-sized lists still reject 2-D.

Are there any user-facing changes?

Yes — pa.array([multi-dim ndarrays], type=fixed_shape_tensor(...)) (and the
same for fixed_size_list) now works instead of raising. Existing 1-D behavior
and variable-sized-list behavior are unchanged.

Scoped to construction only; the reverse to_numpy shape-preservation also
raised in the issue is intentionally left as a separate follow-up.

GitHub Issue: [Python] Support converting list of multi-dimensional arrays to FixedShapeTensor #49644

github-actions · 2026-06-17T09:22:15Z

⚠️ GitHub issue #49644 has been automatically assigned in GitHub to PR creator.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds support for constructing fixed-shape tensors and fixed-size lists from lists of multi-dimensional NumPy ndarrays by flattening values in C order (GH-49644).

Changes:

Add tests covering tensor arrays built from lists of ndarrays (including nulls and shape mismatch).
Add tests ensuring fixed-size lists accept multi-dimensional ndarray elements (and reject invalid cases).
Update ndarray-to-list conversion to allow flattening for FIXED_SIZE_LIST while keeping variable-sized lists restricted to 1D.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
python/pyarrow/tests/test_extension_type.py	Adds coverage for building `FixedShapeTensorArray` from a list of ndarrays.
python/pyarrow/tests/test_array.py	Adds a regression test for fixed-size list conversion from multi-dimensional ndarrays.
python/pyarrow/src/arrow/python/python_to_arrow.cc	Implements multi-dimensional ndarray flattening for fixed-size lists during conversion.

… arrays to FixedShapeTensor

…cover C-order flatten

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

… lists

AlenkaF · 2026-06-22T09:17:41Z

Thank you @aboderinsamuel!
Can you have a look at the linter error here: https://github.com/apache/arrow/actions/runs/27719407632/job/82001639136?pr=50203#step:5:98

Also, am curious how you decided to use PyArray_Ravel? Are there any other alternatives?

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

aboderinsamuel · 2026-06-22T21:39:26Z

Hi @AlenkaF The macOS-intel failure is an intermittent dependency-install flake. Homebrew lock on aws-sdk-cpp, before the build/tests even run. It's hitting other PRs too (e.g. #50146), and #50195 is actively moving the macOS AWS-SDK dependency, so it looks like known infra rather than my change. Happy to keep re-running until it catches a clean runner. Everything else is green. @rok 🙂

AlenkaF

Repeating a question :)

AlenkaF · 2026-06-23T06:49:06Z

+      if (PyArray_NDIM(ndarray) < 2 || this->list_type_->id() != Type::FIXED_SIZE_LIST) {
+        return Status::Invalid("Can only convert 1-dimensional array values");
+      }
+      flattened.reset(PyArray_Ravel(ndarray, NPY_CORDER));


Why did you decided to use PyArray_Ravel? Are there any other alternatives?

PyArray_Ravel(ndarray, NPY_CORDER) returns a 1-D, C-order view, which lets me reuse the existing 1-D append paths unchanged (the typed fast paths and the PySequence fallback). It's zero-copy when the input is already C-contiguous and only copies for non-contiguous/Fortran-order input, where a copy is unavoidable to get C-order data anyway, which also matches the fixed-shape-tensor's row-major storage.

Alternatives: PyArray_Flatten(NPY_CORDER) gives the same result but always copies; a manual NpyIter/buffer loop could avoid even the view allocation but would mean duplicating the typed-append fast paths. Ravel was the smallest change with the best reuse/performance balance.

PyArray_Ravel over PyArray_Flatten seems ok. If I read the context correctly we're using a builder to append these arrays. When builder finishes it creates a new buffer with a copy of concatenated arrays. So PyArray_Ravel should be ok if the list of arrays is not deleted before appending is complete, which I expect won't happen. @AlenkaF am I reading this right?

Could you have a look at PyArray_CheckFromAny with flag NPY_ARRAY_C_CONTIGUOUS? If I understand correctly it reshapes if necessary or gives access to the data/view directly. Then we can use PyArray_DATA as in other places in the codebase.

I am not sure which would fit better but am interested in what you think.

Also to note here - PyArray_Ravel seems to default to C order. So we should document somewhere that no matter the input arrays order we will return C order. (@AlenkaF please doublecheck me here :) )

Please look into @AlenkaF proposal here, it might be better to have the explicit approach here.

@AlenkaF PyArray_Ravel returns a 1-D C-order view so I could reuse the existing 1-D append paths, but per the discussion I'll switch to the explicit PyArray_CheckFromAny + NPY_ARRAY_C_CONTIGUOUS approach and read PyArray_DATA directly, it's more idiomatic here and lets me handle byte order in the same call.

@rok Will do, switching to the explicit PyArray_CheckFromAny + NPY_ARRAY_C_CONTIGUOUS approach and reading PyArray_DATA directly, matching the existing pattern. That should also fold in the byte-order handling.

Great, I think having more control can be helpful. Thanks for looking into it!
Also, some other flags might be needed, see: https://numpy.org/doc/stable/reference/c-api/array.html#basic-array-flags

github-actions · 2026-06-23T10:37:26Z

⚠️ GitHub issue #49644 has been automatically assigned in GitHub to PR creator.

tadeja · 2026-06-23T11:09:46Z

Hi all. I'd add two points for further review of current PR changes (thanks @aboderinsamuel !)
Example of wrong shape and example of permuted tensor currently produce results but might need to return errors instead:

import numpy as np
import pytest
import pyarrow as pa

np_dtype = np.dtype("int8")
tensor_type = pa.fixed_shape_tensor(pa.from_numpy_dtype(np_dtype), (2, 3))
pa.array([np.arange(6, dtype=np_dtype).reshape(3, 2)], type=tensor_type)
# As (3, 2) has the right number of elements (6), but the wrong shape for a (2, 3) tensor,
# shouldn't we get an error?

Currently returns:

<pyarrow.lib.FixedShapeTensorArray object at 0x12524ec20>
[
  [
    0,
    1,
    2,
    3,
    4,
    5
  ]
]

np_dtype2 = np.dtype("float32")
elements = [
  np.arange(6, dtype=np_dtype2).reshape(2, 3),
  np.arange(6, 12, dtype=np_dtype2).reshape(2, 3),
]
permuted_type = pa.fixed_shape_tensor(pa.from_numpy_dtype(np_dtype2), (2, 3),
                                    permutation=[1, 0])
pa.array(elements, type=permuted_type)
# For permuted tensor types built from multi-dim ndarrays, doesn't this store the wrong layout?

Currently returns:

<pyarrow.lib.FixedShapeTensorArray object at 0x104597ac0>
[
  [
    0,
    1,
    2,
    3,
    4,
    5
  ],
  [
    6,
    7,
    8,
    9,
    10,
    11
  ]
]

rok · 2026-06-23T11:14:55Z

+      value = flattened.obj();
+      ndarray = reinterpret_cast<PyArrayObject*>(value);
    }
    if (PyArray_ISBYTESWAPPED(ndarray)) {


I wonder how PyArray_Ravel behaves in big-endian situations? I imagine it would return the correct result but we can't really test anyway. Output would likely be PyArray_ISBYTESWAPPED(ndarray) == true so this would fail and the computation above would be done in vain.
@aboderinsamuel please move the if (PyArray_ISBYTESWAPPED(ndarray)) block above the if (PyArray_NDIM(ndarray) != 1) block.

rok · 2026-06-23T11:22:49Z

+      // which the builder validates below. 0-dimensional arrays and
+      // variable-sized lists remain restricted to 1-dimensional values.
+      if (PyArray_NDIM(ndarray) < 2 || this->list_type_->id() != Type::FIXED_SIZE_LIST) {
+        return Status::Invalid("Can only convert 1-dimensional array values");


This should probably say something like:

Suggested change

return Status::Invalid("Can only convert 1-dimensional array values");

return Status::Invalid("Can only convert 1-dimensional array values to variable sized lists array");

rok · 2026-06-23T11:25:18Z

+      // GH-49644: a fixed-size list (e.g. the storage of a fixed-shape tensor)
+      // can be built from a multi-dimensional array by flattening it in C
+      // order. The total number of elements must still match the list size,
+      // which the builder validates below. 0-dimensional arrays and
+      // variable-sized lists remain restricted to 1-dimensional values.


This could be less verbose.

okay sir 👍

rok · 2026-06-23T11:31:27Z

+      if (PyArray_NDIM(ndarray) < 2 || this->list_type_->id() != Type::FIXED_SIZE_LIST) {
+        return Status::Invalid("Can only convert 1-dimensional array values");
+      }
+      flattened.reset(PyArray_Ravel(ndarray, NPY_CORDER));


Also to note here - PyArray_Ravel seems to default to C order. So we should document somewhere that no matter the input arrays order we will return C order. (@AlenkaF please doublecheck me here :) )

aboderinsamuel · 2026-06-23T12:50:29Z

Thanks @tadeja @rok @AlenkaF, these are really helpful, i'm sorry i didn't think of this earlier. I reproduced tadeja's cases and both are real: a wrong-shape array (e.g. (3,2) into a (2,3) tensor) is silently accepted, and permuted tensor types store the wrong layout (the to_numpy round-trip doesn't return the input).

Root cause: array.pxi swaps the extension type for its storage_type before conversion and re-wraps with wrap_array after, so the C++ converter only ever sees the flat fixed_size_list, it can't know the tensor's shape or permutation. The flatten is correct for a plain fixed_size_list, but shape-validation and permutation-handling need to live in the Python/Cython layer where the FixedShapeTensorType is still intact.

Plan:

C++: switch from PyArray_Ravel to the explicit PyArray_CheckFromAny + NPY_ARRAY_C_CONTIGUOUS approach, reading PyArray_DATA directly (per @AlenkaF), this should also let me handle the byte-order case in the same call.
Cython: validate each element's shape against the tensor's shape (so (3,2) into (2,3) errors), and reject permuted types with a clear error for now, with full permutation support as a follow-up, unless you'd prefer I handle the transpose here.
Also tighten the comment + error message and document that we always output C order, per @rok.

Does this direction sound right before I work on it?

rok · 2026-06-23T13:05:53Z

Cython: validate each element's shape against the tensor's shape (so (3,2) into (2,3) errors), and reject permuted types with a clear error for now, with full permutation support as a follow-up, unless you'd prefer I handle the transpose here.

Permuted output types would best fit into a follow-up issue indeed.
As for non-C ordered input types - if we have a clean way to handle them (e.g. always convert them to C) and document that in the function docs I think that'd be ok. It doesn't seem likely we'll have requests for non-C orders anytime soon, but we should guard against misinterpretation of non-C ordered data.

Does this direction sound right before I work on it?

Yes! Please proceed.

Copilot AI review requested due to automatic review settings June 17, 2026 09:21

aboderinsamuel requested review from AlenkaF, pitrou, raulcd and rok as code owners June 17, 2026 09:21

github-actions Bot added Component: Python awaiting review Awaiting review labels Jun 17, 2026

Copilot AI reviewed Jun 17, 2026

View reviewed changes

Comment thread python/pyarrow/tests/test_array.py Outdated

Comment thread python/pyarrow/tests/test_array.py

github-actions Bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jun 17, 2026

aboderinsamuel added 2 commits June 17, 2026 21:42

apacheGH-49644: [Python] Support converting list of multi-dimensional…

84154db

… arrays to FixedShapeTensor

apacheGH-49644: [Python] Make fixed_size_list test deterministic and …

e695d01

…cover C-order flatten

aboderinsamuel force-pushed the gh-49644-list-multidim-to-fixed-shape-tensor branch from 90c2ac4 to e695d01 Compare June 17, 2026 20:42

Copilot AI review requested due to automatic review settings June 17, 2026 20:42

Copilot started reviewing on behalf of aboderinsamuel June 17, 2026 20:43 View session

Copilot AI reviewed Jun 17, 2026

View reviewed changes

Comment thread python/pyarrow/src/arrow/python/python_to_arrow.cc

apacheGH-49644: [Python] Only flatten ndim>=2 ndarrays for fixed-size…

daed54f

… lists

apacheGH-49644: [Python] Fix clang-format in AppendNdarray

832acee

Copilot AI review requested due to automatic review settings June 22, 2026 17:04

Copilot started reviewing on behalf of aboderinsamuel June 22, 2026 17:04 View session

Copilot AI reviewed Jun 22, 2026

View reviewed changes

AlenkaF reviewed Jun 23, 2026

View reviewed changes

rok changed the title ~~GH-49644: [Python] Support converting list of multi-dimensional array…~~ GH-49644: [Python] Support converting list of multi-dimensional arrays to FixedShapeTensor Jun 23, 2026

github-actions Bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jun 23, 2026

rok reviewed Jun 23, 2026

View reviewed changes

	return Status::Invalid("Can only convert 1-dimensional array values");
	return Status::Invalid("Can only convert 1-dimensional array values to variable sized lists array");

Conversation

aboderinsamuel commented Jun 17, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

AlenkaF commented Jun 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

aboderinsamuel commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlenkaF left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 23, 2026

Uh oh!

tadeja commented Jun 23, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aboderinsamuel commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rok commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

aboderinsamuel commented Jun 17, 2026 •

edited by github-actions Bot

Loading

aboderinsamuel commented Jun 22, 2026 •

edited

Loading

aboderinsamuel commented Jun 23, 2026 •

edited

Loading