Skip to content

Pandas 3 + PyArrow 14 breaks DataFrame shuffle: RuntimeError: P2P failed during barrier phase #9183

@crusaderky

Description

@crusaderky

CI broke pretty badly, only on 3.11 environments (see below), the day pandas 3.0.0 was released.
Many tests are affected.

Image

full report: https://dask.github.io/distributed/test_report.html

Broken tests

  • distributed.protocol.tests.test_highlevelgraph
  • distributed.shuffle.tests.test_graph
  • distributed.shuffle.tests.test_merge
  • distributed.shuffle.tests.test_metrics.test_dataframe
  • distributed.shuffle.tests.test_shuffle

mamba list diff analysis

It looks like the main difference between 3.11 environments and 3.12+ are the numpy and pyarrow versions:

From CI:

  • python 3.11, numpy 1.26, pandas 2.3.3, pyarrow 14: OK
  • python 3.11, numpy 1.26, pandas 3.0.0, pyarrow 14: BROKEN
  • python 3.12, numpy 2.4, pandas 3.0.0, pyarrow 22: OK

Local tests:

  • python 3.11, numpy 1.26, pandas 3.0.0, pyarrow 16.1: OK
  • python 3.11, numpy 1.26, pandas 3.0.0, pyarrow 16.0: OK
  • python 3.11, numpy 1.26, pandas 3.0.0, pyarrow 15: BROKEN

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething is brokenshuffletestsUnit tests and/or continuous integration

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions