Skip to content

Conversation

@ajpotts
Copy link
Contributor

@ajpotts ajpotts commented Feb 12, 2026

Summary

This PR adds a groupby() method to the ArkoudaSeriesAccessor,
enabling pandas-style grouping directly from an Arkouda-backed
pd.Series:

s = pd.Series([80, 443, 80]).ak.to_ak()
g = s.ak.groupby()
keys, counts = g.size()

This eliminates the need for users to access internal _data attributes
and provides a clean, abstraction-safe API.


Motivation

Previously, users had to write:

ak.GroupBy(s.values._data).size()

Accessing ._data breaks encapsulation and bypasses the ExtensionArray
abstraction layer.

This PR introduces:

s.ak.groupby()

which: - Preserves the pandas accessor pattern - Avoids NumPy
materialization - Returns a proper arkouda.pandas.groupbyclass.GroupBy
object - Maintains zero-copy behavior


Implementation Details

Changes

arkouda/pandas/extension/_series_accessor.py

  • Import GroupBy from arkouda.pandas.groupbyclass
  • Add groupby(self) -> GroupBy method to ArkoudaSeriesAccessor
  • Validate that the Series is Arkouda-backed
  • Safely extract the underlying Arkouda array (_data)
  • Return GroupBy(akcol)

The method raises:

  • TypeError if the Series is not Arkouda-backed
  • TypeError if the underlying _data attribute is missing

This ensures correctness and defensive behavior.


Tests

Added tests under:

tests/pandas/extension/series_accessor.py

Coverage includes:

  • ✔ Raises when Series is not Arkouda-backed
  • ✔ Returns ak.GroupBy
  • g.size() matches pandas value_counts().sort_index()
  • ✔ Raises when _data is unavailable

This validates both happy-path and failure cases.


API Behavior

Example

>>> s = pd.Series([80, 443, 80]).ak.to_ak()
>>> g = s.ak.groupby()
>>> keys, counts = g.size()

Equivalent to:

ak.GroupBy(s.array._data).size()

But without leaking internal implementation details.


Design Notes

  • No materialization to NumPy occurs.
  • This maintains the zero-copy Arkouda-backed semantics.
  • Aligns with future plans to expand pandas-style groupby support.

Future Extensions (Optional)

This lays the groundwork for:

  • g.sum()
  • g.min() / g.max()
  • Returning pandas Series directly from groupby aggregations
  • Full DataFrame groupby parity

Closes #5406: Series accessor groupby

@ajpotts ajpotts marked this pull request as ready for review February 12, 2026 14:41
Copy link
Contributor

@1RyanK 1RyanK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@ajpotts ajpotts merged commit 294ae6b into Bears-R-Us:main Feb 12, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Series accessor groupby

3 participants