Skip to content

Conversation

@valeriupredoi
Copy link
Collaborator

@valeriupredoi valeriupredoi commented Dec 16, 2025

A bit of a lengthy one, but in a nutshell:

  • eg a stat mean is not a property anymore, but a method, so active.mean[...] becomes active.mean()[...] so we can pass args and kwargs, so now you can active.mean(axis=(0, 1))[...]
  • add plenty of testing for Reductionist's new axis - which currently doesn't work as expected, see below

Main test case for Reductionist with axis

https://github.com/NCAS-CMS/PyActiveStorage/blob/axis_api/tests/test_real_s3_with_axes.py

  • Active loads a 4dim dataset
  • Loaded dataset <HDF5 dataset "m01s30i111": shape (120, 85, 324, 432), type "float32">
  • default axis arg (when axis=None): 'axis': (0, 1, 2, 3)

Test Case 1

def test_no_axis_2():
    """
    Fails: it should pass: 'axis': (0, 1, 2, 3) default
    are fine!

    activestorage.reductionist.ReductionistError: Reductionist error: HTTP 400: {"error": {"message": "request data is not valid", "caused_by": ["__all__: Validation error: Number of reduction axes must be less than length of shape - to reduce over all axes omit the axis field completely [{}]"]}}
    """
    active = build_active()
    result = active.min(axis=())[:]
    assert result == [[[[164.8125]]]]

Test Case 2

def test_axis_0_1():
    """Fails: activestorage.reductionist.ReductionistError: Reductionist error: HTTP 502: -"""
    active = build_active()
    result = active.min(axis=(0, 1))[:]
    assert result == [[[[164.8125]]]]

Test Case 3

def test_axis_0_1_2():
    """Passes fine."""
    active = build_active()
    result = active.min(axis=(0, 1, 2))[:]
    assert result[0][0][0][0] == 171.05126953125

These fails are here https://github.com/NCAS-CMS/PyActiveStorage/actions/runs/20272446127/job/58211728980?pr=300

@valeriupredoi valeriupredoi added the enhancement New feature or request label Dec 16, 2025
@valeriupredoi
Copy link
Collaborator Author

valeriupredoi commented Jan 21, 2026

@maxstack many thanks for looking into this! I think I found the issue at hand - in the current Reductionist, the response is a dict that has a "bytes" key eg:

Reduction result:  {'byte-order': 'little', 'bytes': [112, 4, 46, 67], 'count': [11897280], 'dtype': 'float32', 'shape': []}
Reduction result size:  184

but that value comes in as raw bytes and needs to be decoded at end pount by the Client; this explains a few things:

  • unit test failure
  • the 503 and 504 we see when attempting to run with axis

You can see the 503 from the failed test:

tests/test_real_s3_with_axes.py:78: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
activestorage/active.py:309: in __getitem__
    return self._get_selection(index)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
activestorage/active.py:440: in _get_selection
    return self._from_storage(ds, indexer, array._chunks, out_shape, dtype,
activestorage/active.py:530: in _from_storage
    result, count, out_selection = future.result()
                                   ^^^^^^^^^^^^^^^
../../../miniconda3/envs/activestorage/lib/python3.14/concurrent/futures/_base.py:443: in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
../../../miniconda3/envs/activestorage/lib/python3.14/concurrent/futures/_base.py:395: in __get_result
    raise self._exception
../../../miniconda3/envs/activestorage/lib/python3.14/concurrent/futures/thread.py:86: in run
    result = ctx.run(self.task)
             ^^^^^^^^^^^^^^^^^^
../../../miniconda3/envs/activestorage/lib/python3.14/concurrent/futures/thread.py:73: in run
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
activestorage/active.py:691: in _process_chunk
    tmp, count = reductionist.reduce_chunk(
activestorage/reductionist.py:101: in reduce_chunk
    decode_and_raise_error(response)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

response = <Response [503]>

    def decode_and_raise_error(response):
        """Decode an error response and raise ReductionistError."""
        try:
            error = json.dumps(response.json())
            raise ReductionistError(response.status_code, error)
        except requests.exceptions.JSONDecodeError as exc:
>           raise ReductionistError(response.status_code, "-") from exc
E           activestorage.reductionist.ReductionistError: Reductionist error: HTTP 503: -

activestorage/reductionist.py:273: ReductionistError

-> that's a json decoder error (which also, incidentally, completely destroyed the memory on my local machine). We should not get Reductionist to return raw bytes, we need actual data that can not risk corruption and the Client being unable to decode it and use. Is this something doable? Cheers 🍺

@valeriupredoi
Copy link
Collaborator Author

@maxstack I added a test that runs with a small file, run it like so:

pytest tests/test_real_s3_with_axes.py::test_small_file_axis_0_1

the test uses a small file with data 15x143x144, and looks like this:

def test_small_file_axis_0_1():
    """Fails: activestorage.reductionist.ReductionistError: Reductionist error: HTTP 502: -"""
    active = build_active_small_file()
    result = active.min(axis=(0, 1))[:]
    print("Reductionist final result", result)
    assert result == [[[[164.8125]]]]

the result that Reductionist returns looks like:

Reductionist final result [[[206.4091796875 207.45655822753906 208.03439331054688
   207.1454620361328 205.37808227539062 203.4414520263672
   203.34115600585938 203.58642578125 203.74832153320312
   203.76719665527344 203.86878967285156 203.83860778808594
   203.16082763671875 203.02694702148438 202.32962036132812
   202.00950622558594 201.763671875 200.8488311767578 199.77630615234375
   198.6121826171875 199.1998291015625 199.6461944580078
   198.9845733642578 198.48779296875 198.35638427734375
   198.1031494140625 199.5530242919922 199.96978759765625
   201.20559692382812 200.11032104492188 200.13302612304688
   199.75437927246094 199.53599548339844 198.2672882080078
   199.26861572265625 197.69595336914062 199.5832061767578
   199.6208038330078 200.3925018310547 201.16285705566406 201.7607421875
   201.32652282714844 201.3401641845703 202.55416870117188
   203.03689575195312 202.56088256835938 201.77401733398438
   201.39480590820312 201.0892333984375 201.91769409179688
   208.03439331054688 208.03439331054688 208.03439331054688
   208.03439331054688 208.03439331054688 208.03439331054688
   208.03439331054688 208.03439331054688 208.03439331054688
   208.03439331054688 208.03439331054688 208.03439331054688
   208.03439331054688 208.03439331054688 208.03439331054688
   208.03439331054688 208.03439331054688 208.03439331054688
   208.03439331054688 208.03439331054688 208.03439331054688
   208.03439331054688 208.03439331054688 208.03439331054688
   208.03439331054688 208.03439331054688 208.03439331054688
   208.03439331054688 208.03439331054688 208.03439331054688
   208.03439331054688 208.03439331054688 208.03439331054688
   208.03439331054688 208.03439331054688 208.03439331054688
   208.03439331054688 208.03439331054688 208.03439331054688
   208.03439331054688 208.03439331054688 208.03439331054688
   208.03439331054688 208.03439331054688 208.03439331054688
   208.03439331054688 208.03439331054688 208.03439331054688
   208.03439331054688 208.03439331054688]]]

so it looks like Reductionist is not applying a final statistic, by the looks of it - no idea why there are 101 elements in that array though; at any rate, the full payload sent by Reductionist looks like this:

Reduction result:  {'byte-order': 'little', 'bytes': [192, 104, 78, 67, 225, 116, 79, 67, 206, 8, 80, 67, 61, 37, 79, 67, 202, 96, 77, 67, 3, 113, 75, 67, 86, 87, 75, 67, 32, 150, 75, 67, 146, 191, 75, 67, 103, 196, 75, 67, 105, 222, 75, 67, 175, 214, 75, 67, 44, 41, 75, 67, 230, 6, 75, 67, 98, 84, 74, 67, 111, 2, 74, 67, 128, 195, 73, 67, 77, 217, 72, 67, 188, 198, 71, 67, 184, 156, 70, 67, 40, 51, 71, 67, 109, 165, 71, 67, 13, 252, 70, 67, 224, 124, 70, 67, 60, 91, 70, 67, 104, 26, 70, 67, 147, 141, 71, 67, 68, 248, 71, 67, 162, 52, 73, 67, 62, 28, 72, 67, 14, 34, 72, 67, 31, 193, 71, 67, 55, 137, 71, 67, 109, 68, 70, 67, 196, 68, 71, 67, 42, 178, 69, 67, 77, 149, 71, 67, 237, 158, 71, 67, 123, 100, 72, 67, 177, 41, 73, 67, 192, 194, 73, 67, 151, 83, 73, 67, 21, 87, 73, 67, 222, 141, 74, 67, 114, 9, 75, 67, 150, 143, 74, 67, 38, 198, 73, 67, 18, 101, 73, 67, 216, 22, 73, 67, 238, 234, 73, 67, 127, 123, 75, 67, 164, 232, 76, 67, 121, 135, 76, 67, 90, 193, 76, 67, 123, 66, 76, 67, 184, 207, 76, 67, 91, 133, 76, 67, 102, 9, 77, 67, 77, 207, 76, 67, 255, 254, 76, 67, 99, 74, 77, 67, 19, 220, 77, 67, 19, 220, 76, 67, 90, 229, 77, 67, 12, 5, 77, 67, 244, 110, 78, 67, 206, 160, 76, 67, 92, 164, 77, 67, 81, 44, 77, 67, 167, 73, 78, 67, 173, 184, 78, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67, 206, 8, 80, 67], 'count': [2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145, 2145], 'dtype': 'float32', 'shape': [144]}

bytes is 576 long, so that's a bit weird too. Anyway, at least this test runs 🍺

@valeriupredoi
Copy link
Collaborator Author

the test above, if run locally with bog standard Numpy:

import pyfive
import numpy as np

ds = pyfive.File("/home/valeriu/CMIP6-test.nc")["tas"]
minarr= np.min(ds[:], axis=(0, 1))
print(len(minarr))  # 144
print(min(minarr))  # 197.69595

@valeriupredoi
Copy link
Collaborator Author

@maxstack - good news - dunno how I counted the elements above, but they are 144 indeed - and the results check well against Numpy! I added this test in dc41f7b - and the validation stacks up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants