Skip to content

feat: add one-shot resample function for simplified resampling#78

Merged
phillip-wenig-frequenz merged 6 commits into
frequenz-floss:v0.x.xfrom
phillip-wenig-frequenz:resample-simple-function
Apr 21, 2026
Merged

feat: add one-shot resample function for simplified resampling#78
phillip-wenig-frequenz merged 6 commits into
frequenz-floss:v0.x.xfrom
phillip-wenig-frequenz:resample-simple-function

Conversation

@phillip-wenig-frequenz
Copy link
Copy Markdown
Contributor

@phillip-wenig-frequenz phillip-wenig-frequenz commented Feb 26, 2026

Summary

Add a new resample() convenience function available in both Rust and Python APIs, allowing users to resample data in a single call without needing to instantiate and manage a Resampler instance.

Rust API

use frequenz_resampling::{resample, ResamplingFunction, SimpleSample};
let result = resample(&data, TimeDelta::seconds(5), ResamplingFunction::Average, true);

Python API

from frequenz.resampling import resample, ResamplingFunction
result = resample(data, timedelta(seconds=5), ResamplingFunction.Average)

Changes

  • Added resample() function in Rust (src/lib.rs) with SimpleSample type
  • Added resample() Python binding (src/python.rs)
  • Made epoch_align() public to support the new functionality
  • Updated README with "One-Shot Resampling" and "Stateful Resampling" sections
  • Updated type stubs with Sequence input type for flexibility
  • Added comprehensive test coverage in both Rust and Python
  • Updated RELEASE_NOTES.md

depends on #75

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a one-shot resample() convenience API for both Rust and Python to resample a batch of timestamp/value pairs without instantiating a Resampler, and clarifies/fixes interval-boundary semantics so first_timestamp only affects output timestamp labeling.

Changes:

  • Added one-shot resample() functions to Rust (src/lib.rs) and Python bindings (src/python.rs), plus exports/stubs.
  • Standardized resampling interval semantics to [start, end) regardless of first_timestamp, updating docs and tests accordingly.
  • Updated README and release notes to document the new API and the semantics fix.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/lib.rs Adds Rust resample() convenience function and SimpleSample, re-exports epoch_align.
src/python.rs Adds Python resample() binding and exports it from the extension module.
src/resampler.rs Fixes boundary/grouping behavior to always use [start, end); makes epoch_align() public.
src/tests.rs Updates existing Rust tests and adds coverage for boundary/labeling semantics and one-shot resampling.
tests/test_resampler.py Updates Python tests for new semantics and adds one-shot resample() tests.
frequenz/resampling/_rust_backend.pyi Exposes resample() in stubs and widens input type to Sequence.
frequenz/resampling/__init__.py Re-exports resample() from the Python package.
README.md Documents one-shot vs stateful resampling for Rust and Python.
RELEASE_NOTES.md Notes the first_timestamp semantic fix and the new one-shot API.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/resampler.rs
Comment thread src/python.rs Outdated
@phillip-wenig-frequenz
Copy link
Copy Markdown
Contributor Author

Thanks for the review feedback.

Regarding the division-by-zero concern for sub-millisecond/zero intervals: this is a pre-existing issue in the codebase, not introduced by this PR. The epoch_align() function was already called internally by Resampler::new(), so users could already trigger the same panic through the existing public API.

The scope of this PR is adding the resample() convenience function with behavior consistent with the existing Resampler API. Input validation for intervals could be addressed in a follow-up PR if desired.

@phillip-wenig-frequenz phillip-wenig-frequenz marked this pull request as draft February 26, 2026 12:16
@phillip-wenig-frequenz phillip-wenig-frequenz marked this pull request as ready for review February 26, 2026 12:20
Copilot AI review requested due to automatic review settings February 26, 2026 12:20
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/lib.rs Outdated
Comment thread frequenz/resampling/_rust_backend.pyi Outdated
interval: timedelta,
method: ResamplingFunction,
*,
first_timestamp: bool = True,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might consider using the same terminology as in pandas, e.g. this would be label which can be "right" or "left" (see also my other comment).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

resampling_function: The resampling function.
max_age_in_intervals: The maximum age of a sample in intervals.
start: The start time of the resampling.
first_timestamp: Whether the resampled timestamp should be the first
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this change. There are two separate features here, the "closedness" definition of the intervals, and the label. I wonder if this parameter was intended to be for the interval definition, namely to distinguish:

  • left-closed: [t, t+1) (what we typically use in reporting API)
  • right-closed: (t, t+1] (what we typically use in the microgrid data pipeline in the SDK).

Resampled data cannot be converted from one interval definition to another.

For the other feature, the label, the data can easily be fixed afterwards by shifting the timestamps by the sampling period.

So to me it makes sense to have the original functionality and not change its meaning. Or if so, we better have two parameters.

See also closed and label parameters here: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@phillip-wenig-frequenz phillip-wenig-frequenz force-pushed the resample-simple-function branch 3 times, most recently from 966ba31 to f9d418a Compare April 8, 2026 10:22
Copy link
Copy Markdown

@malteschaaf malteschaaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only two small comments, but also I can't say much about the rust code.

Comment thread frequenz/resampling/_rust_backend.pyi Outdated
max_age_in_intervals: int,
start: datetime,
first_timestamp: bool = True,
closed: Literal["left", "right"],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor thing, because I introduced a similar change for the SDK resampler and was adviced to use an enum instead of a literal here. Did you give this a thought? I know pandas also uses the Literal but it might be benficial here to be a bit more explicit.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I liked the literal stuff more, but I get your point.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed it

Comment thread frequenz/resampling/_rust_backend.pyi Outdated
Comment on lines +129 to +133
closed: Which interval edge is closed for sample membership. Use
`"left"` for `[start, end)` intervals or `"right"` for
`(start, end]` intervals.
label: Which interval edge to use for output timestamps. Use `"left"`
for the interval start or `"right"` for the interval end.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small wording suggestion to make this a bit clearer and more consistent with interval terminology to avoid mapping left/right to start/end:

Suggested change
closed: Which interval edge is closed for sample membership. Use
`"left"` for `[start, end)` intervals or `"right"` for
`(start, end]` intervals.
label: Which interval edge to use for output timestamps. Use `"left"`
for the interval start or `"right"` for the interval end.
closed: Which interval edge is closed for sample membership. Use `"left"` for
left-closed, right-open intervals `[start, end)` and `"right"` for
right-closed, left-open intervals `(start, end]`.
label: Which interval edge to use for output timestamps. Use `"left"` for the left bin edge and `"right"` for the right bin edge.

@phillip-wenig-frequenz phillip-wenig-frequenz force-pushed the resample-simple-function branch 2 times, most recently from eb5c08c to 89fffb9 Compare April 8, 2026 11:58
Copy link
Copy Markdown

@malteschaaf malteschaaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread tests/test_resampler.py Outdated

assert result == [
(start + 5 * step, 20.0),
(start + 10 * step, None),
Copy link
Copy Markdown

@cwasicki cwasicki Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be

        (start, 10.0),
        (start + 5 * step, 20),

Copy link
Copy Markdown

@malteschaaf malteschaaf Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean

        (start, 10.0),
        (start + 5 * step, 20),

because start*step doesn't work, I think.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, fixed. Thank you!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With closed=Closed.Right and label=Label.Right, the buckets are (0s, 5s] and (5s, 10s], labeled at 5s and 10s. That means:

  • the sample at start (0s) is excluded from the first bucket
  • the sample at start + 5 * step (5s) is included in the first bucket

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on whether your reference is the index or the data. If it's the data, the first bucket would be introduced as (-5s, 0s] to cover your data point.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. for pandas:

pd.DataFrame([10,20], index=pd.to_datetime(["1970-01-01T00:00:00Z", "1970-01-01T00:00:05Z"])).resample("5s", closed="right", label="right").sum()
                            0
1970-01-01 00:00:00+00:00  10
1970-01-01 00:00:05+00:00  20

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also see the new bucket when changing the label to left:

pd.DataFrame([10,20], index=pd.to_datetime(["1970-01-01T00:00:00Z", "1970-01-01T00:00:05Z"])).resample("5s", closed="right", label="left").sum()
                            0
1969-12-31 23:59:55+00:00  10
1970-01-01 00:00:00+00:00  20

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change it to behave exactly like pandas if you like. What will be more helpful for our use cases?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I prefer the pandas behavior since it focuses on keeping the data and not the index.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, done

Comment thread tests/test_resampler.py Outdated
index=pd.DatetimeIndex([timestamp for timestamp, _ in data]),
dtype="float64",
)
pandas_result = pandas_series.resample("5s", closed="left", label="right").mean()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can iterate over all combinations of closed/label options to validate consistency between both.

Comment thread tests/test_resampler.py

# First value in each interval is None
data: list[tuple[dt.datetime, float | None]] = [
(start + i * step, None if i in (0, 5) else float(i + 1)) for i in range(10)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the interval only contains None?

Copy link
Copy Markdown
Contributor Author

@phillip-wenig-frequenz phillip-wenig-frequenz Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complete bucket of only None:

  • Average, Sum, Min, Max, First, Last, Coalesce -> None
  • Count -> 0

Partially None bucket:

  • Average, Sum, Min, Max ignore the Nones
  • First and Last do not ignore them; if the first/last sample is None, result can be None
  • Coalesce returns the first non-None
  • Count counts only non-None

Comment thread tests/test_resampler.py


def test_resample_function_with_none_values() -> None:
"""Test the resample function with None values."""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are NaNs handled the same as Nones?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complete bucket of only NaN:

  • all aggregations except Count return NaN
  • Count -> counts them as present

Partially NaN bucket:

  • Average, Sum -> NaN
  • Min, Max -> effectively ignore NaN when real numbers are present
  • First, Last -> return NaN if the chosen edge sample is NaN
  • Coalesce -> returns NaN if it is the first non-None
  • Count -> counts NaN as present

Comment thread tests/test_resampler.py
# Interval [5, 10): values 7,8,9,10 → avg = 8.5
assert result[0] == (start, 3.5)
assert result[1] == (start + 5 * step, 8.5)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a test for inf too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complete bucket of only inf:

  • all aggregations except Count return inf
  • Count -> counts them as present

Partially inf bucket:

  • Average, Sum -> inf
  • Min -> finite minimum wins if any smaller finite value exists
  • Max -> inf
  • First, Last -> return inf if the chosen edge sample is inf
  • Coalesce -> returns inf if it is the first non-None
  • Count -> counts inf as present

@phillip-wenig-frequenz
Copy link
Copy Markdown
Contributor Author

@cwasicki pandas treats both None and NaN as missing in numeric series, while our implementation only treats None as missing.

Should we adapt it to work exactly like pandas or do we accept this difference? I think our way is better, since None and NaN are two different concepts. None means, nothing arrived. NaN means, NaN arrived.

@phillip-wenig-frequenz
Copy link
Copy Markdown
Contributor Author

I added some more tests.

@cwasicki
Copy link
Copy Markdown

Should we adapt it to work exactly like pandas or do we accept this difference? I think our way is better, since None and NaN are two different concepts. None means, nothing arrived. NaN means, NaN arrived.

I haven't thought about this enough but your explanation makes sense to me. But I think this should be clearly explained in the docs, maybe even stating the different behavior in pandas.

cwasicki
cwasicki previously approved these changes Apr 10, 2026
Copy link
Copy Markdown

@cwasicki cwasicki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@phillip-wenig-frequenz phillip-wenig-frequenz added this pull request to the merge queue Apr 10, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to no response for status checks Apr 10, 2026
@phillip-wenig-frequenz phillip-wenig-frequenz added this pull request to the merge queue Apr 21, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to no response for status checks Apr 21, 2026
…_timestamp

The `first_timestamp` parameter was incorrectly affecting both output timestamp
labeling and interval grouping semantics. When `first_timestamp=false`, intervals
used `(start, end]` instead of `[start, end)`, causing samples at exact interval
boundaries to be excluded.

This fix makes interval grouping consistently use `[start, end)` regardless of
`first_timestamp` value. The parameter now only controls the output timestamp
labeling as documented: `true` labels with the interval start, `false` with the
interval end.

Changes include:
- Simplified boundary checking functions to remove first_timestamp logic
- Updated documentation to clarify the parameter's sole purpose
- Fixed test data to start at t=0 to match README example
- Added explicit test for first_timestamp=false behavior
- Updated RELEASE_NOTES with bug fix details

Signed-off-by: Phillip Wenig <phillip.wenig@frequenz.com>
Add tests to strengthen coverage of the first_timestamp fix:

Rust tests (src/tests.rs):
- test_first_timestamp_same_values: Verifies both settings produce
  identical aggregated values, differing only in output timestamps
- test_sample_at_interval_boundary: Confirms sample at t=5 goes to
  interval [5, 10), validating [start, end) semantics
- test_data_starting_mid_interval: Tests correct behavior when data
  doesn't start at interval boundary

Python tests (tests/test_resampler.py):
- Updated all tests to start data at t=0 to match README example
- Fixed expected values to match correct [start, end) interval semantics

Signed-off-by: Phillip Wenig <phillip.wenig@frequenz.com>
Add a new `resample()` convenience function available in both Rust and
Python APIs, allowing users to resample data in a single call without
needing to instantiate and manage a `Resampler` instance.

The Rust implementation includes:
- A new `SimpleSample` type for simple timestamp/value pairs
- The public `resample()` function that internally uses `Resampler`
- Made `epoch_align()` public to support the new functionality

The Python implementation provides the same functionality through PyO3
bindings with matching semantics.

Also updated:
- README with "One-Shot Resampling" and "Stateful Resampling" sections
- Type stubs with function signature using Sequence for flexibility
- Comprehensive test coverage in both Rust and Python
- RELEASE_NOTES.md

Signed-off-by: Phillip Wenig <phillip.wenig@frequenz.com>
… label parameters

Replace the single boolean `first_timestamp` parameter with two explicit
enums in both `Resampler::new()` and `resample()` functions:

- `Closed`: Controls which edge of the interval is closed for sample
  membership. Use `Closed::Left` for `[start, end)` intervals or
  `Closed::Right` for `(start, end]` intervals.
- `Label`: Controls which edge of the interval is used for output
  timestamps. Use `Label::Left` for the interval start or `Label::Right`
  for the interval end.

This clarifies the API by making the distinction between interval
membership semantics (closed) and output timestamp labeling (label)
explicit, rather than bundling them into a single boolean.

BREAKING CHANGE: The `first_timestamp` parameter is removed from both
Rust and Python APIs. Replace calls using `first_timestamp=true` with
`Closed::Left, Label::Left` (Rust) or `closed="left", label="left"`
(Python). Replace `first_timestamp=false` with `Closed::Left, Label::Right`
(Rust) or `closed="left", label="right"` (Python).

Signed-off-by: Phillip Wenig <phillip.wenig@frequenz.com>
Signed-off-by: Phillip Wenig <phillip.wenig@frequenz.com>
@phillip-wenig-frequenz phillip-wenig-frequenz added this pull request to the merge queue Apr 21, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to no response for status checks Apr 21, 2026
Signed-off-by: Phillip Wenig <phillip.wenig@frequenz.com>
@phillip-wenig-frequenz phillip-wenig-frequenz added this pull request to the merge queue Apr 21, 2026
Merged via the queue into frequenz-floss:v0.x.x with commit b0f0a01 Apr 21, 2026
7 checks passed
@phillip-wenig-frequenz phillip-wenig-frequenz deleted the resample-simple-function branch April 21, 2026 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants