add support for sintela protobuf format by d-chambers · Pull Request #656 · DASDAE/dascore

d-chambers · 2026-04-08T15:33:46Z

Description

This PR adds read and scan support for Sintela's MTLV-wrapped protobuf .pb recordings. It supports format detection without importing protobuf, promotes selected recorder metadata, and decodes supported timeseries, band, and FFT packet families into DASCore patches with validated coordinates.

No linked issue.

Validation

Local validation on the rebased branch:

pre-commit run --all-files
pytest tests/test_io/test_sintela_protobuf/test_sintela_protobuf.py -q
pytest tests/test_io/test_common_io.py tests/test_io/test_io_core.py -q
pytest tests --cov dascore --cov-report term-missing --cov-fail-under=100

CI status before the latest review-fix commit was green for lint, full test matrix, min-deps matrix, and Codecov; benchmark/docs jobs were skipped by workflow policy.

Checklist

I have (if applicable):

referenced the GitHub issue this PR closes.
documented the new feature with docstrings and/or appropriate doc page.
included tests. See testing guidelines.
added the "ready_for_review" tag once the PR is ready to be reviewed.

Summary by CodeRabbit

New Features
- Added support for Sintela protobuf (.pb) recordings with automatic format detection, metadata promotion, and correct time/distance/band/frequency coordinates for timeseries, band, and FFT data.
Chores
- Made the Sintela protobuf reader available via package extras and registry so test artifacts can be downloaded.
Tests
- Added extensive unit and end-to-end tests covering detection, scan, read, selectors, validation/error cases, remote/in-memory flows, and timeouts.

coderabbitai · 2026-04-08T15:34:02Z

Warning

Rate limit exceeded

@d-chambers has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 9 minutes and 2 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e4db64c7-0e9a-43ed-83a3-89d962bd1fbf

📥 Commits

Reviewing files that changed from the base of the PR and between 2425f39 and 3c5b381.

📒 Files selected for processing (3)

dascore/io/sintela_protobuf/utils.py
tests/test_io/_common_io_test_utils.py
tests/test_io/test_sintela_protobuf/test_sintela_protobuf.py

📝 Walkthrough

Walkthrough

Added a Sintela Protobuf IO v1 (MTLV envelope + protobuf parsing), registered it as a FiberIO entry point, added test data, and introduced comprehensive tests plus related test-fixture and timeout adjustments.

Changes

Cohort / File(s)	Summary
Data registry `dascore/data_registry.txt`	Added `sintela_protobuf_1.pb` test artifact entry with content hash and remote URL.
Sintela Protobuf package `dascore/io/sintela_protobuf/__init__.py`, `dascore/io/sintela_protobuf/core.py`	New package and core FiberIO `SintelaProtobufV1` implementing `get_format`, `scan`, and `read`, re-exported from package init.
Sintela Protobuf utils `dascore/io/sintela_protobuf/utils.py`	New large utility module: MTLV envelope parsing, runtime protobuf message construction (cached), family-specific validation/parsing for TS/BAND/FFT, dataclasses for metadata/attrs, and entry points `get_supported_family_tag`, `scan_payload`, `read_payload`.
Project config / entry points `pyproject.toml`	Added `protobuf` to extras/tests and registered `SINTELA_PROTOBUF__V1 = "dascore.io.sintela_protobuf.core:SintelaProtobufV1"` under `dascore.fiber_io`.
Common IO test matrix `tests/test_io/test_common_io.py`	Added `SintelaProtobufV1` to `COMMON_IO_READ_TESTS` with fixture `sintela_protobuf_1.pb`.
New Sintela tests `tests/test_io/test_sintela_protobuf/test_sintela_protobuf.py`	Added extensive unit and end-to-end tests covering format detection, scan/read parity, TS/BAND/FFT decoding, metadata promotion, selector behavior, validation/error cases, and optional-dependency handling.
Remote memory tests `tests/test_io/test_remote_memory.py`	Added `memory_fetch_copy` fixture to fetch and copy test artifacts into `memory://` and refactored two tests to use it.
Remote common IO tests `tests/test_io/test_remote_common_io.py`	Introduced `REMOTE_COMMON_IO_READ_TESTS` (excluding Sintela_Protobuf) and updated remote-derived test case lists to use the filtered mapping.
Test utilities `tests/test_io/_common_io_test_utils.py`	Expanded timeout detection to treat certain `pytest.fail.Exception` messages as timeouts and updated `skip_on_timeout` to catch/reclassify `BaseException`.
HTTP remote test timeout `tests/test_io/test_remote_http.py`	Added `@pytest.mark.timeout(0)` to `test_http_range_hdf5_read_succeeds` to override the test timeout.

Possibly related PRs

add remote HTTP regression tests #652 — Modifies the same HTTP remote test (tests/test_io/test_remote_http.py::test_http_range_hdf5_read_succeeds); changes are directly related at the test level.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description does not follow the required template structure. Critical sections are missing or incomplete.	Add a clear 'Description' section explaining the problem/feature, reference any related GitHub issues using '#' notation, and ensure all applicable checklist items are properly addressed with accurate checkmarks reflecting actual work completed.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'add support for sintela protobuf format' directly and clearly summarizes the main objective of the PR: introducing Sintela protobuf format support to DASCore.
Docstring Coverage	✅ Passed	Docstring coverage is 96.12% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch sintela_proto

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 0/1 reviews remaining, refill in 9 minutes and 2 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 44bc09c41d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-08T15:38:54Z

+    from google.protobuf import descriptor_pb2, descriptor_pool, message_factory
+    from google.protobuf import timestamp_pb2


Guard Sintela protobuf tests behind optional dependency

The new Sintela protobuf test helpers import google.protobuf directly, but this dependency is not guaranteed in the project’s test environments (for example, the min-deps workflow installs .[test] and then runs the full tests suite). In environments without protobuf, invoking these helpers raises ModuleNotFoundError and fails the entire test run instead of skipping optional-format coverage, so this change can break CI for unrelated work.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-08T15:38:54Z

+            int(getattr(acquisition, "fiber_id"))
+            if acquisition is not None and getattr(acquisition, "fiber_id", None) is not None


Preserve missing fiber_id instead of coercing to zero

This presence check treats any existing acquisition_stats message as having fiber_id, because unset protobuf scalar fields still read back as 0 rather than None. As a result, files where acquisition_stats is present but fiber_id is unset will be reported with fiber_id=0, silently corrupting metadata in scan/read outputs.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dascore/io/sintela_protobuf/utils.py`:
- Around line 93-133: The parser currently buffers all records because
_iter_envelope_records returns a list; change _iter_envelope_records to be a
generator (yield EnvelopeRecord) that reads and yields each EnvelopeRecord as it
is parsed (preserve the same header/size/error handling and EnvelopeRecord
construction), update its return/type hint accordingly, and modify
get_supported_family_tag to iterate over the generator (not collect into a list)
so it can return the first supported tag immediately (ensure it still skips
META_TAG and checks TS_TAGS|BAND_TAGS|FFT_TAGS).
- Around line 454-459: The ParseFromString call in _parse_meta currently runs
inside suppress_warnings() but can raise google.protobuf.message.DecodeError;
catch that exception and re-raise it as InvalidFiberFileError with a descriptive
message. Locate _parse_meta(), wrap msg.ParseFromString(payload) in a try/except
that catches DecodeError (from google.protobuf.message) and raises
InvalidFiberFileError (preserving or including the original exception message).
Apply the same pattern to the other ParseFromString call referenced in this
module so both protobuf parses consistently convert DecodeError into
InvalidFiberFileError.

In `@pyproject.toml`:
- Line 147: The test extras are missing the protobuf runtime so imports like
google.protobuf used by the SINTELA_PROTOBUF__V1 entry point will fail; update
pyproject.toml to add the protobuf package to the .[test]
(project.optional-dependencies test) list so tests install it, e.g. add
"protobuf" to the test extras alongside other test deps, ensuring the
SINTELA_PROTOBUF__V1 entry point can import google.protobuf during test runs.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9c1c2f0e-2b7f-4f6e-be63-f2bcae0ded19

📥 Commits

Reviewing files that changed from the base of the PR and between da295d3 and 44bc09c.

📒 Files selected for processing (8)

dascore/data_registry.txt
dascore/io/sintela_protobuf/__init__.py
dascore/io/sintela_protobuf/core.py
dascore/io/sintela_protobuf/utils.py
pyproject.toml
tests/test_io/test_common_io.py
tests/test_io/test_remote_memory.py
tests/test_io/test_sintela_protobuf/test_sintela_protobuf.py

codecov · 2026-04-09T08:38:48Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (906e90b) to head (3c5b381).

Additional details and impacted files

@@            Coverage Diff             @@
##               dev      #656    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files          140       143     +3     
  Lines        12890     13420   +530     
==========================================
+ Hits         12890     13420   +530

Flag	Coverage Δ
unittests	`100.00% <100.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

tests/test_io/_common_io_test_utils.py (1)
82-85: Consider documenting the broad BaseException catch.

Catching BaseException is intentionally broad to intercept any timeout-related exception, including those from pytest-timeout. While the logic correctly re-raises non-timeout exceptions, a brief inline comment explaining why BaseException (rather than Exception) is caught would help future maintainers understand this is deliberate.
💡 Suggested comment
         try:
             yield
-        except BaseException as exc:
+        except BaseException as exc:  # Broad catch to handle pytest-timeout's exceptions
             if not _is_timeout_error(exc):
                 raise
             pytest.skip(str(exc))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_io/_common_io_test_utils.py` around lines 82 - 85, Add a brief
inline comment above the "except BaseException as exc:" block in
tests/test_io/_common_io_test_utils.py explaining that BaseException is
intentionally used (not Exception) to catch timeout-related exceptions raised by
pytest-timeout or similar frameworks, while still re-raising non-timeout errors
via the existing _is_timeout_error(exc) check; reference the existing
_is_timeout_error function in the comment so future maintainers understand the
deliberate design and that non-timeouts are re-raised.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dascore/io/sintela_protobuf/utils.py`:
- Around line 606-609: In _get_time_coord_from_samples, validate the sample_rate
before computing step: check that sample_rate is finite (not NaN/Inf), positive,
and non-zero; if it fails, raise InvalidFiberFileError with a clear message;
otherwise compute step = dc.to_timedelta64(1 / sample_rate) and call
get_coord(start=start, stop=start + step * size, step=step) as before. Ensure
you reference the function _get_time_coord_from_samples and helpers
dc.to_timedelta64 and get_coord in the change.
- Around line 557-560: The current presence check for the optional protobuf
INT32 field uses getattr(acquisition, "fiber_id", None) which treats unset
fields as 0; replace that check with acquisition is not None and
acquisition.HasField("fiber_id") and then cast the value (e.g.,
int(acquisition.fiber_id)) to produce the same result or None otherwise. Update
the expression that sets the fiber_id to use acquisition.HasField("fiber_id")
(matching how identification and metadata_recording_time are checked) and ensure
the final value remains int(...) when present or None when absent.

---

Nitpick comments:
In `@tests/test_io/_common_io_test_utils.py`:
- Around line 82-85: Add a brief inline comment above the "except BaseException
as exc:" block in tests/test_io/_common_io_test_utils.py explaining that
BaseException is intentionally used (not Exception) to catch timeout-related
exceptions raised by pytest-timeout or similar frameworks, while still
re-raising non-timeout errors via the existing _is_timeout_error(exc) check;
reference the existing _is_timeout_error function in the comment so future
maintainers understand the deliberate design and that non-timeouts are
re-raised.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1e73a58b-ec72-402b-8f72-4d55159187fd

📥 Commits

Reviewing files that changed from the base of the PR and between 44bc09c and c6ea016.

📒 Files selected for processing (8)

dascore/io/sintela_protobuf/core.py
dascore/io/sintela_protobuf/utils.py
pyproject.toml
tests/test_io/_common_io_test_utils.py
tests/test_io/test_remote_common_io.py
tests/test_io/test_remote_http.py
tests/test_io/test_remote_memory.py
tests/test_io/test_sintela_protobuf/test_sintela_protobuf.py

🚧 Files skipped from review as they are similar to previous changes (4)

pyproject.toml
tests/test_io/test_remote_memory.py
dascore/io/sintela_protobuf/core.py
tests/test_io/test_sintela_protobuf/test_sintela_protobuf.py

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

tests/test_io/_common_io_test_utils.py (1)

82-87: Optional: extract duplicated timeout-handling block into a helper.

The two except BaseException blocks are identical; a small helper (e.g., _skip_if_timeout_else_raise(exc)) would reduce drift risk.

Also applies to: 101-104

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_io/_common_io_test_utils.py` around lines 82 - 87, There is
duplicated exception handling logic for timeout errors in two `except
BaseException` blocks around the timeout handling code. Extract this logic into
a helper function named like `_skip_if_timeout_else_raise(exc)` that takes the
caught exception object, checks if it is a timeout error using
`_is_timeout_error`, raises it if not, or calls `pytest.skip` with the exception
string if it is. Replace both original except blocks in the test utils
(including the one at 101-104) to call this new helper, reducing code
duplication and drift risk.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dascore/io/sintela_protobuf/utils.py`:
- Around line 907-919: The code uses bin_res as the step to build the frequency
axis (frequency = get_coord(...)) without validating it's a finite, positive
value; add the same validation you used for sample_rate by checking bin_res
(from _assert_float_equal / variable bin_res) is a finite number and > 0 before
calling get_coord, and raise InvalidFiberFileError with a clear message if the
check fails so malformed FFT headers fail predictably instead of inside
get_coord.
- Around line 613-621: The _get_distance_coord function must validate its
spacing and step inputs (spacing must be finite and > 0; step must be a positive
integer) before calling get_coord; if validation fails, raise
InvalidFiberFileError with a clear message including the offending spacing/step
values so format-specific errors surface (this affects callers building
timeseries/band/FFT coords). Implement checks at the top of _get_distance_coord
for spacing (use math.isfinite and >0) and for step (ensure int-like and >0),
and raise InvalidFiberFileError with context (e.g., "invalid channel_spacing X"
/ "invalid channel_step Y") instead of letting get_coord raise.
- Around line 866-876: The current code uses band_def[0] to set
attrs.extra['data_type'] and 'data_units', which mislabels packets that contain
multiple band semantics; instead inspect all entries in band_def (e.g., iterate
band_def and convert each entry's type with _BAND_DATA_TYPE_MAP) and derive a
consolidated value: if all bands map to the same data_type/data_units keep that
single value, otherwise set a clear "mixed" or aggregated value (e.g., a
tuple/list of types or "mixed" and empty units) before passing into _base_attrs
so attrs.data_type and attrs.data_units accurately reflect mixed-band packets.

In `@tests/test_io/test_sintela_protobuf/test_sintela_protobuf.py`:
- Around line 766-769: The test constructs protobuf payloads
(_build_meta_payload and _build_ts_payloads) before the monkeypatch that
simulates missing protobuf, causing failures during file construction; change
the test_missing_protobuf_only_affects_scan_and_read to use a checked-in .pb
fixture (or copy a prebuilt ts.pb blob into path) instead of calling
_build_meta_payload/_build_ts_payloads before patching, or alternatively move
the calls to _build_meta_payload/_build_ts_payloads to after the monkeypatch is
applied; ensure the test then asserts get_format() still works while
scan()/read() raise under the monkeypatched missing-protobuf environment and
keep references to _write_records, get_format, scan, and read to locate the
changes.

---

Nitpick comments:
In `@tests/test_io/_common_io_test_utils.py`:
- Around line 82-87: There is duplicated exception handling logic for timeout
errors in two `except BaseException` blocks around the timeout handling code.
Extract this logic into a helper function named like
`_skip_if_timeout_else_raise(exc)` that takes the caught exception object,
checks if it is a timeout error using `_is_timeout_error`, raises it if not, or
calls `pytest.skip` with the exception string if it is. Replace both original
except blocks in the test utils (including the one at 101-104) to call this new
helper, reducing code duplication and drift risk.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0a302fc7-fcc2-4e1d-978a-bf9eb5678b00

📥 Commits

Reviewing files that changed from the base of the PR and between c6ea016 and 0fb6d82.

📒 Files selected for processing (3)

dascore/io/sintela_protobuf/utils.py
tests/test_io/_common_io_test_utils.py
tests/test_io/test_sintela_protobuf/test_sintela_protobuf.py

coderabbitai Bot added the IO Work for reading/writing different formats label Apr 8, 2026

chatgpt-codex-connector Bot reviewed Apr 8, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 8, 2026

View reviewed changes

Comment thread dascore/io/sintela_protobuf/utils.py Outdated

Comment thread dascore/io/sintela_protobuf/utils.py Outdated

Comment thread pyproject.toml

coderabbitai Bot reviewed Apr 9, 2026

View reviewed changes

Comment thread dascore/io/sintela_protobuf/utils.py Outdated

Comment thread dascore/io/sintela_protobuf/utils.py

coderabbitai Bot added the patch related to Patch class label Apr 9, 2026

coderabbitai Bot reviewed Apr 9, 2026

View reviewed changes

Comment thread dascore/io/sintela_protobuf/utils.py

Comment thread dascore/io/sintela_protobuf/utils.py Outdated

Comment thread dascore/io/sintela_protobuf/utils.py

Comment thread tests/test_io/test_sintela_protobuf/test_sintela_protobuf.py Outdated

d-chambers added 4 commits May 1, 2026 14:05

add support for sintela protobuf format

14f92f1

try fix failures

d425669

address review

558c433

Tolerate unknown Sintela protobuf records

2425f39

d-chambers force-pushed the sintela_proto branch from 0fb6d82 to 2425f39 Compare May 1, 2026 14:54

Address Sintela protobuf review feedback

3c5b381

		from google.protobuf import descriptor_pb2, descriptor_pool, message_factory
		from google.protobuf import timestamp_pb2

		int(getattr(acquisition, "fiber_id"))
		if acquisition is not None and getattr(acquisition, "fiber_id", None) is not None

Conversation

d-chambers commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Validation

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

d-chambers commented Apr 8, 2026 •

edited

Loading

coderabbitai Bot commented Apr 8, 2026 •

edited

Loading

codecov Bot commented Apr 9, 2026 •

edited

Loading