Skip to content

add support for sintela protobuf format#656

Open
d-chambers wants to merge 5 commits intodevfrom
sintela_proto
Open

add support for sintela protobuf format#656
d-chambers wants to merge 5 commits intodevfrom
sintela_proto

Conversation

@d-chambers
Copy link
Copy Markdown
Contributor

@d-chambers d-chambers commented Apr 8, 2026

Description

This PR adds read and scan support for Sintela's MTLV-wrapped protobuf .pb recordings. It supports format detection without importing protobuf, promotes selected recorder metadata, and decodes supported timeseries, band, and FFT packet families into DASCore patches with validated coordinates.

No linked issue.

Validation

Local validation on the rebased branch:

  • pre-commit run --all-files
  • pytest tests/test_io/test_sintela_protobuf/test_sintela_protobuf.py -q
  • pytest tests/test_io/test_common_io.py tests/test_io/test_io_core.py -q
  • pytest tests --cov dascore --cov-report term-missing --cov-fail-under=100

CI status before the latest review-fix commit was green for lint, full test matrix, min-deps matrix, and Codecov; benchmark/docs jobs were skipped by workflow policy.

Checklist

I have (if applicable):

  • referenced the GitHub issue this PR closes.
  • documented the new feature with docstrings and/or appropriate doc page.
  • included tests. See testing guidelines.
  • added the "ready_for_review" tag once the PR is ready to be reviewed.

Summary by CodeRabbit

  • New Features

    • Added support for Sintela protobuf (.pb) recordings with automatic format detection, metadata promotion, and correct time/distance/band/frequency coordinates for timeseries, band, and FFT data.
  • Chores

    • Made the Sintela protobuf reader available via package extras and registry so test artifacts can be downloaded.
  • Tests

    • Added extensive unit and end-to-end tests covering detection, scan, read, selectors, validation/error cases, remote/in-memory flows, and timeouts.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 8, 2026

Warning

Rate limit exceeded

@d-chambers has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 9 minutes and 2 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e4db64c7-0e9a-43ed-83a3-89d962bd1fbf

📥 Commits

Reviewing files that changed from the base of the PR and between 2425f39 and 3c5b381.

📒 Files selected for processing (3)
  • dascore/io/sintela_protobuf/utils.py
  • tests/test_io/_common_io_test_utils.py
  • tests/test_io/test_sintela_protobuf/test_sintela_protobuf.py
📝 Walkthrough

Walkthrough

Added a Sintela Protobuf IO v1 (MTLV envelope + protobuf parsing), registered it as a FiberIO entry point, added test data, and introduced comprehensive tests plus related test-fixture and timeout adjustments.

Changes

Cohort / File(s) Summary
Data registry
dascore/data_registry.txt
Added sintela_protobuf_1.pb test artifact entry with content hash and remote URL.
Sintela Protobuf package
dascore/io/sintela_protobuf/__init__.py, dascore/io/sintela_protobuf/core.py
New package and core FiberIO SintelaProtobufV1 implementing get_format, scan, and read, re-exported from package init.
Sintela Protobuf utils
dascore/io/sintela_protobuf/utils.py
New large utility module: MTLV envelope parsing, runtime protobuf message construction (cached), family-specific validation/parsing for TS/BAND/FFT, dataclasses for metadata/attrs, and entry points get_supported_family_tag, scan_payload, read_payload.
Project config / entry points
pyproject.toml
Added protobuf to extras/tests and registered SINTELA_PROTOBUF__V1 = "dascore.io.sintela_protobuf.core:SintelaProtobufV1" under dascore.fiber_io.
Common IO test matrix
tests/test_io/test_common_io.py
Added SintelaProtobufV1 to COMMON_IO_READ_TESTS with fixture sintela_protobuf_1.pb.
New Sintela tests
tests/test_io/test_sintela_protobuf/test_sintela_protobuf.py
Added extensive unit and end-to-end tests covering format detection, scan/read parity, TS/BAND/FFT decoding, metadata promotion, selector behavior, validation/error cases, and optional-dependency handling.
Remote memory tests
tests/test_io/test_remote_memory.py
Added memory_fetch_copy fixture to fetch and copy test artifacts into memory:// and refactored two tests to use it.
Remote common IO tests
tests/test_io/test_remote_common_io.py
Introduced REMOTE_COMMON_IO_READ_TESTS (excluding Sintela_Protobuf) and updated remote-derived test case lists to use the filtered mapping.
Test utilities
tests/test_io/_common_io_test_utils.py
Expanded timeout detection to treat certain pytest.fail.Exception messages as timeouts and updated skip_on_timeout to catch/reclassify BaseException.
HTTP remote test timeout
tests/test_io/test_remote_http.py
Added @pytest.mark.timeout(0) to test_http_range_hdf5_read_succeeds to override the test timeout.

Possibly related PRs

  • add remote HTTP regression tests #652 — Modifies the same HTTP remote test (tests/test_io/test_remote_http.py::test_http_range_hdf5_read_succeeds); changes are directly related at the test level.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description does not follow the required template structure. Critical sections are missing or incomplete. Add a clear 'Description' section explaining the problem/feature, reference any related GitHub issues using '#' notation, and ensure all applicable checklist items are properly addressed with accurate checkmarks reflecting actual work completed.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'add support for sintela protobuf format' directly and clearly summarizes the main objective of the PR: introducing Sintela protobuf format support to DASCore.
Docstring Coverage ✅ Passed Docstring coverage is 96.12% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sintela_proto

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 9 minutes and 2 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added the IO Work for reading/writing different formats label Apr 8, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 44bc09c41d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +68 to +69
from google.protobuf import descriptor_pb2, descriptor_pool, message_factory
from google.protobuf import timestamp_pb2
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard Sintela protobuf tests behind optional dependency

The new Sintela protobuf test helpers import google.protobuf directly, but this dependency is not guaranteed in the project’s test environments (for example, the min-deps workflow installs .[test] and then runs the full tests suite). In environments without protobuf, invoking these helpers raises ModuleNotFoundError and fails the entire test run instead of skipping optional-format coverage, so this change can break CI for unrelated work.

Useful? React with 👍 / 👎.

Comment thread dascore/io/sintela_protobuf/utils.py Outdated
Comment on lines +475 to +476
int(getattr(acquisition, "fiber_id"))
if acquisition is not None and getattr(acquisition, "fiber_id", None) is not None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve missing fiber_id instead of coercing to zero

This presence check treats any existing acquisition_stats message as having fiber_id, because unset protobuf scalar fields still read back as 0 rather than None. As a result, files where acquisition_stats is present but fiber_id is unset will be reported with fiber_id=0, silently corrupting metadata in scan/read outputs.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dascore/io/sintela_protobuf/utils.py`:
- Around line 93-133: The parser currently buffers all records because
_iter_envelope_records returns a list; change _iter_envelope_records to be a
generator (yield EnvelopeRecord) that reads and yields each EnvelopeRecord as it
is parsed (preserve the same header/size/error handling and EnvelopeRecord
construction), update its return/type hint accordingly, and modify
get_supported_family_tag to iterate over the generator (not collect into a list)
so it can return the first supported tag immediately (ensure it still skips
META_TAG and checks TS_TAGS|BAND_TAGS|FFT_TAGS).
- Around line 454-459: The ParseFromString call in _parse_meta currently runs
inside suppress_warnings() but can raise google.protobuf.message.DecodeError;
catch that exception and re-raise it as InvalidFiberFileError with a descriptive
message. Locate _parse_meta(), wrap msg.ParseFromString(payload) in a try/except
that catches DecodeError (from google.protobuf.message) and raises
InvalidFiberFileError (preserving or including the original exception message).
Apply the same pattern to the other ParseFromString call referenced in this
module so both protobuf parses consistently convert DecodeError into
InvalidFiberFileError.

In `@pyproject.toml`:
- Line 147: The test extras are missing the protobuf runtime so imports like
google.protobuf used by the SINTELA_PROTOBUF__V1 entry point will fail; update
pyproject.toml to add the protobuf package to the .[test]
(project.optional-dependencies test) list so tests install it, e.g. add
"protobuf" to the test extras alongside other test deps, ensuring the
SINTELA_PROTOBUF__V1 entry point can import google.protobuf during test runs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9c1c2f0e-2b7f-4f6e-be63-f2bcae0ded19

📥 Commits

Reviewing files that changed from the base of the PR and between da295d3 and 44bc09c.

📒 Files selected for processing (8)
  • dascore/data_registry.txt
  • dascore/io/sintela_protobuf/__init__.py
  • dascore/io/sintela_protobuf/core.py
  • dascore/io/sintela_protobuf/utils.py
  • pyproject.toml
  • tests/test_io/test_common_io.py
  • tests/test_io/test_remote_memory.py
  • tests/test_io/test_sintela_protobuf/test_sintela_protobuf.py

Comment thread dascore/io/sintela_protobuf/utils.py Outdated
Comment thread dascore/io/sintela_protobuf/utils.py Outdated
Comment thread pyproject.toml
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (906e90b) to head (3c5b381).

Additional details and impacted files
@@            Coverage Diff             @@
##               dev      #656    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files          140       143     +3     
  Lines        12890     13420   +530     
==========================================
+ Hits         12890     13420   +530     
Flag Coverage Δ
unittests 100.00% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/test_io/_common_io_test_utils.py (1)

82-85: Consider documenting the broad BaseException catch.

Catching BaseException is intentionally broad to intercept any timeout-related exception, including those from pytest-timeout. While the logic correctly re-raises non-timeout exceptions, a brief inline comment explaining why BaseException (rather than Exception) is caught would help future maintainers understand this is deliberate.

💡 Suggested comment
         try:
             yield
-        except BaseException as exc:
+        except BaseException as exc:  # Broad catch to handle pytest-timeout's exceptions
             if not _is_timeout_error(exc):
                 raise
             pytest.skip(str(exc))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_io/_common_io_test_utils.py` around lines 82 - 85, Add a brief
inline comment above the "except BaseException as exc:" block in
tests/test_io/_common_io_test_utils.py explaining that BaseException is
intentionally used (not Exception) to catch timeout-related exceptions raised by
pytest-timeout or similar frameworks, while still re-raising non-timeout errors
via the existing _is_timeout_error(exc) check; reference the existing
_is_timeout_error function in the comment so future maintainers understand the
deliberate design and that non-timeouts are re-raised.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dascore/io/sintela_protobuf/utils.py`:
- Around line 606-609: In _get_time_coord_from_samples, validate the sample_rate
before computing step: check that sample_rate is finite (not NaN/Inf), positive,
and non-zero; if it fails, raise InvalidFiberFileError with a clear message;
otherwise compute step = dc.to_timedelta64(1 / sample_rate) and call
get_coord(start=start, stop=start + step * size, step=step) as before. Ensure
you reference the function _get_time_coord_from_samples and helpers
dc.to_timedelta64 and get_coord in the change.
- Around line 557-560: The current presence check for the optional protobuf
INT32 field uses getattr(acquisition, "fiber_id", None) which treats unset
fields as 0; replace that check with acquisition is not None and
acquisition.HasField("fiber_id") and then cast the value (e.g.,
int(acquisition.fiber_id)) to produce the same result or None otherwise. Update
the expression that sets the fiber_id to use acquisition.HasField("fiber_id")
(matching how identification and metadata_recording_time are checked) and ensure
the final value remains int(...) when present or None when absent.

---

Nitpick comments:
In `@tests/test_io/_common_io_test_utils.py`:
- Around line 82-85: Add a brief inline comment above the "except BaseException
as exc:" block in tests/test_io/_common_io_test_utils.py explaining that
BaseException is intentionally used (not Exception) to catch timeout-related
exceptions raised by pytest-timeout or similar frameworks, while still
re-raising non-timeout errors via the existing _is_timeout_error(exc) check;
reference the existing _is_timeout_error function in the comment so future
maintainers understand the deliberate design and that non-timeouts are
re-raised.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1e73a58b-ec72-402b-8f72-4d55159187fd

📥 Commits

Reviewing files that changed from the base of the PR and between 44bc09c and c6ea016.

📒 Files selected for processing (8)
  • dascore/io/sintela_protobuf/core.py
  • dascore/io/sintela_protobuf/utils.py
  • pyproject.toml
  • tests/test_io/_common_io_test_utils.py
  • tests/test_io/test_remote_common_io.py
  • tests/test_io/test_remote_http.py
  • tests/test_io/test_remote_memory.py
  • tests/test_io/test_sintela_protobuf/test_sintela_protobuf.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • pyproject.toml
  • tests/test_io/test_remote_memory.py
  • dascore/io/sintela_protobuf/core.py
  • tests/test_io/test_sintela_protobuf/test_sintela_protobuf.py

Comment thread dascore/io/sintela_protobuf/utils.py Outdated
Comment thread dascore/io/sintela_protobuf/utils.py
@coderabbitai coderabbitai Bot added the patch related to Patch class label Apr 9, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
tests/test_io/_common_io_test_utils.py (1)

82-87: Optional: extract duplicated timeout-handling block into a helper.

The two except BaseException blocks are identical; a small helper (e.g., _skip_if_timeout_else_raise(exc)) would reduce drift risk.

Also applies to: 101-104

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_io/_common_io_test_utils.py` around lines 82 - 87, There is
duplicated exception handling logic for timeout errors in two `except
BaseException` blocks around the timeout handling code. Extract this logic into
a helper function named like `_skip_if_timeout_else_raise(exc)` that takes the
caught exception object, checks if it is a timeout error using
`_is_timeout_error`, raises it if not, or calls `pytest.skip` with the exception
string if it is. Replace both original except blocks in the test utils
(including the one at 101-104) to call this new helper, reducing code
duplication and drift risk.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dascore/io/sintela_protobuf/utils.py`:
- Around line 907-919: The code uses bin_res as the step to build the frequency
axis (frequency = get_coord(...)) without validating it's a finite, positive
value; add the same validation you used for sample_rate by checking bin_res
(from _assert_float_equal / variable bin_res) is a finite number and > 0 before
calling get_coord, and raise InvalidFiberFileError with a clear message if the
check fails so malformed FFT headers fail predictably instead of inside
get_coord.
- Around line 613-621: The _get_distance_coord function must validate its
spacing and step inputs (spacing must be finite and > 0; step must be a positive
integer) before calling get_coord; if validation fails, raise
InvalidFiberFileError with a clear message including the offending spacing/step
values so format-specific errors surface (this affects callers building
timeseries/band/FFT coords). Implement checks at the top of _get_distance_coord
for spacing (use math.isfinite and >0) and for step (ensure int-like and >0),
and raise InvalidFiberFileError with context (e.g., "invalid channel_spacing X"
/ "invalid channel_step Y") instead of letting get_coord raise.
- Around line 866-876: The current code uses band_def[0] to set
attrs.extra['data_type'] and 'data_units', which mislabels packets that contain
multiple band semantics; instead inspect all entries in band_def (e.g., iterate
band_def and convert each entry's type with _BAND_DATA_TYPE_MAP) and derive a
consolidated value: if all bands map to the same data_type/data_units keep that
single value, otherwise set a clear "mixed" or aggregated value (e.g., a
tuple/list of types or "mixed" and empty units) before passing into _base_attrs
so attrs.data_type and attrs.data_units accurately reflect mixed-band packets.

In `@tests/test_io/test_sintela_protobuf/test_sintela_protobuf.py`:
- Around line 766-769: The test constructs protobuf payloads
(_build_meta_payload and _build_ts_payloads) before the monkeypatch that
simulates missing protobuf, causing failures during file construction; change
the test_missing_protobuf_only_affects_scan_and_read to use a checked-in .pb
fixture (or copy a prebuilt ts.pb blob into path) instead of calling
_build_meta_payload/_build_ts_payloads before patching, or alternatively move
the calls to _build_meta_payload/_build_ts_payloads to after the monkeypatch is
applied; ensure the test then asserts get_format() still works while
scan()/read() raise under the monkeypatched missing-protobuf environment and
keep references to _write_records, get_format, scan, and read to locate the
changes.

---

Nitpick comments:
In `@tests/test_io/_common_io_test_utils.py`:
- Around line 82-87: There is duplicated exception handling logic for timeout
errors in two `except BaseException` blocks around the timeout handling code.
Extract this logic into a helper function named like
`_skip_if_timeout_else_raise(exc)` that takes the caught exception object,
checks if it is a timeout error using `_is_timeout_error`, raises it if not, or
calls `pytest.skip` with the exception string if it is. Replace both original
except blocks in the test utils (including the one at 101-104) to call this new
helper, reducing code duplication and drift risk.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0a302fc7-fcc2-4e1d-978a-bf9eb5678b00

📥 Commits

Reviewing files that changed from the base of the PR and between c6ea016 and 0fb6d82.

📒 Files selected for processing (3)
  • dascore/io/sintela_protobuf/utils.py
  • tests/test_io/_common_io_test_utils.py
  • tests/test_io/test_sintela_protobuf/test_sintela_protobuf.py

Comment thread dascore/io/sintela_protobuf/utils.py
Comment thread dascore/io/sintela_protobuf/utils.py Outdated
Comment thread dascore/io/sintela_protobuf/utils.py
Comment thread tests/test_io/test_sintela_protobuf/test_sintela_protobuf.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

IO Work for reading/writing different formats patch related to Patch class

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant