add support for sintela protobuf format#656
Conversation
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughAdded a Sintela Protobuf IO v1 (MTLV envelope + protobuf parsing), registered it as a FiberIO entry point, added test data, and introduced comprehensive tests plus related test-fixture and timeout adjustments. Changes
Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 9 minutes and 2 seconds.Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 44bc09c41d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| from google.protobuf import descriptor_pb2, descriptor_pool, message_factory | ||
| from google.protobuf import timestamp_pb2 |
There was a problem hiding this comment.
Guard Sintela protobuf tests behind optional dependency
The new Sintela protobuf test helpers import google.protobuf directly, but this dependency is not guaranteed in the project’s test environments (for example, the min-deps workflow installs .[test] and then runs the full tests suite). In environments without protobuf, invoking these helpers raises ModuleNotFoundError and fails the entire test run instead of skipping optional-format coverage, so this change can break CI for unrelated work.
Useful? React with 👍 / 👎.
| int(getattr(acquisition, "fiber_id")) | ||
| if acquisition is not None and getattr(acquisition, "fiber_id", None) is not None |
There was a problem hiding this comment.
Preserve missing fiber_id instead of coercing to zero
This presence check treats any existing acquisition_stats message as having fiber_id, because unset protobuf scalar fields still read back as 0 rather than None. As a result, files where acquisition_stats is present but fiber_id is unset will be reported with fiber_id=0, silently corrupting metadata in scan/read outputs.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dascore/io/sintela_protobuf/utils.py`:
- Around line 93-133: The parser currently buffers all records because
_iter_envelope_records returns a list; change _iter_envelope_records to be a
generator (yield EnvelopeRecord) that reads and yields each EnvelopeRecord as it
is parsed (preserve the same header/size/error handling and EnvelopeRecord
construction), update its return/type hint accordingly, and modify
get_supported_family_tag to iterate over the generator (not collect into a list)
so it can return the first supported tag immediately (ensure it still skips
META_TAG and checks TS_TAGS|BAND_TAGS|FFT_TAGS).
- Around line 454-459: The ParseFromString call in _parse_meta currently runs
inside suppress_warnings() but can raise google.protobuf.message.DecodeError;
catch that exception and re-raise it as InvalidFiberFileError with a descriptive
message. Locate _parse_meta(), wrap msg.ParseFromString(payload) in a try/except
that catches DecodeError (from google.protobuf.message) and raises
InvalidFiberFileError (preserving or including the original exception message).
Apply the same pattern to the other ParseFromString call referenced in this
module so both protobuf parses consistently convert DecodeError into
InvalidFiberFileError.
In `@pyproject.toml`:
- Line 147: The test extras are missing the protobuf runtime so imports like
google.protobuf used by the SINTELA_PROTOBUF__V1 entry point will fail; update
pyproject.toml to add the protobuf package to the .[test]
(project.optional-dependencies test) list so tests install it, e.g. add
"protobuf" to the test extras alongside other test deps, ensuring the
SINTELA_PROTOBUF__V1 entry point can import google.protobuf during test runs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 9c1c2f0e-2b7f-4f6e-be63-f2bcae0ded19
📒 Files selected for processing (8)
dascore/data_registry.txtdascore/io/sintela_protobuf/__init__.pydascore/io/sintela_protobuf/core.pydascore/io/sintela_protobuf/utils.pypyproject.tomltests/test_io/test_common_io.pytests/test_io/test_remote_memory.pytests/test_io/test_sintela_protobuf/test_sintela_protobuf.py
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## dev #656 +/- ##
==========================================
Coverage 100.00% 100.00%
==========================================
Files 140 143 +3
Lines 12890 13420 +530
==========================================
+ Hits 12890 13420 +530
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/test_io/_common_io_test_utils.py (1)
82-85: Consider documenting the broadBaseExceptioncatch.Catching
BaseExceptionis intentionally broad to intercept any timeout-related exception, including those from pytest-timeout. While the logic correctly re-raises non-timeout exceptions, a brief inline comment explaining whyBaseException(rather thanException) is caught would help future maintainers understand this is deliberate.💡 Suggested comment
try: yield - except BaseException as exc: + except BaseException as exc: # Broad catch to handle pytest-timeout's exceptions if not _is_timeout_error(exc): raise pytest.skip(str(exc))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_io/_common_io_test_utils.py` around lines 82 - 85, Add a brief inline comment above the "except BaseException as exc:" block in tests/test_io/_common_io_test_utils.py explaining that BaseException is intentionally used (not Exception) to catch timeout-related exceptions raised by pytest-timeout or similar frameworks, while still re-raising non-timeout errors via the existing _is_timeout_error(exc) check; reference the existing _is_timeout_error function in the comment so future maintainers understand the deliberate design and that non-timeouts are re-raised.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dascore/io/sintela_protobuf/utils.py`:
- Around line 606-609: In _get_time_coord_from_samples, validate the sample_rate
before computing step: check that sample_rate is finite (not NaN/Inf), positive,
and non-zero; if it fails, raise InvalidFiberFileError with a clear message;
otherwise compute step = dc.to_timedelta64(1 / sample_rate) and call
get_coord(start=start, stop=start + step * size, step=step) as before. Ensure
you reference the function _get_time_coord_from_samples and helpers
dc.to_timedelta64 and get_coord in the change.
- Around line 557-560: The current presence check for the optional protobuf
INT32 field uses getattr(acquisition, "fiber_id", None) which treats unset
fields as 0; replace that check with acquisition is not None and
acquisition.HasField("fiber_id") and then cast the value (e.g.,
int(acquisition.fiber_id)) to produce the same result or None otherwise. Update
the expression that sets the fiber_id to use acquisition.HasField("fiber_id")
(matching how identification and metadata_recording_time are checked) and ensure
the final value remains int(...) when present or None when absent.
---
Nitpick comments:
In `@tests/test_io/_common_io_test_utils.py`:
- Around line 82-85: Add a brief inline comment above the "except BaseException
as exc:" block in tests/test_io/_common_io_test_utils.py explaining that
BaseException is intentionally used (not Exception) to catch timeout-related
exceptions raised by pytest-timeout or similar frameworks, while still
re-raising non-timeout errors via the existing _is_timeout_error(exc) check;
reference the existing _is_timeout_error function in the comment so future
maintainers understand the deliberate design and that non-timeouts are
re-raised.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1e73a58b-ec72-402b-8f72-4d55159187fd
📒 Files selected for processing (8)
dascore/io/sintela_protobuf/core.pydascore/io/sintela_protobuf/utils.pypyproject.tomltests/test_io/_common_io_test_utils.pytests/test_io/test_remote_common_io.pytests/test_io/test_remote_http.pytests/test_io/test_remote_memory.pytests/test_io/test_sintela_protobuf/test_sintela_protobuf.py
🚧 Files skipped from review as they are similar to previous changes (4)
- pyproject.toml
- tests/test_io/test_remote_memory.py
- dascore/io/sintela_protobuf/core.py
- tests/test_io/test_sintela_protobuf/test_sintela_protobuf.py
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
tests/test_io/_common_io_test_utils.py (1)
82-87: Optional: extract duplicated timeout-handling block into a helper.The two
except BaseExceptionblocks are identical; a small helper (e.g.,_skip_if_timeout_else_raise(exc)) would reduce drift risk.Also applies to: 101-104
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_io/_common_io_test_utils.py` around lines 82 - 87, There is duplicated exception handling logic for timeout errors in two `except BaseException` blocks around the timeout handling code. Extract this logic into a helper function named like `_skip_if_timeout_else_raise(exc)` that takes the caught exception object, checks if it is a timeout error using `_is_timeout_error`, raises it if not, or calls `pytest.skip` with the exception string if it is. Replace both original except blocks in the test utils (including the one at 101-104) to call this new helper, reducing code duplication and drift risk.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dascore/io/sintela_protobuf/utils.py`:
- Around line 907-919: The code uses bin_res as the step to build the frequency
axis (frequency = get_coord(...)) without validating it's a finite, positive
value; add the same validation you used for sample_rate by checking bin_res
(from _assert_float_equal / variable bin_res) is a finite number and > 0 before
calling get_coord, and raise InvalidFiberFileError with a clear message if the
check fails so malformed FFT headers fail predictably instead of inside
get_coord.
- Around line 613-621: The _get_distance_coord function must validate its
spacing and step inputs (spacing must be finite and > 0; step must be a positive
integer) before calling get_coord; if validation fails, raise
InvalidFiberFileError with a clear message including the offending spacing/step
values so format-specific errors surface (this affects callers building
timeseries/band/FFT coords). Implement checks at the top of _get_distance_coord
for spacing (use math.isfinite and >0) and for step (ensure int-like and >0),
and raise InvalidFiberFileError with context (e.g., "invalid channel_spacing X"
/ "invalid channel_step Y") instead of letting get_coord raise.
- Around line 866-876: The current code uses band_def[0] to set
attrs.extra['data_type'] and 'data_units', which mislabels packets that contain
multiple band semantics; instead inspect all entries in band_def (e.g., iterate
band_def and convert each entry's type with _BAND_DATA_TYPE_MAP) and derive a
consolidated value: if all bands map to the same data_type/data_units keep that
single value, otherwise set a clear "mixed" or aggregated value (e.g., a
tuple/list of types or "mixed" and empty units) before passing into _base_attrs
so attrs.data_type and attrs.data_units accurately reflect mixed-band packets.
In `@tests/test_io/test_sintela_protobuf/test_sintela_protobuf.py`:
- Around line 766-769: The test constructs protobuf payloads
(_build_meta_payload and _build_ts_payloads) before the monkeypatch that
simulates missing protobuf, causing failures during file construction; change
the test_missing_protobuf_only_affects_scan_and_read to use a checked-in .pb
fixture (or copy a prebuilt ts.pb blob into path) instead of calling
_build_meta_payload/_build_ts_payloads before patching, or alternatively move
the calls to _build_meta_payload/_build_ts_payloads to after the monkeypatch is
applied; ensure the test then asserts get_format() still works while
scan()/read() raise under the monkeypatched missing-protobuf environment and
keep references to _write_records, get_format, scan, and read to locate the
changes.
---
Nitpick comments:
In `@tests/test_io/_common_io_test_utils.py`:
- Around line 82-87: There is duplicated exception handling logic for timeout
errors in two `except BaseException` blocks around the timeout handling code.
Extract this logic into a helper function named like
`_skip_if_timeout_else_raise(exc)` that takes the caught exception object,
checks if it is a timeout error using `_is_timeout_error`, raises it if not, or
calls `pytest.skip` with the exception string if it is. Replace both original
except blocks in the test utils (including the one at 101-104) to call this new
helper, reducing code duplication and drift risk.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0a302fc7-fcc2-4e1d-978a-bf9eb5678b00
📒 Files selected for processing (3)
dascore/io/sintela_protobuf/utils.pytests/test_io/_common_io_test_utils.pytests/test_io/test_sintela_protobuf/test_sintela_protobuf.py
Description
This PR adds read and scan support for Sintela's MTLV-wrapped protobuf
.pbrecordings. It supports format detection without importing protobuf, promotes selected recorder metadata, and decodes supported timeseries, band, and FFT packet families into DASCore patches with validated coordinates.No linked issue.
Validation
Local validation on the rebased branch:
pre-commit run --all-filespytest tests/test_io/test_sintela_protobuf/test_sintela_protobuf.py -qpytest tests/test_io/test_common_io.py tests/test_io/test_io_core.py -qpytest tests --cov dascore --cov-report term-missing --cov-fail-under=100CI status before the latest review-fix commit was green for lint, full test matrix, min-deps matrix, and Codecov; benchmark/docs jobs were skipped by workflow policy.
Checklist
I have (if applicable):
Summary by CodeRabbit
New Features
Chores
Tests