Fix rasx and bruker raw parser error by jcwang587 · Pull Request #40 · jcwang587/geddes

jcwang587 · 2026-04-23T04:06:23Z

Improved the validation of x values to ensure they are strictly increasing and do not contain NaN values in a single iteration.
This change enhances performance and readability by consolidating the checks into one loop.

Summary by CodeRabbit

Bug Fixes
- Stricter axis validation to reject NaN or non-strictly-increasing sequences.
- Fix to ensure the first axis value is not missed when files include a UTF-8 BOM.
New Features
- Broader, more robust detection of Bruker RAW layout variants with improved selection favoring marker-derived metadata.
Tests
- Tightened tests enforcing exact axis point counts and the first RASX x-value.
Chores
- Package version bumped to 0.4.0 and dependency version updates.

- Improved the validation of x values to ensure they are strictly increasing and do not contain NaN values in a single iteration. - This change enhances performance and readability by consolidating the checks into one loop.

coderabbitai · 2026-04-23T04:06:33Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Single-pass strict-increase validation for Pattern::validate; parse_rasx strips an optional UTF‑8 BOM; parse_bruker_raw expands and deduplicates layout candidates and prefers start/step from a count marker; tests tightened to assert exact axis lengths and first RASX x-value.

Changes

Cohort / File(s)	Summary
Pattern validation `src/lib.rs`	Rewrote `Pattern::validate` to a single-pass iterator that errors on a `NaN` first element and enforces strict `prev < v` for consecutive x values via a unified error path.
Parser: Bruker & RASX `src/parser.rs`	`parse_rasx` trims an optional UTF‑8 BOM from the first trimmed line. `parse_bruker_raw` now enumerates and deduplicates more candidate layouts (interleaved tails + additional interleaved layouts found via count‑marker scanning), and selection prefers start/step derived from an adjacent count marker; `find_bruker_start_step` now returns `(start, step, from_count_marker)` and downstream logic updated.
Tests (Python & Rust) `tests/test_python.py`, `tests/test_rust.rs`	Tightened assertions: RASX reads must produce `pattern.x.len() == 2334` and first x ≈ `10.0`; Bruker DIFFRAC.EVA RAW must produce `pattern.x.len() == 7134`, in addition to existing checks.
Manifest `Cargo.toml`	Bumped package version `0.3.0 -> 0.4.0` and tightened several dependency versions (`zip`, `thiserror`, `serde`, `serde_json`, `pyo3`) while preserving feature flags.

Sequence Diagram(s)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Improved Bruker RAW parsing #26: Overlapping changes to Bruker RAW layout discovery and candidate selection in src/parser.rs.
Added support for Bruker RAW binary file parsing #14: Prior Bruker RAW parsing and start/step discovery logic that this PR evolves.
Refactor Pattern creation and validation logic #36: Related changes enforcing strict-increasing behavior in Pattern::validate.

Poem

🐰
I hopped through bytes and checked each line,
One pass to make each x-value fine.
I sniffed for markers, chose the start,
Stepped in order, did my part.
Parsers munch carrots — code feels fine 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The title mentions fixing 'rasx and bruker raw parser' issues, but the PR objectives emphasize refactoring validation logic in Pattern to a single pass for x values. The title is partially related but does not capture the main change (validation refactoring).	Revise the title to better reflect the primary change: e.g., 'Refactor Pattern validation to single-pass iteration' or 'Consolidate x-value validation checks into single iteration'.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 81.25% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch dev

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

- Refactored the parsing logic in `parse_bruker_raw` to handle multiple candidate layouts and eliminate duplicates using a HashSet. - Updated the `find_bruker_start_step` function to return a boolean indicating the presence of a count marker. - Added assertions in Python and Rust tests to verify the length of the parsed pattern's x values, ensuring they match expected results.

coderabbitai

🧹 Nitpick comments (2)

tests/test_python.py (1)
110-110: Hard-coded point count ties the test to a specific fixture.

The assertion len(pattern.x) == 7134 ensures deterministic parsing output, which is valuable for catching regressions in the layout selection logic. However, if the fixture file ever changes or if a parsing improvement yields a more accurate count, this test will fail.

Consider adding a comment explaining where this expected value comes from (e.g., verified against known instrument output or reference software) to aid future maintainers.
📝 Suggested documentation
     assert len(pattern.x) == len(pattern.y)
+    # Expected point count verified against Bruker DIFFRAC.EVA export metadata
     assert len(pattern.x) == 7134
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_python.py` at line 110, The test assertion hard-codes the expected
point count (assert len(pattern.x) == 7134) which can break if the fixture or
parser changes; update the test by keeping the deterministic assertion but add
an inline comment immediately above the assert referencing the source of the
expected value (e.g., "7134 = point count from instrument X output / verified
against reference file Y or tool Z on DATE") so future maintainers know why 7134
is expected and how to re-verify it; if appropriate, mention the fixture
filename or verification method alongside the pattern.x reference so it's easy
to trace.
tests/test_rust.rs (1)
178-178: Consistent with Python test - same suggestion applies.

The hard-coded expected count of 7134 matches the Python test, which is good for cross-language consistency. The same suggestion to document the source of this expected value applies here.
📝 Suggested documentation
     assert_eq!(pattern.x.len(), pattern.y.len());
+    // Expected point count verified against Bruker DIFFRAC.EVA export metadata
     assert_eq!(pattern.x.len(), 7134);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_rust.rs` at line 178, The assertion uses a magic number 7134
(assert_eq!(pattern.x.len(), 7134)); make the expectation explicit by replacing
the literal with a named constant or documented variable (e.g.,
EXPECTED_PATTERN_POINT_COUNT) and add a short comment or docstring next to the
test referencing the source of 7134 (how it was computed or which
dataset/version it comes from); update the test assertion to use that constant
and ensure the name and comment live near the test that references
pattern.x.len() so future readers know the provenance of the expected count.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/test_python.py`:
- Line 110: The test assertion hard-codes the expected point count (assert
len(pattern.x) == 7134) which can break if the fixture or parser changes; update
the test by keeping the deterministic assertion but add an inline comment
immediately above the assert referencing the source of the expected value (e.g.,
"7134 = point count from instrument X output / verified against reference file Y
or tool Z on DATE") so future maintainers know why 7134 is expected and how to
re-verify it; if appropriate, mention the fixture filename or verification
method alongside the pattern.x reference so it's easy to trace.

In `@tests/test_rust.rs`:
- Line 178: The assertion uses a magic number 7134 (assert_eq!(pattern.x.len(),
7134)); make the expectation explicit by replacing the literal with a named
constant or documented variable (e.g., EXPECTED_PATTERN_POINT_COUNT) and add a
short comment or docstring next to the test referencing the source of 7134 (how
it was computed or which dataset/version it comes from); update the test
assertion to use that constant and ensure the name and comment live near the
test that references pattern.x.len() so future readers know the provenance of
the expected count.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 82ce39d1-02d0-47a8-8814-f37f8b28b75f

📥 Commits

Reviewing files that changed from the base of the PR and between 2b6cfc2 and 477caad.

📒 Files selected for processing (3)

src/parser.rs
tests/test_python.py
tests/test_rust.rs

- Updated the RASX parser to strip UTF-8 BOM characters from the beginning of lines to prevent silent skips in parsing. - Added assertions in Python and Rust tests to verify the length of the parsed pattern's x values and check the first x value against an expected result.

- Updated the version of the `geddes` package to 0.4.0 in `Cargo.toml`. - Upgraded dependencies: `zip` to 8.5.1, `thiserror` to 2.0.18, `serde` to 1.0.228, `serde_json` to 1.0.149, and `pyo3` to 0.28.3.

- Included comments in both Python and Rust test cases to specify that the expected point count is verified against Bruker DIFFRAC.EVA export metadata, enhancing test clarity and documentation.

- Bumped the version of `geddes` and `geddes-node` to `0.4.0` in `Cargo.lock`, `Cargo.toml`, `package.json`, and `package-lock.json`. - Updated version checks in `node/index.js` to reflect the new version.

for more information, see https://pre-commit.ci

- Changed the version of `geddes` and `geddes-node` to `0.4.0-alpha.0` in `Cargo.lock`, `Cargo.toml`, `package.json`, and `package-lock.json`. - Updated version checks in `node/index.js` to reflect the new alpha version.

for more information, see https://pre-commit.ci

Refactor validation logic in Pattern to a single pass

2b6cfc2

- Improved the validation of x values to ensure they are strictly increasing and do not contain NaN values in a single iteration. - This change enhances performance and readability by consolidating the checks into one loop.

coderabbitai Bot reviewed Apr 23, 2026

View reviewed changes

jcwang587 added 3 commits April 23, 2026 01:27

Update dependencies and bump version to 0.4.0

cfb3523

- Updated the version of the `geddes` package to 0.4.0 in `Cargo.toml`. - Upgraded dependencies: `zip` to 8.5.1, `thiserror` to 2.0.18, `serde` to 1.0.228, `serde_json` to 1.0.149, and `pyo3` to 0.28.3.

Add comments to clarify expected point count in Bruker RAW tests

be6ffde

- Included comments in both Python and Rust test cases to specify that the expected point count is verified against Bruker DIFFRAC.EVA export metadata, enhancing test clarity and documentation.

jcwang587 changed the title ~~Refactor validation logic in Pattern to a single pass~~ Fix rasx and bruker raw io error Apr 23, 2026

jcwang587 changed the title ~~Fix rasx and bruker raw io error~~ Fix rasx and bruker raw parser error Apr 23, 2026

jcwang587 and others added 4 commits April 23, 2026 01:36

Update version to 0.4.0 across all relevant files

c73fae4

- Bumped the version of `geddes` and `geddes-node` to `0.4.0` in `Cargo.lock`, `Cargo.toml`, `package.json`, and `package-lock.json`. - Updated version checks in `node/index.js` to reflect the new version.

[pre-commit.ci] auto fixes from pre-commit.com hooks

cfcb38f

for more information, see https://pre-commit.ci

Update version to 0.4.0-alpha.0 across all relevant files

6a2ad2a

- Changed the version of `geddes` and `geddes-node` to `0.4.0-alpha.0` in `Cargo.lock`, `Cargo.toml`, `package.json`, and `package-lock.json`. - Updated version checks in `node/index.js` to reflect the new alpha version.

[pre-commit.ci] auto fixes from pre-commit.com hooks

4394380

for more information, see https://pre-commit.ci

jcwang587 merged commit 319fb44 into main Apr 24, 2026
3 checks passed

jcwang587 deleted the dev branch April 24, 2026 01:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix rasx and bruker raw parser error#40

Fix rasx and bruker raw parser error#40
jcwang587 merged 9 commits into
mainfrom
dev

jcwang587 commented Apr 23, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jcwang587 commented Apr 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jcwang587 commented Apr 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading