Skip to content

Fix rasx and bruker raw parser error#40

Merged
jcwang587 merged 9 commits into
mainfrom
dev
Apr 24, 2026
Merged

Fix rasx and bruker raw parser error#40
jcwang587 merged 9 commits into
mainfrom
dev

Conversation

@jcwang587
Copy link
Copy Markdown
Owner

@jcwang587 jcwang587 commented Apr 23, 2026

  • Improved the validation of x values to ensure they are strictly increasing and do not contain NaN values in a single iteration.
  • This change enhances performance and readability by consolidating the checks into one loop.

Summary by CodeRabbit

  • Bug Fixes

    • Stricter axis validation to reject NaN or non-strictly-increasing sequences.
    • Fix to ensure the first axis value is not missed when files include a UTF-8 BOM.
  • New Features

    • Broader, more robust detection of Bruker RAW layout variants with improved selection favoring marker-derived metadata.
  • Tests

    • Tightened tests enforcing exact axis point counts and the first RASX x-value.
  • Chores

    • Package version bumped to 0.4.0 and dependency version updates.

- Improved the validation of x values to ensure they are strictly increasing and do not contain NaN values in a single iteration.
- This change enhances performance and readability by consolidating the checks into one loop.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 23, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Single-pass strict-increase validation for Pattern::validate; parse_rasx strips an optional UTF‑8 BOM; parse_bruker_raw expands and deduplicates layout candidates and prefers start/step from a count marker; tests tightened to assert exact axis lengths and first RASX x-value.

Changes

Cohort / File(s) Summary
Pattern validation
src/lib.rs
Rewrote Pattern::validate to a single-pass iterator that errors on a NaN first element and enforces strict prev < v for consecutive x values via a unified error path.
Parser: Bruker & RASX
src/parser.rs
parse_rasx trims an optional UTF‑8 BOM from the first trimmed line. parse_bruker_raw now enumerates and deduplicates more candidate layouts (interleaved tails + additional interleaved layouts found via count‑marker scanning), and selection prefers start/step derived from an adjacent count marker; find_bruker_start_step now returns (start, step, from_count_marker) and downstream logic updated.
Tests (Python & Rust)
tests/test_python.py, tests/test_rust.rs
Tightened assertions: RASX reads must produce pattern.x.len() == 2334 and first x ≈ 10.0; Bruker DIFFRAC.EVA RAW must produce pattern.x.len() == 7134, in addition to existing checks.
Manifest
Cargo.toml
Bumped package version 0.3.0 -> 0.4.0 and tightened several dependency versions (zip, thiserror, serde, serde_json, pyo3) while preserving feature flags.

Sequence Diagram(s)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰
I hopped through bytes and checked each line,
One pass to make each x-value fine.
I sniffed for markers, chose the start,
Stepped in order, did my part.
Parsers munch carrots — code feels fine 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title mentions fixing 'rasx and bruker raw parser' issues, but the PR objectives emphasize refactoring validation logic in Pattern to a single pass for x values. The title is partially related but does not capture the main change (validation refactoring). Revise the title to better reflect the primary change: e.g., 'Refactor Pattern validation to single-pass iteration' or 'Consolidate x-value validation checks into single iteration'.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 81.25% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev

Comment @coderabbitai help to get the list of available commands and usage tips.

- Refactored the parsing logic in `parse_bruker_raw` to handle multiple candidate layouts and eliminate duplicates using a HashSet.
- Updated the `find_bruker_start_step` function to return a boolean indicating the presence of a count marker.
- Added assertions in Python and Rust tests to verify the length of the parsed pattern's x values, ensuring they match expected results.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
tests/test_python.py (1)

110-110: Hard-coded point count ties the test to a specific fixture.

The assertion len(pattern.x) == 7134 ensures deterministic parsing output, which is valuable for catching regressions in the layout selection logic. However, if the fixture file ever changes or if a parsing improvement yields a more accurate count, this test will fail.

Consider adding a comment explaining where this expected value comes from (e.g., verified against known instrument output or reference software) to aid future maintainers.

📝 Suggested documentation
     assert len(pattern.x) == len(pattern.y)
+    # Expected point count verified against Bruker DIFFRAC.EVA export metadata
     assert len(pattern.x) == 7134
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_python.py` at line 110, The test assertion hard-codes the expected
point count (assert len(pattern.x) == 7134) which can break if the fixture or
parser changes; update the test by keeping the deterministic assertion but add
an inline comment immediately above the assert referencing the source of the
expected value (e.g., "7134 = point count from instrument X output / verified
against reference file Y or tool Z on DATE") so future maintainers know why 7134
is expected and how to re-verify it; if appropriate, mention the fixture
filename or verification method alongside the pattern.x reference so it's easy
to trace.
tests/test_rust.rs (1)

178-178: Consistent with Python test - same suggestion applies.

The hard-coded expected count of 7134 matches the Python test, which is good for cross-language consistency. The same suggestion to document the source of this expected value applies here.

📝 Suggested documentation
     assert_eq!(pattern.x.len(), pattern.y.len());
+    // Expected point count verified against Bruker DIFFRAC.EVA export metadata
     assert_eq!(pattern.x.len(), 7134);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_rust.rs` at line 178, The assertion uses a magic number 7134
(assert_eq!(pattern.x.len(), 7134)); make the expectation explicit by replacing
the literal with a named constant or documented variable (e.g.,
EXPECTED_PATTERN_POINT_COUNT) and add a short comment or docstring next to the
test referencing the source of 7134 (how it was computed or which
dataset/version it comes from); update the test assertion to use that constant
and ensure the name and comment live near the test that references
pattern.x.len() so future readers know the provenance of the expected count.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/test_python.py`:
- Line 110: The test assertion hard-codes the expected point count (assert
len(pattern.x) == 7134) which can break if the fixture or parser changes; update
the test by keeping the deterministic assertion but add an inline comment
immediately above the assert referencing the source of the expected value (e.g.,
"7134 = point count from instrument X output / verified against reference file Y
or tool Z on DATE") so future maintainers know why 7134 is expected and how to
re-verify it; if appropriate, mention the fixture filename or verification
method alongside the pattern.x reference so it's easy to trace.

In `@tests/test_rust.rs`:
- Line 178: The assertion uses a magic number 7134 (assert_eq!(pattern.x.len(),
7134)); make the expectation explicit by replacing the literal with a named
constant or documented variable (e.g., EXPECTED_PATTERN_POINT_COUNT) and add a
short comment or docstring next to the test referencing the source of 7134 (how
it was computed or which dataset/version it comes from); update the test
assertion to use that constant and ensure the name and comment live near the
test that references pattern.x.len() so future readers know the provenance of
the expected count.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 82ce39d1-02d0-47a8-8814-f37f8b28b75f

📥 Commits

Reviewing files that changed from the base of the PR and between 2b6cfc2 and 477caad.

📒 Files selected for processing (3)
  • src/parser.rs
  • tests/test_python.py
  • tests/test_rust.rs

- Updated the RASX parser to strip UTF-8 BOM characters from the beginning of lines to prevent silent skips in parsing.
- Added assertions in Python and Rust tests to verify the length of the parsed pattern's x values and check the first x value against an expected result.
- Updated the version of the `geddes` package to 0.4.0 in `Cargo.toml`.
- Upgraded dependencies: `zip` to 8.5.1, `thiserror` to 2.0.18, `serde` to 1.0.228, `serde_json` to 1.0.149, and `pyo3` to 0.28.3.
- Included comments in both Python and Rust test cases to specify that the expected point count is verified against Bruker DIFFRAC.EVA export metadata, enhancing test clarity and documentation.
@jcwang587 jcwang587 changed the title Refactor validation logic in Pattern to a single pass Fix rasx and bruker raw io error Apr 23, 2026
@jcwang587 jcwang587 changed the title Fix rasx and bruker raw io error Fix rasx and bruker raw parser error Apr 23, 2026
jcwang587 and others added 4 commits April 23, 2026 01:36
- Bumped the version of `geddes` and `geddes-node` to `0.4.0` in `Cargo.lock`, `Cargo.toml`, `package.json`, and `package-lock.json`.
- Updated version checks in `node/index.js` to reflect the new version.
- Changed the version of `geddes` and `geddes-node` to `0.4.0-alpha.0` in `Cargo.lock`, `Cargo.toml`, `package.json`, and `package-lock.json`.
- Updated version checks in `node/index.js` to reflect the new alpha version.
@jcwang587 jcwang587 merged commit 319fb44 into main Apr 24, 2026
3 checks passed
@jcwang587 jcwang587 deleted the dev branch April 24, 2026 01:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant