Skip to content

Parse multi cuvette data#9

Open
danolson1 wants to merge 8 commits intodd-hebert:mainfrom
danolson1:parse-multi-cuvette-data
Open

Parse multi cuvette data#9
danolson1 wants to merge 8 commits intodd-hebert:mainfrom
danolson1:parse-multi-cuvette-data

Conversation

@danolson1
Copy link
Copy Markdown

I made some edits to the import_kd.py file to enable it to parse data about the cuvette ID (called SAMPLES_CELL_1, 2, 3, etc. in the .KD file). I also included an example .KD file with 3 cuvettes.

I tried to be as conservative as possible to avoid disrupting any downstream components. I added samples_cell as a property of the KDFile object and didn't change the exported spectra dataframe, although in the future it might be useful to include the samples_cell info to that dataframe.

This is a .KD file with data for 3 different cuvettes
Added samples_cell_header to identify SAMPLES_CELL_x text for decoding data from multi-cuvette samples.
Added handle_samples and parse_samples functions
Initial implementation of samples_cell. Before debugging.
I tested it on both multi-cuvette and single-cuvette files and I don't get any errors.

Summary

  Added multi-cuvette support by creating a new samples_cell attribute:

  1. Added samples_cell_header class attribute (line 58-61):
    - Header: RegName in UTF-16-LE
    - Spacing: 18 bytes from header to first cell name
  2. Added _handle_samples_cell method (lines 162-183):
    - Finds the RegName header once
    - Iterates through sequential 30-byte entries (2-byte prefix + 28-byte cell name)
    - Stops when it encounters a non-SAMPLES_CELL string
    - Returns a pd.Series with cell identifiers
  3. Added _parse_samples_cell method (lines 217-223):
    - Reads a fixed 28-byte UTF-16-LE encoded cell name
    - Returns None on decode errors
  4. Updated parse_kd to return samples_cell and updated the __init__ assignment

  Results:
  - Multi-cuvette file: Returns 357 entries with SAMPLES_CELL_1, SAMPLES_CELL_2, SAMPLES_CELL_3 (119 each)
  - Single-cuvette files: Returns all entries as SAMPLES_CELL_1
@dd-hebert dd-hebert self-requested a review December 11, 2025 02:08
@dd-hebert dd-hebert self-assigned this Dec 11, 2025
@dd-hebert dd-hebert added the enhancement New feature or request label Dec 11, 2025
I'm adding a file called "1229 PDC PYRUVATE 100MM-8KD" which I renamed to "multi_cuvette_test_data_corrupted.KD." It's an example of a file corruption where the final data point from the previously-saved file gets appended to the start of this file. I think this kind of corruption can be detected and fixed relatively easily.
Setting up tests for fixing a bug with a corrupted .KD file
I fixed the bug by adding validation for time values in the KD file parser. The changes:

  1. Added warnings import (import_kd.py:11) to issue warnings about corrupted files.
  2. Added _validate_and_fix_data() method (import_kd.py:161-246) that:
    - Builds a working DataFrame by transposing the spectra and adding the sample cell column
    - Uses pandas groupby to process each cuvette's data separately
    - Detects non-increasing time values by finding "reset points" where time decreases
    - Marks all preceding timepoints with values >= the reset time as invalid
    - Issues two warnings: one about potential corruption and one about removed timepoints
    - Returns cleaned spectra, spectra_times, and samples_cell with corrupt data removed
  3. Integrated validation into parse_kd() (import_kd.py:132-135) so it runs automatically when parsing any .KD
  file.
  4. Created tests (tests/test_import_kd.py) to verify:
    - Valid files produce no corruption warnings
    - Corrupted files produce warnings and have bad data removed

  The fix correctly identifies and removes the corrupt timepoint (730.3 seconds) from each of the 3 cuvettes in
  the corrupted test file.
@danolson1
Copy link
Copy Markdown
Author

I found that some of my .KD files were corrupted, and made some additional edits to this branch to identify and fix problems associated with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants