Skip to content

MRG: Use mffpy for EGI MFF event reading#13932

Open
PragnyaKhandelwal wants to merge 7 commits into
mne-tools:mainfrom
PragnyaKhandelwal:egi-mff-events
Open

MRG: Use mffpy for EGI MFF event reading#13932
PragnyaKhandelwal wants to merge 7 commits into
mne-tools:mainfrom
PragnyaKhandelwal:egi-mff-events

Conversation

@PragnyaKhandelwal
Copy link
Copy Markdown
Contributor

Reference issue (if any)

None.

What does this implement/fix?

This replaces the internal EGI MFF event-reading path with mffpy and keeps the existing RawMff event contract intact. It also adds a small compatibility shim in mne/fixes.py for mffpy timestamp parsing so real-world MFF files with nanosecond fractional seconds can still be read correctly.

Additional information

I tested this change against the EGI MFF test module with the current testing dataset, and the full mne/io/egi/tests/test_egi.py suite passes locally.

This branch intentionally keeps the scope narrow. The separate changelog fragment should use the final PR number once the PR is opened on GitHub.

AI disclosure:
I used GitHub Copilot to help draft the PR text and review the scope.

@PragnyaKhandelwal
Copy link
Copy Markdown
Contributor Author

Hi @scott-huberty! Here is the first micro-PR for the event reader refactor, exactly as we discussed yesterday.
Quick status update on what this PR includes:
Refactored the EGI MFF event-reading path to use mffpy while keeping the existing RawMff event contract intact.
Removed legacy unused event helper code to address the Vulture dead-code warnings (commit b10ea86).
Added a temporary compatibility shim in mne/fixes.py for 9-digit fractional-second beginTime values (using defusedxml).
Kept the scope intentionally narrow to event-reading internals only.

Upstream Update:
I raised the issue here BEL-Public/mffpy#138 regarding the timestamp bug. The maintainer got back to me immediately and confirmed their pending PR #133 fixes this case! Once we review this PR and figure out the test failures, I will update the TODO comment in our shim to explicitly track their PR

Failing Checks:
The style checks are green now, but we are hitting some functional test failures in the CI. I am going to look into the logs to track down the mismatch, but if you have a minute to look at the failing CI checks and see if it's something obvious that I'm missing, I'd really appreciate the help!

@PragnyaKhandelwal PragnyaKhandelwal marked this pull request as ready for review May 29, 2026 13:55
Comment thread mne/io/egi/events.py
# mffpy.Reader for locating the Events.xml files inside the MFF.
_soft_import("mffpy", "reading EGI MFF data")
_soft_import("defusedxml", "reading EGI MFF data")
import defusedxml.ElementTree as DET
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you're using difusedxml instead of mffpy.XML.from_file( )?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for taking a look! There are two reasons for this:

MNE-Python has a strict internal policy requiring defusedxml for all XML parsing to protect against XML vulnerability attacks.

Because of the 9-digit fractional timestamp bug we discussed in issue BEL-Public/mffpy#138 calling mffpy.XML directly crashes the CI right now. I'm using defusedxml to route the parsing through a temporary shim in mne/fixes.py until your PR gets merged and released!

Copy link
Copy Markdown
Contributor

@pmolfese pmolfese May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The struggle is real as we say. This might be something more for @scott-huberty and @drammock to ponder beyond this GSOC.

The MFF reader in mffpy currently used for reading epochs or evoked files is via Reader() is already using the mffpy XML.from_file() at least for getting header information and events (mne/io/egi/egimff.py). So there's already some exposure to another XML reader in the current codebase.

The pros of keeping the mffpy way of things is just how easy some things are like returning typed objects (.sensors, .epochs, .events), recover=True for malformed XML, and the fact that Reader() is already using it.

for evfile in sorted(glob(op.join(input_fname, "Events_*.xml"))):
	track = XML.from_file(evfile)
	for event in track.events:
		code = event.get("code")
		if code is None:
			continue
...
...

I suppose we could have a conversation with BEL and other mffpy maintainers about switching the current lxml backend to defusedxml.

My thesis is currently: "if the point of this project is to use mffpy to parse MFF files, then you should use mffpy's way of doing things and if those need changing, then we should probably change those upstream". Perhaps the compromise could be using the XML.* functions for this with the goal of changing things upstream in mffpy.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - okay faster than expected I have a mostly working defusedxml backend for mffpy. Might be able to crank this out in a week or so.

Copy link
Copy Markdown
Contributor

@scott-huberty scott-huberty May 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drammock / @larsoner : 1) I assume it's ok if one of our dependencies parses XML files using lxml?

and 2) If mffpy devs cut a release after the aforementioned timestamp bug is fixed upstream... Is there any chance that MNE is comfortable with putting a lower pin on mffpy's latest release, so that we do not have to keep our shim around? 😁

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pins are fine for newly introduced dependencies. It's only an issue if we move from no pin to pin, or bump the pin.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it's ok if one of our dependencies parses XML files using lxml?

we use defusedxml in our code because it closes some vulnerabilities present in the standard library. We cannot force dependencies to do likewise, but can suggest/encourage it upstream. IIRC, lxml has similar safeguards to defusedxml, but is a much more heavyweight package (does more stuff). I'd need to look into it to be sure though.

If a required dependency used lxml then I'd actually consider switching us to use it too --- why have 2 libs installed if one will suffice? But mffpy will be an optional dep (right?) so that logic doesn't necessarily apply, at least not as strongly, esp if there are salient differences in security or install size or API convenience.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: sorry for the in Felicity of my last message, I just saw that @pmolfese is already looking into what it would take to switch to defusedxml upstream. Thanks! LMK if I can be of help there.

Comment thread mne/io/egi/events.py Outdated
files_list = []
tracks = []
for xml_name in files_list:
if not xml_name.lower().endswith(".xml"):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will have you parsing all XML files instead of just event ones. Probably unnecessary file IO.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch here! You are completely right—parsing info.xml and the others here is totally unnecessary I/O. I will update this loop to explicitly filter for the event files (e.g., startswith("Events_") or similar) in my next commit. Thank you!

Comment thread mne/io/egi/events.py
if event_start is None:
continue
start_sec = (event_start - start_time).total_seconds()
code_str = ev.get("code", "")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are probably a couple checks you could add:

  1. codes are supposed to be 4 characters (e.g. "STIM") and if not, then you're probably not reading a real MFF file
  2. You probably also want to read in the labels for use if/when you add annotations

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are really great suggestions. Because this is the very first step of my GSoC project, my mentors requested that I keep this initial micro-PR strictly confined to 1:1 functional parity with MNE's legacy reader. The legacy reader didn't enforce the 4-character limit or map the labels, so I want to avoid expanding the scope just yet.
However, adding proper annotations is on my roadmap for Phase 2 here #13926, so I will definitely be referencing those labels when we get there!

@scott-huberty
Copy link
Copy Markdown
Contributor

Thx for taking a look at this @pmolfese !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants