Skip to content

Fix NCU CSV NaN parser crash on empty unit cells#134

Open
jiannanWang wants to merge 1 commit intomainfrom
jiannanWang/fix-ncu-csv-nan-parser
Open

Fix NCU CSV NaN parser crash on empty unit cells#134
jiannanWang wants to merge 1 commit intomainfrom
jiannanWang/fix-ncu-csv-nan-parser

Conversation

@jiannanWang
Copy link
Copy Markdown
Contributor

The NCU cvs report contains a header row including
"ID","Process ID","Process Name",…,"Kernel Name",…,"Block Size","Grid Size",…,"sm__cycles_active.avg",…
and a unit row
"","","","","","","","","","",…,"cycle","%","block","block",…

The units row is the one we're parsing. It contains a unit string for every metric column (cycle, %, block, register, …) but empty strings for the identifier columns (ID, Kernel Name, Block Size, Grid Size) — because IDs and names don't have units.

When pandas parses those "" cells, it loads them as numpy.float64(nan). Later tok in nan raises TypeError: argument of type 'float' is not iterable.

This PR fix the typeerror by making the detection NaN-safe.

The unit-row detection in ``load_ncu_metrics`` reads the first CSV row,
lowercases each cell, and checks whether any cell contains a
unit-marker substring. ``Series.str.lower()`` propagates ``pd.NA`` /
``NaN`` cells through as float NaN, and the subsequent
``any(tok in x for tok in unit_tokens)`` then raises::

    TypeError: argument of type 'float' is not iterable

This surfaces on real NCU outputs (e.g. profiles from the
``gemma3_swiglu`` benchmark in SOL-ExecBench) and aborts the entire
profiling step, which then propagates up to the worker as
``Profiling failed`` and the round produces 0 successful kernels.

Fix: chain ``.fillna("")`` after ``.str.lower()`` so NaN cells become
empty strings before the substring check. The matching logic itself is
unchanged — empty strings legitimately don't contain any of the unit
tokens.

Test plan:
- ``pytest tests/`` (existing suite passes)
- Reproduces no longer: running the optimizer against
  ``ka-review-gate-runs/gemma3_swiglu`` previously hit the TypeError on
  every worker's first NCU profile; with the patch the round produces
  successful kernels.
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant