Fix regex pattern for performance log parsing#102
Fix regex pattern for performance log parsing#102Saiamd999 wants to merge 1 commit intoSaiamd999/ROCM-21596from
Conversation
There was a problem hiding this comment.
Pull request overview
Updates MADEngine’s performance log parsing to tolerate additional suffix/punctuation after the numeric performance value (e.g., , and /s) so performance extraction works with the updated log format mentioned in ROCM-21596.
Changes:
- Relaxed the
performance:regex to allow optional/s-like suffixes and an optional comma after the numeric value. - Strips a trailing comma from the parsed metric token.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| import re | ||
|
|
||
| perf_pattern = r"performance:\s*([\d.]+)\s+(\S+)" | ||
| perf_pattern = r"performance:\s*([\d.]+)[/a-zA-Z]*,?\s+(\S+)" |
There was a problem hiding this comment.
The new performance regex is overly permissive: [/a-zA-Z]* will accept arbitrary alphabetic suffixes on the numeric value (e.g., performance: 12abc samples_per_second will now parse as 12.0), which can silently mis-parse malformed logs. Since the stated goal is to handle a trailing "," and/or "/s", consider restricting this portion to those exact tokens (and, if needed, support both orders like "/s," vs ",/s") instead of allowing any letters.
| perf_pattern = r"performance:\s*([\d.]+)[/a-zA-Z]*,?\s+(\S+)" | |
| perf_pattern = r"performance:\s*([\d.]+)(?:/s,|,/s|/s|,)?\s+(\S+)" |
| perf_pattern = r"performance:\s*([\d.]+)[/a-zA-Z]*,?\s+(\S+)" | ||
| match = re.search(perf_pattern, log_content) | ||
| if not match: | ||
| return None | ||
|
|
||
| value = float(match.group(1)) | ||
| metric = match.group(2) | ||
| metric = match.group(2).rstrip(',') |
There was a problem hiding this comment.
This parsing change is behaviorally significant (new supported formats like trailing "/s" and commas). Please add unit tests for _parse_performance_from_log covering the new formats (e.g., performance: 12.34/s samples_per_second, performance: 12.34, samples_per_second, and metric with a trailing comma) to prevent regressions; there are already unit tests for other helpers in tests/unit/test_deployment.py.
|
Code Review — PR#102: Fix regex pattern for performance log parsing
SummaryThe fix correctly addresses the root cause: the performance log parser in Review NotesChange 1 — Regex relaxation (✅ Correct, minor improvement possible) - perf_pattern = r"performance:\s*([\d.]+)\s+(\S+)"
+ perf_pattern = r"performance:\s*([\d.]+)[/a-zA-Z]*,?\s+(\S+)"The new pattern handles the reported
Minor suggestion: perf_pattern = r"performance:\s*([\d.]+)[^,\s]*,?\s+(\S+)"
Change 2 — Metric cleanup (✅ Correct) - metric = match.group(2)
+ metric = match.group(2).rstrip(',')Correctly strips trailing commas from the metric token. No issues here. No regression risk: existing valid VerdictApprove with optional suggestion. The fix is correct and safe to merge. The Review generated with AI assistance (ROCM-21596 investigation). |
Motivation
we have (',' and ‘/s’) after the perf numbers which breaks the regex pattern matching in the madengine framework.
fixes: https://amd-hub.atlassian.net/browse/ROCM-21596
Technical Details
Test Plan
Test Result
Uploading pyt_davit_singlegpu_pyt_davit.ubuntu.amd.run.live.log…
jax_resnet_jax_flax.ubuntu.amd.run.live.log
Submission Checklist