Fix catastrophic regex backtracking in shell framework#4658
Fix catastrophic regex backtracking in shell framework#4658thrix wants to merge 3 commits intoteemtee:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
The catastrophic regex backtracking is fixed by replacing re.findall with a line-by-line re.search. The added unit tests cover the fix and include a performance regression test. Two suggestions are provided to improve the new test code, focusing on type correctness and assertion specificity.
|
@gemini-code-assist please review again |
There was a problem hiding this comment.
Code Review
This pull request addresses a catastrophic backtracking issue in _extract_failures by using a line-by-line search. The added unit tests cover the changes, including a performance regression test. I have provided two suggestions for minor improvements.
Replace re.findall() with a line-by-line re.search() approach in _extract_failures() to avoid catastrophic backtracking on long lines. The old regex r'.*\b(?:error|fail)\b.*' caused tmt processes to hang for hours when test output contained lines over 1M characters (e.g. base64-encoded in-toto attestation payloads from container builds). Fixes teemtee#4657 Assisted-by Claude Code
- Use Path("dummy.log") instead of MagicMock() for log_path parameter
- Make test assertions exact lists instead of length/membership checks
Assisted-by Claude Code
- Remove TestExtractFailures class, use plain test functions (happz) - Move import tmt.utils to top of file (gemini-code-assist) - Pre-compile regex as module-level FAILURE_PATTERN constant (gemini-code-assist) Assisted-by Claude Code
6884a58 to
4063a35
Compare
|
@gemini-code-assist please review again |
There was a problem hiding this comment.
Code Review
This pull request effectively resolves a critical performance issue related to catastrophic regex backtracking by implementing a more efficient line-by-line search strategy. The accompanying unit tests are thorough and include a valuable performance regression test to prevent this issue from recurring. My feedback includes one suggestion to enhance the maintainability of the new performance test.
| def _make_invocation(log_content: str) -> MagicMock: | ||
| """Create a mock TestInvocation that returns given log content on read.""" | ||
| mock = MagicMock() | ||
| mock.phase.step.plan.execute.read.return_value = log_content | ||
| return mock |
There was a problem hiding this comment.
We use fixture, please make this a proper fixture
| def test_word_boundary() -> None: | ||
| """Words like 'errorless' or 'failover' should not match.""" | ||
| invocation = _make_invocation("errorless operation\nfailover complete\n") | ||
| assert _extract_failures(invocation, Path("dummy.log")) == [] | ||
|
|
||
|
|
||
| def test_file_error() -> None: | ||
| invocation = MagicMock() |
There was a problem hiding this comment.
Inconsistent documentaiton of tests
| def test_no_failures() -> None: | ||
| invocation = _make_invocation("all good\nnothing wrong here\n") | ||
| assert _extract_failures(invocation, Path("dummy.log")) == [] | ||
|
|
||
|
|
||
| def test_error_match() -> None: | ||
| invocation = _make_invocation("line1\nsome error occurred\nline3\n") | ||
| result = _extract_failures(invocation, Path("dummy.log")) | ||
| assert result == ["some error occurred"] | ||
|
|
||
|
|
||
| def test_fail_match() -> None: | ||
| invocation = _make_invocation("test passed\ntest fail here\ndone\n") | ||
| result = _extract_failures(invocation, Path("dummy.log")) | ||
| assert result == ["test fail here"] |
There was a problem hiding this comment.
So much repetition. Please make these parametrized
| def test_long_lines_performance() -> None: | ||
| """ | ||
| Regression test: regex must not cause catastrophic backtracking | ||
| on very long lines. | ||
|
|
||
| The original implementation used re.findall(r'.*\\b(?:error|fail)\\b.*', ...) | ||
| which caused O(n^2) or worse backtracking on long lines without matches, | ||
| hanging tmt processes for hours on 1M+ character lines (e.g. base64-encoded | ||
| in-toto attestation payloads in container build logs). | ||
| """ | ||
| # Build a log with a 1M-character line (similar to base64 attestation data) | ||
| long_line = "A" * 1_000_000 | ||
| log_content = f"start\n{long_line}\nsome error here\nend\n" | ||
|
|
||
| invocation = _make_invocation(log_content) | ||
|
|
||
| start = time.monotonic() | ||
| result = _extract_failures(invocation, Path("dummy.log")) | ||
| elapsed = time.monotonic() - start | ||
|
|
||
| # Must complete in under 5 seconds (the old regex took 5+ seconds | ||
| # on just 10k characters, and would never complete on 1M characters) | ||
| assert elapsed < 5.0, ( | ||
| f"_extract_failures took {elapsed:.1f}s on a log with a 1M-char line; " | ||
| f"likely catastrophic regex backtracking" | ||
| ) | ||
| assert result == ["some error here"] |
There was a problem hiding this comment.
Checking test times in tests is very iffy to me. But in absence of benchmark infrastructure 🤷 . I am also fine with just ignoring this test since it fixed a rather edge-case situation
| start = time.monotonic() | ||
| result = _extract_failures(invocation, Path("dummy.log")) | ||
| elapsed = time.monotonic() - start |
There was a problem hiding this comment.
The choice of monotic here seems arbitrary?
| # Must complete in under 5 seconds (the old regex took 5+ seconds | ||
| # on just 10k characters, and would never complete on 1M characters) | ||
| assert elapsed < 5.0, ( |
There was a problem hiding this comment.
5 seconds is a very arbitrary cut-off. If the old one would have finished these in 5.001 seconds, and this change brings it down to 4.999, this test does not really tell anything. Similarly if we make changes to the code and it gradually increases the test time to the point that it takes 4.999 tests still pass.
If there is log-scale changes then confirm the intended change are at least an order of magnitude difference and account more realistically buffers of what time scale it is expected
| # Build a log with a 1M-character line (similar to base64 attestation data) | ||
| long_line = "A" * 1_000_000 | ||
| log_content = f"start\n{long_line}\nsome error here\nend\n" |
There was a problem hiding this comment.
What if the long line does not have a new-line that separates from the some error part, what if it is comprised of multiple words which would match the \b?
Summary
Fix
_extract_failures()intmt/frameworks/shell.pywhich causes tmt processes to hang for hours when test output contains very long lines.re.findall(r'.*\b(?:error|fail)\b.*', ...)with a line-by-linere.search()approach that avoids catastrophic regex backtracking_extract_failures()including a regression test with 1M-character linesFixes #4657
Context
Testing Farm jobs from the Hummingbird project were getting stuck in "running" state for hours. The test output files contained base64-encoded in-toto attestation payloads — single lines of 1M+ characters — causing the regex engine to backtrack indefinitely.
Detailed write-up: https://vadkerti.net/posts/debugging-stuck-tmt-processes-with-claude-code/
Test plan
hatch -e dev run -- python -m pytest tests/unit/test_shell.py -v)