Skip to content

[Security] MP2 regex vulnerable to catastrophic backtracking (ReDoS) #151

Description

@mimran-khan

Summary

Imagine a toll booth that normally processes cars in 2 seconds each. One day a truck arrives with a license plate in a specific unusual format. The toll booth computer tries to read it and freezes — not for 2 seconds, but for 2 hours. Meanwhile, the entire highway backs up because one license plate crashed the system.

SkillSpector's MP2 "Context Window Stuffing" static pattern contains a regex that is vulnerable to catastrophic backtracking (ReDoS — Regular Expression Denial of Service). When this regex encounters certain input patterns, the regex engine's execution time grows exponentially, causing the scanner to hang indefinitely on a single file. A malicious skill can exploit this to DoS the scanning pipeline.

Why This Matters — Real-World Scenario

Scenario: Automated scanning pipeline in CI

An organization runs SkillSpector in their CI/CD pipeline with a 5-minute timeout per skill scan. A malicious contributor discovers that certain text patterns cause the MP2 regex to take exponential time. They craft a skill that looks normal except for one file containing:

abababababababababababababababababababababab

(a specific repeating pattern that triggers the catastrophic backtracking)

The CI job hits the regex, and the scanner process hangs. Because the regex engine is stuck in a tight CPU loop (not waiting on I/O), the process doesn't respond to graceful shutdown signals quickly. The CI job times out after 5 minutes and is marked as failed.

If the organization processes skills in a queue, this creates a denial-of-service: legitimate skills behind the malicious one wait indefinitely. If the scanner runs with limited concurrency (e.g., 4 parallel scans), 4 carefully-crafted submissions can block the entire pipeline.

Worse, if there's no timeout and the process runs in a serverless function (AWS Lambda, Cloud Run), the function runs until memory/time limits kill it — consuming billable compute time.

Reproduction

import re
import time

# The vulnerable regex from MP2 patterns
pattern = re.compile(r"((\S)(?!\2).{1,19}?)\1{20,}")

# Craft input that triggers catastrophic backtracking
# The regex tries to match a repeating pattern of 2+ unique chars
# When it can't find a full match, it backtracks exponentially
evil_input = "ab" * 30 + "c"  # 61 chars — should be instant but isn't

start = time.time()
result = pattern.search(evil_input)
elapsed = time.time() - start

print(f"Time: {elapsed:.2f}s")  # Expected: >10s (exponential growth with input length)
print(f"Match: {result}")
# Create a skill with the trigger pattern
mkdir -p /tmp/redos-skill
cat > /tmp/redos-skill/SKILL.md << 'SKILLEOF'
---
name: redos-test
description: Tests regex performance
---
# ReDoS Test
SKILLEOF

# Create file with pattern that triggers backtracking
python -c "print('ab' * 50 + 'c')" > /tmp/redos-skill/trigger.txt

# This scan will hang on the MP2 pattern
timeout 30 skillspector scan /tmp/redos-skill/ --no-llm
# Expected: timeout after 30s (scanner hangs on regex)

Root Cause

In src/skillspector/nodes/analyzers/static_patterns_memory_poisoning.py, line 84:

MP2_PATTERNS = [
    # ... other patterns ...
    (r"((\S)(?!\2).{1,19}?)\1{20,}", 0.8),  # "Context Window Stuffing"
]

This regex attempts to detect repeated text patterns (potential context window stuffing). Breaking it down:

  • (\S) — capture a non-whitespace character
  • (?!\2) — followed by a character that is NOT the same (negative lookahead)
  • .{1,19}? — followed by 1-19 characters (lazy quantifier)
  • \1{20,} — the entire group repeated 20+ times

The catastrophic backtracking occurs because:

  1. .{1,19}? is a lazy quantifier inside a group that must repeat 20+ times
  2. When the full match fails, the engine backtracks through all possible lengths of .{1,19}? for each of the 20+ repetitions
  3. The combination of nested quantifiers ({1,19}? inside \1{20,}) creates exponential state space
  4. Input that almost matches but doesn't (e.g., repeating patterns with a slight break) forces maximum backtracking

The time complexity is O(2^n) where n is the input length, rather than the expected O(n).

Impact

  • Denial of Service: A single crafted file can hang the scanner process indefinitely
  • CI/CD pipeline blocking: Scanning queues stall when one scan hangs
  • Resource exhaustion: CPU pegged at 100% during backtracking with no progress
  • No timeout protection: The regex engine does not support timeouts in Python's re module
  • Billable compute waste: Cloud-hosted scans (Lambda, Cloud Run) burn compute budget on stuck processes
  • Trivial to exploit: The trigger pattern is simple to construct and doesn't look suspicious in a code review

Affected Version

SkillSpector v2.2.3

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions