Skip to content

[Bug] Static pattern analyzers fire on markdown documentation and code blocks, not just executable skill logic #135

Description

@mimran-khan

Summary

If you write a cookbook and include the sentence "be careful with knives — they can cut you," a safety scanner should not flag your cookbook as a weapon. But that's exactly what SkillSpector does with documentation.

Consider a skill that has a docs/deployment.md file showing users how to manually check a service:

```bash
curl -k https://staging.example.com/health
```

This is a documentation example — it's not executable code, it's not what the agent will run, it's a reference for humans reading the skill. But SkillSpector's static pattern analyzers flag it as "Tool Misuse: insecure network call" with the same severity as if the skill's actual Python code was running curl -k.

The problem: 10 of 12 static analyzers have zero awareness of whether they're scanning executable code or markdown documentation. Only 2 analyzers (excessive_agency and memory_poisoning) check if a match is inside a code example. The rest fire indiscriminately on any text that matches a regex pattern — including fenced code blocks, usage examples, and reference documentation that will never be executed by an agent.

Relation to Existing Issues

This problem has been reported from the user-symptom side by multiple contributors:

What this issue adds:

  1. Root cause identification: The exact code-level reason — is_code_example() exists in common.py but is called by only 2 of 12 static analyzers. This is the specific integration gap.
  2. Comprehensive fix scope: Not just one analyzer (P2) or one scoring formula, but a unified filtering layer in run_static_patterns() that applies to ALL 12 analyzers before findings are emitted.
  3. File-type-aware strategy: Distinguishes between hard-dropping (non-executable file types like .md, .json, .yaml) vs. confidence-downweighting (executable files where a code-example context might be real but should be lower confidence) — avoiding the security hole where an attacker could suppress findings in .py files by salting code-example indicators.
  4. PR fix(static-patterns): filter false positives from documentation and code examples #140 (our fix): Implements this at the static_runner.py level with _NON_EXECUTABLE_FILE_TYPES, _DOCUMENTATION_CONFIDENCE_FACTOR, and _CODE_EXAMPLE_CONFIDENCE_FACTOR — a single integration point rather than per-analyzer patches.

Reproduction

Create a skill with documentation that references common shell patterns:

docs-skill/
├── SKILL.md
├── tool.py
└── docs/
    ├── usage.md
    └── deployment.md

SKILL.md:

---
name: docs-skill
description: A deployment automation skill
---
# Deployment Skill
Automates cloud deployments.

tool.py:

def deploy(env: str) -> str:
    """Safe deployment function that calls an internal API."""
    return f"Deployed to {env}"

docs/usage.md:

# Usage Guide

## Example: Checking Service Health

```bash
curl -k https://internal-api.example.com/health
curl --insecure https://staging.example.com/status
```

## Example: Clearing Build Cache

```bash
rm -rf /tmp/build-artifacts
git reset --hard origin/main
```

docs/deployment.md:

# Deployment Procedures

## Rolling Restart

```bash
kubectl exec -it pod-name -- /bin/sh
wget https://artifacts.example.com/release.tar.gz
eval "$(kubectl get secret deploy-key -o json | jq -r '.data.key')"
```
skillspector scan ./docs-skill/ --no-llm --format json
# Multiple TM1, EA2, SQP findings — ALL from the .md documentation files
# tool.py (the only executable code) has zero findings
# Score: HIGH or CRITICAL due to documentation examples

Root Cause

1. is_code_example() only used in 2 of 12 analyzers

grep -rl "is_code_example" src/skillspector/nodes/analyzers/
# Only: static_patterns_excessive_agency.py, static_patterns_memory_poisoning.py, common.py

The static_patterns_tool_misuse analyzer (TM1) has zero documentation filtering:

# static_patterns_tool_misuse.py — no call to is_code_example anywhere
for pattern, confidence in TM1_PATTERNS:
    for match in re.finditer(pattern, content, re.IGNORECASE | re.MULTILINE):
        # Fires on ANY text match, including markdown code blocks

2. is_code_example() itself is too narrow

_CODE_EXAMPLE_INDICATORS = (
    "```",
    "example:",
    "for example",
    # ...only ~14 indicators
)

It only checks for the presence of backticks in a ±3 line context window. It doesn't:

  • Parse markdown structure (fenced code blocks have start/end boundaries)
  • Distinguish "this skill will execute X" from "this document describes X"
  • Account for documentation files in docs/, procedures/, references/ subdirectories

Impact

  • Skills with deployment/procedure documentation are systematically flagged as CRITICAL
  • Developers documenting shell commands in their skill get penalized
  • The scanner cannot distinguish "skill instructs the agent to run curl --insecure" (genuine risk) from "documentation describes a manual procedure that uses curl --insecure" (informational only)
  • Makes the tool unreliable for any real-world skill that includes usage examples

Affected Version

SkillSpector v2.2.3

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions