Skip to content

Proposal: canonical dangerous-callable resolver + shared evasion corpus as a fitness gate #181

Description

@zied-jlassi

Problem

The behavioral analyzers detect dangerous primitives by matching a set of name spellings, and each new way to spell the same primitive has needed its own branch: #114/#115 (import alias), #166 (getattr), and the builtins/importlib branch (#180). This enumerate-instead-of-canonicalize pattern leaks a fresh blind spot per missed spelling — e.g. subscript __builtins__["exec"] / os.__dict__["system"], and importlib.util / runpy / code machinery, which are currently undetected.

Proposal

One chokepoint resolve_to_canonical_sink(node, aliases, type_map) reducing any spelling — bare/alias/from/as, builtins.*, importlib.import_module, getattr("lit"), subscript __builtins__["lit"] / <mod>.__dict__["lit"], and dynamic-import / code-exec siblings — to one canonical sink id, reusing the existing alias/type machinery and subsuming the per-idiom branches. Paired with a shared, parametrized evasion corpus (tests/evasion_corpus/): several equivalent spellings per primitive plus FP-neighbors that must NOT match, wired as a fitness test so a future missed spelling fails a shared gate instead of shipping silently.

Scope / non-goals

In scope: statically-resolvable spellings. Explicit non-goals (def-use / taint territory, kept residual): non-literal / kwarg getattr(os, name), variable subscript keys, computed attribute strings, module-name-from-variable, and local indirection (f = os.system; f(x)). Receiver-gated instance methods (spec.loader.exec_module, code.InteractiveInterpreter().runsource) fire only when the receiver resolves to the real machinery.

Status

I have a working prototype (chokepoint + corpus + fitness test) that reproduces all current detections with no regression, adds the subscript / __dict__ / sibling spellings, and composes with #166. Looking for a maintainer yes/no on the direction before opening the PR.

Prepared with AI assistance; reviewed and validated by the author.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions