Problem
The behavioral analyzers detect dangerous primitives by matching a set of name spellings, and each new way to spell the same primitive has needed its own branch: #114/#115 (import alias), #166 (getattr), and the builtins/importlib branch (#180). This enumerate-instead-of-canonicalize pattern leaks a fresh blind spot per missed spelling — e.g. subscript __builtins__["exec"] / os.__dict__["system"], and importlib.util / runpy / code machinery, which are currently undetected.
Proposal
One chokepoint resolve_to_canonical_sink(node, aliases, type_map) reducing any spelling — bare/alias/from/as, builtins.*, importlib.import_module, getattr("lit"), subscript __builtins__["lit"] / <mod>.__dict__["lit"], and dynamic-import / code-exec siblings — to one canonical sink id, reusing the existing alias/type machinery and subsuming the per-idiom branches. Paired with a shared, parametrized evasion corpus (tests/evasion_corpus/): several equivalent spellings per primitive plus FP-neighbors that must NOT match, wired as a fitness test so a future missed spelling fails a shared gate instead of shipping silently.
Scope / non-goals
In scope: statically-resolvable spellings. Explicit non-goals (def-use / taint territory, kept residual): non-literal / kwarg getattr(os, name), variable subscript keys, computed attribute strings, module-name-from-variable, and local indirection (f = os.system; f(x)). Receiver-gated instance methods (spec.loader.exec_module, code.InteractiveInterpreter().runsource) fire only when the receiver resolves to the real machinery.
Status
I have a working prototype (chokepoint + corpus + fitness test) that reproduces all current detections with no regression, adds the subscript / __dict__ / sibling spellings, and composes with #166. Looking for a maintainer yes/no on the direction before opening the PR.
Prepared with AI assistance; reviewed and validated by the author.
Problem
The behavioral analyzers detect dangerous primitives by matching a set of name spellings, and each new way to spell the same primitive has needed its own branch: #114/#115 (import alias), #166 (
getattr), and thebuiltins/importlibbranch (#180). This enumerate-instead-of-canonicalize pattern leaks a fresh blind spot per missed spelling — e.g. subscript__builtins__["exec"]/os.__dict__["system"], andimportlib.util/runpy/codemachinery, which are currently undetected.Proposal
One chokepoint
resolve_to_canonical_sink(node, aliases, type_map)reducing any spelling — bare/alias/from/as,builtins.*,importlib.import_module,getattr("lit"), subscript__builtins__["lit"]/<mod>.__dict__["lit"], and dynamic-import / code-exec siblings — to one canonical sink id, reusing the existing alias/type machinery and subsuming the per-idiom branches. Paired with a shared, parametrized evasion corpus (tests/evasion_corpus/): several equivalent spellings per primitive plus FP-neighbors that must NOT match, wired as a fitness test so a future missed spelling fails a shared gate instead of shipping silently.Scope / non-goals
In scope: statically-resolvable spellings. Explicit non-goals (def-use / taint territory, kept residual): non-literal / kwarg
getattr(os, name), variable subscript keys, computed attribute strings, module-name-from-variable, and local indirection (f = os.system; f(x)). Receiver-gated instance methods (spec.loader.exec_module,code.InteractiveInterpreter().runsource) fire only when the receiver resolves to the real machinery.Status
I have a working prototype (chokepoint + corpus + fitness test) that reproduces all current detections with no regression, adds the subscript /
__dict__/ sibling spellings, and composes with #166. Looking for a maintainer yes/no on the direction before opening the PR.Prepared with AI assistance; reviewed and validated by the author.