Skip to content

[ML] Harden pytorch_inference with TorchScript model graph validation#2999

Merged
edsavage merged 11 commits intoelastic:mainfrom
edsavage:feature/harden-pytorch-inference-v2
Mar 19, 2026
Merged

[ML] Harden pytorch_inference with TorchScript model graph validation#2999
edsavage merged 11 commits intoelastic:mainfrom
edsavage:feature/harden-pytorch-inference-v2

Conversation

@edsavage
Copy link
Contributor

@edsavage edsavage commented Mar 15, 2026

Summary

Re-applies #2936 and #2991 which were reverted in #2995.

  • Adds a static TorchScript graph validation layer (CModelGraphValidator, CSupportedOperations) that rejects models containing operations not observed in supported transformer architectures, reducing the attack surface by ensuring only known-safe operation sets are permitted.
  • Includes aten::mul_ and quantized::linear_dynamic in the allowed operations for dynamically quantized models (e.g. ELSER v2 imported via Eland).
  • Adds Python extraction tooling (dev-tools/extract_model_ops/) to trace reference HuggingFace models and collect their op sets, with support for quantized variants.
  • Adds reference_model_ops.json golden file and C++ drift test to detect allowlist staleness on PyTorch upgrades.
  • Adds adversarial "evil model" integration tests to verify rejection of forbidden operations.
  • Adds CHANGELOG entry.

Test plan

@prodsecmachine
Copy link

prodsecmachine commented Mar 15, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@edsavage edsavage marked this pull request as draft March 15, 2026 21:19
…ns quotes

Sanitize BUILDKITE_MESSAGE before embedding in generated pipeline YAML
to prevent double quotes and multi-line content from breaking the YAML
structure. Affects both run_qa_tests.yml.sh and run_pytorch_tests.yml.sh.

Made-with: Cursor
Add aten::bmm, aten::ceil, aten::floor_divide, aten::gt, aten::le, and
aten::sign to the allowed operations list, fixing graph validation
failures for the .rerank-v1 model used by the default rerank endpoint.

These operations were identified by running the full Elasticsearch
inference and ML integration test suites (974 tests) against a build
with graph validation enabled.

Made-with: Cursor
Add aten::clamp_min, aten::eq, aten::expand_as, aten::linalg_vector_norm,
and aten::sum to the allowed operations list, fixing graph validation
failures for distilbert-base-uncased-finetuned-sst-2-english and
sentence-transformers/all-distilroberta-v1 models.

Made-with: Cursor
…d additional model ops

Add .rerank-v1 (52 ops extracted from ml-models.elastic.co .pt file),
distilbert-sst2, all-distilroberta-v1, and their Eland-deployed variants
to the reference model golden file. All ops extracted with PyTorch 2.7.1.

Add aten::detach (Eland-traced models) and aten::masked_fill_ (.rerank-v1)
to the allowlist.

Made-with: Cursor
Add a parallel Buildkite step that runs Elasticsearch inference
integration tests against the ml-cpp build artifacts. The new step
runs on its own machine alongside the existing ES tests step.

Tests exercised:
- DefaultEndPointsIT (ELSER, E5, rerank default endpoints)
- TextEmbeddingCrudIT (E5 model CRUD via inference API)
- Semantic text YAML REST tests (indexing and querying with default
  ELSER 2 endpoint)

All tests use local prepacked models served by the test framework —
no external services required.

Made-with: Cursor
Extract clone/branch selection, Java configuration, and Gradle
invocation into run_es_tests_common.sh.  Both run_es_tests.sh and
run_es_inference_tests.sh are now thin wrappers that pass their
Gradle commands as arguments to the common script.

Made-with: Cursor
Resolve conflict in dev-tools/run_es_tests.sh: keep the refactored
thin-wrapper structure from this branch and integrate the Gradle build
cache support (from elastic#2907) into run_es_tests_common.sh.

Made-with: Cursor
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Reintroduces and extends TorchScript model-graph validation for pytorch_inference to reduce attack surface by enforcing an operation allowlist/denylist, and adds tooling + tests to keep that allowlist current (including quantized model support).

Changes:

  • Add C++ TorchScript graph validation (CModelGraphValidator + CSupportedOperations) and wire it into pytorch_inference startup.
  • Add Python tooling (dev-tools/extract_model_ops/) + CMake runner to extract/validate op allowlists and detect drift via a golden JSON + C++ test.
  • Add adversarial TorchScript fixtures and integration tests; extend Buildkite scripts to run additional Elasticsearch inference integration tests.

Reviewed changes

Copilot reviewed 22 out of 45 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
test/CMakeLists.txt Runs allowlist validation script as part of test_all_parallel (optional) and adds a standalone validation target.
docs/CHANGELOG.asciidoc Adds release note entry for graph validation hardening.
dev-tools/run_es_tests_common.sh Factors shared logic for running ES integration tests from a local Ivy repo.
dev-tools/run_es_tests.sh Uses the common runner to execute core ML REST/YAML tests.
dev-tools/run_es_inference_tests.sh Adds separate runner for ES inference integration tests.
dev-tools/generate_malicious_models.py Adds generator for malicious TorchScript fixtures used by validator integration tests.
dev-tools/extract_model_ops/validation_models.json Defines HF model set to validate allowlist (incl. quantized variants).
dev-tools/extract_model_ops/validate_allowlist.py Adds Python-side allowlist validation against traced models + local .pt fixtures.
dev-tools/extract_model_ops/torchscript_utils.py Shared tracing/inlining + config loading helpers (incl. dynamic quantization).
dev-tools/extract_model_ops/requirements.txt Pins Python deps for extraction/validation tooling (torch + transformers stack).
dev-tools/extract_model_ops/reference_models.json Defines reference HF models used to build the allowlist union (incl. quantized).
dev-tools/extract_model_ops/extract_model_ops.py Adds extractor to generate op unions / C++ initializer / golden JSON.
dev-tools/extract_model_ops/es_it_models/tiny_text_expansion.pt Adds ES IT TorchScript fixture for validation.
dev-tools/extract_model_ops/es_it_models/tiny_text_embedding.pt Adds ES IT TorchScript fixture for validation.
dev-tools/extract_model_ops/es_it_models/supersimple_pytorch_model_it.pt Adds ES IT TorchScript fixture for validation.
dev-tools/extract_model_ops/es_it_models/README.md Documents provenance/regeneration of ES IT TorchScript fixtures.
dev-tools/extract_model_ops/README.md Documents extraction/validation workflows and golden drift test.
dev-tools/extract_model_ops/.gitignore Ignores the local Python venv used by the tooling.
cmake/run-validation.cmake Adds portable CMake driver to create venv, install deps, and run validation.
cmake/functions.cmake Wires validation into precommit target (optional).
bin/pytorch_inference/unittest/testfiles/reference_model_ops.json Adds/updates golden per-model op sets for drift detection tests.
bin/pytorch_inference/unittest/testfiles/malicious_models/malicious_rop_exploit.pt Adds malicious TorchScript fixture used by integration tests.
bin/pytorch_inference/unittest/testfiles/malicious_models/malicious_mixed_file_reader.pt Adds malicious TorchScript fixture used by integration tests.
bin/pytorch_inference/unittest/testfiles/malicious_models/malicious_many_unrecognised.pt Adds malicious TorchScript fixture used by integration tests.
bin/pytorch_inference/unittest/testfiles/malicious_models/malicious_hidden_in_submodule.pt Adds malicious TorchScript fixture used by integration tests.
bin/pytorch_inference/unittest/testfiles/malicious_models/malicious_heap_leak.pt Adds malicious TorchScript fixture used by integration tests.
bin/pytorch_inference/unittest/testfiles/malicious_models/malicious_file_reader_in_submodule.pt Adds malicious TorchScript fixture used by integration tests.
bin/pytorch_inference/unittest/testfiles/malicious_models/malicious_file_reader.pt Adds malicious TorchScript fixture used by integration tests.
bin/pytorch_inference/unittest/testfiles/malicious_models/malicious_conditional.pt Adds malicious TorchScript fixture used by integration tests.
bin/pytorch_inference/unittest/CThreadSettingsTest.cc Switches includes to angle-bracket form.
bin/pytorch_inference/unittest/CResultWriterTest.cc Switches includes to angle-bracket form.
bin/pytorch_inference/unittest/CModelGraphValidatorTest.cc Adds unit + integration tests for validator, fixtures, and allowlist drift.
bin/pytorch_inference/unittest/CMakeLists.txt Adds validator test source and include path for new include style.
bin/pytorch_inference/unittest/CCommandParserTest.cc Switches includes to angle-bracket form.
bin/pytorch_inference/Main.cc Enforces model-graph validation at load time; improves rejection messages.
bin/pytorch_inference/CSupportedOperations.h Declares forbidden/allowed op sets for validation.
bin/pytorch_inference/CSupportedOperations.cc Defines forbidden/allowed TorchScript ops (incl. quantized ops).
bin/pytorch_inference/CModelGraphValidator.h Declares validator API + node-count guard.
bin/pytorch_inference/CModelGraphValidator.cc Implements graph inlining + op collection + allow/deny evaluation.
bin/pytorch_inference/CMakeLists.txt Builds new validator + supported-ops sources into pytorch_inference.
.buildkite/scripts/steps/run_es_inference_tests.sh Adds Buildkite step script for ES inference integration tests.
.buildkite/pipelines/run_qa_tests.yml.sh Sanitizes Buildkite message when triggering downstream QA pipeline.
.buildkite/pipelines/run_pytorch_tests.yml.sh Sanitizes Buildkite message when triggering downstream PyTorch pipeline.
.buildkite/pipelines/run_es_inference_tests_x86_64.yml.sh Adds x86_64 pipeline to run ES inference integration tests.
.buildkite/pipeline.json.py Uploads the new ES inference tests runner pipeline when x86_64 enabled.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Remove unused collect_graph_ops import and fix help text in
  validate_allowlist.py
- Query venv Python for site-packages path instead of globbing
  (which can yield multiple paths) in run-validation.cmake
- Bump cmake_minimum_required to 3.19.2 to match the repo
- Escape backslashes in SAFE_MESSAGE for YAML double-quoted strings
  in pipeline scripts
- Remove allowlist validation from precommit and test_all_parallel
  targets to keep them fast; validation remains available via the
  standalone validate_pytorch_inference_models target
- Document why eval is necessary in run_es_tests_common.sh and
  properly quote the Ivy repo URL

Made-with: Cursor
Resolve conflict in dev-tools/run_es_tests.sh: incorporate
ES_TEST_SUITE support from elastic#2990 (parallel javaRestTest/yamlRestTest
steps) into our thin-wrapper architecture that delegates to
run_es_tests_common.sh.

Made-with: Cursor
@edsavage edsavage marked this pull request as ready for review March 17, 2026 22:50
@edsavage edsavage requested review from Copilot and valeriy42 March 17, 2026 22:50
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Re-introduces TorchScript model graph validation for pytorch_inference to reduce attack surface by rejecting models that contain forbidden or unrecognised TorchScript ops, and adds tooling/tests to keep the allowlist in sync with supported model architectures (including quantized variants).

Changes:

  • Added C++ TorchScript graph validator + op allowlist/denylist, and wired validation into pytorch_inference startup.
  • Added C++ unit/integration tests, golden op-set drift test, and malicious-model fixtures to verify forbidden/unrecognised ops are rejected.
  • Added developer tooling + CMake runner to extract/validate op allowlists against reference HF models and local ES IT .pt models; added Buildkite scripts to run ES inference ITs.

Reviewed changes

Copilot reviewed 21 out of 44 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
test/CMakeLists.txt Adds a standalone CMake target to run the Python allowlist validation script.
docs/CHANGELOG.asciidoc Documents the TorchScript graph validation hardening enhancement.
dev-tools/run_es_tests_common.sh Factors out shared ES integration test runner logic (clone/setup/gradle/cache).
dev-tools/run_es_tests.sh Delegates ES ML REST/YAML test execution to the common runner.
dev-tools/run_es_inference_tests.sh Adds a dedicated runner for ES inference integration test suites.
dev-tools/generate_malicious_models.py Adds generator for malicious TorchScript model fixtures used by validator tests.
dev-tools/extract_model_ops/validation_models.json Defines HF models used by the Python allowlist validation run.
dev-tools/extract_model_ops/validate_allowlist.py Implements Python-side allowlist validation against C++ allow/deny sets.
dev-tools/extract_model_ops/torchscript_utils.py Shared utilities for tracing/scripting HF models and collecting inlined ops.
dev-tools/extract_model_ops/requirements.txt Pins Python dependencies for extraction/validation tooling.
dev-tools/extract_model_ops/reference_models.json Defines HF reference models used to derive the allowlist/golden file.
dev-tools/extract_model_ops/extract_model_ops.py Implements op extraction and golden file generation for allowlist maintenance.
dev-tools/extract_model_ops/es_it_models/tiny_text_expansion.pt Adds pre-saved ES IT TorchScript model fixture for validation.
dev-tools/extract_model_ops/es_it_models/tiny_text_embedding.pt Adds pre-saved ES IT TorchScript model fixture for validation.
dev-tools/extract_model_ops/es_it_models/supersimple_pytorch_model_it.pt Adds pre-saved ES IT TorchScript model fixture for validation.
dev-tools/extract_model_ops/es_it_models/README.md Documents provenance/regeneration steps for ES IT .pt fixtures.
dev-tools/extract_model_ops/README.md Documents how to extract/validate ops and regenerate golden drift file.
dev-tools/extract_model_ops/.gitignore Ignores the local venv used by extraction tooling.
cmake/run-validation.cmake Adds portable CMake script to create venv, install deps, and run validation.
bin/pytorch_inference/unittest/testfiles/reference_model_ops.json Adds/updates golden file for allowlist drift detection in C++ tests.
bin/pytorch_inference/unittest/testfiles/malicious_models/malicious_rop_exploit.pt Adds malicious fixture to ensure validator rejects as_strided-based exploit graphs.
bin/pytorch_inference/unittest/testfiles/malicious_models/malicious_mixed_file_reader.pt Adds malicious fixture to ensure validator rejects forbidden file access ops.
bin/pytorch_inference/unittest/testfiles/malicious_models/malicious_many_unrecognised.pt Adds malicious fixture to ensure validator rejects unexpected/unrecognised ops.
bin/pytorch_inference/unittest/testfiles/malicious_models/malicious_hidden_in_submodule.pt Adds malicious fixture to ensure validator detects ops hidden in submodules.
bin/pytorch_inference/unittest/testfiles/malicious_models/malicious_heap_leak.pt Adds malicious fixture for heap-leak style graphs to validate rejection.
bin/pytorch_inference/unittest/testfiles/malicious_models/malicious_file_reader_in_submodule.pt Adds malicious fixture for forbidden ops inside submodules.
bin/pytorch_inference/unittest/testfiles/malicious_models/malicious_file_reader.pt Adds malicious fixture to ensure validator rejects aten::from_file.
bin/pytorch_inference/unittest/testfiles/malicious_models/malicious_conditional.pt Adds malicious fixture to ensure validator scans conditional branches.
bin/pytorch_inference/unittest/CThreadSettingsTest.cc Switches unittest includes to use configured include dirs (<...>).
bin/pytorch_inference/unittest/CResultWriterTest.cc Switches unittest includes to use configured include dirs (<...>).
bin/pytorch_inference/unittest/CModelGraphValidatorTest.cc Adds comprehensive unit/integration tests for graph validation and drift checks.
bin/pytorch_inference/unittest/CMakeLists.txt Wires new validator tests and adds include path for <...> style includes.
bin/pytorch_inference/unittest/CCommandParserTest.cc Switches unittest includes to use configured include dirs (<...>).
bin/pytorch_inference/Main.cc Calls CModelGraphValidator on model load and fails fast on invalid graphs.
bin/pytorch_inference/CSupportedOperations.h Declares the allowed/forbidden operation sets for graph validation.
bin/pytorch_inference/CSupportedOperations.cc Defines the allowed/forbidden operation sets (incl. quantized ops).
bin/pytorch_inference/CModelGraphValidator.h Declares validator API and collection logic for inlined TorchScript graphs.
bin/pytorch_inference/CModelGraphValidator.cc Implements graph inlining, op collection, and allow/deny validation logic.
bin/pytorch_inference/CMakeLists.txt Adds validator and allowlist sources to the pytorch_inference build.
.buildkite/scripts/steps/run_es_inference_tests.sh Adds Buildkite step script to run ES inference integration tests.
.buildkite/pipelines/run_qa_tests.yml.sh Sanitizes Buildkite message when triggering downstream QA pipeline.
.buildkite/pipelines/run_pytorch_tests.yml.sh Sanitizes Buildkite message when triggering downstream PyTorch test pipeline.
.buildkite/pipelines/run_es_inference_tests_x86_64.yml.sh Adds Buildkite pipeline generator for x86_64 inference ITs.
.buildkite/pipeline.json.py Wires the new inference IT runner pipeline upload into the main pipeline.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +162 to +170
for GRADLE_CMD in "$@" ; do
# eval is required here because each GRADLE_CMD argument from the caller
# contains embedded shell quoting (e.g. --tests "class.name {p0=glob/*}")
# that must be interpreted by the shell. The environment variables
# (GRADLE_JVM_OPTS, CACHE_ARGS, EXTRA_TEST_OPTS) are set by our own
# scripts and the CI environment, not by untrusted input.
eval ./gradlew $GRADLE_JVM_OPTS $CACHE_ARGS \
"-Dbuild.ml_cpp.repo=$IVY_REPO_URL" \
$GRADLE_CMD $EXTRA_TEST_OPTS
Comment on lines +55 to +57
find_program(_python_path
NAMES python3 python3.12 python3.11 python3.10 python3.9 python
DOC "Python 3 interpreter"
Comment on lines +68 to +75
def load_pt_and_collect_ops(pt_path: str) -> set[str] | None:
"""Load a saved TorchScript .pt file, inline, and return its op set."""
try:
module = torch.jit.load(pt_path)
return collect_inlined_ops(module)
except Exception as exc:
print(f" LOAD ERROR: {exc}", file=sys.stderr)
return None
Comment on lines +48 to +55
def extract_ops_for_model(model_name: str,
quantize: bool = False) -> set[str] | None:
"""Trace a HuggingFace model and return its TorchScript op set.

Returns None if the model could not be loaded or traced.
"""
label = f"{model_name} (quantized)" if quantize else model_name
print(f" Loading {label}...", file=sys.stderr)
@edsavage
Copy link
Contributor Author

@copilot open a new pull request to apply changes based on the comments in this thread

- Replace PEP 604 union types (set[str] | None) with
  Optional[set[str]] for Python 3.9 compatibility
- Drop python3.9 from CMake find_program search list since
  scripts use PEP 585 generics requiring 3.9+
- Eliminate eval in run_es_tests_common.sh by changing the caller
  interface: each Gradle argument is now a separate shell argument
  with '---' as a sentinel separating multiple invocations
- Convert CACHE_ARGS from string to proper bash array

Made-with: Cursor
@edsavage
Copy link
Contributor Author

buildkite run_qa_tests

Copy link
Contributor

@valeriy42 valeriy42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great that you are fixing this test gap!

@edsavage edsavage merged commit d3df09c into elastic:main Mar 19, 2026
20 checks passed
edsavage added a commit to edsavage/ml-cpp that referenced this pull request Mar 20, 2026
edsavage added a commit that referenced this pull request Mar 20, 2026
edsavage added a commit to edsavage/ml-cpp that referenced this pull request Mar 22, 2026
edsavage added a commit that referenced this pull request Mar 24, 2026
…alidation (#3008)

Reapply "[ML] Harden pytorch_inference with TorchScript model graph validation (#2999)" (#3006)

This reverts commit ceabc9b.

- Adds a static TorchScript graph validation layer (CModelGraphValidator, CSupportedOperations) that rejects models containing operations not observed in supported transformer architectures, reducing the attack surface by ensuring only known-safe operation sets are permitted.
- Includes aten::mul_ and quantized::linear_dynamic in the allowed operations for dynamically quantized models (e.g. ELSER v2 imported via Eland).
- Adds Python extraction tooling (dev-tools/extract_model_ops/) to trace reference HuggingFace models and collect their op sets, with support for quantized variants.
- Adds reference_model_ops.json golden file and C++ drift test to detect allowlist staleness on PyTorch upgrades.
- Adds adversarial "evil model" integration tests to verify rejection of forbidden operations.
- Adds CHANGELOG entry.

- Add aten::norm to graph validator allowlist

The prepacked .multilingual-e5-small model uses aten::norm for
normalization, which was not in the allowlist. This caused the
model to be rejected with "Unrecognised operations: aten::norm".

- Add multilingual-e5-small model ops to reference files

Extracted ops from intfloat/multilingual-e5-small (base and Eland
text_embedding variant) and added both to the reference golden file.

The base model uses standard XLM-RoBERTa ops. The Eland variant adds
pooling/normalization ops (linalg_vector_norm, clamp, etc.). The
prepacked .multilingual-e5-small model bundled with Elasticsearch uses
aten::norm (added to the allowlist in the previous commit).

- Add graph validator test for prepacked e5 model with aten::norm

The prepacked .multilingual-e5-small model uses aten::norm, which was
missing from the allowlist and caused production failures. This test
loads a tiny (24KB) model that mirrors the real prepacked model's graph
structure (including aten::norm) and verifies graph validation passes.

The test model was created by tracing a minimal XLM-RoBERTa-like
architecture with normalization, then patching the TorchScript IR to
use aten::norm (which modern PyTorch decomposes into
aten::linalg_vector_norm, so it can't be generated via tracing).

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants