[ML] Harden pytorch_inference with TorchScript model graph validation#2999
Conversation
This reverts commit 4f1ec3e.
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
…ns quotes Sanitize BUILDKITE_MESSAGE before embedding in generated pipeline YAML to prevent double quotes and multi-line content from breaking the YAML structure. Affects both run_qa_tests.yml.sh and run_pytorch_tests.yml.sh. Made-with: Cursor
Add aten::bmm, aten::ceil, aten::floor_divide, aten::gt, aten::le, and aten::sign to the allowed operations list, fixing graph validation failures for the .rerank-v1 model used by the default rerank endpoint. These operations were identified by running the full Elasticsearch inference and ML integration test suites (974 tests) against a build with graph validation enabled. Made-with: Cursor
Add aten::clamp_min, aten::eq, aten::expand_as, aten::linalg_vector_norm, and aten::sum to the allowed operations list, fixing graph validation failures for distilbert-base-uncased-finetuned-sst-2-english and sentence-transformers/all-distilroberta-v1 models. Made-with: Cursor
…d additional model ops Add .rerank-v1 (52 ops extracted from ml-models.elastic.co .pt file), distilbert-sst2, all-distilroberta-v1, and their Eland-deployed variants to the reference model golden file. All ops extracted with PyTorch 2.7.1. Add aten::detach (Eland-traced models) and aten::masked_fill_ (.rerank-v1) to the allowlist. Made-with: Cursor
Add a parallel Buildkite step that runs Elasticsearch inference integration tests against the ml-cpp build artifacts. The new step runs on its own machine alongside the existing ES tests step. Tests exercised: - DefaultEndPointsIT (ELSER, E5, rerank default endpoints) - TextEmbeddingCrudIT (E5 model CRUD via inference API) - Semantic text YAML REST tests (indexing and querying with default ELSER 2 endpoint) All tests use local prepacked models served by the test framework — no external services required. Made-with: Cursor
Extract clone/branch selection, Java configuration, and Gradle invocation into run_es_tests_common.sh. Both run_es_tests.sh and run_es_inference_tests.sh are now thin wrappers that pass their Gradle commands as arguments to the common script. Made-with: Cursor
Resolve conflict in dev-tools/run_es_tests.sh: keep the refactored thin-wrapper structure from this branch and integrate the Gradle build cache support (from elastic#2907) into run_es_tests_common.sh. Made-with: Cursor
There was a problem hiding this comment.
Pull request overview
Reintroduces and extends TorchScript model-graph validation for pytorch_inference to reduce attack surface by enforcing an operation allowlist/denylist, and adds tooling + tests to keep that allowlist current (including quantized model support).
Changes:
- Add C++ TorchScript graph validation (
CModelGraphValidator+CSupportedOperations) and wire it intopytorch_inferencestartup. - Add Python tooling (
dev-tools/extract_model_ops/) + CMake runner to extract/validate op allowlists and detect drift via a golden JSON + C++ test. - Add adversarial TorchScript fixtures and integration tests; extend Buildkite scripts to run additional Elasticsearch inference integration tests.
Reviewed changes
Copilot reviewed 22 out of 45 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
test/CMakeLists.txt |
Runs allowlist validation script as part of test_all_parallel (optional) and adds a standalone validation target. |
docs/CHANGELOG.asciidoc |
Adds release note entry for graph validation hardening. |
dev-tools/run_es_tests_common.sh |
Factors shared logic for running ES integration tests from a local Ivy repo. |
dev-tools/run_es_tests.sh |
Uses the common runner to execute core ML REST/YAML tests. |
dev-tools/run_es_inference_tests.sh |
Adds separate runner for ES inference integration tests. |
dev-tools/generate_malicious_models.py |
Adds generator for malicious TorchScript fixtures used by validator integration tests. |
dev-tools/extract_model_ops/validation_models.json |
Defines HF model set to validate allowlist (incl. quantized variants). |
dev-tools/extract_model_ops/validate_allowlist.py |
Adds Python-side allowlist validation against traced models + local .pt fixtures. |
dev-tools/extract_model_ops/torchscript_utils.py |
Shared tracing/inlining + config loading helpers (incl. dynamic quantization). |
dev-tools/extract_model_ops/requirements.txt |
Pins Python deps for extraction/validation tooling (torch + transformers stack). |
dev-tools/extract_model_ops/reference_models.json |
Defines reference HF models used to build the allowlist union (incl. quantized). |
dev-tools/extract_model_ops/extract_model_ops.py |
Adds extractor to generate op unions / C++ initializer / golden JSON. |
dev-tools/extract_model_ops/es_it_models/tiny_text_expansion.pt |
Adds ES IT TorchScript fixture for validation. |
dev-tools/extract_model_ops/es_it_models/tiny_text_embedding.pt |
Adds ES IT TorchScript fixture for validation. |
dev-tools/extract_model_ops/es_it_models/supersimple_pytorch_model_it.pt |
Adds ES IT TorchScript fixture for validation. |
dev-tools/extract_model_ops/es_it_models/README.md |
Documents provenance/regeneration of ES IT TorchScript fixtures. |
dev-tools/extract_model_ops/README.md |
Documents extraction/validation workflows and golden drift test. |
dev-tools/extract_model_ops/.gitignore |
Ignores the local Python venv used by the tooling. |
cmake/run-validation.cmake |
Adds portable CMake driver to create venv, install deps, and run validation. |
cmake/functions.cmake |
Wires validation into precommit target (optional). |
bin/pytorch_inference/unittest/testfiles/reference_model_ops.json |
Adds/updates golden per-model op sets for drift detection tests. |
bin/pytorch_inference/unittest/testfiles/malicious_models/malicious_rop_exploit.pt |
Adds malicious TorchScript fixture used by integration tests. |
bin/pytorch_inference/unittest/testfiles/malicious_models/malicious_mixed_file_reader.pt |
Adds malicious TorchScript fixture used by integration tests. |
bin/pytorch_inference/unittest/testfiles/malicious_models/malicious_many_unrecognised.pt |
Adds malicious TorchScript fixture used by integration tests. |
bin/pytorch_inference/unittest/testfiles/malicious_models/malicious_hidden_in_submodule.pt |
Adds malicious TorchScript fixture used by integration tests. |
bin/pytorch_inference/unittest/testfiles/malicious_models/malicious_heap_leak.pt |
Adds malicious TorchScript fixture used by integration tests. |
bin/pytorch_inference/unittest/testfiles/malicious_models/malicious_file_reader_in_submodule.pt |
Adds malicious TorchScript fixture used by integration tests. |
bin/pytorch_inference/unittest/testfiles/malicious_models/malicious_file_reader.pt |
Adds malicious TorchScript fixture used by integration tests. |
bin/pytorch_inference/unittest/testfiles/malicious_models/malicious_conditional.pt |
Adds malicious TorchScript fixture used by integration tests. |
bin/pytorch_inference/unittest/CThreadSettingsTest.cc |
Switches includes to angle-bracket form. |
bin/pytorch_inference/unittest/CResultWriterTest.cc |
Switches includes to angle-bracket form. |
bin/pytorch_inference/unittest/CModelGraphValidatorTest.cc |
Adds unit + integration tests for validator, fixtures, and allowlist drift. |
bin/pytorch_inference/unittest/CMakeLists.txt |
Adds validator test source and include path for new include style. |
bin/pytorch_inference/unittest/CCommandParserTest.cc |
Switches includes to angle-bracket form. |
bin/pytorch_inference/Main.cc |
Enforces model-graph validation at load time; improves rejection messages. |
bin/pytorch_inference/CSupportedOperations.h |
Declares forbidden/allowed op sets for validation. |
bin/pytorch_inference/CSupportedOperations.cc |
Defines forbidden/allowed TorchScript ops (incl. quantized ops). |
bin/pytorch_inference/CModelGraphValidator.h |
Declares validator API + node-count guard. |
bin/pytorch_inference/CModelGraphValidator.cc |
Implements graph inlining + op collection + allow/deny evaluation. |
bin/pytorch_inference/CMakeLists.txt |
Builds new validator + supported-ops sources into pytorch_inference. |
.buildkite/scripts/steps/run_es_inference_tests.sh |
Adds Buildkite step script for ES inference integration tests. |
.buildkite/pipelines/run_qa_tests.yml.sh |
Sanitizes Buildkite message when triggering downstream QA pipeline. |
.buildkite/pipelines/run_pytorch_tests.yml.sh |
Sanitizes Buildkite message when triggering downstream PyTorch pipeline. |
.buildkite/pipelines/run_es_inference_tests_x86_64.yml.sh |
Adds x86_64 pipeline to run ES inference integration tests. |
.buildkite/pipeline.json.py |
Uploads the new ES inference tests runner pipeline when x86_64 enabled. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Remove unused collect_graph_ops import and fix help text in validate_allowlist.py - Query venv Python for site-packages path instead of globbing (which can yield multiple paths) in run-validation.cmake - Bump cmake_minimum_required to 3.19.2 to match the repo - Escape backslashes in SAFE_MESSAGE for YAML double-quoted strings in pipeline scripts - Remove allowlist validation from precommit and test_all_parallel targets to keep them fast; validation remains available via the standalone validate_pytorch_inference_models target - Document why eval is necessary in run_es_tests_common.sh and properly quote the Ivy repo URL Made-with: Cursor
Resolve conflict in dev-tools/run_es_tests.sh: incorporate ES_TEST_SUITE support from elastic#2990 (parallel javaRestTest/yamlRestTest steps) into our thin-wrapper architecture that delegates to run_es_tests_common.sh. Made-with: Cursor
There was a problem hiding this comment.
Pull request overview
Re-introduces TorchScript model graph validation for pytorch_inference to reduce attack surface by rejecting models that contain forbidden or unrecognised TorchScript ops, and adds tooling/tests to keep the allowlist in sync with supported model architectures (including quantized variants).
Changes:
- Added C++ TorchScript graph validator + op allowlist/denylist, and wired validation into
pytorch_inferencestartup. - Added C++ unit/integration tests, golden op-set drift test, and malicious-model fixtures to verify forbidden/unrecognised ops are rejected.
- Added developer tooling + CMake runner to extract/validate op allowlists against reference HF models and local ES IT
.ptmodels; added Buildkite scripts to run ES inference ITs.
Reviewed changes
Copilot reviewed 21 out of 44 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
test/CMakeLists.txt |
Adds a standalone CMake target to run the Python allowlist validation script. |
docs/CHANGELOG.asciidoc |
Documents the TorchScript graph validation hardening enhancement. |
dev-tools/run_es_tests_common.sh |
Factors out shared ES integration test runner logic (clone/setup/gradle/cache). |
dev-tools/run_es_tests.sh |
Delegates ES ML REST/YAML test execution to the common runner. |
dev-tools/run_es_inference_tests.sh |
Adds a dedicated runner for ES inference integration test suites. |
dev-tools/generate_malicious_models.py |
Adds generator for malicious TorchScript model fixtures used by validator tests. |
dev-tools/extract_model_ops/validation_models.json |
Defines HF models used by the Python allowlist validation run. |
dev-tools/extract_model_ops/validate_allowlist.py |
Implements Python-side allowlist validation against C++ allow/deny sets. |
dev-tools/extract_model_ops/torchscript_utils.py |
Shared utilities for tracing/scripting HF models and collecting inlined ops. |
dev-tools/extract_model_ops/requirements.txt |
Pins Python dependencies for extraction/validation tooling. |
dev-tools/extract_model_ops/reference_models.json |
Defines HF reference models used to derive the allowlist/golden file. |
dev-tools/extract_model_ops/extract_model_ops.py |
Implements op extraction and golden file generation for allowlist maintenance. |
dev-tools/extract_model_ops/es_it_models/tiny_text_expansion.pt |
Adds pre-saved ES IT TorchScript model fixture for validation. |
dev-tools/extract_model_ops/es_it_models/tiny_text_embedding.pt |
Adds pre-saved ES IT TorchScript model fixture for validation. |
dev-tools/extract_model_ops/es_it_models/supersimple_pytorch_model_it.pt |
Adds pre-saved ES IT TorchScript model fixture for validation. |
dev-tools/extract_model_ops/es_it_models/README.md |
Documents provenance/regeneration steps for ES IT .pt fixtures. |
dev-tools/extract_model_ops/README.md |
Documents how to extract/validate ops and regenerate golden drift file. |
dev-tools/extract_model_ops/.gitignore |
Ignores the local venv used by extraction tooling. |
cmake/run-validation.cmake |
Adds portable CMake script to create venv, install deps, and run validation. |
bin/pytorch_inference/unittest/testfiles/reference_model_ops.json |
Adds/updates golden file for allowlist drift detection in C++ tests. |
bin/pytorch_inference/unittest/testfiles/malicious_models/malicious_rop_exploit.pt |
Adds malicious fixture to ensure validator rejects as_strided-based exploit graphs. |
bin/pytorch_inference/unittest/testfiles/malicious_models/malicious_mixed_file_reader.pt |
Adds malicious fixture to ensure validator rejects forbidden file access ops. |
bin/pytorch_inference/unittest/testfiles/malicious_models/malicious_many_unrecognised.pt |
Adds malicious fixture to ensure validator rejects unexpected/unrecognised ops. |
bin/pytorch_inference/unittest/testfiles/malicious_models/malicious_hidden_in_submodule.pt |
Adds malicious fixture to ensure validator detects ops hidden in submodules. |
bin/pytorch_inference/unittest/testfiles/malicious_models/malicious_heap_leak.pt |
Adds malicious fixture for heap-leak style graphs to validate rejection. |
bin/pytorch_inference/unittest/testfiles/malicious_models/malicious_file_reader_in_submodule.pt |
Adds malicious fixture for forbidden ops inside submodules. |
bin/pytorch_inference/unittest/testfiles/malicious_models/malicious_file_reader.pt |
Adds malicious fixture to ensure validator rejects aten::from_file. |
bin/pytorch_inference/unittest/testfiles/malicious_models/malicious_conditional.pt |
Adds malicious fixture to ensure validator scans conditional branches. |
bin/pytorch_inference/unittest/CThreadSettingsTest.cc |
Switches unittest includes to use configured include dirs (<...>). |
bin/pytorch_inference/unittest/CResultWriterTest.cc |
Switches unittest includes to use configured include dirs (<...>). |
bin/pytorch_inference/unittest/CModelGraphValidatorTest.cc |
Adds comprehensive unit/integration tests for graph validation and drift checks. |
bin/pytorch_inference/unittest/CMakeLists.txt |
Wires new validator tests and adds include path for <...> style includes. |
bin/pytorch_inference/unittest/CCommandParserTest.cc |
Switches unittest includes to use configured include dirs (<...>). |
bin/pytorch_inference/Main.cc |
Calls CModelGraphValidator on model load and fails fast on invalid graphs. |
bin/pytorch_inference/CSupportedOperations.h |
Declares the allowed/forbidden operation sets for graph validation. |
bin/pytorch_inference/CSupportedOperations.cc |
Defines the allowed/forbidden operation sets (incl. quantized ops). |
bin/pytorch_inference/CModelGraphValidator.h |
Declares validator API and collection logic for inlined TorchScript graphs. |
bin/pytorch_inference/CModelGraphValidator.cc |
Implements graph inlining, op collection, and allow/deny validation logic. |
bin/pytorch_inference/CMakeLists.txt |
Adds validator and allowlist sources to the pytorch_inference build. |
.buildkite/scripts/steps/run_es_inference_tests.sh |
Adds Buildkite step script to run ES inference integration tests. |
.buildkite/pipelines/run_qa_tests.yml.sh |
Sanitizes Buildkite message when triggering downstream QA pipeline. |
.buildkite/pipelines/run_pytorch_tests.yml.sh |
Sanitizes Buildkite message when triggering downstream PyTorch test pipeline. |
.buildkite/pipelines/run_es_inference_tests_x86_64.yml.sh |
Adds Buildkite pipeline generator for x86_64 inference ITs. |
.buildkite/pipeline.json.py |
Wires the new inference IT runner pipeline upload into the main pipeline. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
dev-tools/run_es_tests_common.sh
Outdated
| for GRADLE_CMD in "$@" ; do | ||
| # eval is required here because each GRADLE_CMD argument from the caller | ||
| # contains embedded shell quoting (e.g. --tests "class.name {p0=glob/*}") | ||
| # that must be interpreted by the shell. The environment variables | ||
| # (GRADLE_JVM_OPTS, CACHE_ARGS, EXTRA_TEST_OPTS) are set by our own | ||
| # scripts and the CI environment, not by untrusted input. | ||
| eval ./gradlew $GRADLE_JVM_OPTS $CACHE_ARGS \ | ||
| "-Dbuild.ml_cpp.repo=$IVY_REPO_URL" \ | ||
| $GRADLE_CMD $EXTRA_TEST_OPTS |
cmake/run-validation.cmake
Outdated
| find_program(_python_path | ||
| NAMES python3 python3.12 python3.11 python3.10 python3.9 python | ||
| DOC "Python 3 interpreter" |
| def load_pt_and_collect_ops(pt_path: str) -> set[str] | None: | ||
| """Load a saved TorchScript .pt file, inline, and return its op set.""" | ||
| try: | ||
| module = torch.jit.load(pt_path) | ||
| return collect_inlined_ops(module) | ||
| except Exception as exc: | ||
| print(f" LOAD ERROR: {exc}", file=sys.stderr) | ||
| return None |
| def extract_ops_for_model(model_name: str, | ||
| quantize: bool = False) -> set[str] | None: | ||
| """Trace a HuggingFace model and return its TorchScript op set. | ||
|
|
||
| Returns None if the model could not be loaded or traced. | ||
| """ | ||
| label = f"{model_name} (quantized)" if quantize else model_name | ||
| print(f" Loading {label}...", file=sys.stderr) |
|
@copilot open a new pull request to apply changes based on the comments in this thread |
- Replace PEP 604 union types (set[str] | None) with Optional[set[str]] for Python 3.9 compatibility - Drop python3.9 from CMake find_program search list since scripts use PEP 585 generics requiring 3.9+ - Eliminate eval in run_es_tests_common.sh by changing the caller interface: each Gradle argument is now a separate shell argument with '---' as a sentinel separating multiple invocations - Convert CACHE_ARGS from string to proper bash array Made-with: Cursor
|
buildkite run_qa_tests |
There was a problem hiding this comment.
Great that you are fixing this test gap!
…lidation (elastic#2999)" This reverts commit d3df09c.
…alidation (elastic#2999)" (elastic#3006) This reverts commit ceabc9b.
…alidation (#3008) Reapply "[ML] Harden pytorch_inference with TorchScript model graph validation (#2999)" (#3006) This reverts commit ceabc9b. - Adds a static TorchScript graph validation layer (CModelGraphValidator, CSupportedOperations) that rejects models containing operations not observed in supported transformer architectures, reducing the attack surface by ensuring only known-safe operation sets are permitted. - Includes aten::mul_ and quantized::linear_dynamic in the allowed operations for dynamically quantized models (e.g. ELSER v2 imported via Eland). - Adds Python extraction tooling (dev-tools/extract_model_ops/) to trace reference HuggingFace models and collect their op sets, with support for quantized variants. - Adds reference_model_ops.json golden file and C++ drift test to detect allowlist staleness on PyTorch upgrades. - Adds adversarial "evil model" integration tests to verify rejection of forbidden operations. - Adds CHANGELOG entry. - Add aten::norm to graph validator allowlist The prepacked .multilingual-e5-small model uses aten::norm for normalization, which was not in the allowlist. This caused the model to be rejected with "Unrecognised operations: aten::norm". - Add multilingual-e5-small model ops to reference files Extracted ops from intfloat/multilingual-e5-small (base and Eland text_embedding variant) and added both to the reference golden file. The base model uses standard XLM-RoBERTa ops. The Eland variant adds pooling/normalization ops (linalg_vector_norm, clamp, etc.). The prepacked .multilingual-e5-small model bundled with Elasticsearch uses aten::norm (added to the allowlist in the previous commit). - Add graph validator test for prepacked e5 model with aten::norm The prepacked .multilingual-e5-small model uses aten::norm, which was missing from the allowlist and caused production failures. This test loads a tiny (24KB) model that mirrors the real prepacked model's graph structure (including aten::norm) and verifies graph validation passes. The test model was created by tracing a minimal XLM-RoBERTa-like architecture with normalization, then patching the TorchScript IR to use aten::norm (which modern PyTorch decomposes into aten::linalg_vector_norm, so it can't be generated via tracing). Made-with: Cursor
Summary
Re-applies #2936 and #2991 which were reverted in #2995.
CModelGraphValidator,CSupportedOperations) that rejects models containing operations not observed in supported transformer architectures, reducing the attack surface by ensuring only known-safe operation sets are permitted.aten::mul_andquantized::linear_dynamicin the allowed operations for dynamically quantized models (e.g. ELSER v2 imported via Eland).dev-tools/extract_model_ops/) to trace reference HuggingFace models and collect their op sets, with support for quantized variants.reference_model_ops.jsongolden file and C++ drift test to detect allowlist staleness on PyTorch upgrades.Test plan
test_pytorch_inferencepassesci:run-qa-testslabel applied) See https://buildkite.com/elastic/appex-qa-stateful-custom-ml-cpp-build-testing/builds/724