中文版 | English
XSlim is a Post-Training Quantization (PTQ) tool developed by SpacemiT. It integrates chip-optimized quantization strategies and provides a unified interface for ONNX model quantization via JSON configuration files.
- INT8 / FP16 / Dynamic Quantization – multiple precision levels for different deployment scenarios
- JSON-driven configuration – simple, declarative quantization setup
- Python API & CLI – use as a library or from the command line
- Custom preprocessing – plug in your own preprocessing functions
- Expanded ONNX operator coverage – run Graphwise Analysis and quantization on models that use common arithmetic, activation, comparison, reduction, dropout, and opset-24
Padpatterns - Automatic YOLO decode fusion – fuse supported YOLO decode subgraphs into a single
spacemit_functions.YoloDecodenode - ONNX Function-aware export – preserve embedded FunctionProto definitions and emit required custom-domain imports automatically
- ONNX-based workflow – built on the ONNX ecosystem
python -m pip install xslimOr install from source:
git clone https://github.com/spacemit-com/xslim.git
cd xslim
python -m pip install .For local development, use an editable install:
python -m pip install -e .Build metadata is defined in pyproject.toml, and the import package lives under the standard src/ layout. To build source and wheel distributions locally:
python -m pip install --upgrade build
python -m buildimport xslim
# Using a JSON config file
xslim.quantize_onnx_model("config.json")
# Using a dict
config = {
"model_parameters": {
"onnx_model": "model.onnx",
"working_dir": "./output"
},
"calibration_parameters": {
"input_parameters": [{
"mean_value": [123.675, 116.28, 103.53],
"std_value": [58.395, 57.12, 57.375],
"color_format": "rgb",
"preprocess_file": "PT_IMAGENET",
"data_list_path": "./calib_img_list.txt"
}]
}
}
xslim.quantize_onnx_model(config)
# You can also pass the model path and output path directly
xslim.quantize_onnx_model("config.json", "input.onnx", "output.onnx")# Installed CLI entry point
xslim --config config.json
# Module entry point also remains available
python -m xslim --config config.json
# Specify input and output model paths
xslim -c config.json -i input.onnx -o output.onnx
# Dynamic quantization (no config file needed)
xslim -i input.onnx -o output.onnx --dynq
# FP16 conversion (no config file needed)
xslim -i input.onnx -o output.onnx --fp16
# Convert the default ai.onnx opset to a target version
xslim -i input.onnx -o output.onnx --opset 20
# ONNX simplification only (no config file needed)
xslim -i input.onnx -o output.onnxFor config-free dynamic quantization and FP16 conversion, you can exclude operators with comma-separated names or types:
xslim -i input.onnx -o output.onnx --dynq --ignore_op_types Softmax,LayerNormalization
xslim -i input.onnx -o output.onnx --fp16 --ignore_op_names /model/head/MatMulStatic INT8 quantization expects a floating-point input model. If the model already contains QuantizeLinear or DequantizeLinear, XSlim stops with a clear error instead of quantizing an already-quantized graph again.
For supported YOLO exports, no extra switch is required: XSlim will try to fuse decode-heavy post-processing into spacemit_functions.YoloDecode during simplification and keep the corresponding ONNX FunctionProto in the exported model.
- Configuration Reference – Full description of all JSON configuration options
- Examples – Step-by-step guides for INT8, FP16, dynamic quantization, custom preprocessing, and more
- Accuracy Tuning Guide – How to diagnose and improve quantization accuracy
See the samples directory for ready-to-run examples covering ResNet-18, MobileNet V3, BERT, and more. YOLO-specific usage notes are documented in the examples and accuracy-tuning guides.
For a full list of published versions, see the Releases page. The summary below is synchronized with that release history; 2.1.1 is the current in-tree development version and has not been published yet.
| Version | Highlights |
|---|---|
| 2.1.1 | Current in-tree development version after the 2.1.0 release |
| 2.1.0 | Latest published release; add automatic spacemit_functions.YoloDecode fusion for supported YOLO exports, preserve custom ONNX FunctionProto definitions during quantization/export, improve opset-24/custom-domain handling coverage, expand ONNX operator execution/socket coverage, support scalar and axes-input reduce kernels, and reject static re-quantization of models that already contain QuantizeLinear / DequantizeLinear |
| 2.0.14 | Add configurable default ai.onnx opset conversion for quantization and conversion workflows |
| 2.0.13 | Upgrade the default ONNX opset to 24, standardize operator domains, and align version metadata with the 2.0.12 release |
| 2.0.12 | Complete README changelog/release metadata, add accuracy-tuning docs and README links, introduce the xslim-accuracy-tuning GitHub skill, add YOLO truncation guidance, and rename input parameters for consistency |
| 2.0.11 | Fix Pad/missing-input handling, add Or/Einsum/Selu support, normalize Conv/ConvTranspose kernel shapes, and raise minimum Python to 3.9 |
| 2.0.10 | Align release metadata, improve CI/test coverage, normalize missing default ONNX opset before dynamic quantization, and refine shape inference handling |
| 2.0.9 | Add documentation, preserve tensor dtype metadata during FP16 conversion, and restore compatibility with onnxslim 0.1.87 |
| 2.0.8 | Improve packaging/CI, add torch executor operator coverage, add PyPI publish workflow, and centralize version metadata |
| 2.0.7 | Fix FP16 conversion bug on complex models |
| 2.0.6 | Fix metadata props deletion; default CLI behavior changed to model simplification (use --dynq for dynamic quantization) |
Contributions are welcome! Please open an issue or submit a pull request.
This project is licensed under the Apache License 2.0.