Description
measuring baseline ONNX Runtime latency
applying graph optimizations manually + via tooling
comparing execution graphs and performance before/after
Objective
Take a PyTorch → ONNX exported MLP model and reduce inference latency using ONNX graph optimizations, while maintaining numerical equivalence.
Deliverables
/models
mlp_baseline.onnx
mlp_optimized.onnx
/benchmarks
baseline.json
optimized.json
/scripts
export.py
benchmark.py
optimize.py
inspect.py
Description
Objective
Take a PyTorch → ONNX exported MLP model and reduce inference latency using ONNX graph optimizations, while maintaining numerical equivalence.
Deliverables