Skip to content

Add examples for optimizing onnx graph for inference latency #32

@dchou1618

Description

@dchou1618

Description

measuring baseline ONNX Runtime latency
applying graph optimizations manually + via tooling
comparing execution graphs and performance before/after

Objective

Take a PyTorch → ONNX exported MLP model and reduce inference latency using ONNX graph optimizations, while maintaining numerical equivalence.

Deliverables

/models
  mlp_baseline.onnx
  mlp_optimized.onnx

/benchmarks
  baseline.json
  optimized.json

/scripts
  export.py
  benchmark.py
  optimize.py
  inspect.py

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions