- π Project Overview
- β¨ Features
- π Installation
- π Usage
- π§ Models
- β‘ Performance Comparison
- π Results
- π Repository Structure
- π License
- π« Contact
EfficientDet Lite Object Detection with ONNX & TensorRT is a high-performance project designed to implement EfficientDet Lite models (versions 0 to 4) for object detection. Utilizing ONNX for model inference and TensorRT for optimized engine building, this project enables efficient and rapid deployment of object detection models with support for FP32 and FP16 precision on NVIDIA GPUs.
- Support for EfficientDet Lite Models: Implemented versions 0, 1, 2, 3, and 4.
- ONNX Inference: Run inference directly using ONNX models.
- TensorRT Engine Building: Optimize models with TensorRT for FP32 and FP16 precision.
- Inference Scripts: Execute inference using both ONNX and TensorRT engines seamlessly.
- Performance Benchmarking: Compare latency and speed across different models and backends.
- (TO BE IMPLEMENTED) INT8 Quantization: INT8 Post-Training Quantization for faster inference.
- ONNX Runtime (tested with version 1.19.2)
- TensorRT (tested with version 10.5.0)
- PyCUDA (tested with version 2024.1.2)
- cuda-python (tested with version 12.2.1 - should be the same as installed CUDA version)
-
Clone the Repository
git clone https://github.com/namas191297/efficientdetlite.git cd efficientdetlite -
Set Up a Virtual Environment
conda create -n efficientdetlite python=3.9 conda activate efficientdetlite
-
Install Dependencies
pip install -r requirements.txt
-
Download EfficientDet Lite Models
- Follow the instructions in the Models section to obtain the required model files.
-
FP32 Precision
python scripts/build_trt_engine.py --model path/to/model.onnx --precision FP32 --output path/to/engine_fp32.trt
-
FP16 Precision
python scripts/build_trt_engine.py --model path/to/model.onnx --precision FP16 --output path/to/engine_fp16.trt
-
Using FP32 Engine
python scripts/infer_trt.py --engine path/to/engine_fp32.trt --image path/to/image.jpg
-
Using FP16 Engine
python scripts/infer_trt.py --engine path/to/engine_fp16.trt --image path/to/image.jpg
- Building TRT Engine from ONNX models
# Build .engine TRT Engine for EfficientDetLit4 with FP32 precision.
python build_engine.py --model_type efficientdet_lite4
# Build .engine TRT Engine for EfficientDetLit4 with FP16 precision.
python build_engine.py --model_type efficientdet_lite4 --fp16- Single Image
# Inference with ONNX on a single image
python onnx_inference_image.py --model_type efficientdet_lite1 --image test.jpg --score_threshold 0.5 --top_k 5
# Inference with TRT Engine on a single image using FP32 precision.
python trt_inference_image.py --model_type efficientdet_lite1 --image test.jpg --score_threshold 0.5 --top_k 5
# Inference with TRT Engine on a single image using FP16 precision.
python trt_inference_image.py --model_type efficientdet_lite1 --image test.jpg --score_threshold 0.5 --top_k 5 --fp16- Webcam
# Inference with ONNX on your webcam
python onnx_inference_webcam.py --model_type efficientdet_lite1 --score_threshold 0.5 --top_k 5
# Inference with TRT Engine on your webcam using FP32 precision.
python trt_inference_webcam.py --model_type efficientdet_lite1 --score_threshold 0.5 --top_k 5
# Inference with TRT Engine on your webcam using FP32 precision.
python trt_inference_webcam.py --model_type efficientdet_lite1 --score_threshold 0.5 --top_k 5 --fp16- EfficientDet Lite 0
- EfficientDet Lite 1
- EfficientDet Lite 2
- EfficientDet Lite 3
- EfficientDet Lite 4
- Model Files: All models are included in this repo but you can still download the pre-trained EfficientDet Lite models from EfficientDetLite Google Drive Repo.
- Place all .engine files under
trt_models/. - Place all the .onnx files under
onnx_models/.
The following table compares the latency (ms) of each EfficientDet Lite model across different backends when running on an NVIDIA RTX 3060.
| Model | ONNX | TensorRT FP32 | TensorRT FP16 |
|---|---|---|---|
| Lite0 | 27 | 27 | 19 |
| Lite1 | 39 | 33 | 23 |
| Lite2 | 54 | 42 | 27 |
| Lite3 | 78 | 54 | 33 |
| Lite4 | 145 | 82 | 46 |
- GPU: NVIDIA RTX 3060
- CUDA Version: 12.2
- TensorRT Version: 10.5.0
Inference using EfficientDetLite 4
The project demonstrates significant improvements in inference speed when utilizing TensorRT, especially with FP16 precision. TensorRT FP16 offers up to 300% speedup compared to ONNX for larger models, enabling real-time object detection applications.
root/
βββ onnx_models/ # ONNX model files
βββ trt_models/ # TensorRT engine files
βββ build_engine.py # Script to build a TRT engine from ONNX models.
βββ trt_engine_builder.py # TRTEngineBuilder class implementation.
βββ trt_executor.py # TRTExecutor class implementation for inference.
βββ trt_config.py # Contains LABELS for classes and helper dictionary to build and run models.
βββ onnx_inference_image.py # Script to run ONNX inference on a single image.
βββ onnx_inference_webcam.py # Script to run ONNX inference on webcam.
βββ trt_inference_image.py # Script to run TRT inference on a single image.
βββ trt_inference_webcam.py # # Script to run ONNX inference on webcam.
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ LICENSE # License information
- build_engine.py: Builds TensorRT engines from ONNX models with specified precision.
- onnx_inference_image.py: Runs inference using ONNX model on a single image.
- onnx_inference_webcam.py: Runs inference using ONNX model on webcam.
- trt_inference_image.py: Runs inference using TRT engines on a single image.
- trt_inference_webcam.py: Runs inference using TRT engines on webcam.
This project is licensed under the Creative Commons Attribution 3.0.
Email: namas.brd@gmail.com
LinkedIn: Namas Bhandari
Feel free to reach out for any questions, suggestions, or collaborations!

