This repository presents my solution for the Model-Centric Track of the Wake Vision Challenge, where I designed an efficient and compact model for human presence detection in images.
My solution is based on a structurally pruned version of MobileNetV2, optimized to minimize Multiply-Accumulate Operations (MACs) and reduce the number of parameters. The pruning methodology follows the approach introduced in our recently accepted paper at the IEEE International Conference on Communications (IEEE ICC) (see figure below).
Since our pruning framework was originally developed for PyTorch, but this challenge required a TensorFlow implementation, I first applied our pruning algorithm to MobileNetV2 in PyTorch. After obtaining the pruned model, I manually reconstructed its TensorFlow counterpart to ensure compatibility with the competition’s pipeline.
The algorithm prunes each block of layers to its maximum extent, then measures the corresponding reduction in MACs and parameters. MobileNetV2 blocks consist of inverted residual structures with depthwise and pointwise convolutions. To achieve aggressive pruning, we retain only a single channel per layer within each block.
This process provides an estimated importance score for each block, which determines a unique pruning ratio per block. The final model undergoes non-uniform structured pruning, ensuring that critical layers retain more parameters while others are pruned more aggressively.
MobileNetV2_0.25 was already a strong baseline for this task, featuring a uniform 25% channel reduction across all layers. However, I believed that some layers were more critical than others, requiring more than 25% retention. To address this, I applied non-uniform structured pruning, prioritizing essential layers while significantly reducing the model size.
To further reduce MACs, I downsampled the input size from the standard (224, 224, 3) to (80, 80, 3), enhancing efficiency without compromising performance.
| Flash [B] | RAM [B] | MACs | Deployability | Test Acc. | Norm. Test Acc. | Score |
|---|---|---|---|---|---|---|
| 55392 | 61968 | 3887331 | 0.8 | 0.75 | 0.94 | 0.78 |
This solution achieved 4th place in the Wake Vision Challenge. More details about the challenge can be found here.
After designing an efficient and high-performing model, I deployed it on the OpenMV H7 microcontroller board (GitHub repo).
To facilitate deployment, I used the Edge Impulse Python SDK, which streamlined the process of converting and flashing the model onto the board.
For visual feedback, I utilized the onboard LED:
- Green indicates human presence detected
- Red indicates no presence detected
As demonstrated below, the LED turns green when I am visible in the frame:
And it turns red when my body is obscured or when there is no person in the frame:
Welcome to the Model-Centric Track of the Wake Vision Challenge! 🎉
This track challenges you to push the boundaries of tiny computer vision by designing innovative model architectures for the newly released Wake Vision Dataset.
🔗 Learn More: Wake Vision Challenge Details
Participants are invited to:
- Design novel model architectures to achieve high accuracy.
- Optimize for resource efficiency (e.g., memory, inference time).
- Evaluate models on the public test set of the Wake Vision dataset.
You can modify the model architecture freely, but the dataset must remain unchanged. 🛠️
First, install Docker on your machine:
Run the following command inside the directory where you cloned this repository:
sudo docker run -it --rm -v $PWD:/tmp -w /tmp andregara/wake_vision_challenge:cpu python model_centric_track.py- This trains the ColabNAS model, a state-of-the-art person detection model, on the Wake Vision dataset.
- Modify the
model_centric_track.pyscript to propose your own architecture.
💡 Note: The first execution may take several hours as it downloads the full dataset (~365 GB).
- Install the NVIDIA Container Toolkit.
- Verify your GPU drivers.
Run the following command inside the directory where you cloned this repository:
sudo docker run --gpus all -it --rm -v $PWD:/tmp -w /tmp andregara/wake_vision_challenge:gpu python model_centric_track.py- This trains the ColabNAS model on the Wake Vision dataset.
- Modify the
model_centric_track.pyscript to design your own model architecture.
💡 Note: The first execution may take several hours as it downloads the full dataset (~365 GB).
- Focus on Model Innovation: Experiment with architecture design, layer configurations, and optimization techniques.
- Stay Efficient: Resource usage is critical—consider model size, inference time, and memory usage.
- Collaborate: Join the community discussions on Discord to exchange ideas and insights!
Have questions or need help? Reach out on Discord.
🌟 Happy Innovating and Good Luck! 🌟






