Sub-Millisecond, Microjoule Edge Inference for Indoor Environment Classification via Layer-Wise Mixed-Precision Quantization
Hamza A. Abushahla, Muhammed Noshin, Dr. Mohamed I. AlHajri, and Dr. Nazar T. Ali
This repository contains code and resources for the paper: Sub-Millisecond, Microjoule Edge Inference for Indoor Environment Classification via Layer-Wise Mixed-Precision Quantization.
Figure 1: System overview of the proposed indoor environment identification framework. The pipeline illustrates the complete workflow, including data preprocessing, model training and quantization, and deployment on the MAX78002.
This work presents the first hardware-validated framework that integrates Quantization-Aware Training (QAT) with layer-wise Mixed-Precision Quantization (MPQ) to enable sub-millisecond, microjoule indoor environment identification on the MAX78002 microcontroller. The main contributions are:
-
We evaluate CNN-based indoor environment identification on resource-constrained IoT hardware by deploying QAT-enabled, layer-wise MPQ models across multiple bandwidth settings, showing that MPQ consistently outperforms uniform quantization.
-
We provide hardware-grounded, on-device deployment characterization by separating weight loading from inference, evaluating clocking modes, and quantifying their impact on energyβlatency trade-offs.
-
At 99.22% accuracy, the best MPQ configuration achieves 127.7 Β΅s latency and 27 Β΅J per inference, while reducing model size by 77% and delivering 10% faster inference and 22% lower inference energy than uniform INT8; it also reduces weight-loading time and energy by 85.2% and 56.1%, respectively.
-
Under a relaxed 98% accuracy requirement, compact MPQ configurations achieve 75.5 Β΅s latency and 15.6 Β΅J per inference, with reductions of 87% in model size, 46.7% in inference time, 55% in inference energy, 91.5% in weight-loading time, and 74.8% in weight-loading energy, enabling flexible accuracyβefficiency trade-offs.
This repository is organized as overlays on top of the official Analog Devices ai8x toolchain:
-
envs/max_linux.yml/max_mac.yml: recommended conda environments for reproducibilityrequirements.txt: fallback pip requirements (Python 3.11)
-
training/Overlay files forai8x-training:- dataset (
data/) - dataloader (
datasets/) - model definitions (
models/) - training policies (LR schedule + QAT/MPQ policies) (
policies/) - scripts and sweep drivers (
scripts/,sweeps/)
- dataset (
-
synthesis/Overlay files forai8x-synthesis:- izer configs / network YAMLs
- generation scripts for hardware project generation
-
inference/Example exported projects and ready-to-run MAX78002 deployments:- generated izer project folders (e.g.,
indoor_env_1d_51_q8824/) - example quantized checkpoints (when applicable)
- generated izer project folders (e.g.,
-
results/Raw experiment outputs and processed summaries used in the paper:- CSV logs for simulation sweeps (QAT / PTQ)
- aggregated summary tables (e.g., mean/std across seeds)
-
figs/Figures used in this repo/README
From the repo root:
conda env create -f envs/max_linux.yml
# or
conda env create -f envs/max_mac.yml
conda activate maxCreate a fresh conda env called max (Python 3.11), then install the pip requirements inside it:
conda create -n max python=3.11 -y
conda activate max
python -m pip install --upgrade pip
pip install -r requirements.txtWe recommend a single workspace that contains both toolchains:
mkdir max_workdir && cd max_workdir
git clone --recursive https://github.com/analogdevicesinc/ai8x-training.git
git clone --recursive https://github.com/analogdevicesinc/ai8x-synthesis.gitExpected layout:
max_workdir/
βββ ai8x-training/
βββ ai8x-synthesis/
Copy the contents of this repoβs training/ into your local ai8x-training/ (merge folders; do not nest).
Example mapping (will expand as repo evolves):
Target folder (inside ai8x-training/) |
Copy from (this repo) | Purpose |
|---|---|---|
data/ |
training/data/indoor_environment/ |
Dataset (.mat files) |
datasets/ |
training/datasets/ |
Dataloader(s) |
models/ |
training/models/ |
ai8x model(s) |
policies/ |
training/policies/ |
LR schedule + QAT/MPQ policies |
scripts/ / sweeps/ |
training/scripts/, training/sweeps/ |
training/eval + sweep drivers |
Similarly, merge this repoβs synthesis/ into your local ai8x-synthesis/.
All commands below assume you are inside:
cd max_workdir/ai8x-trainingQAT-enabled run:
python train.py --epochs 10 --batch-size 256 \
--optimizer Adam --lr 0.001 --weight-decay 0.0005 \
--use-bias --deterministic \
--model ai85indoorenvnetv2 --dataset IndoorEnvironment_1D --data data/indoor_environment \
--compress policies/schedule-indoor-env.yaml \
--qat-policy policies/qat_policy_indoor_v2.yaml \
--input-1d-length 101 \
--device MAX78002 --name indoor_runPTQ-only (no QAT):
Set --qat-policy to None (or remove it, depending on your local script conventions).
Outputs (logs directory):
checkpoint.pth.tar,best.pth.tar: float checkpointsqat_checkpoint.pth.tar,qat_best.pth.tar: checkpoints after QAT starts Note: these are not yet βizer-readyβ until quantization/export steps are run by the provided scripts.
MPQ configuration:
Layer-wise bitwidths are controlled in the relevant QAT policy YAML (e.g., qat_policy_indoor_v2.yaml).
Script: train_indoor_1D_mixed_sweep.py
What it does:
- Enumerates all 891 MPQ configs (3^4 over {8,4,2} bits for conv1/conv2/fc1/fc2)
- Sweeps multiple input lengths (Ξ±) and multiple seeds
- Runs full QAT, then quantizes and evaluates each run
- Writes detailed and aggregated CSV summaries
Run:
python train_indoor_1D_mixed_sweep.pyDefault output folder (example):
ai8x_seed_runs_out/
βββ logs_mixed/
βββ checkpoints_mixed/
βββ policies/sweep/
βββ mixed_precision_sweep_results.csv
βββ mixed_precision_sweep_summary.csv
Script: train_indoor_1D_mixed_sweep_ptq.py
Run (example):
python -u train_indoor_1D_mixed_sweep_ptq.py \
--num-seeds 5 \
--start-seed 42 \
--input-lengths 101 \
--epochs 10 \
--z-score 2.0 \
--calib-split trainOutputs (example):
ai8x_ptq_sweep_out/
βββ ptq_sweep_results.csv
βββ ptq_sweep_summary.csv
βββ logs_ptq/
βββ checkpoints_ptq/
Quantized checkpoints follow:
- Uniform:
_q8.pth.tar,_q4.pth.tar,_q2.pth.tar - Mixed precision:
_qmixed.pth.tar
Example:
indoor_mixed_seed_46__L101__8_8_8_8_*_qat_best_q8.pth.tar
Make sure you are using the same Python environment set up for the ai8x toolchain.
Go to ai8x-synthesis:
cd ../ai8x-synthesisEdit the generation script (example):
scripts/gen_indoor_1d.sh
Set inside the script:
LENGTH=101CONFIG="8-8-8-8"(or MPQ config such as8-8-2-4)CHECKPOINT=<path-to-quantized-checkpoint>
Make the script executable (only once):
chmod +x scripts/gen_indoor_1d.shRun the script from the ai8x-synthesis root:
./scripts/gen_indoor_1d.shThe script internally calls ai8xize.py to generate a deployable C/C++ project, e.g.:
python ai8xize.py \
--test-dir "$TARGET" \
--prefix "$PREFIX" \
--checkpoint-file "$CHECKPOINT" \
--config-file networks/indoorenvnet-v2-chw-${LENGTH}.yaml \
--sample-input tests/sample_indoorenvironment_1d_${LENGTH}.npy \
--overwrite --softmax --compact-data --mexpress --max-speed --energyA C/C++ project folder will be created (example):
HW_Evaluation/indoor_env_1d_101_q8888/
This includes:
main.c(entry point)cnn.c,cnn.h(generated network)- Makefile / Eclipse launch files (depending on izer output)
Import the generated project into Eclipse after setting up the Analog Devices MSDK:
For paper-style evaluation, we modify main.c to report:
- inference latency
- energy/power measurements (as used in our evaluation scripts)
This repo includes example exported projects and checkpoints under:
inference/
Example:
inference/indoor_env_1d_51_q8824/inference/indoor_env_1d_91_q8824/inference/indoor_mixed_seed_46__L51__8_8_2_4_indoor_mixed_seed_46__L51__8_8_2_4_qat_best_qmixed.pth.tarindoor_mixed_seed_46__L91__8_8_2_4_indoor_mixed_seed_46__L91__8_8_2_4_qat_best_qmixed.pth.tar
If you use our work for your own research, please cite us with the below:
You can also reach out through email to:
- Hamza Abushahla - b00090279@alumni.aus.edu
- Dr. Mohamed AlHajri - mialhajri@aus.edu
