End-to-end pipeline that measures the real-world width and height (mm) of a notebook from a single phone photo.
Four steps:
- Calibrate the iPhone 12 Pro Max main lens using a checkerboard displayed fullscreen on a laptop screen. Mean per-image reprojection error: 0.085 px.
- Undistort every image using the recovered intrinsics before any measurement is taken.
- Segment the notebook with a Mask R-CNN (ResNet-50 FPN, COCO-pretrained, fine-tuned on 48 self-collected images). Best validation mask IoU: 0.96.
- Measure in mm using a credit/ID card (ISO/IEC 7810 ID-1, 85.60 x 53.98 mm) as a known-size reference. The user clicks the 4 card corners; a homography handles camera tilt.
Why Mask R-CNN and not YOLO/Roboflow: the assessment forbids both. Mask R-CNN is well-supported in torchvision with COCO pretraining that transfers cleanly to a small single-class dataset. More in docs/training-report.md.
| result | |
|---|---|
| Calibration mean reprojection error | 0.085 px |
| Mask R-CNN best val IoU | 0.9606 |
| Measurement (overhead, N=19): width MAE / MPE | 23.6 mm / 13.1% |
| Measurement (overhead, N=19): height MAE / MPE | 14.2 mm / 11.4% |
The measurement error is dominated by a systematic over-estimate caused by notebook thickness (~20 mm) vs the card plane. See docs/measurement-report.md for the diagnosis and the fixes that would actually help.
End-to-end result on one image. Notebook mask in green, card box in yellow, predicted dimensions in the top label bar.
The black curved border is from cv2.undistort with alpha=1.0, which retains all source pixels (the curve is the geometric inverse of the lens's barrel distortion). The calibration before/after and training curves are in their respective reports under docs/.
68 raw photos of the same notebook on varied backgrounds, lighting, and angles:
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
# end-to-end demo on one image (you'll click 4 card corners, ENTER)
python inference/demo.py --image dataset/raw/IMG_4326.JPG.jpeg --out docs/figures/demo_output.jpg
See SETUP.md for the full setup, including the Python version note (torch wheels target 3.9-3.12; on 3.13/3.14 you need a separate venv for training).
calibration/ checkerboard generator, calibration script, intrinsics.npz
dataset/ raw + undistorted images, splits, COCO annotations
models/ training script, training_log.csv (best.pt produced by training)
inference/ end-to-end demo
measurement/ pixel -> mm pipeline, validation script, ground_truth.csv, results.csv
notebooks/ Colab notebook used for training on a free GPU
docs/ all reports + figures
docs/architecture.md- end-to-end data flow diagrams (offline setup + online measurement)docs/calibration-report.md- method, K, distortion, reprojection errordocs/dataset-card.md- what was collected, how it was labelled, splitsdocs/training-report.md- model choice, hyperparameters, loss + IoU curvesdocs/measurement-report.md- pipeline, accuracy table, parallax limitationdocs/api.md- public function signatures, inputs/outputs, sample usagedocs/design-decisions.md- choices made, alternatives considered, trade-offsSETUP.md- installation and full run order
- Weights not committed.
best.ptis 176 MB;.gitignoreexcludes*.pt. Re-train in ~3 minutes on a Colab T4 vianotebooks/train_colab.ipynb. - One iPhone, one notebook. The model and the calibration are specific to one device and one notebook style. Re-running on a different phone needs a fresh calibration; a different notebook style needs more labelled data.

