A complete PyTorch implementation of PointNet for 3D indoor scene semantic segmentation using the Stanford 3D Indoor Scene Dataset (S3DIS). This project implements the architecture from scratch based on the original research paper by Qi et al.
This implementation focuses on scene semantic segmentation, classifying every point in room-scale 3D point clouds into semantic categories. The model processes entire indoor scenes and assigns semantic labels to each point, enabling detailed understanding of 3D indoor environments.
- STN3d: 3D Spatial Transformer Network for input transformation
- STNkd: k-dimensional Spatial Transformer Network for feature alignment
- PointNetFeatureExtractor: Main feature extraction backbone
- PointNetSegmentation: Complete segmentation model with classification head
- Input transformation networks for rotation invariance
- Optional feature transformation for better alignment
- Point-wise classification for semantic segmentation
- Regularization loss for transformation matrices
- π· Support for 13 semantic classes from S3DIS
S3DIS (Stanford 3D Indoor Scene Dataset)
- 6 indoor areas with 271 rooms
- 13 semantic classes:
ceiling,floor,wall,beam,column,window,door,chair,table,bookcase,sofa,board,clutter - Point clouds with RGB information
- Instance and semantic annotations
pointnet-s3dis/
βββ src/
β βββ models/
β β βββ __init__.py
β β βββ pointnet.py # Core PointNet architecture
β β βββ transforms.py # Spatial transformer networks
β βββ data/
β β βββ __init__.py
β β βββ dataset.py # S3DIS dataset loader
β β βββ preprocessing.py # Data preprocessing utilities
β βββ π utils/
β β βββ __init__.py
β β βββ metrics.py # Evaluation metrics
β β βββ visualization.py # Visualization utilities
β β βββ training.py # Training utilities
β βββ train.py # Main training script
βββ notebooks/
β βββ pointnet_implementation.ipynb
βββ configs/
β βββ config.yaml
βββ requirements.txt
βββ README.md
βββ .gitignore
git clone https://github.com/yourusername/pointnet-s3dis.git
cd pointnet-s3dis
pip install -r requirements.txtpython src/data/preprocessing.py# Default training
python src/train.py
# Custom parameters
python src/train.py --batch_size 16 --num_points 4096 --epochs 100 --test_area 5python src/evaluate.py --model_path checkpoints/best_model.pth --test_area 5python src/visualize.py --model_path checkpoints/best_model.pth --num_samples 5| Metric | Value | Status |
|---|---|---|
| Final Validation Accuracy | 67.45% | β Good |
| Best Mean IoU | 36.42% | β Solid |
| Final Mean IoU | 31.41% | β Reasonable |
| Training Epochs | 100 | β±οΈ Complete |
| Class | IoU | Performance | Analysis |
|---|---|---|---|
| Floor | 89.03% | Excellent | Best performing - large planar surfaces |
| Ceiling | 83.43% | Excellent | Strong geometric consistency |
| Wall | 54.12% | Good | Solid performance with room for improvement |
| Bookcase | 41.17% | Moderate | Complex furniture structure |
| Table | 35.24% | Moderate | Shape variation challenges |
| Chair | 30.61% | Moderate | High variability and occlusion |
| Door | 26.53% | Moderate | Confusion with walls |
| Window | 23.24% | Moderate | Embedded in walls |
| Clutter | 16.51% | Poor | Highly variable category |
| Board | 5.97% | Very Poor | Small objects, scale issues |
| Column | 2.47% | Very Poor | Thin structures, limited examples |
| Beam | 0.00% | Failed | Extremely sparse in dataset |
| Sofa | 0.00% | Failed | High variation, dataset imbalance |
- Input: Point clouds with XYZ coordinates (N Γ 3)
- Feature Extraction: Shared MLPs with batch normalization
- Spatial Invariance: Transformer networks for geometric robustness
- Permutation Invariance: Global max pooling
- Output: Point-wise classification head
- Loss: Cross-entropy with feature transformation regularization
- Optimizer: Adam with learning rate scheduling
- Split: Area-based (Area 5 for testing)
- Augmentation: Point sampling and normalization
Training:
batch_size: 16
num_points: 4096
epochs: 100
learning_rate: 0.001
weight_decay: 1e-4
Model:
num_classes: 13
feature_transform: true
Data:
test_area: 5The project includes comprehensive visualization capabilities:
- RGB point cloud visualization
- Semantic segmentation results
- Confusion matrices
- Training curve plots
- Per-class performance analysis
- PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
- Original PointNet Implementation
- S3DIS Dataset
If you use this implementation in your research, please cite:
@article{qi2017pointnet,
title={PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation},
author={Qi, Charles R and Su, Hao and Mo, Kaichun and Guibas, Leonidas J},
journal={arXiv preprint arXiv:1612.00593},
year={2017}
}This project is licensed under the MIT License - see the LICENSE file for details.
- Original PointNet authors for the groundbreaking architecture
- Stanford University for the S3DIS dataset
- PyTorch team for the deep learning framework
β Star this repo if you find it useful! β