This project aims to optimize autonomous driving datasets by considering three key features: complexity, quality, and uncertainty. The goal is to create a smaller yet equally effective subset by removing redundant and low-quality 3D point cloud data frames.
- Complexity Analysis: Evaluates the complexity of each data frame, ensuring that the frames with enough instances of information are reserved.
- Quality Assessment: Identifies and filters high-quality data frames, improving the overall reliability of the dataset.
- Uncertainty Quantification: Considers uncertainty factors in the data, removing the frames with abnormally sensing uncertainty.
- Dataset Optimization: Significantly reduces dataset size through intelligent selection while maintaining its effectiveness for autonomous driving tasks.
- Transferability: The optimizing method is transferable to many datasets (supported: nuScenes, SUSCape, Carla-4Scenes, and CADC), and the optimized dataset can be used in multiple tasks such as 3D object detection and 2D object detection.
Table of Contents
As shown in the table below, our approach significantly reduces the size of datasets while retaining most of the valuable information. The imagesets and annotations files of the original and optimized datasets can be downloaded here:
| Dataset | Download | Frame | Car | Truck | Trailer | Bus | Total. |
|---|---|---|---|---|---|---|---|
| nuScenes | original | 28130 | 413318 | 72815 | 20701 | 13163 | 519997 |
| nuScenes | optimized | 21518 (76.5%) | 307059 | 56557 | 16798 | 9305 | 389710 (74.9%) |
| Dataset | Download | Frame | Car | Truck | Van | Bus | Total. |
| SUSCape | original | 14709 | 153114 | 24828 | 15517 | 11269 | 203728 |
| SUSCape | optimized | 11251 (76.5%) | 153114 | 24828 | 15517 | 11269 | 203728 (84.6%) |
| Dataset | Download | Frame | Car | Pedes. | Cyclist | Van | Total. |
| Carla-4Scenes | original | 14782 | 91197 | 40516 | 23282 | 21966 | 176961 |
| Carla-4Scenes | optimized | 12002 (81.2%) | 81193 | 35216 | 20755 | 19852 | 157016 (88.1%) |
| Dataset | Download | Frame | Car | Truck | Pedes. | - | Total. |
| CADC | original | 5600 | 80425 | 4358 | 29347 | - | 114130 |
| CADC | optimized | 3996 (71.4%) | 63069 | 3305 | 21801 | - | 88175 (77.3%) |
- Ubuntu 20.04/22.04
- Python 3.8
- Cuda 11.3/11.6/11.7
- Pytorch 1.12/1.13/2.0
To start from data collecting, follow the full steps. If only the open source dataset is used, skip the two optional steps. But first create environment.
conda create -n OptAdDatasets python=3.8Choose one of the two CARLA versions to install, the official version is recommended.
- Download the official CARLA version, and following the official installation tutorial.
- Download the specified CARLA version. The newest version is here (BaiduCloud | GoogleDrive).
- Clone the collecting tool
mkdir ./collecting_tool & cd ./collecting_tool
git clone https://github.com/Kazawaryu/CARLA_ADA.git- Install requirements
conda activate OptAdDatasets
pip install -r requirements.txt- Install requirements
cd ./optimizing_tool
conda activate OptAdDatasets
pip install -r requirements.txt- Clone the repository
mkdir ./training_tool & cd ./training_tool
git clone https://github.com/open-mmlab/OpenPCDet.git
cd OpenPCDet- Install requirements
conda activate OptAdDatasets
conda install pytorch torchvision -c pytorch # check your version
pip install -r requirements.txt
pip install av2, kornia==0.5.8, open3d, spconv- Compile
python setup.py develop- Clone the repository
mkdir ./training_tool & cd ./training_tool
git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d- Install requirements
pip install -U openmim
mim install mmengine, 'mmcv>=2.0.0rc4', 'mmdet>=3.0.0'
pip install -v -e .- Start CARLA
cd ./collecting_tool
cd $carla_root_directory$
./CarlaUE4.sh- Start recoder
python data_recoder.py
# press ^C in terminal to stop data collecting- Convert raw datas into dataset
python format_helper.py -s $raw_data_directory$- Create imagesets (for SUSCape, Carla-4Scenes, and CADC)
cd ./optimizing_tool
python optimize_imgsets.py --root_path $dataset_root$ --dataset_name $dataset_name$ --save_path $save_dir$- Creare annotations (for nuScenes)
cd ./optimizing_tool
python optimize_imgsets.py --root_path $dataset_root$ --dataset_name $dataset_name$ --save_path $save_dir$
python optimize_annos.py --root $dataset_root$ --save_path $save_dir$To train the models, first clone the repositories from the following urls for different datasets.
| Dataset | nuScenes | SUSCape | Carla-4Scenes | CADC |
|---|---|---|---|---|
| Platform | OpenPCDet | MMDetection3D | OpenPCDet | OpenPCDet |
cd ./training_tool/OpenPCDet
mv $imagesets_path/*$ $./dataset/$your_dataset_name$/imagesets$
cd tools
# single GPU
python train.py --cfg_file $config_file_path$
# multiple GPUs
bash script/dist_train.sh $number_of_gpus$ --cfg_file $config_file_path$cd ./training_tool/mmdetection3d
mv $annotations_path/*$ $./dataset/nuscenes/v1.0-trainval/v1.0-trainval$
cd tools
# single GPU
python train.py --config $config_file_path$
# multiple GPUs
bash script/dist_train.sh $number_of_gpus$ --config $config_file_path$Left: the average training gain (nuScenes detection score per frame) of the retrained perception algorithms by the original and optimized nuScenes, which are tested on the validation set of the original nuScenes. Right: the Pareto front of the retrained perception algorithms by the original dataset (
$D_{S_0}$ ) and the optimized datasets at each layer ($D_{S_1}, D_{S_2}, D_{S_3}$ ) of our method from training time and average training gain per frame. The retrained algorithms by$D_{S_3}$ dominate the others.
The visualization of the performance of classic 3D perception models in autonomous driving, which are Voxel NeXt, TransFusion-L, CenterPoint, and SECOND retrained on the original dataset
$D_{S_0}$ and optimized dataset$D_{S_3}$ , respectively.
@inproceedings{kazawa2024optimizing,
title={Optimizing Autonomous Driving Datasets: Complexity, Quality, Uncertainty},
author={Kazawa, Ryu},
booktitle={To be published},
year={2024}
}- Readme
- Collecting tools
- Optimization tools
- Training tools

