Optimizing Autonomous Driving Datasets: Complexity, Quality, Uncertainty

This project aims to optimize autonomous driving datasets by considering three key features: complexity, quality, and uncertainty. The goal is to create a smaller yet equally effective subset by removing redundant and low-quality 3D point cloud data frames.

Key Features

Complexity Analysis: Evaluates the complexity of each data frame, ensuring that the frames with enough instances of information are reserved.
Quality Assessment: Identifies and filters high-quality data frames, improving the overall reliability of the dataset.
Uncertainty Quantification: Considers uncertainty factors in the data, removing the frames with abnormally sensing uncertainty.
Dataset Optimization: Significantly reduces dataset size through intelligent selection while maintaining its effectiveness for autonomous driving tasks.
Transferability: The optimizing method is transferable to many datasets (supported: nuScenes, SUSCape, Carla-4Scenes, and CADC), and the optimized dataset can be used in multiple tasks such as 3D object detection and 2D object detection.

Table of Contents

Optimization Results
Getting Started
- Prerequisites
- Installation
Usage
Result Visualization
Contribution
License
Contact
Acknowledgments
Citation
Todo List

Optimization Results

As shown in the table below, our approach significantly reduces the size of datasets while retaining most of the valuable information. The imagesets and annotations files of the original and optimized datasets can be downloaded here:

Dataset	Download	Frame	Car	Truck	Trailer	Bus	Total.
nuScenes	original	28130	413318	72815	20701	13163	519997
nuScenes	optimized	21518 (76.5%)	307059	56557	16798	9305	389710 (74.9%)
Dataset	Download	Frame	Car	Truck	Van	Bus	Total.
SUSCape	original	14709	153114	24828	15517	11269	203728
SUSCape	optimized	11251 (76.5%)	153114	24828	15517	11269	203728 (84.6%)
Dataset	Download	Frame	Car	Pedes.	Cyclist	Van	Total.
Carla-4Scenes	original	14782	91197	40516	23282	21966	176961
Carla-4Scenes	optimized	12002 (81.2%)	81193	35216	20755	19852	157016 (88.1%)
Dataset	Download	Frame	Car	Truck	Pedes.	-	Total.
CADC	original	5600	80425	4358	29347	-	114130
CADC	optimized	3996 (71.4%)	63069	3305	21801	-	88175 (77.3%)

Getting Started

Prerequisites

Ubuntu 20.04/22.04
Python 3.8
Cuda 11.3/11.6/11.7
Pytorch 1.12/1.13/2.0

Installation

To start from data collecting, follow the full steps. If only the open source dataset is used, skip the two optional steps. But first create environment.

conda create -n OptAdDatasets python=3.8

CARLA Simulator (optional)

Choose one of the two CARLA versions to install, the official version is recommended.

Download the official CARLA version, and following the official installation tutorial.
Download the specified CARLA version. The newest version is here (BaiduCloud | GoogleDrive).

Active Data Collecting Tool (optional)

Clone the collecting tool

mkdir ./collecting_tool & cd ./collecting_tool
git clone https://github.com/Kazawaryu/CARLA_ADA.git

Install requirements

conda activate OptAdDatasets
pip install -r requirements.txt

Data optimization Tool

Install requirements

cd ./optimizing_tool
conda activate OptAdDatasets
pip install -r requirements.txt

Training Platform

OpenPCDet

Clone the repository

mkdir ./training_tool & cd ./training_tool
git clone https://github.com/open-mmlab/OpenPCDet.git
cd OpenPCDet

Install requirements

conda activate OptAdDatasets
conda install pytorch torchvision -c pytorch # check your version
pip install -r requirements.txt
pip install av2, kornia==0.5.8, open3d, spconv

Compile

python setup.py develop

MMDetection3D

Clone the repository

mkdir ./training_tool & cd ./training_tool
git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d

Install requirements

pip install -U openmim
mim install mmengine, 'mmcv>=2.0.0rc4', 'mmdet>=3.0.0'
pip install -v -e .

Usage

Collecting Datasets form CARLA

Start CARLA

cd ./collecting_tool
cd $carla_root_directory$
./CarlaUE4.sh

Start recoder

python data_recoder.py
# press ^C in terminal to stop data collecting

Convert raw datas into dataset

python format_helper.py -s $raw_data_directory$

Optimizing Datasets

Create imagesets (for SUSCape, Carla-4Scenes, and CADC)

cd ./optimizing_tool
python optimize_imgsets.py --root_path $dataset_root$ --dataset_name $dataset_name$ --save_path $save_dir$

Creare annotations (for nuScenes)

cd ./optimizing_tool
python optimize_imgsets.py --root_path $dataset_root$ --dataset_name $dataset_name$ --save_path $save_dir$
python optimize_annos.py --root $dataset_root$ --save_path $save_dir$

Traning Models

To train the models, first clone the repositories from the following urls for different datasets.

Dataset	nuScenes	SUSCape	Carla-4Scenes	CADC
Platform	OpenPCDet	MMDetection3D	OpenPCDet	OpenPCDet

OpenPCDet

cd ./training_tool/OpenPCDet
mv $imagesets_path/*$ $./dataset/$your_dataset_name$/imagesets$ 
cd tools
# single GPU
python train.py --cfg_file $config_file_path$
# multiple GPUs
bash script/dist_train.sh $number_of_gpus$ --cfg_file $config_file_path$

MMDetection3D

cd ./training_tool/mmdetection3d
mv $annotations_path/*$ $./dataset/nuscenes/v1.0-trainval/v1.0-trainval$ 
cd tools
# single GPU
python train.py --config $config_file_path$
# multiple GPUs
bash script/dist_train.sh $number_of_gpus$ --config $config_file_path$

Result Visualization

Left: the average training gain (nuScenes detection score per frame) of the retrained perception algorithms by the original and optimized nuScenes, which are tested on the validation set of the original nuScenes. Right: the Pareto front of the retrained perception algorithms by the original dataset ($D_{S_0}$) and the optimized datasets at each layer ($D_{S_1}, D_{S_2}, D_{S_3}$) of our method from training time and average training gain per frame. The retrained algorithms by $D_{S_3}$ dominate the others.

The visualization of the performance of classic 3D perception models in autonomous driving, which are Voxel NeXt, TransFusion-L, CenterPoint, and SECOND retrained on the original dataset $D_{S_0}$ and optimized dataset $D_{S_3}$, respectively.

Contribution

Acknowledgments

Citation

@inproceedings{kazawa2024optimizing,
  title={Optimizing Autonomous Driving Datasets: Complexity, Quality, Uncertainty},
  author={Kazawa, Ryu},
  booktitle={To be published},
  year={2024}
}

Todo List

Readme
Collecting tools
Optimization tools
Training tools

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Docs/figures		Docs/figures
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Optimizing Autonomous Driving Datasets: Complexity, Quality, Uncertainty

Key Features

Optimization Results

Getting Started

Prerequisites

Installation

CARLA Simulator (optional)

Active Data Collecting Tool (optional)

Data optimization Tool

Training Platform

OpenPCDet

MMDetection3D

Usage

Collecting Datasets form CARLA

Optimizing Datasets

Traning Models

OpenPCDet

MMDetection3D

Result Visualization

Contribution

Acknowledgments

Citation

Todo List

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Optimizing Autonomous Driving Datasets: Complexity, Quality, Uncertainty

Key Features

Optimization Results

Getting Started

Prerequisites

Installation

CARLA Simulator (optional)

Active Data Collecting Tool (optional)

Data optimization Tool

Training Platform

OpenPCDet

MMDetection3D

Usage

Collecting Datasets form CARLA

Optimizing Datasets

Traning Models

OpenPCDet

MMDetection3D

Result Visualization

Contribution

Acknowledgments

Citation

Todo List

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Packages