Skip to content

Kazawaryu/OptAdDatasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 

Repository files navigation

Optimizing Autonomous Driving Datasets: Complexity, Quality, Uncertainty

This project aims to optimize autonomous driving datasets by considering three key features: complexity, quality, and uncertainty. The goal is to create a smaller yet equally effective subset by removing redundant and low-quality 3D point cloud data frames.

Key Features

  • Complexity Analysis: Evaluates the complexity of each data frame, ensuring that the frames with enough instances of information are reserved.
  • Quality Assessment: Identifies and filters high-quality data frames, improving the overall reliability of the dataset.
  • Uncertainty Quantification: Considers uncertainty factors in the data, removing the frames with abnormally sensing uncertainty.
  • Dataset Optimization: Significantly reduces dataset size through intelligent selection while maintaining its effectiveness for autonomous driving tasks.
  • Transferability: The optimizing method is transferable to many datasets (supported: nuScenes, SUSCape, Carla-4Scenes, and CADC), and the optimized dataset can be used in multiple tasks such as 3D object detection and 2D object detection.
Table of Contents
  1. Optimization Results
  2. Getting Started
  3. Usage
  4. Result Visualization
  5. Contribution
  6. License
  7. Contact
  8. Acknowledgments
  9. Citation
  10. Todo List

Optimization Results

As shown in the table below, our approach significantly reduces the size of datasets while retaining most of the valuable information. The imagesets and annotations files of the original and optimized datasets can be downloaded here:

Dataset Download Frame Car Truck Trailer Bus Total.
nuScenes original 28130 413318 72815 20701 13163 519997
nuScenes optimized 21518 (76.5%) 307059 56557 16798 9305 389710 (74.9%)
Dataset Download Frame Car Truck Van Bus Total.
SUSCape original 14709 153114 24828 15517 11269 203728
SUSCape optimized 11251 (76.5%) 153114 24828 15517 11269 203728 (84.6%)
Dataset Download Frame Car Pedes. Cyclist Van Total.
Carla-4Scenes original 14782 91197 40516 23282 21966 176961
Carla-4Scenes optimized 12002 (81.2%) 81193 35216 20755 19852 157016 (88.1%)
Dataset Download Frame Car Truck Pedes. - Total.
CADC original 5600 80425 4358 29347 - 114130
CADC optimized 3996 (71.4%) 63069 3305 21801 - 88175 (77.3%)

Getting Started

Prerequisites

  • Ubuntu 20.04/22.04
  • Python 3.8
  • Cuda 11.3/11.6/11.7
  • Pytorch 1.12/1.13/2.0

Installation

To start from data collecting, follow the full steps. If only the open source dataset is used, skip the two optional steps. But first create environment.

conda create -n OptAdDatasets python=3.8

CARLA Simulator (optional)

Choose one of the two CARLA versions to install, the official version is recommended.

  1. Download the official CARLA version, and following the official installation tutorial.
  2. Download the specified CARLA version. The newest version is here (BaiduCloud | GoogleDrive).

Active Data Collecting Tool (optional)

  1. Clone the collecting tool
mkdir ./collecting_tool & cd ./collecting_tool
git clone https://github.com/Kazawaryu/CARLA_ADA.git
  1. Install requirements
conda activate OptAdDatasets
pip install -r requirements.txt

Data optimization Tool

  1. Install requirements
cd ./optimizing_tool
conda activate OptAdDatasets
pip install -r requirements.txt

Training Platform

OpenPCDet
  1. Clone the repository
mkdir ./training_tool & cd ./training_tool
git clone https://github.com/open-mmlab/OpenPCDet.git
cd OpenPCDet
  1. Install requirements
conda activate OptAdDatasets
conda install pytorch torchvision -c pytorch # check your version
pip install -r requirements.txt
pip install av2, kornia==0.5.8, open3d, spconv
  1. Compile
python setup.py develop
MMDetection3D
  1. Clone the repository
mkdir ./training_tool & cd ./training_tool
git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d
  1. Install requirements
pip install -U openmim
mim install mmengine, 'mmcv>=2.0.0rc4', 'mmdet>=3.0.0'
pip install -v -e .

Usage

Collecting Datasets form CARLA

  1. Start CARLA
cd ./collecting_tool
cd $carla_root_directory$
./CarlaUE4.sh
  1. Start recoder
python data_recoder.py
# press ^C in terminal to stop data collecting
  1. Convert raw datas into dataset
python format_helper.py -s $raw_data_directory$

Optimizing Datasets

  • Create imagesets (for SUSCape, Carla-4Scenes, and CADC)
cd ./optimizing_tool
python optimize_imgsets.py --root_path $dataset_root$ --dataset_name $dataset_name$ --save_path $save_dir$
  • Creare annotations (for nuScenes)
cd ./optimizing_tool
python optimize_imgsets.py --root_path $dataset_root$ --dataset_name $dataset_name$ --save_path $save_dir$
python optimize_annos.py --root $dataset_root$ --save_path $save_dir$

Traning Models

To train the models, first clone the repositories from the following urls for different datasets.

Dataset nuScenes SUSCape Carla-4Scenes CADC
Platform OpenPCDet MMDetection3D OpenPCDet OpenPCDet

OpenPCDet

cd ./training_tool/OpenPCDet
mv $imagesets_path/*$ $./dataset/$your_dataset_name$/imagesets$ 
cd tools
# single GPU
python train.py --cfg_file $config_file_path$
# multiple GPUs
bash script/dist_train.sh $number_of_gpus$ --cfg_file $config_file_path$

MMDetection3D

cd ./training_tool/mmdetection3d
mv $annotations_path/*$ $./dataset/nuscenes/v1.0-trainval/v1.0-trainval$ 
cd tools
# single GPU
python train.py --config $config_file_path$
# multiple GPUs
bash script/dist_train.sh $number_of_gpus$ --config $config_file_path$

Result Visualization

Left: the average training gain (nuScenes detection score per frame) of the retrained perception algorithms by the original and optimized nuScenes, which are tested on the validation set of the original nuScenes. Right: the Pareto front of the retrained perception algorithms by the original dataset ($D_{S_0}$) and the optimized datasets at each layer ($D_{S_1}, D_{S_2}, D_{S_3}$) of our method from training time and average training gain per frame. The retrained algorithms by $D_{S_3}$ dominate the others.

The visualization of the performance of classic 3D perception models in autonomous driving, which are Voxel NeXt, TransFusion-L, CenterPoint, and SECOND retrained on the original dataset $D_{S_0}$ and optimized dataset $D_{S_3}$, respectively.

Contribution

Acknowledgments

Citation

@inproceedings{kazawa2024optimizing,
  title={Optimizing Autonomous Driving Datasets: Complexity, Quality, Uncertainty},
  author={Kazawa, Ryu},
  booktitle={To be published},
  year={2024}
}

Todo List

  • Readme
  • Collecting tools
  • Optimization tools
  • Training tools

About

This project aims to optimize autonomous driving datasets by considering three key features: complexity, quality, and uncertainty. The goal is to create a smaller yet equally effective subset by removing redundant and low-quality 3D point cloud data frames.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors