A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.
- Conda
- Cookiecutter Python package: This can be installed with pip by or conda depending on how you manage your Python packages:
These three commands quickly and quietly download the latest 64-bit Windows installer, rename it to a shorter file name, silently install, and then delete the installer:
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe -o miniconda.exe
start /wait "" .\miniconda.exe /S
del miniconda.exeAfter installing, open the Anaconda Prompt (miniconda3) program to use Miniconda3.
These four commands download the latest 64-bit version of the Linux installer, rename it to a shorter file name, silently install, and then delete the installer:
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm ~/miniconda3/miniconda.shAfter installing, add the following line to your .bashrc to initialize conda automatically:
export PATH=~/miniconda3/bin:$PATHThen run:
source ~/.bashrcNow you can use Miniconda3 on your Linux system.
pip install cookiecutteror
conda install -c conda-forge cookiecutterIn a folder where you want your project generated:
cookiecutter https://github.com/kesant/projects_template├── LICENSE
├── README.md <- The top-level README for developers using this project.
├── install.md <- Detailed instructions to set up this project.
├── data
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
│
├── models <- Trained and serialized models, model predictions, or model summaries.
│
├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
│ the creator's initials, and a short `-` delimited description, eg.
│ `1.0-jqp-initial-data-exploration`.
│
├── environment.yml <- The requirements file for reproducing the analysis environment.
├── requirements.txt <- The pip requirements file for reproducing the environment.
│
├── test <- Unit and integration tests for the project.
│ ├── __init__.py
│ └── test_model.py <- Example of a test script.
│
├── .here <- File that will stop the search if none of the other criteria
│ apply when searching head of project.
│
├── setup.py <- Makes project pip installable (pip install -e .)
│ so {{ cookiecutter.project_module_name }} can be imported.
│
└── {{ cookiecutter.project_module_name }} <- Source code for use in this project.
│
├── __init__.py <- Makes {{ cookiecutter.project_module_name }} a Python module.
│
├── config.py <- Store useful variables and configuration.
│
├── dataset.py <- Scripts to download or generate data.
│
├── features.py <- Code to create features for modeling.
│
├── modeling
│ ├── __init__.py
│ ├── predict.py <- Code to run model inference with trained models.
│ └── train.py <- Code to train models.
│
├── utils <- Scripts to help with common tasks.
│ └── paths.py <- Helper functions for relative file referencing across the project.
│
└── plots.py <- Code to create visualizations.
This project is heavily influenced by drivendata's Cookiecutter Data Science, andfanilo's Cookiecutter for Kaggle Conda projects
Other links that helped shape this cookiecutter :