Data Science Playook

Contact Information

William Ponton: LinkedIn

Overview

This notebook will be a collection of Data Science basics, examples, and best practices for use as a reference guide.

There are five sections of this guide broken down to the basic steps of the Data Analysis process. The first step is related to importing a dataset to your environment to be able to analyze and do the work. Some estimates show that Data Scientists can spend up to 80% of their time cleaning and organizing data for analysis and modeling. The second step is to define best practices for cleaning and organizing, how to handle NULL values, and how to merge and organize messy data. Once the dataset is normalized and cleaned, this guide will detail common statistical methods and define the values needed for visualization and final stats for the Interpretation section. Numerical Analysis is the 'magic' of Data Science, as this step often can expose anomalies and patterns in the data that humans alone might not have been able to interpret. The output of the Numerical Analysis step also powers the Visualizations that will be presented to the stakeholders in the final reporting, and is vital for the subsequent step of Interpretation and Reporting. Finally, the guide covers creating a deliverable to be passed off to other departments. The final result must be understandable by all audiences it is intended for, so knowing the goals of the project up front is imperative for keeping the results in the scope of the audience's understanding of the analysis.

Data Science Steps

0.0 Importing Data
0.1 Cleaning & Organizing
0.2 Numerical Analysis
0.3 Visualizations
0.4 Interpretation & Reporting

Data Science with Python

Python has a rich Data Science functionality that has been motivated by teams of scientists and engineers trying to solve scientific and engineering problems. Python's Object Oriented Design, ease of syntax, and available libraries make it the industry standard for Data Analysis. A 2016 study done by O'Reily shows that Python is now dominant over R throughout the Data Science community, favoring Python 3.6 to the soon to be extinct Python 2.7. I also plan to create a Data Science Playbook for R techniques in the future (I am still learning!).

Python has become the fastest growing programming language of 2019, and continues to remain the industry standard for modeling and analysis in the scientific and engineering industries. The Scientific Python Stack is an array of technologies that make Python so powerful for Data analysis and statistical prediction.

To get everything running in this project, use pip install -r requirements.txt

Project Stack

Language

Python 3.6 (replacing legacy Python 2.7 in 2020)
Cython (a speedy C library for backing up numpy)

Scientific & Numeric Power

SciPy
NumPy
SciKitLearn

Interactive Environment

Anaconda IDE
IPython Notebooks
GitHub (version control)
RMOTR Notebooks

Data Science Libraries

Analysis tools
- NumPy
- Pandas
- Cython
Visualization tools
- Matplotlib
- Seaborn
- Bokeh

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
DS_Playbook		DS_Playbook
app_data		app_data
.gitignore		.gitignore
Data Science with Jupyter.ipynb		Data Science with Jupyter.ipynb
Experiment 8 RC Filters and Oscilloscopes.ipynb		Experiment 8 RC Filters and Oscilloscopes.ipynb
Experiment_3.ipynb		Experiment_3.ipynb
LICENSE		LICENSE
Matplotlib.ipynb		Matplotlib.ipynb
NumPy.ipynb		NumPy.ipynb
Python for Data Analysis.ipynb		Python for Data Analysis.ipynb
README.md		README.md
Seaborn.ipynb		Seaborn.ipynb
ds_playbook.ipynb		ds_playbook.ipynb
high_pass_filter_graph.png		high_pass_filter_graph.png
log_graph.png		log_graph.png
low_pass_filter_graph.png		low_pass_filter_graph.png
requirements.txt		requirements.txt
tension_to_node.png		tension_to_node.png
test.png		test.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Science Playook

Contact Information

Overview

Data Science Steps

Data Science with Python

Project Stack

Language

Scientific & Numeric Power

Interactive Environment

Data Science Libraries

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data Science Playook

Contact Information

Overview

Data Science Steps

Data Science with Python

Project Stack

Language

Scientific & Numeric Power

Interactive Environment

Data Science Libraries

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages