Skip to content
View Abhudeep's full-sized avatar
  • Fremont

Block or report Abhudeep

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Abhudeep/README.md

📚 Hi, and welcome to my GitHub repository! 🙋🏻‍♀️

I am Abhudeep Kaur Arora, an engineer and a data learner!

Thanks for visiting my portfolio here on Github!

📚 Projects & Guides 📚

In my portfolio, you can find my projects and guides on data analytics, machine learning, deep learning and data science!

🛠️ My Tools

  • Jira, Confluence, Wiki, Asana, Salesforce, PostgreSQL, Tableau, Excel, Jupyter Notebook, C++, Python (NumPy, Pandas, Statsmodels, Scikit-learn), Google Analytics, Axure, Xmind

💡 My Skills

  • Scrum Framework, Agile Methodology, Software as a Service (SaaS), Software Development Life Cycle (SDLC), Product Strategy, User Research, Statistical Analysis and Inference, Machine Learning, Feature Engineering, Data Visualization, Data Wrangling, A/B testing

🏆 My Credentials

Credential Name Link to Credential Description
Caltech (California Institute of Technology) Data Science Bootcamp Credential Learnt to make data-driven decisions through this acclaimed bootcamp that delivers a high-engagement learning experience, leveraging Caltech’s academic excellence in data science. Topics included:Data Visualization, Deep Learning, Descriptive Statistics, Ensemble Learning, Exploratory Data Analysis, Inferential Statistics, Model Building and Fine Tuning, Supervised and Unsupervised Learning
Tableau Desktop Specialist Credential Desktop Specialists are able to connect to, prepare, explore and analyze data, and share their insights
Udacity Nanodegree Program- Data Analyst, in collaboration with Kaggle Credential Use Python, SQL, and statistics to uncover insights, communicate critical findings, and create data-driven solutions
Udacity Nanodegree Program- SQL Credential Mastered SQL, the core language for Big Data analysis, and enable insight-driven decision-making and strategy for business.
Udacity A/B testing, in collaboration with Google Non certified course (free course) This course covered the design and analysis of A/B tests, also known as split tests, which are online experiments used to test potential improvements to a website or mobile application. This course covered how to choose and characterize metrics to evaluate your experiments, how to design an experiment with enough statistical power, how to analyze the results and draw valid conclusions, and how to ensure that the the participants of experiments are adequately protected.
PCAP™ – Certified Associate in Python Programming (Exam PCAP-31-0x) Credential PCAP™ – Certified Associate in Python Programming certification focuses on the Object-Oriented Programming approach to Python, and shows that the individual is familiar with the more advanced aspects of programming, including the essentials of OOP, the essentials of modules and packages, the exception handling mechanism in OOP, advanced operations on strings, list comprehensions, lambdas, generators, closures, and file processing.
PCEP™ – Certified Entry-Level Python Programmer, (Exam PCEP-30-0x) Credential PCEP™ – Certified Entry-Level Python Programmer certification shows that the individual is familiar with universal computer programming concepts like data types, containers, functions, conditions, loops, as well as Python programming language syntax, semantics, and the runtime environment.

📝 My Favourite Books

Title Link
Trustworthy online controlled experiments: A practical guide to A/B testing. Kohavi, R., Tang, D. and Xu, Y., 2020, Cambridge University Press. link
Hypothesis Testing: An Intuitive Guide for Making Data Driven Decisions Jim Frost link
Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models Jim Frost link

Portfolio

🗺 Hi there! 🙋🏻‍♀️

Welcome to my Portfolio where I provide a walkthrough to all of my notebooks, data science projects and courses.

📚 Table of Contents

Click on the project's title (bold and coloured in Blue) to view my projects! Thank you! ☺️

📚 Abhu's Guides

These are a list of notebooks created by me, containing codes to study, revise and/or ramp up under-the-hood knowledge sourced while studying various courses/books/projects:

Guide Name Link Description
All about NumPy link NumPy methods including mathematical, logical, shape manipulation, sorting, selecting, basic linear algebra, basic statistical operations, random simulation and much more.
All about Pandas- assess and clean data link Step by step guide for for data manipulation and analysis using Pandas.
All about Data Visualization link Step by step guide using Matplotlib and Seaborn:

Distribution Plots:
  • Displot: The distplot shows the distribution of a univariate set of observations;
  • Pairplot: pairplot will plot pairwise relationships across an entire dataframe (for the numerical columns) and supports a color hue argument (for categorical columns);
  • Rugplot :they just draw a dash mark for every point on a univariate distribution. They are the building block of a KDE plot;
  • KDEplot : kdeplots are Kernel Density Estimation plots. These KDE plots replace every single observation with a Gaussian (Normal) distribution centered around that value
Categorical Data Plots:
  • Factor plot: factorplot is the most general form of a categorical plot. It can take in a kind parameter to adjust the plot type: Now called catplot;
  • Boxplot & Violinplot: boxplots and violinplots are used to shown the distribution of categorical data. A box plot (or box-and-whisker plot) shows the distribution of quantitative data in a way that facilitates comparisons between variables or across levels of a categorical variable. The box shows the quartiles of the dataset while the whiskers extend to show the rest of the distribution, except for points that are determined to be “outliers” using a method that is a function of the inter-quartile range; dodge: dodge : When using hue nesting, setting this to True will separate the strips for different hue levels along the categorical axis. Otherwise, the points for each level will be plotted in one swarm.;
  • Stripplot: The stripplot will draw a scatterplot where one variable is categorical. A strip plot can be drawn on its own, but it is also a good complement to a box or violin plot in cases where you want to show all observations along with some representation of the underlying distribution:;
  • Swarmplot: The swarmplot is similar to stripplot(), but the points are adjusted (only along the categorical axis) so that they don’t overlap. This gives a better representation of the distribution of values, although it does not scale as well to large numbers of observations (both in terms of the ability to show all the points and in terms of the computation needed to arrange them);
  • Barplot : barplot is a general plot that allows you to aggregate the categorical data based off some function, by default the mean ;
  • Countplot: This is essentially the same as barplot except the estimator is explicitly counting the number of occurrences. Which is why we only pass the x value.
Matrix Plots: Matrix plots allow you to plot data as color-encoded matrices and can also be used to indicate clusters within the data ;
  • Heatmap: In order for a heatmap to work properly, your data should already be in a matrix form, the sns.heatmap function basically just colors it in for you;
  • Clustermap:The clustermap uses hierarchal clustering to produce a clustered version of the heatmap
Grids are general types of plots that allow you to map plot types to rows and columns of a grid, this helps you create similar plots separated by features;
  • PairGrid: Pairgrid is a subplot grid for plotting pairwise relationships in a dataset.;
  • Facet Grid: FacetGrid is the general way to create grids of plots based off of a feature
Statistics and Hypothesis Testing link
  • Continuous Data: Z test ; 1-sample t-test ; Means for two groups:2-sample t-test ; Paired t ; Comparing CIs ; Means for at least three groups: One-way ANOVA ; Two Way ANOVA ; Compare specific groups from ANOVA: Post hoc tests; Tukey’s Method ; Dunnett’s Method ;Hsu’s MCB ; One standard deviation to reference: 1 Sample Variance Test ; Standard deviations for two groups :2 Sample Variance Test ; Correlation between two continuous variables: Pearson’s correlation coefficient ; Medians: Nonparametric tests, Mann-Whitney Test
  • Binary Data:One proportion to a target : 1 Proportions Test ; Proportions for two groups: 2 Proportions Test ; Control chart: P control chart
  • Count Data: Do your counts follow the Poisson distribution?: Poisson Goodness-of-Fit Test ; One rate to a target: 1 Sample Poisson Rate Test ; Rates for two groups: 2 Sample Poisson Rate Test
  • Categorical Data: Association between two categorical variables: Chi-Squared Test of Independence ; Do the proportions of values follow a hypothesized distribution?: Chi-Square Goodness-of-Fit test
  • Ordinal and Ranked Data:Medians, Ordinal and Ranked data: Nonparametric tests, Mann-Whitney Test ; Correlation between variables: Spearman’s Rank Correlation ; Kendall’s Rank Correlation ;
  • Various: Bootstrapping Methods
  • Stationary Tests:Augmented Dickey-Fuller ; Kwiatkowski-Phillips-Schmidt-Shin ; Nonparametric Statistical Hypothesis Tests; Mann-Whitney Test:
  • Ordinal and Ranked data: Friedman Test : Data should be ordinal (e.g. the Likert scale) or continuous ; Wilcoxon Signed-Rank Test: interval or ratio data ; Kruskal-Wallis H Test: Ordinal scale, Ratio Scale or Interval scale dependent variables.
  • Theory: Means ; Normal Distribution ; Useful Commands ; Errors ; Power ; Directional vs Non Directional ; Effect Size
A/B Testing - solid base link
  • Step 1: Calculate sample size, decide practical significance /mde
  • Step 2: Choose and characterize metrics for both sanity check and evaluation
  • Step 3: Designing an experiment
  • Step 4: Analyzing Results
  • Simple Sequential A/B Testing
  • Comparing Classic and Sequential A/B Tests
Regression link Conversion of Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models by Jim Frost from Minitab to Python notebook
Classification link Assumptions, sigmoid function, decision boundary, gradient descent and implementation
Deep Learning link ⚠️ In progress
Naive Bayes link Maths, implementaion, deployment and optimization
Support Vector Machines and Classifiers link Maths, implementaion, deployment and optimization
SQL link Comprehensive sql tutorial :
  • Basic SQL: Introduction to SQL; SELECT FROM statement; WHERE statement; ORDER BY statement, LIMIT statement, DISTINCT
  • LOGICAL and COMPARISON Operators
  • Aggregates: Aggregate Functions (COUNT, SUM, MIN/MAX, AVG); GROUP BY clause; HAVING clause
  • Conditional Expressions: CASE WHEN, COALESCE, IFNULL
  • JOINS and UNIONS
  • Subqueries and Common Table Expressions
  • String Manipulations
  • Date-time manipulation: EXTRACT, DATE_ADD(), DATE_SUB(), DATE_DIFF()
  • Windows Functions: ROW_NUMBER(), RANK(), DENSE_RANK(), LAG, LEAD, SUM, COUNT, AVG

SQL

Level: Intermediate SQL

Functions: Aggregations, Joins, CTEs, Window functions (aggregates, ranking, running total, partitioned averages), CASE WHEN statements, subqueries, nested subqueries, DATETIME functions, data type conversion, text and string manipulation

Project Name Description SQL Functions
🆔 Udiddit, A Social News Aggregator Part I: Investigate the existing schema, Part II: Create the DDL for your new schema, Part III: Migrate the provided data Advanced SQL
🌴 Report for ForestQuery into Global Deforestation Prepare and disseminate a report that uses complete sentences to help them understand the global deforestation overview with numbers derived from SQL queries Advanced SQL

Python

🐍 Skills: Data cleaning, wrangling, visualisation, analysis

Libraries: pandas, numpy, matplotlib, seaborn

Project Name Area Description Libraries
📺 TMDb Movie Analysis Data Wrangling & EDA Analysing more than 10,000 TMDb movies and getting the answers to - Does having a higher popularity associated with a higher revenue and budget? Does having a higher popularity results in a higher profitability?Which are the top 10 profitable movies? Which are the top 10 Genres? How has profitability moved year on year? pandas, matplotlib
📲 311-Request-Analysis Data Wrangling & EDA Project involved data analysis of service request (311) calls from New York City utilizing data wrangling techniques to understand the pattern in the data and visualize the major types of complaints pandas, matplotlib
🍷 Wine Quality Data Wrangling & EDA A study on red and white wine samples and understanding whether certain types of wine and their qualities (alcohol level, sugar content and acidity level) are associated with higher wine quality. pandas, matplotlib
🚲 Ford GoBike System Data EDA Analysis of this data set which includes information about individual rides made in a bike-sharing system covering the greater San Francisco Bay area. pandas, matplotlib, seaborn
🐶 WeRateDogs EDA, Wrangle Discovered insights by cleaning data, Storing data, analyzing and visualizing data of twitter. pandas, matplotlib, seaborn

Tableau

Project Name Description Key Tools used Tableau Dashboard
🏪 Sample Superstore Dashboard Analysis of sales data to find out highest revenue and profit product categories and top customer segments, visually!
  • Answers Key Business Questions:
    • Which geographies have the most customers?
    • Which customer segments have the most customers, and how has that changed YoY?
    • Which category have the most customers, and how has that changed YoY?
    • How does average revenue compare to the average number of products ordered by customer segment?
  • Structure data by using groups, bins, and hierarchies, and apply sorting and filtering techniques to reveal additional insights.
  • Use calculations, LOD expressions, and table calculations to create new views and insights
  • Create parameters and user controls, and show data trends and forecasts, as well as data distributions
Link
🛍 Business Metric Dashboard This is the first stop for leaders to learn about the composition of their workforce and key trends across hiring, headcount, demographic diversity, and attrition. Metrics can be viewed by several different dimensions including Department, Job Level, and Country to enable comparisons and identify hotspots.
  • Answers Key Business Questions:
    • How are we progressing in representation of Women, globally, and Underrepresented Minorities (URMs), within the U.S.?
    • What is our current employee attrition, how is it trending, and which departments are experiencing the highest rates of attrition?
    • How much has our organization grown since the prior fiscal year?
    • In which departments are we hiring the most employees?
  • Swapping multiple worksheets using Toggle Functionality
  • Map Layers
  • Sort, filter, and group data
  • Build a range of essential chart types for analysis
Link

Pinned Loading

  1. Abhu-s-Guides Abhu-s-Guides Public

    These are a list of notebooks created by me, containing codes to study, revise and/or ramp up under-the-hood knowledge sourced while studying various courses/books/projects

    Jupyter Notebook

  2. Udacity-Data-Analyst-Nanodegree Udacity-Data-Analyst-Nanodegree Public

    Jupyter Notebook

  3. Report-for-ForestQuery-into-Global-Deforestation Report-for-ForestQuery-into-Global-Deforestation Public

  4. Udiddit-a-social-news-aggregator Udiddit-a-social-news-aggregator Public

  5. Dashboard Dashboard Public

  6. Sentiment-Analysis Sentiment-Analysis Public

    Jupyter Notebook