I am Abhudeep Kaur Arora, an engineer and a data learner!
Thanks for visiting my portfolio here on Github!
In my portfolio, you can find my projects and guides on data analytics, machine learning, deep learning and data science!
- Jira, Confluence, Wiki, Asana, Salesforce, PostgreSQL, Tableau, Excel, Jupyter Notebook, C++, Python (NumPy, Pandas, Statsmodels, Scikit-learn), Google Analytics, Axure, Xmind
- Scrum Framework, Agile Methodology, Software as a Service (SaaS), Software Development Life Cycle (SDLC), Product Strategy, User Research, Statistical Analysis and Inference, Machine Learning, Feature Engineering, Data Visualization, Data Wrangling, A/B testing
| Credential Name | Link to Credential | Description |
|---|---|---|
| Caltech (California Institute of Technology) Data Science Bootcamp | Credential | Learnt to make data-driven decisions through this acclaimed bootcamp that delivers a high-engagement learning experience, leveraging Caltech’s academic excellence in data science. Topics included:Data Visualization, Deep Learning, Descriptive Statistics, Ensemble Learning, Exploratory Data Analysis, Inferential Statistics, Model Building and Fine Tuning, Supervised and Unsupervised Learning |
| Tableau Desktop Specialist | Credential | Desktop Specialists are able to connect to, prepare, explore and analyze data, and share their insights |
| Udacity Nanodegree Program- Data Analyst, in collaboration with Kaggle | Credential | Use Python, SQL, and statistics to uncover insights, communicate critical findings, and create data-driven solutions |
| Udacity Nanodegree Program- SQL | Credential | Mastered SQL, the core language for Big Data analysis, and enable insight-driven decision-making and strategy for business. |
| Udacity A/B testing, in collaboration with Google | Non certified course (free course) | This course covered the design and analysis of A/B tests, also known as split tests, which are online experiments used to test potential improvements to a website or mobile application. This course covered how to choose and characterize metrics to evaluate your experiments, how to design an experiment with enough statistical power, how to analyze the results and draw valid conclusions, and how to ensure that the the participants of experiments are adequately protected. |
| PCAP™ – Certified Associate in Python Programming (Exam PCAP-31-0x) | Credential | PCAP™ – Certified Associate in Python Programming certification focuses on the Object-Oriented Programming approach to Python, and shows that the individual is familiar with the more advanced aspects of programming, including the essentials of OOP, the essentials of modules and packages, the exception handling mechanism in OOP, advanced operations on strings, list comprehensions, lambdas, generators, closures, and file processing. |
| PCEP™ – Certified Entry-Level Python Programmer, (Exam PCEP-30-0x) | Credential | PCEP™ – Certified Entry-Level Python Programmer certification shows that the individual is familiar with universal computer programming concepts like data types, containers, functions, conditions, loops, as well as Python programming language syntax, semantics, and the runtime environment. |
| Title | Link |
|---|---|
| Trustworthy online controlled experiments: A practical guide to A/B testing. Kohavi, R., Tang, D. and Xu, Y., 2020, Cambridge University Press. | link |
| Hypothesis Testing: An Intuitive Guide for Making Data Driven Decisions Jim Frost | link |
| Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models Jim Frost | link |
🗺 Hi there! 🙋🏻♀️
Welcome to my Portfolio where I provide a walkthrough to all of my notebooks, data science projects and courses.
Click on the project's title (bold and coloured in Blue) to view my projects! Thank you!
These are a list of notebooks created by me, containing codes to study, revise and/or ramp up under-the-hood knowledge sourced while studying various courses/books/projects:
| Guide Name | Link | Description |
|---|---|---|
| All about NumPy | link | NumPy methods including mathematical, logical, shape manipulation, sorting, selecting, basic linear algebra, basic statistical operations, random simulation and much more. |
| All about Pandas- assess and clean data | link | Step by step guide for for data manipulation and analysis using Pandas. |
| All about Data Visualization | link | Step by step guide using Matplotlib and Seaborn: Distribution Plots:
|
| Statistics and Hypothesis Testing | link |
|
| A/B Testing - solid base | link |
|
| Regression | link | Conversion of Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models by Jim Frost from Minitab to Python notebook |
| Classification | link | Assumptions, sigmoid function, decision boundary, gradient descent and implementation |
| Deep Learning | link | |
| Naive Bayes | link | Maths, implementaion, deployment and optimization |
| Support Vector Machines and Classifiers | link | Maths, implementaion, deployment and optimization |
| SQL | link | Comprehensive sql tutorial :
|
Level: Intermediate SQL
Functions: Aggregations, Joins, CTEs, Window functions (aggregates, ranking, running total, partitioned averages), CASE WHEN statements, subqueries, nested subqueries, DATETIME functions, data type conversion, text and string manipulation
| Project Name | Description | SQL Functions |
|---|---|---|
| 🆔 Udiddit, A Social News Aggregator | Part I: Investigate the existing schema, Part II: Create the DDL for your new schema, Part III: Migrate the provided data | Advanced SQL |
| 🌴 Report for ForestQuery into Global Deforestation | Prepare and disseminate a report that uses complete sentences to help them understand the global deforestation overview with numbers derived from SQL queries | Advanced SQL |
🐍 Skills: Data cleaning, wrangling, visualisation, analysis
Libraries: pandas, numpy, matplotlib, seaborn
| Project Name | Area | Description | Libraries |
|---|---|---|---|
| 📺 TMDb Movie Analysis | Data Wrangling & EDA | Analysing more than 10,000 TMDb movies and getting the answers to - Does having a higher popularity associated with a higher revenue and budget? Does having a higher popularity results in a higher profitability?Which are the top 10 profitable movies? Which are the top 10 Genres? How has profitability moved year on year? | pandas, matplotlib |
| 📲 311-Request-Analysis | Data Wrangling & EDA | Project involved data analysis of service request (311) calls from New York City utilizing data wrangling techniques to understand the pattern in the data and visualize the major types of complaints | pandas, matplotlib |
| 🍷 Wine Quality | Data Wrangling & EDA | A study on red and white wine samples and understanding whether certain types of wine and their qualities (alcohol level, sugar content and acidity level) are associated with higher wine quality. | pandas, matplotlib |
| 🚲 Ford GoBike System Data | EDA | Analysis of this data set which includes information about individual rides made in a bike-sharing system covering the greater San Francisco Bay area. | pandas, matplotlib, seaborn |
| 🐶 WeRateDogs | EDA, Wrangle | Discovered insights by cleaning data, Storing data, analyzing and visualizing data of twitter. | pandas, matplotlib, seaborn |
| Project Name | Description | Key Tools used | Tableau Dashboard |
|---|---|---|---|
| 🏪 Sample Superstore Dashboard | Analysis of sales data to find out highest revenue and profit product categories and top customer segments, visually!
|
|
Link |
| 🛍 Business Metric Dashboard | This is the first stop for leaders to learn about the composition of their workforce and key trends across hiring, headcount, demographic diversity, and attrition. Metrics can be viewed by several different dimensions including Department, Job Level, and Country to enable comparisons and identify hotspots.
|
|
Link |