First Ironhack Project
Learning objective: Create a Pipeline and get some hands on experience on working with web scraping through Python: getting, transforming and exporting data.
The goal of this project is to create a pipeline to analyze the data provided on the forbes 2018 list.
Input
- A country -- The argparse method has been used to enter variables from the console.
- Example: python main.py -c "Spain"
Output
-
A PDF with the following plots:
- GDP per capita of the selected country compared to the world average GDP per capita.
- The gender distribution of the richest people in the selected country.
- The age distribution of the richest people in the selected country.
- The distribution by sector of the richest people in the selected country.
- The distribution by worth amount of the richest people in the selected country.
- The Top 5 richest people in the selected country.
- Database provided in class with data from the 2018 Forbes List.
- The GDP per capita data for each country obtained through web scraping (https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)_per_capita)
1. Acquisition: Import data from the database and web scraping.
2. Wrangling: Data cleaning - Change the data type of the numeric columns (from object to integer or float), Standardize the data format of each column (ex. gender: F, Female; or age: 27 years old, 1991), etc.
3. Analysis and Reporting: Analyse the data and create the plots and PDF with the analised data.