Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
133 changes: 133 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@

# Created by https://www.gitignore.io/api/macos,pycharm+all,jupyternotebooks
# Edit at https://www.gitignore.io/?templates=macos,pycharm+all,jupyternotebooks

### JupyterNotebooks ###
# gitignore template for Jupyter Notebooks
# website: http://jupyter.org/

.ipynb_checkpoints
*/.ipynb_checkpoints/*

# IPython
profile_default/
ipython_config.py

# Remove previous ipynb_checkpoints
# git rm -r .ipynb_checkpoints/

### macOS ###
# General
.DS_Store
.AppleDouble
.LSOverride

# Icon must end with two \r
Icon

# Thumbnails
._*

# Files that might appear in the root of a volume
.DocumentRevisions-V100
.fseventsd
.Spotlight-V100
.TemporaryItems
.Trashes
.VolumeIcon.icns
.com.apple.timemachine.donotpresent

# Directories potentially created on remote AFP share
.AppleDB
.AppleDesktop
Network Trash Folder
Temporary Items
.apdisk

### PyCharm+all ###
# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio and WebStorm
# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839

# User-specific stuff
.idea/**/workspace.xml
.idea/**/tasks.xml
.idea/**/usage.statistics.xml
.idea/**/dictionaries
.idea/**/shelf

# Generated files
.idea/**/contentModel.xml

# Sensitive or high-churn files
.idea/**/dataSources/
.idea/**/dataSources.ids
.idea/**/dataSources.local.xml
.idea/**/sqlDataSources.xml
.idea/**/dynamic.xml
.idea/**/uiDesigner.xml
.idea/**/dbnavigator.xml

# Gradle
.idea/**/gradle.xml
.idea/**/libraries

# Gradle and Maven with auto-import
# When using Gradle or Maven with auto-import, you should exclude module files,
# since they will be recreated, and may cause churn. Uncomment if using
# auto-import.
# .idea/modules.xml
# .idea/*.iml
# .idea/modules
# *.iml
# *.ipr

# CMake
cmake-build-*/

# Mongo Explorer plugin
.idea/**/mongoSettings.xml

# File-based project format
*.iws

# IntelliJ
out/

# mpeltonen/sbt-idea plugin
.idea_modules/

# JIRA plugin
atlassian-ide-plugin.xml

# Cursive Clojure plugin
.idea/replstate.xml

# Crashlytics plugin (for Android Studio and IntelliJ)
com_crashlytics_export_strings.xml
crashlytics.properties
crashlytics-build.properties
fabric.properties

# Editor-based Rest Client
.idea/httpRequests

# Android studio 3.1+ serialized cache file
.idea/caches/build_file_checksums.ser

### PyCharm+all Patch ###
# Ignores the whole .idea folder and all .iml files
# See https://github.com/joeblau/gitignore.io/issues/186 and https://github.com/joeblau/gitignore.io/issues/360

.idea/

# Reason: https://github.com/joeblau/gitignore.io/issues/186#issuecomment-249601023

*.iml
modules.xml
.idea/misc.xml
*.ipr

# Sonarlint plugin
.idea/sonarlint

# End of https://www.gitignore.io/api/macos,pycharm+all,jupyternotebooks
104 changes: 0 additions & 104 deletions Kick-Off.md

This file was deleted.

79 changes: 79 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
<img src="https://bit.ly/2VnXWr2" alt="Ironhack Logo" width="100"/>

# Predicting Job Salaries
*Aitor Quinza*

*[Data, Barcelona & March 2020]*

## Content
- [Project Description](#project-description)
- [Hypotheses / Questions](#hypotheses-questions)
- [Dataset](#dataset)
- [Cleaning](#cleaning)
- [Analysis](#analysis)
- [Model Training and Evaluation](#model-training-and-evaluation)
- [Future Work](#future-work)
- [Organization](#organization)
- [Links](#links)

## Project Description
This project aims to help people on interviews, helping them to say a salary range.

## Hypotheses / Questions
* Affects the salary depending on the state?
* Affects the salary the company industry?
* Can We predict the salary based on job title, geography and required skills?


## Dataset
* For this project, I scraped GlassDoor website because in EEUU they have in some offers a salary range estimation and I'm going to work with this estimation to make my own.
* The script is in the folder scripts/glassdoor.py

## Cleaning
* Parsed numeric data out of salary
* Removed rows without salary
* Made columns for employer provided salary and hourly wages
* Made columns for if different skills were listed in the job description:
* Python
* R
* Excel
* AWS
* Spark
* SQL
* Tableau
* Parsed rating out of company text
* Made a new column for company state
* Added a column for if the job was at the company’s headquarters
* Transformed founded date into age of company
* Column for simplified job title and Seniority
* Column for description length


## Analysis
Visit my [Tableau Graphs](https://public.tableau.com/profile/aitor2544#!/vizhome/DataScienceJobEEUU/Story1)

## Model Training and Evaluation
I used 3 Algorithms:
* Multivariable Linear Regression
* Lasso Regression
* Random FOrest



## Future Work
* Test SVM algorithm
* Improve skills extraction
* Add more keywords for jobs


## Organization
The structure has 3 folders:
* Datasets -> CSV files
* Notebooks -> Data cleaning, EDA and model building
* Scripts -> Python scripts for scraping and save the the model

## Links

[Repository](https://github.com/aitorquinza/Project-Week-8-Final-Project/)
[Slides](https://docs.google.com/presentation/d/1lD6bA32RghmyEhmh5p3Ni35dZikB0xhQMjr6udIGW9w/edit?usp=sharing)
[Kanban](https://github.com/aitorquinza/Project-Week-8-Final-Project/projects/1)
Loading