Skip to content

tazriahelal/Data_Scientist_Assignment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Data_Scientist_Assignment

Excel and Python Assignment

N.B: - For part-2 I used Python3 version

Have to install these libraries(Numpy, Pandas,scipy)

           pip install numpy
           pip install scipy
           pip install pandas
           pip install sklearn

Part 1: Here I created a pivot table using data of sheet1 & shoed the information following this structure -

a. The data table showed as the Sum of Income as value.

b. The columns is included as the value of Gender and MaritalStatus.

c. The rows is in the following order: Division; Customer Name; ID.

d. In sheet2 there are some IDs. Added a new column to sheet1 and name it “Matched”. Here I have shown the IDs of sheet2 with the IDs of sheet1 and showing the result as True or False Using Formula

Note: Added additional sheets which filter the data of sheet1 & filtered by Division parameters.

Part 2:

  1. In this part, I have separated the sheet 1 data in a new excel sheet.
  2. Converted the excel file to csv.
  3. Loaded the csv file.
  4. Get rid of the column ID from the data frame.
  5. Encoded the data to have similar values.
  6. Useing K-means clustering based on their divisions.
  7. Submitting the file without omitting the output.

Releases

No releases published

Packages

 
 
 

Contributors