Getting And Cleaning Data

Course project in Coursera Getting and Cleaning Data The purpose of the project is to make a tidy dataset out of the Human Activity Recognition data generated from smartphones. The data is split inyto two sets:

Training Data (70% of the volunteers)
Test data (30% of the volunteers)

The functions in the R code is described below.

You activate the script by executing the command run_analysis().

Download data

This is handled by function downloadUCIHARData Data is downloaded as a zip file from https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip

The URL is hard coded in the function.

Unzip files

This is handled by function *extractUCIHARFiles

Data is unzipped to folder in working directory. The folder is the foldername in the path for each file in the zip file.

Read datasets into data.frames

The list of files in the zip is checked in order to find path to the required files. Each required file is read into a data.frame by calling the function readUCIHARdatafile

The following files are required

UCI HAR Dataset/activity_labels.txt
UCI HAR Dataset/features.txt
UCI HAR Dataset/test/subject_test.txt
UCI HAR Dataset/test/X_test.txt
UCI HAR Dataset/test/y_test.txt
UCI HAR Dataset/train/subject_train.txt
UCI HAR Dataset/train/X_train.txt
UCI HAR Dataset/train/y_train.txt

Special processing is done for data frames created from y_test.txt and y_train.txt in order to translate activity_id to an activity_labels and add as new column in df_y_test and df.y_train

The data frames is given correct column names. The features.txt contains the labels to be used for the X_test.txt and X_train.txt columns. If you in R Studio study the data.frames that are the result of a read.table("UCI HAR Dataset/test/X_test.txt") / read.table("UCI HAR Dataset/train/X_train.txt") R Studio will only show the first 100 columns even though there in reality is 561 columns in each.

Create combined test data frame consisting of columns of subjects, activity labels and test data

A new data.frame indicating whether it is test or training data is created for each type.

Test data:

DataType
Subject_ID (subject_test)
AvtivityType (Modified y_test)
UCI HAR data (X_train)

Training data:

DataType
Subject_ID (subject_train)
AvtivityType (Modified y_train)
UCI HAR data (X_train)

Produce tidy output file

Combines rows from test and training into a combined training data frame consisting of columns of subjects, activity labels and train data.

Creates a subset of all data with DataType, Subject_ID, ActivityType and all columns having expressions ".mean." and ".std."

Writes tidy data to file "UCIHAR_allData_Subset_Tidy_Table.txt" in folder "data"" under Working Directory.

Column names are written in file "output_colnames.txt" in folder "data"" under Working Directory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting And Cleaning Data

Download data

Unzip files

Read datasets into data.frames

Create combined test data frame consisting of columns of subjects, activity labels and test data

Produce tidy output file

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Getting And Cleaning Data

Download data

Unzip files

Read datasets into data.frames

Create combined test data frame consisting of columns of subjects, activity labels and test data

Produce tidy output file