Human Activity Recognition Using Smartphones Sensor DataSet

Overview

Smartphones have become integral to daily life, equipped with advanced technology and intelligent assistance for everyday activities. They include embedded sensors such as accelerometers and gyroscopes, enabling applications based on location, movement, and context.

Activity Recognition (AR) refers to monitoring a person's activity using smartphone sensors. These sensors detect motion, allowing the development of systems that recognize environmental changes or specific user actions.

The Human Activity Recognition (HAR) framework collects raw sensor data and applies machine learning and deep learning methods to classify human movement. The UCI HAR Dataset contains data from 30 subjects performing various activities, recorded via smartphones worn at the waist. The goal is to build models that accurately predict activities such as Walking, Walking Upstairs, Walking Downstairs, Sitting, Standing, and Laying.

Sources/Useful Links

Problem Statement

Given a new datapoint, predict the human activity class (e.g., Walking, Sitting, etc.) based on smartphone sensor data.

Solution

The project uses both the expert-engineered and raw datasets from the UCI HAR repository:

Classical Machine Learning: Uses the pre-engineered feature dataset to train models like Logistic Regression, SVM, Decision Trees, etc.
Deep Learning: Uses the raw sensor data to train LSTM models for time-series classification.

Which Type of ML Problem is This?

This is a multiclass time series classification problem. Each datapoint corresponds to one of six activities, and features are derived from sensor signal processing. Predicting human activity requires capturing temporal dependencies and distinguishing subtle differences between similar actions.

What is the Best Performance Metric for This Problem?

Accuracy: The primary metric used to evaluate overall model performance.
Confusion Matrix: Helps understand which classes are confused and the types of errors being made.
Multi-class Log-loss: Useful for evaluating multiclass classification problems.

The confusion matrix is particularly helpful in identifying confusion between similar activities (e.g., Sitting vs. Standing, Walking Upstairs vs. Walking Downstairs).

Business Objectives and Constraints

The goal is to enable health/activity tracking using only smartphone sensors (accelerometer and gyroscope).
The solution could be implemented as a smartphone app for health monitoring.
Constraints:
- High cost of misclassification (e.g., misclassifying activity could impact health feedback).
- No strict latency requirements.

Data Overview

1. How Data Was Recorded

30 subjects performed daily activities while a smartphone (with accelerometer & gyroscope) was mounted at the waist.
Data was recorded in windows of 2.56 seconds with 50% overlap (resulting in 1.28-second steps).
Activities were video-recorded for accurate labeling.

Signals recorded:

3-axial linear acceleration (accelerometer): tAcc-XYZ
3-axial angular velocity (gyroscope): tGyro-XYZ
Other derived signals (jerk, magnitude, frequency domain, etc.)

2. How is the Data Preprocessed?

Raw sensor data was filtered and segmented into 2.56s sliding windows (128 readings per window).
Each window yields a feature vector, computed from time & frequency domains.
Signals are separated (e.g., body vs. gravity acceleration).
Jerk signals and magnitudes are computed.
FFT is applied for frequency domain features.
Feature functions: mean, std, mad, max, min, sma, energy, iqr, entropy, arCoeff, correlation, maxInds, meanFreq, skewness, kurtosis, bandsEnergy, angle, etc.

Sample features:

tBodyAcc-XYZ, tGravityAcc-XYZ, tBodyAccJerk-XYZ, tBodyGyro-XYZ, tBodyGyroJerk-XYZ, tBodyAccMag, tGravityAccMag, tBodyAccJerkMag, tBodyGyroMag, tBodyGyroJerkMag, fBodyAcc-XYZ, fBodyAccJerk-XYZ, fBodyGyro-XYZ, fBodyAccMag, fBodyAccJerkMag, fBodyGyroMag, fBodyGyroJerkMag

3. Y_Labels(Encoded)

Activity	Label
WALKING	1
WALKING_UPSTAIRS	2
WALKING_DOWNSTAIRS	3
SITTING	4
STANDING	5
LAYING	6

4. Data Directory

The dataset is provided as a single ZIP file (~58 MB): UCI HAR Dataset.zip. The structure includes separate directories for training and test data, with expert features and raw sensor data.

Train and Test Ratio

The 30 subjects' data is split: 70% for training (21 subjects), 30% for testing (9 subjects).
Example: train = pd.read_csv('UCI_HAR_dataset/csv_files/train.csv')

Agenda

1. Analyzing the Data (EDA)

Exploratory Data Analysis (EDA) is performed on the expert-engineered dataset.
Data shapes: Train (7352, 564), Test (2947, 564)
Distribution plots show that the dataset is balanced across subjects and activity classes.
Activities are categorized as Static (Sitting, Standing, Laying) and Dynamic (Walking, Walking Upstairs, Walking Downstairs). Feature analysis (e.g., tBodyAccMagmean) helps distinguish between them.
TSNE and boxplots are used for visualizing feature separability and activity clusters.

2. Machine Learning Models

Models used:

Logistic Regression: Linear classifier; uses L2 regularization and cross-validation for hyperparameter tuning.
Linear SVC: Fits a hyperplane to separate activity classes.
Kernel SVM: Uses kernel functions for nonlinear separation.
Decision Tree: Hierarchical, interpretable model mapping features to activity classes.
Random Forest: Ensemble of decision trees, improves generalization and reduces overfitting.
Gradient Boosted: Ensemble boosting method for improved accuracy (note: some systems may not support GridSearchCV with GradientBoostingClassifier due to resource limits).

3. Deep Learning Models

LSTM (Long Short-Term Memory) Networks are applied to the raw time-series data.
- 1-Layer LSTM
- 2-Layer LSTM with hyperparameter tuning

4. Results & Conclusion

Classical Machine Learning Model Accuracy

Model Name	Features	Hyperparameter Tuning	Accuracy Score
Logistic Regression	Expert features	Yes	95.83%
Linear SVC	Expert features	Yes	96.47%
RBF SVM	Expert features	Yes	96.27%
Decision Tree	Expert features	Yes	86.46%
Random Forest	Expert features	Yes	92.06%

Best classical models: Linear SVC, RBF SVM, Logistic Regression.

Deep Learning LSTM Model Comparison

Model Name	Features	Hyperparameter Tuning	Crossentropy	Accuracy Value
LSTM (1 layer, 32 neurons)	Raw time series	Yes	0.47	0.90
LSTM (2 layers, 48/32)	Raw time series	Yes	0.39	0.90
LSTM (2 layers, 64/48)	Raw time series	Yes	0.27	0.91

LSTM models perform very well on raw data, reaching up to 91% accuracy with two layers and appropriate tuning.

Technical Aspect

This project is divided into four parts:

Exploratory Data Analysis (EDA)
Classical Machine Learning prediction models (expert features)
LSTM-based Deep Learning model (raw time-series data)
Model comparison and conclusion

Installation

The code is written in Python 3.7.

Install required libraries:

pip3 install pandas
pip3 install numpy
pip3 install scikit-learn
pip3 install matplotlib
pip3 install keras

All code and scripts are available in this repository.

Quick Overview of the Dataset

Sensors used: Accelerometer and Gyroscope
Participants: 30 volunteers
Activities: Walking, Walking Upstairs, Walking Downstairs, Standing, Sitting, Laying.
Windowing: 2.56s window with 50% overlap
Signals: Acceleration (body & gravity), angular velocity, Jerk, and their frequency transforms
Features: 561 per datapoint (means, max, mad, sma, arCoeff, energy bands, entropy, etc.)
Label encoding: Each window of readings is a datapoint of 561 features with an activity label.

Note:

For classical ML algorithms, use expert features.
For deep learning (LSTM), use raw sensor time-series data.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Human.ipynb		Human.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Human Activity Recognition Using Smartphones Sensor DataSet

Table of Contents

Overview

Sources/Useful Links

Problem Statement

Solution

Which Type of ML Problem is This?

What is the Best Performance Metric for This Problem?

Business Objectives and Constraints

Data Overview

1. How Data Was Recorded

2. How is the Data Preprocessed?

3. Y_Labels(Encoded)

4. Data Directory

Train and Test Ratio

Agenda

1. Analyzing the Data (EDA)

2. Machine Learning Models

3. Deep Learning Models

4. Results & Conclusion

Classical Machine Learning Model Accuracy

Deep Learning LSTM Model Comparison

Technical Aspect

Installation

Quick Overview of the Dataset

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Human Activity Recognition Using Smartphones Sensor DataSet

Table of Contents

Overview

Sources/Useful Links

Problem Statement

Solution

Which Type of ML Problem is This?

What is the Best Performance Metric for This Problem?

Business Objectives and Constraints

Data Overview

1. How Data Was Recorded

2. How is the Data Preprocessed?

3. Y_Labels(Encoded)

4. Data Directory

Train and Test Ratio

Agenda

1. Analyzing the Data (EDA)

2. Machine Learning Models

3. Deep Learning Models

4. Results & Conclusion

Classical Machine Learning Model Accuracy

Deep Learning LSTM Model Comparison

Technical Aspect

Installation

Quick Overview of the Dataset

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages