A Python toolkit for cleaning, transforming, and analyzing User Experience (UX) survey data. Built for UX researchers and data analysts who need to process multilingual survey responses and perform statistical analysis on Likert-scale data.
- Data Cleaning – Standardize messy survey exports and filter incomplete responses
- Scale Conversion – Convert French Likert scale text to numerical values
- Sample Size Calculator – Determine required participants for statistical validity
- Statistical Analysis – Normality tests, correlation matrices, and distribution metrics
- Qualitative Processing – Clean open-ended responses and translate from French to English
- Theme Extraction – Basic keyword-based categorization of feedback
┌─────────────────────────────────────────────────────────────────────┐
│ DATA PIPELINE │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ your_data.csv ──► cleanup.py ──► convert_to_numbers.py │
│ │ │ │
│ ▼ ▼ │
│ cleaned_data.csv numeric_likert.csv │
│ │ │
│ ┌──────────────────────────────────┼──────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ shapiro_wilk.py skewness_kurtosis.py iqr_median.py│
│ spearman.py │
│ │
├─────────────────────────────────────────────────────────────────────┤
│ OPEN-ENDED RESPONSES │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ open_ended.csv ──► clean_open.py ──► themes.py / feature.py │
│ │ │
│ ▼ │
│ open_ended_translated.csv │
│ │
└─────────────────────────────────────────────────────────────────────┘
| Script | Description |
|---|---|
cleanup.py |
Removes invalid rows and standardizes column names from raw survey exports |
scale_mappings.py |
Defines French-to-number mappings for Likert scales and ordinal choices |
convert_to_numbers.py |
Applies scale mappings to convert text responses to numeric values |
| Script | Description | Output |
|---|---|---|
sample_size_calculator.py |
Calculates required sample size for valid tests | (Console Output) |
shapiro_wilk.py |
Tests if data follows a normal distribution | shapiro_wilk.csv |
spearman.py |
Computes rank correlation matrix with heatmap | spearman_correlation.csv, spearman_correlation_heatmap.png |
skewness_kurtosis.py |
Measures distribution shape (asymmetry and tailedness) | skewness_kurtosis_results.csv |
iqr_median.py |
Calculates robust central tendency and spread metrics | median_iqr_results.csv |
| Script | Description | Output |
|---|---|---|
clean_open.py |
Cleans open-ended responses and translates French → English | open_ended_cleaned.csv, open_ended_translated.csv |
translate.py |
Standalone translation utility (re-run without re-cleaning) | open_ended_translated.csv |
feature.py |
Counts keyword mentions (e.g., "Calendar", "Mobile") | features.csv |
themes.py |
Categorizes responses into themes (e.g., "Bug", "RFE") | themes.csv |
# Clean the raw survey export
python cleanup.py
# Convert text responses to numbers
python convert_to_numbers.py# Calculate required sample size
python sample_size_calculator.py
# Test for normal distribution
python shapiro_wilk.py
# Generate correlation matrix and heatmap
python spearman.py
# Analyze distribution shape
python skewness_kurtosis.py
# Calculate median and IQR
python iqr_median.py# Clean and translate responses
python clean_open.py
# Extract feature keywords
python feature.py
# Categorize into themes
python themes.pyWarning
Keyword-based analysis has limitations.
The feature.py and themes.py scripts use simple keyword matching. Results should be verified through human analysis for accuracy.
Note
Dataset-specific configuration required.
The cleanup.py script contains a hardcoded French column name from the original survey. You must update this for different datasets.
Install dependencies with pip:
pip install pandas numpy scipy matplotlib seaborn deep_translator| File | Description |
|---|---|
your_data.csv |
Raw survey export (input) |
cleaned_data.csv |
Preprocessed structured data |
numeric_likert.csv |
Responses converted to numbers |
data_no_outliers.csv |
Dataset with outliers removed (used by statistical scripts) |
open_ended.csv |
Raw open-ended responses |
open_ended_cleaned.csv |
Cleaned open-ended responses |
open_ended_translated.csv |
Translated responses (French → English) |
themes.csv |
Categorized open-ended responses |
features.csv |
Feature keyword counts |
This project is open-source and licensed under the MIT License.