Statistical analysis demonstrating 2.16x improvement in program effectiveness through targeted intervention. Using Zero-Inflated Negative Binomial regression on 2,558 participants, this project quantifies the impact of support programs on user engagement and success metrics.
Key Impact: 44% reduction in user disengagement (77% → 33% zero completions)
Density distribution showing dramatic reduction in disengagement with intervention
An online educational platform needed to evaluate the ROI of their peer support program to make data-driven decisions about resource allocation. Key questions:
- Does the intervention program justify its cost?
- Which user segments benefit most from support?
- How can we optimize program targeting?
- Data Integration: Merged intervention records (449 sessions) with user progress data (2,558 users)
- Statistical Modeling: Applied ZINB regression to handle zero-inflated count data
- Validation: Bootstrap analysis and permutation testing for robust results
- Data Wrangling & Integration: Complex data cleaning, validation, and transformation across multiple sources
- Exploratory Data Analysis (EDA): Patterns and trends identification via Statistical methods and visualizations
- Statistical Analysis & Modeling: Zero-Inflated Negative Binomial (ZINB) regression, hypothesis testing, bootstrap methods
- Python Stack: Pandas, numpy, scipy, statsmodels, matplotlib, seaborn
- Best Practices: Reproducible analysis via environment management and Black code formatting
- Business Communication: Executive presentations, actionable insights
- 2.16x improvement in completion rates (IRR: 2.16, 95% CI: [1.56, 3.01], p < 0.00001)
- Clear positive ROI based on improved user engagement metrics
| Segment | Description | Action |
|---|---|---|
| High Responders (67%) | Tutored students showing clear progress | Continue support |
| Opportunity Group (33%) | Tutored students with no progress | Target intervention |
| Self-Starters (23%) | Untutored students progressing independently | No action needed |
| At-Risk Group (77%) | Untutored students with zero progress | Prioritize outreach |
- Target intervention for Opportunity Group for quick win
- Maintain current support for High Responders
- Expand program to At-Risk Group via continued outreach
- Executive Presentation (PDF)
- Interactive Executive Presentation with Notes (Google Slides)
- Statistical Model Metrics
- Statistical Tests Summary
- Main Outputs
- Predictive Model: Develop early warning system to identify at-risk users
- A/B Testing: Test targeted intervention for Opportunity Group
- Dashboard: Real-time monitoring of segment movements
- Scale Analysis: Apply framework to other programs/interventions
├── raw_data/ # README only (data excluded for privacy)
├── notebooks/ # Analysis notebooks with full methodology
├── data_wrangling_output/ # README only (data excluded for privacy)
├── data_modeling_output/ # Visualizations, statistical modeling and tests results
├── presentations/ # Stakeholder communications
├── README.md # Project overview and results
├── .gitignore # Excludes sensitive data files
└── requirements.txt # Environment specifications
All data files containing participant information have been excluded from this repository to protect privacy. Raw data and processed datasets are not included in this public repository - see individual folder READMEs for data descriptions.
The notebooks demonstrate the complete analysis process and results without exposing sensitive information. To reproduce this analysis, you would need:
- Intervention tracking data with participant identifiers
- Progress/engagement data from the platform system
- Proper data access permissions and compliance approval
[Masae Kobayashi Wen] - [mkwen2024@gmail.com] - [LinkedIn]
