Skip to content

Ani25beep/statistical-data-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Applied Statistical Analysis of Time-Series Data

This repository contains applied statistical analysis projects using real-world time-series data across multiple domains, including finance and environmental studies.

The focus is on statistical reasoning, interpretability, and time-varying behavior, rather than prediction or black-box modeling.


📌 Objective

To apply core concepts from probability and statistics—such as rolling statistics, volatility, and z-scores—to analyze variability and extreme events in real-world time-series data.


📊 Datasets

The projects in this repository use publicly available datasets, including:

  • Historical daily price data of the NIFTY 50 index
  • Air quality measurements (PM2.5) from Delhi air pollution datasets

Raw data files are not included due to licensing considerations.


🛠 Tools & Technologies

  • Python
  • pandas, numpy
  • matplotlib
  • Jupyter Notebook

📁 Projects in This Repository

🔹 Project 1: Statistical Analysis of NIFTY 50 Returns

Notebook: stock_analysis.ipynb

Focus:

  • Computation of daily percentage returns
  • Mean and standard deviation analysis
  • Return distribution visualization
  • Identification of tail events

Key Takeaways:

  • Average daily returns are close to zero, consistent with efficient market behavior
  • Return distributions exhibit fat tails
  • Volatility captures market risk effectively

🔹 Project 2: Volatility Modeling & Extreme Event Detection

Notebook: volatility_zscore.ipynb

Focus:

  • Rolling mean and rolling volatility estimation
  • Time-varying risk analysis
  • Z-score based detection of statistically extreme events

Key Takeaways:

  • Volatility is not constant over time
  • Extreme events cluster during high-risk regimes
  • Constant-variance assumptions are violated in real data

🔹 Project 3: Air Pollution Variability & Extreme Event Detection

Notebook: air_quality_anomaly.ipynb

Focus:

  • Time-series analysis of daily PM2.5 concentrations
  • Rolling variability and volatility analysis
  • Z-score based detection of extreme pollution events

Key Takeaways:

  • Air pollution variability is time-dependent and non-constant
  • Extreme pollution episodes occur in clusters
  • Statistical anomaly detection methods are transferable beyond finance

📈 Key Statistical Concepts Used

  • Time-series analysis
  • Rolling mean and rolling standard deviation
  • Volatility and variability modeling
  • Z-scores and anomaly detection
  • Tail risk and extreme event interpretation

🎓 Learning Outcomes

  • Applied statistical theory to real-world datasets across domains
  • Developed intuition for time-varying risk and variability
  • Gained experience handling messy, real-world data
  • Strengthened ability to interpret and communicate statistical results

🧠 Author

Mayank Kochar
Mathematics background with interest in applied statistics, data analysis, and time-series modeling

About

Exploratory and statistical analysis of a real-world dataset using Python and basic statistics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors