Data rarely arrives clean. It crashes in like waves… messy, unpredictable, and full of hidden signals. This project dives into a Shark Attack Dataset and transforms it into meaningful insights using a blend of data cleaning, exploration, and advanced visualization.
Think of it as turning scattered ocean noise into a readable map of patterns.
🔍 Investigates real-world shark attack data 🧹 Cleans and structures messy information 📊 Compares before vs after cleaning visuals 🚀 Uses advanced interactive visualizations to uncover hidden patterns
-
Dataset: Shark Attack Records
-
Customization: Random 200-row subset (ensures uniqueness)
-
Key Attributes:
- 📅 Year
- 🌍 Country
- 🏄 Activity
- 🎂 Age
- 👤 Sex
- ⚰️ Fatal (Y/N)
Raw data → Clean data → Insightful patterns
-
Removed irrelevant & noisy columns
-
Handled missing and inconsistent values
-
Converted data types (Age, Year)
-
Eliminated duplicates
-
Engineered new feature:
- ⚡
severity(based on fatality)
- ⚡
(The messy ocean 🌪️)
- Missing values visualization
- Unstructured country distribution
👉 Data looks inconsistent and hard to interpret
(Clarity emerges ✨)
- 📅 Year-wise attack trends
- ⚰️ Fatal vs Non-Fatal comparison
- 🎂 Age distribution
👉 Patterns become visible and meaningful
(Where the project truly shines 💎)
-
🌍 Choropleth Map → Global distribution of shark attacks
-
🌳 Sunburst Chart → Country → Activity → Fatality hierarchy
-
🎻 Violin Plot → Age distribution with density insight
-
🧬 3D Scatter Plot → Multi-dimensional data exploration
-
📦 Treemap → Compact hierarchical visualization
| Tool | Purpose |
|---|---|
| Python | Core programming |
| Pandas | Data manipulation |
| Matplotlib | Basic visualization |
| Plotly | Advanced interactive charts |
- 🌍 Certain regions show higher attack concentration
- 🏄 Activities like surfing/swimming increase risk
- ⚰️ Fatal attacks are relatively rare but critical
- 🎂 Specific age groups appear more vulnerable
pip install pandas matplotlib plotlyjupyter notebook
✔️ Uses a custom dataset subset (not repeated) ✔️ Shows before vs after cleaning comparison ✔️ Combines basic + advanced visualization ✔️ Includes feature engineering (severity) ✔️ Focuses on storytelling through data
Data is not just numbers. It’s a story waiting to be revealed.
This project demonstrates how thoughtful preprocessing and powerful visualizations can turn raw data into something meaningful, insightful, and impactful.
Bharath JR B.Tech CSE (AI & ML)