Skip to content

A comprehensive collection of data science and machine learning projects, tutorials, and real-world applications.Projects involving data science conceptss

Notifications You must be signed in to change notification settings

DataDarling/DATA-SCIENCE-projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

38 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”¬ Data Science Projects Portfolio

A comprehensive collection of data science and machine learning projects, tutorials, and real-world applications. This repository contains 29 Jupyter notebooks covering fundamental concepts, advanced algorithms, model evaluation techniques, and practical data analysis projects.

πŸ“‹ Table of Contents

🎯 Overview

This repository serves as both a learning resource and a portfolio showcase, demonstrating proficiency in:

  • Machine Learning Algorithms: Classification, clustering, and ensemble methods
  • Deep Learning: Neural network implementations
  • Data Analysis: Exploratory data analysis and visualization
  • Model Evaluation: Cross-validation, hyperparameter tuning, and metrics
  • Real-World Applications: Environmental studies, real estate analysis, healthcare predictions

πŸ“ Repository Structure

The repository contains standalone Jupyter notebooks organized by topic, making it easy to explore specific areas of interest without dependencies on other files.

πŸš€ Getting Started

Prerequisites

  • Python 3.7+
  • Jupyter Notebook or JupyterLab
  • Required Python libraries (see Installation)

Installation

  1. Clone the repository:
git clone https://github.com/DataDarling/DATA-SCIENCE-projects.git
cd DATA-SCIENCE-projects
  1. Create a virtual environment (recommended):
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install required dependencies:
pip install jupyter numpy pandas matplotlib seaborn scikit-learn scipy

Usage

Launch Jupyter Notebook:

jupyter notebook

Navigate to any notebook and run the cells to see the demonstrations and analyses.

πŸ“š Project Categories

πŸ“ˆ Machine Learning - Regression

Notebook Description Dataset Key Concepts
linear_regression.ipynb Simple and multiple linear regression from first principles Synthetic data (1D & 2D) Matrix-based regression, coefficient calculation, prediction visualization

πŸŽ“ Machine Learning - Classification

Fundamental Algorithms:

Notebook Description Dataset Key Concepts
logistic_regression.ipynb Binary and multiclass logistic regression implementations Synthetic data Classification, decision boundaries, sigmoid function
naive_Bayes.ipynb Implementation of Multinomial, Bernoulli, and Gaussian Naive Bayes Synthetic text & binary data Probabilistic classification, Bayes theorem
decision_tree.ipynb Decision tree classifier with visualization Iris Dataset Tree-based learning, feature importance, tree visualization
knn_knearest neighbors.ipynb k-Nearest Neighbors algorithm Synthetic data Distance metrics, k-value selection, instance-based learning
suport_vector_machine_svm.ipynb SVM with multiple kernel types (linear, polynomial, RBF) Iris Dataset Kernel methods, hyperplane optimization, C and gamma parameters

Advanced Classification Applications:

Notebook Description Dataset Key Achievements
Heart Disease Prediction Model using Logistic Regression.ipynb Medical diagnosis prediction model Heart Disease Dataset Healthcare ML application, logistic regression
Breast Cancer Dataset - Model Evaluation and Hyperparameter Tuning.ipynb Comprehensive model evaluation and optimization Breast Cancer Dataset Hyperparameter tuning, model selection, medical ML
Multi-Class Classification and Model Tuning.ipynb Multi-class problem solving with model comparison Synthetic data Model comparison, hyperparameter optimization

πŸ” Machine Learning - Clustering

Unsupervised Learning:

Notebook Description Dataset Key Concepts
k_means_clustering.ipynb K-Means clustering fundamentals Synthetic blobs Centroid-based clustering, elbow method
Iris Dataset - k_means clustering.ipynb K-Means applied to Iris dataset Iris Dataset Cluster visualization, centroid analysis
clustering.ipynb Advanced clustering techniques Synthetic blobs (300 samples) GMM, hierarchical clustering (Ward, Complete, Average, Single linkage), dendrograms

🎯 Ensemble Methods & Neural Networks

Notebook Description Algorithms Performance
ensemble_models.ipynb Boosting methods for classification and regression AdaBoost, Gradient Boosting 100% accuracy on Iris, comprehensive regression metrics
neural_nets.ipynb Multi-Layer Perceptron implementation Neural Network (2-10-1 architecture) Decision boundaries, ReLU activation, Sigmoid output

πŸ“Š Model Evaluation & Validation

Comprehensive Evaluation Techniques:

Notebook Description Focus Key Techniques
evaluation_metrics.ipynb Complete classification metrics guide Performance measurement Confusion matrix, Accuracy, Precision, Recall, F1-score, ROC curves, AUC
cross_validation.ipynb Cross-validation strategies Model validation KFold, LeaveOneOut, LeavePOut, ShuffleSplit, StratifiedKFold
feature_selection.ipynb Feature importance and selection Wine dataset analysis Univariate selection, feature ranking
feature_selection_advanced.ipynb Advanced feature engineering Synthetic data Missing value analysis, tree-based feature importance, Random Forest

πŸ”§ Data Fundamentals

Core Skills & Tools:

Notebook Description Focus
numpy_demo.ipynb NumPy fundamentals Array operations, formatting, mathematical operations
pandas_demo.ipynb Pandas library essentials DataFrame manipulation, data operations
data_visualization.ipynb Data visualization techniques Matplotlib, Seaborn plotting
data_preparation.ipynb Data preprocessing workflow Data cleaning, transformation
data_quality_report.ipynb Data quality assessment Missing values, quality metrics
iris_data_exploration.ipynb Comprehensive EDA Statistical summaries, distributions

🌍 Real-World Projects

Applied Data Science:

Project Description Domain Key Insights
weather data analysis.ipynb Weather pattern analysis Meteorology Trend analysis, seasonal patterns
New York Homes and Hotel Listings Data Exploration.ipynb NYC real estate and hospitality analysis Real Estate Price trends, market insights, feature analysis
2020-2025 Southern States Fungi Report.ipynb Multi-year fungi observation study Ecology Species distribution, temporal patterns, regional analysis
georgia fungi observation report sept-oct 24.ipynb Georgia fungi seasonal report Environmental Science Monthly patterns, species identification
V2 Hikes Of Georgia Monthly Fungi Reports.ipynb Hiking and fungi tracking in Georgia Outdoor Recreation/Ecology Geographic analysis, species by location

πŸ› οΈ Technologies Used

  • Programming Language: Python 3.x
  • Data Manipulation: NumPy, Pandas
  • Visualization: Matplotlib, Seaborn
  • Machine Learning: Scikit-learn
  • Deep Learning: Neural network implementations
  • Development Environment: Jupyter Notebook
  • Statistical Analysis: SciPy

✨ Key Highlights

πŸ† Algorithm Coverage

  • Regression: Linear regression with matrix-based solutions
  • 9 Classification algorithms including SVM, Naive Bayes, Logistic Regression, Decision Trees, KNN, and ensemble methods
  • 3 Clustering approaches with various linkage methods and algorithms
  • Neural Networks with custom architecture
  • Comprehensive evaluation with 4 notebooks dedicated to model validation

πŸ“ˆ Real-World Impact

  • Healthcare applications: Heart disease and breast cancer prediction models
  • Environmental research: Multi-year fungi observation studies
  • Market analysis: Real estate and hospitality data exploration

πŸŽ“ Educational Value

  • Beginner-friendly: Starts with NumPy and Pandas basics
  • Progressive complexity: Moves from simple algorithms to advanced ensemble methods
  • Best practices: Demonstrates proper cross-validation, feature selection, and model evaluation
  • Well-documented: Each notebook contains explanations and visualizations

🀝 Contributing

Contributions, issues, and feature requests are welcome! Feel free to check the issues page.

How to Contribute

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ“ License

This project is available for educational and reference purposes. Please check individual datasets for their respective licenses.

πŸ“§ Contact

For questions or collaborations, please open an issue in the repository.


Note: Some notebooks may require specific datasets. Check individual notebooks for dataset sources and requirements.

Last Updated: February 2026