Skip to content

0xEvsky/GaussianNB-vs-MultinomialNB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Naive Bayes Classifiers from Scratch

A from-scratch implementation of Gaussian and Multinomial Naive Bayes classifiers in Python, evaluated on two real-world datasets.


Project Structure

.
├── classifiers.py   # GaussianNaiveBayes and MultinomialNaiveBayes implementations
├── utils.py         # Helper functions: accuracy, mean, variance, Bag-of-Words builder
└── main.ipynb       # Experiments, evaluation, and visualizations

Datasets

Dataset Task Model
Abalone (UCI) Age classification (Young / Adult / Old) Gaussian NB
IMDB Movie Reviews Sentiment analysis (Positive / Negative) Multinomial NB

Results

Model Mode Accuracy
Gaussian NB With log probabilities 56.58%
Gaussian NB Without log probabilities 56.58%
Multinomial NB With log probabilities 78.50%
Multinomial NB Without log probabilities 53.00%

Key insight: Log probabilities matter significantly for Multinomial NB on text data (25.5% accuracy gap), because multiplying many small word probabilities causes numerical underflow without the log transformation.


How to Run

  1. Install dependencies:

    pip install numpy pandas matplotlib scikit-learn
  2. Place the IMDB dataset at data/IMDB Dataset.csv.

  3. Open and run main.ipynb.


Implementation Notes

  • Both classifiers support toggling log-probability mode via use_log=True/False.
  • Gaussian NB uses Laplace smoothing (+1e-9 to variance) to avoid division by zero.
  • Multinomial NB uses Laplace (additive) smoothing on word counts.
  • The buildBOW utility converts raw text reviews into fixed-length Bag-of-Words vectors.

About

Comparison between the GaussianNB on Abalone Dataset and MultinomialNB on IMDB Movie Reviews Dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors