This project is an end-to-end Music Genre Classification system that uses machine learning / deep learning techniques to automatically classify audio tracks into their respective music genres based on extracted audio features.
The goal of the project is to demonstrate how signal processing + machine learning can be combined to solve a real-world audio classification problem.
With the massive growth of digital music platforms, organizing and recommending music manually has become impractical.
Music genre classification helps in:
- 🎧 Music recommendation systems
- 📂 Automatic music library organization
- 🎼 Music information retrieval (MIR)
- 🤖 Intelligent audio-based applications
This project focuses on building a clean, interpretable, and effective genre classification pipeline rather than using black-box APIs.
- Raw audio files are loaded from the dataset
- Each audio file is split into manageable segments (if required)
- Audio signals are converted into numerical representations using feature extraction
The following audio features are extracted from each track:
- MFCCs (Mel-Frequency Cepstral Coefficients)
- Chroma Features
- Spectral Centroid
- Spectral Bandwidth
- Zero Crossing Rate
- RMS Energy
These features capture:
- Timbre
- Pitch distribution
- Frequency characteristics
- Rhythm and energy
📌 Feature extraction converts raw sound waves into a format suitable for ML models.
- Extracted features are used as input to the classification model
- The model learns patterns that differentiate music genres
- The trained model predicts the genre of unseen audio samples
- For a new audio file:
- Features are extracted
- Passed through the trained model
- The most probable genre is returned as output
Depending on implementation, the project uses:
- Classifier: Random Forest / SVM / Logistic Regression
- Well-suited for structured feature vectors
- Easy to interpret and fast to train
- Neural Network (Dense / CNN)
- Learns complex feature interactions
- Better performance on large datasets
📌 Why Feature-Based Models?
- Computationally efficient
- Works well with limited datasets
- Easier to debug and explain than raw-audio deep models
- Dataset Used: GTZAN Music Genre Dataset
- Number of Genres: 10
- Blues
- Classical
- Country
- Disco
- Hip-Hop
- Jazz
- Metal
- Pop
- Reggae
- Rock
- Audio normalization
- Fixed-duration sampling
- Feature scaling (Standardization)
- Python
- NumPy
- Pandas
- Librosa (Audio Processing)
- Scikit-learn
- TensorFlow / Keras (if deep learning used)
- Matplotlib / Seaborn (Visualization)
The model performance is evaluated using:
- Accuracy
- Precision
- Recall
- F1-score
- Confusion Matrix
These metrics help analyze:
- Genre-wise performance
- Misclassification patterns
- Model reliability
music-genre-classification/
│
├── data/
│ └── genres_original/
│
├── notebooks/
│ └── music_genre_classification.ipynb
│
├── models/
│ └── trained_model.pkl
│
├── utils/
│ └── feature_extraction.py
│
├── requirements.txt
└── README.md
git clone [https://github.com/your-username/music-genre-classification.git](https://github.com/your-username/music-genre-classification.git)
cd music-genre-classificationpython -m venv venvLinux / macOS:
source venv/bin/activateWindows:
venv\Scripts\activatepip install -r requirements.txtjupyter notebookOpen the notebook and run cells sequentially to: -> Extract features -> Train the model -> Evaluate results
🎙️ Real-time audio genre prediction 🧠 CNNs on Mel-Spectrograms 📱 Web or mobile-based interface 🎶 Multi-label genre classification 🔊 Noise-robust training
Audio data requires signal processing knowledge Feature quality directly impacts model performance Classical ML models can perform surprisingly well on audio tasks Proper evaluation is critical in multi-class classification problems
GTZAN Dataset Librosa documentation Scikit-learn community