Skip to content

jonchung1205/Audio-Semantic-Discovery-Engine

Repository files navigation

Audio Semantic Discovery Engine

A semantic search engine that leverages the Spotify API to discover music through natural language queries. Uses SBERT for embeddings and FAISS for efficient vector similarity search.

Architecture Overview

User Query → SBERT Embedding → FAISS Search → Spotify Metadata → Results
                ↑
         Vector Database
    (10k+ track embeddings)

Core Components

  1. Data Ingestion - Spotify API integration for track metadata and lyrics
  2. Embedding Generation - SBERT to convert text into 768-dimensional vectors
  3. Vector Storage - FAISS index for efficient similarity search
  4. Search Engine - Natural language query interface

Prerequisites

  • Python 3.8+
  • Spotify Developer Account (for API credentials)
  • ~2GB disk space (for model and vector index)

Installation

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install spotipy sentence-transformers faiss-cpu numpy pandas python-dotenv requests beautifulsoup4 lyricsgenius --break-system-packages

Setup

1. Get Spotify API Credentials

  1. Go to https://developer.spotify.com/dashboard
  2. Create a new app
  3. Copy your Client ID and Client Secret
  4. Create a .env file in the project root:
SPOTIFY_CLIENT_ID=your_client_id_here
SPOTIFY_CLIENT_SECRET=your_client_secret_here
GENIUS_ACCESS_TOKEN=your_genius_token_here  # Optional, for lyrics

2. Project Structure

audio-semantic-discovery/
├── .env
├── config.py
├── spotify_ingestion.py
├── lyrics_scraper.py
├── embeddings_generator.py
├── faiss_indexer.py
├── search_engine.py
├── main.py
├── data/
│   ├── tracks.json
│   ├── tracks_with_lyrics.json
│   └── faiss.index
└── README.md

Usage

Quick Start

from search_engine import AudioSemanticSearch

# Initialize search engine
search = AudioSemanticSearch()

# Build index (first time only)
search.build_index(num_tracks=10000)

# Search with natural language
results = search.search("upbeat songs about summer and dancing", top_k=10)

# Display results
for i, result in enumerate(results, 1):
    print(f"{i}. {result['name']} by {result['artist']} (Score: {result['score']:.3f})")

Step-by-Step Process

# 1. Ingest track data from Spotify
python spotify_ingestion.py --num_tracks 10000

# 2. Fetch lyrics for tracks
python lyrics_scraper.py

# 3. Generate embeddings
python embeddings_generator.py

# 4. Build FAISS index
python faiss_indexer.py

# 5. Run search interface
python main.py

Key Features

  • Natural Language Queries: Search using descriptions like "melancholic indie songs about lost love"
  • Multi-field Embeddings: Combines track name, artist, genre, and lyrics
  • Fast Retrieval: Sub-second search across 10k+ tracks using FAISS
  • Semantic Understanding: Goes beyond keyword matching to understand meaning

Technical Details

Embedding Strategy

Each track is represented by a concatenated text field:

"{track_name} by {artist}. Genre: {genres}. Lyrics: {lyrics_snippet}"

This is encoded using sentence-transformers/all-MiniLM-L6-v2 (384-dim) or all-mpnet-base-v2 (768-dim).

FAISS Index Type

  • IndexFlatL2: Brute-force L2 distance (exact search, good for <100k vectors)
  • IndexIVFFlat: Inverted file index with clustering (faster, slight accuracy trade-off)

Performance Metrics

  • Indexing: ~100 tracks/second
  • Search latency: <50ms for 10k tracks
  • Memory: ~300MB for 10k tracks (768-dim embeddings)

Advanced Usage

Custom Embedding Model

from embeddings_generator import EmbeddingGenerator

gen = EmbeddingGenerator(model_name='all-mpnet-base-v2')

Filtering Results

results = search.search(
    query="energetic workout music",
    top_k=20,
    min_score=0.7,
    filters={'genre': 'electronic'}
)

Batch Processing

queries = [
    "sad piano ballads",
    "aggressive rap with heavy bass",
    "relaxing ambient music"
]
results = search.batch_search(queries, top_k=5)

Troubleshooting

Issue: Rate limiting from Spotify API

  • Solution: Add delays between requests or cache responses

Issue: Out of memory during indexing

  • Solution: Process in batches or use smaller embedding model

Issue: Poor search results

  • Solution: Tune embedding model, add more metadata, or adjust similarity threshold

Future Enhancements

  • Audio feature analysis (tempo, key, energy)
  • Playlist generation from queries
  • User preference learning
  • Real-time streaming integration
  • Multi-modal search (audio + text)

License

MIT License - feel free to use for personal or commercial projects.

Contributing

Pull requests welcome! Please ensure code follows PEP 8 style guidelines.

About

A semantic search engine that leverages the Spotify API to discover music through natural language queries. Uses SBERT for embeddings and FAISS for efficient vector similarity search.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages