DCUSV is a Python pipeline for the unsupervised discovery of vocalization types in rodent ultrasonic vocalizations (USVs). It segments raw audio into spectrogram images using methods from ContourUSV, compresses them with a dense autoencoder, and clusters the latent representations with UMAP + HDBSCAN + agglomerative meta-clustering.
The pipeline was developed for 22 kHz USVs recorded in a PTSD rat model, but the approach generalizes to any USV frequency band.
Raw .wav files (USCMed dataset)
│
▼
data_prep.py — segment audio, extract USV spectrogram patches → 512×512 PNG images
│
▼
autoencoder.py — train a dense autoencoder; save 10-D latent embeddings (.npy)
│
▼
dcusv.py — UMAP → HDBSCAN → agglomerative meta-clustering; UMAP/t-SNE/PCA plots
│
▼
cluster_dist.py — per-animal cluster-proportion heatmaps; symmetric KL divergence
param_optuna.py is an optional Optuna hyperparameter search over the UMAP + HDBSCAN + meta-clustering space.
dcusv/
├── data_prep.py # Step 1 – spectrogram generation & USV extraction
├── autoencoder.py # Step 2 – dense autoencoder training & embedding
├── dcusv.py # Step 3 – UMAP / HDBSCAN / meta-clustering & visualization
├── cluster_dist.py # Step 4 – cluster-distribution analysis across animals/conditions
├── param_optuna.py # (Optional) Optuna hyperparameter search
├── requirements.txt
└── README.md
USCMed/ # Downloaded dataset (see Step 0 below)
└── PTSD16/
└── Context/
└── *.wav
clustering_data/ # Created by data_prep.py — 512×512 spectrogram patches
clustering_results/
├── models/ # Autoencoder weights, .npy embeddings, file-path list
├── cluster_dcusv_*/ # Per-cluster image grids (dcusv.py)
├── cluster_dcusv_vis_*/ # UMAP / t-SNE / PCA scatter plots (dcusv.py)
└── cluster_map/ # Heatmaps and KL-divergence CSV/PNG (cluster_dist.py)
Python 3.9–3.11 is recommended (TensorFlow 2.x compatibility).
-
Clone the repository:
git clone https://github.com/lina-usc/dcusv.git cd dcusv -
Create and activate virtual environment:
python -m venv venv source venv/bin/activate # Linux/MacOS venv\Scripts\activate # Windows
-
Install dependencies:
pip install -r requirements.txt
Download the USCMed dataset from Zenodo:
USCMed contains audio recordings and hand-scored annotations for 27 male rats across a Context trial, collected at the University of South Carolina School of Medicine.
Extract the archive into the dcusv/ directory so that the layout matches:
dcusv/
└── USCMed/
└── PTSD16/
└── Context/
└── *.wav
If you extract to a different location, update root_path in data_prep.py accordingly.
python data_prep.pyThis reads the raw .wav files from USCMed/PTSD16/Context/, extracts 22 kHz USV regions, and saves 512×512 grayscale PNG spectrogram patches under:
clustering_data/all_data_512x512//<recording_stem>/
python autoencoder.pyOutputs written to clustering_results/models/:
dense_encoder_all.h5— saved Keras encoderdense_encoded_images_all.npy— (N, 10) latent embedding matrixfile_paths_all.npy— ordered list of source image pathsdense_autoencoder_loss_all.png— train/val loss curve
python dcusv.pyOutputs:
- Per-cluster image grids in
clustering_results/cluster_dcusv_<embedding>/ - UMAP, t-SNE, and PCA scatter plots in
clustering_results/cluster_dcusv_vis_<embedding>_<silhouette>/
python cluster_dist.pyOutputs written to clustering_results/cluster_map/:
cluster_counts_<cond>.csv/cluster_props_<cond>_colnorm.csvcluster_clustermap_<cond>.png— per-condition cluster heatmapkl_divergence_per_animal.csv/kl_divergence_per_animal.png
python param_optuna.pyRuns 500 Optuna trials optimizing silhouette score over UMAP, HDBSCAN, and meta-clustering hyperparameters.
data_prep.py uses the detection methods from ContourUSV, our earlier USV detection pipeline.
Key parameters are set as variables at the top of each script.
| Script | Variable | Default | Description |
|---|---|---|---|
data_prep.py |
root_path |
Path("USCMed") |
Root directory of the downloaded dataset |
data_prep.py |
experiment_tests_mapping |
{'PTSD16': ['Context']} |
Experiments and conditions to process |
data_prep.py |
freq_min / freq_max |
0 / 115 kHz |
Frequency range for spectrogram |
autoencoder.py |
dims |
[N, 2048, 512, 128, 10] |
Autoencoder layer sizes |
autoencoder.py |
pretrain_epochs |
300 |
Max training epochs (early stopping applies) |
dcusv.py |
embedding |
"dense_encoded_images_all" |
Embedding file stem to load |
dcusv.py |
UMAP n_neighbors / min_dist |
7 / 0.0 |
UMAP hyperparameters |
dcusv.py |
HDBSCAN min_cluster_size / min_samples |
10 / 14 |
HDBSCAN hyperparameters |
cluster_dist.py |
agg n_clusters |
4 |
Number of meta-clusters |
- Cluster image grids — a panel of up to 10 representative spectrogram patches for each cluster.
- UMAP / t-SNE / PCA scatter plots — 2-D embedding colored by merged cluster label.
- Cluster heatmaps — rows = clusters, columns = animals; color = proportion of animal's USVs in that cluster.
- KL divergence bar chart — per-animal divergence between ACQ and Context cluster distributions.
dcusv.py prints the following metrics (noise points excluded):
| Metric | Interpretation |
|---|---|
| Silhouette score | Higher is better (−1 to 1) |
| Davies-Bouldin score | Lower is better |
| Calinski-Harabasz score | Higher is better |
| HDBSCAN validity index | Higher is better (0 to 1) |
See requirements.txt. Core libraries:
- TensorFlow / Keras — autoencoder training
- UMAP-learn — dimensionality reduction
- HDBSCAN — density-based clustering
- scikit-learn — agglomerative clustering, metrics
- Optuna — hyperparameter optimization
- OpenCV — image processing
If you use DCUSV in your research, please cite the following:
DCUSV (this work):
Deep Clustering of Ultrasonic Vocalizations in Rodents. Research Square, 2025. https://www.researchsquare.com/article/rs-9068431/v1
ContourUSV (spectrogram preprocessing):
Anis, S. S., Kellis, D. M., Kaigler, K. F., Wilson, M. A., & O'Reilly, C. (2025). A Reliable and Efficient Detection Pipeline for Rodent Ultrasonic Vocalizations. arXiv:2503.18928. https://arxiv.org/abs/2503.18928
USCMed dataset: