Train auto-encoders for feature extraction from acoustic spectrograms

This is a codebase for applied research with auto-encoders to extract features from spectrograms
It allow to define and train simple custom Pytorch auto-encoders for spectrograms
Auto-encoders perform partial pooling of time axis (latent array representation is 2D -> channel by time)
Specific data loader for spectrogram data to train under de-noising regime
Trained models are meant to be used for feature extraction with companion project
Extracted features can be ingested by this data annotation app - its repo

Make a fresh venv
Install latest package release from wheel:
- Go to https://github.com/sergezaugg/train_saec/releases
- Navigate to latest release an copy the full link to the whl file
- In fresh venv, run pip install --upgrade <full link>
- Example: pip install --upgrade https://github.com/sergezaugg/train_saec/releases/download/vx.x.x/train_saec-x.x.x-py3-none-any.whl
PyTorch dependencies (torch, torchvision) are not included in package and must be installed separately:
- For fast execution, torch and torchvision should to be install for GPU.
- Example: pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126 (specifically for Windows with CUDA 12.6)
- If other CUDA version or other OS, check official instructions here
- If no GPU on machine, try: pip install torch torchvision for CPU usage

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
.github/workflows		.github/workflows
dev		dev
pics		pics
src/train_saec		src/train_saec
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
sample_code.py		sample_code.py

Provide feedback