Train auto-encoders for feature extraction from acoustic spectrograms
- This is a codebase for applied research with auto-encoders to extract features from spectrograms
- It allow to define and train simple custom Pytorch auto-encoders for spectrograms
- Auto-encoders perform partial pooling of time axis (latent array representation is 2D -> channel by time)
- Specific data loader for spectrogram data to train under de-noising regime
- Trained models are meant to be used for feature extraction with companion project
- Extracted features can be ingested by this data annotation app - its repo
- Make a fresh venv
- Install latest package release from wheel:
- Go to https://github.com/sergezaugg/train_saec/releases
- Navigate to latest release an copy the full link to the whl file
- In fresh venv, run
pip install --upgrade <full link>
- Example:
pip install --upgrade https://github.com/sergezaugg/train_saec/releases/download/vx.x.x/train_saec-x.x.x-py3-none-any.whl
- PyTorch dependencies (torch, torchvision) are not included in package and must be installed separately:
- For fast execution, torch and torchvision should to be install for GPU.
- Example:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126 (specifically for Windows with CUDA 12.6)
- If other CUDA version or other OS, check official instructions here
- If no GPU on machine, try:
pip install torch torchvision for CPU usage
- Prepare PNG formatted color images of spectrograms, e.g. with this tool
- sample_code.py illustrates a pipeline to create and train auto-encoders
