Skip to content

sergezaugg/train_saec

Repository files navigation

Train auto-encoders for feature extraction from acoustic spectrograms

Overview

  • This is a codebase for applied research with auto-encoders to extract features from spectrograms
  • It allow to define and train simple custom Pytorch auto-encoders for spectrograms
  • Auto-encoders perform partial pooling of time axis (latent array representation is 2D -> channel by time)
  • Specific data loader for spectrogram data to train under de-noising regime
  • Trained models are meant to be used for feature extraction with companion project
  • Extracted features can be ingested by this data annotation app - its repo

Intallation

  • Make a fresh venv
  • Install latest package release from wheel:
    • Go to https://github.com/sergezaugg/train_saec/releases
    • Navigate to latest release an copy the full link to the whl file
    • In fresh venv, run pip install --upgrade <full link>
    • Example: pip install --upgrade https://github.com/sergezaugg/train_saec/releases/download/vx.x.x/train_saec-x.x.x-py3-none-any.whl
  • PyTorch dependencies (torch, torchvision) are not included in package and must be installed separately:
    • For fast execution, torch and torchvision should to be install for GPU.
    • Example: pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126 (specifically for Windows with CUDA 12.6)
    • If other CUDA version or other OS, check official instructions here
    • If no GPU on machine, try: pip install torch torchvision for CPU usage

Usage

  • Prepare PNG formatted color images of spectrograms, e.g. with this tool
  • sample_code.py illustrates a pipeline to create and train auto-encoders

ML details

Example image

About

Python package to train spectrogram auto-encoders (SAEC) under de-noising regime

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages