GitHub - imbulana/coma: Conformer with multi-scale local attention and periodic positional encoding for symbolic piano music composer classification.

Conformer with multi-scale local attention and *(periodic positional encoding) for composer classification. See coma-gen for a similar architecture used for music generation.

Model Architecture (see src/transformer.py):

Embedding: REMI token embedding + scaled sinusoidal positional embedding.
Encoder: Stack of conformer-like blocks¹ (FeedForward → Multi-Scale Local Attention → Convolution Module → FeedForward, with LayerNorm and residuals).
- Attention: Multi-scale local self-attention (windowed, not full sequence). Scales aggregated via a weighted sum (learnable weight for each scale). Inspired by the multi-scale attention mechanism in Cui et al.²
- Convolution Module: pointwise convolution (w/ expansion factor of 2) -> GLU activation -> 1D Depthwise convolution -> Batchnorm -> Swish activation.
Sequence Attention: After encoding, a linear layer computes attention weights over the sequence, producing a weighted sum (sequence embedding).
Classifier: MLP (LayerNorm → Linear → GELU → Dropout → Linear) to output logits for composer classes.

Todo

*periodic positional encoding².

Setup

Create a conda environment with python 3.11:

conda create -n coma python=3.11
conda activate coma

Install dependencies:

pip install -r requirements.txt

Dataset

Download the Maestro 3.0 dataset³

wget https://storage.googleapis.com/magentadata/datasets/maestro/v3.0.0/maestro-v3.0.0-midi.zip
unzip 'maestro-v3.0.0-midi.zip'
rm 'maestro-v3.0.0-midi.zip'
mv 'maestro-v3.0.0' 'data/maestro-v3.0.0'

Data Split & Preprocessing:

There are various options for data preparation and splitting:

Tokenizer: Uses miditok REMI tokenizer, either loaded, untrained, or trained from scratch on the training set to a target vocab size.
Select Composers: Only top K composers (by number of compositions or total duration) are selected (TOP_K_COMPOSERS in config).
Train/Test splits: For each composer, compositions are split so that no composition appears in more than one split (ensures no data leakage).
Shuffle (recommended): Optionally shuffles before splitting (maintaining that no composition appears in more than one split). This creates a stratified split based on TEST_SIZE in config. If SHUFFLE=False, the data split provided in the MAESTRO dataset is used.
Augmentation: Optionally applies pitch, velocity, and duration augmentations to training data.

Training

Adjust training params in config.py and begin training the transformer with

python3 train.py

Tensorboard logs and eval plots will be saved in the specified LOG_DIR directory. View the logs with

tensorboard --logdir=<LOG_DIR>

Training Details:

Loss: Cross-entropy loss for multi-class classification.
Optimizer: AdamW.
LR Scheduler: MultiStepLR or CosineAnnealing.
Metrics: Tracks accuracy and F1-score, both at chunk and composition level (majority voting or confidence aggregation).

Results (wip)

Preliminary results (top K by number of compositions, 80:20 shuffled split, 20 epochs):

# composers	Composition F1 (Confidence Agg)	Composition F1 (Majority Vote)	Chunk F1	# params
3 (config)	0.98	0.98	0.84	406,948
5 (config)	0.97	0.97	0.86	402,921
10 (config)	0.90	0.89	0.69	407,822
13 (config)	0.87	0.82	0.68	408,689

Related Works

Deep Composer Classification Using Symbolic Representation (2020)(code)

Visual-based Musical Data Representation for Composer Classification (2022)

ComposeInStyle: Music composition with and without Style Transfer (2021)

Composer Classification with Cross-modal Transfer Learning and Musically-informed Augmentation (2021) (zero-shot)

Automated Thematic Composer Classification Using Segment Retrieval (2024)

Concept-Based Explanations For Composer Classification (2022)(code)

The following work achieves perfect acc/f1. Looking at their code, it appears that there is data leakage b/w the train and test sets. Their dataset (on which they do a random train/test split) for the 5 composer classification task has at most 482 unique compositions but 809 total recordings.

NLP-based music processing for composer classification (2023)

References

This repo is largely adapted from the following.

local attention: https://github.com/lucidrains/local-attention

conformer: https://github.com/jreremy/conformer, https://github.com/lucidrains/conformer

miditok: https://github.com/Natooz/MidiTok

Gulati, A., Qin, J., Chiu, C., Parmar, N., Zhang, Y., Yu, J., Han, W., Wang, S., Zhang, Z., Wu, Y., & Pang, R. (2020). Conformer: Convolution-augmented Transformer for Speech Recognition. ArXiv, abs/2005.08100. ↩
Cui XH, Hu P, Huang Z. Music sequence generation and arrangement based on transformer model. Journal of Computational Methods in Sciences and Engineering. 2025;0(0). doi:10.1177/14727978251337904. ↩ ↩²
Hawthorne, C., Stasyuk, A., Roberts, A., Simon, I., Huang, C.A., Dieleman, S., Elsen, E., Engel, J., & Eck, D. (2018). Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset. ArXiv, abs/1810.12247. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
assets		assets
configs		configs
src		src
.gitignore		.gitignore
README.md		README.md
_train_tokenizer.py		_train_tokenizer.py
config.py		config.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Todo

Setup

Dataset

Training

Results (wip)

Related Works

References

About

Uh oh!

Releases

Packages

Languages

imbulana/coma

Folders and files

Latest commit

History

Repository files navigation

Todo

Setup

Dataset

Training

Results (wip)

Related Works

References

Footnotes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages