SongSlice

Local automatic audio section slicer. Takes an audio file, finds the section boundaries, groups repeated sections under shared labels (A, B, C, …), exports each section as a WAV slice, and writes sections.json plus sections.csv. Runs entirely on your machine — no cloud, no upload.

Install

Requires Python 3.11+ and pip.

From GitHub (recommended for users)

python -m pip install "git+https://github.com/VanKyle00/SongSlice.git"

This installs the songslice CLI directly from this repo. No clone required.

From a clone (recommended for development)

git clone https://github.com/VanKyle00/SongSlice.git
cd SongSlice
python -m pip install -e ".[dev]"

The [dev] extra pulls in test and benchmarking dependencies (pytest, mir_eval). Use plain python -m pip install -e . if you only want the runtime.

Verify the install

songslice --help

CLI

`songslice analyze`

Slice an audio file into labeled sections.

songslice analyze .\song.wav --out .\exports

Options:

Flag	Default	Description
`--out`, `-o`	`exports`	Output directory for slices + metadata.
`--min-section-seconds`	`8.0`	Minimum section duration. Larger values force coarser segmentation.
`--max-sections`	`16`	Hard cap on the number of sections.
`--label-groups`	`auto`	Number of distinct section labels. Default auto-selects from the affinity eigengap. A forced value is capped at 8 (and at the number of detected sections).

A run prints a one-line summary:

Detected structure: Intro B C D B E D F G H D
Exported 11 slices to .\exports
Overall confidence: 0.63

`songslice serve`

Run the local web app for upload + browser-based workflow.

songslice serve --port 8000

Then open http://127.0.0.1:8000.

`songslice bench`

Score the analyzer against ground-truth annotations (SALAMI / Isophonics formats). Useful when iterating on the analyzer.

songslice bench --manifest .\bench_manifest.toml --report .\bench-report.json

The manifest is TOML:

[[tracks]]
id = "SALAMI_10"
audio = "C:/Music/salami/10.mp3"
annotations = "C:/datasets/salami-data-public/annotations/10/parsed/textfile1_uppercase.txt"
format = "salami"

Reports per-track boundary F-measure at ±0.5 s and ±3 s plus pairwise label F-measure, then a corpus mean.

Output

For an input song.wav, the output directory contains:

Ordered WAV slices: 01_Intro_0m00s-0m19s.wav, 02_B_0m19s-0m33s.wav, …
sections.json — source file, duration, overall confidence, detected structure string, and per-section detail (label, start/end, durations, boundary + group confidences, beat-snap flag).
sections.csv — one row per section, same fields.

Labels are letters A, B, C, … assigned in temporal order, with the special label Intro when the first section is detected to be instrumental (see below).

How it works

Decode + resample. Audio is loaded at 22 050 Hz mono.
Feature extraction. HPSS-separated harmonic + percussive components; chroma CENS (key/chord profile), MFCC (timbre), RMS, spectral centroid, and harmonic-RMS (voice-activity proxy) at hop-rate.
Beat tracking. Frame features synced to detected beats; downstream work happens on beat-segments.
Boundary detection. Foote checkerboard novelty on a combined chroma + MFCC self-similarity matrix; boundary candidates picked from novelty peaks. Candidates are re-ranked by a vocal-change score — peaks where the harmonic component's energy shifts (voice entering/leaving) get boosted.
Beat snapping. Each surviving boundary snaps to the nearest beat when close enough.
Section grouping. Each segment gets two feature blocks:
- Harmonic (chroma mean + std, 24 dims)
- Timbral (MFCC + RMS + centroid means and stds, 28 dims)
Pairwise Pearson correlation on each axis yields a harmonic and a timbral affinity. Their element-wise product — high only when BOTH axes agree — is the affinity for spectral clustering: segments are partitioned by normalized-cut clustering of the affinity's Laplacian, with the number of label groups taken from the eigengap (capped internally at 8). Pass --label-groups N to force a specific count. This follows McFee & Ellis, "Analyzing song structure with spectral clustering" (ISMIR 2014).
Adjacent merge. Consecutive sections with the same label collapse into one.
Intro labeling. If the first section has notably lower MFCC mid-coefficient variability than the next section (voices wiggle the spectral envelope more than sustained instruments) AND the two sections are feature-different, the first section's label becomes Intro.
Export. WAV slices, JSON, CSV.

Tuning

If the default analysis under- or over-segments a particular track:

Too many short sections within what's perceptually one part? Raise --min-section-seconds (try 15 or 20). Forces detected boundaries to be further apart.
Two sections wrongly grouped under the same label? Force a finer split: set --label-groups one higher than the detected label count.
Repeated sections wrongly given different labels? Force a coarser grouping: set --label-groups one lower.
Too many sections overall? Lower --max-sections.

Limitations

The analyzer uses classical DSP features (chroma, MFCC, RMS, spectral centroid, HPSS-derived voice proxy) plus structural clustering — not a trained section-classification model. Concrete consequences:

Subtle verse-to-chorus transitions can be missed, especially in genres with continuous dynamics (shoegaze, dream-pop, ambient pop). Songs where the chorus enters by gradually layering instruments often produce a weaker novelty peak at the section entry than at sharper mid-section events (drum entries, chord-progression cadences). Tuning --min-section-seconds can help; sometimes no automatic setting matches human perception exactly.
The Intro heuristic uses MFCC mid-coefficient variability as a proxy for vocal vs. instrumental content. It's not real voice-activity detection — a section with a busy lead instrument (saxophone solo, fast guitar) may register as "vocal-like" and skip the Intro rebrand. A section with a steady synth and no vocals will register as instrumental and get the rebrand. This works for the common case but isn't infallible.
Labels are structural, not semantic. A B section is "everything grouped under letter B," not specifically "verse" or "chorus." Two sections sharing a label mean only that the analyzer found them musically similar; the meaning is whatever the song's actual structure is.
No support for stems or multi-track input — analysis is on the mixed audio.
Long sections (multiple minutes) of homogeneous content may produce fewer boundaries than a human listener expects; conversely, arrangement-heavy sections with internal variation may be over-split.

Future work

The grouping step is classical spectral clustering. A learned model could do better, especially at distinguishing functional sections (verse vs chorus) rather than only structural similarity. The natural next step is an optional trained analyzer behind the existing Analyzer interface, following:

Morgan Buisson, Brian McFee, Slim Essid. "Using Pairwise Link Prediction and Graph Attention Networks for Music Structure Analysis." ISMIR 2024. Code: https://github.com/morgan76/LinkSeg

That method is lightweight (<330K parameters, runs locally) and is strongest at structural grouping and section labeling — the same axis this release improves with DSP. Boundary detection would stay with the current novelty-based stage.

Attribution

Section grouping uses spectral clustering after Brian McFee and Daniel P. W. Ellis, "Analyzing song structure with spectral clustering," ISMIR 2014.

Tests

python -m pytest tests/

The suite covers feature extraction edge cases, boundary detection fallbacks, two-axis discrimination (same pitch / different timbre and vice versa), adjacent-duplicate merging, intro labeling, CLI invocation, metadata writing, and the bench scorer.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
specs		specs
src/songslice		src/songslice
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SongSlice

Install

From GitHub (recommended for users)

From a clone (recommended for development)

Verify the install

CLI

`songslice analyze`

`songslice serve`

`songslice bench`

Output

How it works

Tuning

Limitations

Future work

Attribution

Tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SongSlice

Install

From GitHub (recommended for users)

From a clone (recommended for development)

Verify the install

CLI

songslice analyze

songslice serve

songslice bench

Output

How it works

Tuning

Limitations

Future work

Attribution

Tests

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`songslice analyze`

`songslice serve`

`songslice bench`

Packages