diff --git a/README.md b/README.md index 55417f9..9f842e2 100644 --- a/README.md +++ b/README.md @@ -12,11 +12,13 @@ Requires Python 3.10+. pip install -e . ``` -For remuxing, you also need [MKVToolNix](https://mkvtoolnix.download/) (`mkvmerge` on PATH). +Optional external tools: +- [MKVToolNix](https://mkvtoolnix.download/) (`mkvmerge` on PATH) for `bdpl remux` +- [FFmpeg](https://ffmpeg.org/) (`ffmpeg` on PATH) for `bdpl archive` ## Quick Start -Point `bdpl` at a `BDMV/` directory from a disc backup: +Point `bdpl` at a `BDMV/` directory (or a parent directory containing `BDMV/`) from a disc backup: ```bash # Detect episodes and write structured JSON @@ -65,16 +67,16 @@ Episodes Ep 3 27:22 conf=0.80 clips=[00009] ------------------------------------------------------------ -Special Features (9) +Special Features (9 total, 5 visible) ------------------------------------------------------------ - 1. 00003.mpls 01:06 creditless_op - 2. 00004.mpls 01:43 creditless_ed - 3. 00005.mpls 02:00 creditless_ed + 1. 00003.mpls 01:06 creditless_op [visible] + 2. 00004.mpls 01:43 creditless_ed [visible] + 3. 00005.mpls 02:00 creditless_ed [hidden] ... - 6. 00008.mpls ch.0 01:22 creditless_ed - 7. 00008.mpls ch.1 00:16 creditless_ed - 8. 00009.mpls 00:16 extra - 9. 00010.mpls 00:17 extra + 6. 00008.mpls ch.0 01:22 creditless_ed [visible] + 7. 00008.mpls ch.1 00:16 creditless_ed [visible] + 8. 00009.mpls 00:16 extra [hidden] + 9. 00010.mpls 00:17 extra [hidden] ------------------------------------------------------------ Warnings @@ -108,8 +110,9 @@ Output includes: - Full playlist inventory with durations, streams, and chapters - Episode candidates with confidence scores - Episode scene segments (menu-scene boundaries when available) -- Special features (creditless OP/ED, extras, previews) detected from IG menus +- Special features (creditless OP/ED, commentary, extras, previews) with `menu_visible` flag - Playlist classifications (episode, play_all, bumper, creditless_op, etc.) +- Disc title (extracted from `META/DL/bdmt_eng.xml` when available, falls back to other `bdmt_*.xml`) - Warnings for ambiguous or low-confidence results ### `bdpl explain` @@ -136,16 +139,21 @@ Remux episodes to MKV with chapters and named tracks. Requires `mkvmerge` (MKVTo ```bash bdpl remux /path/to/BDMV --out ./Episodes bdpl remux /path/to/BDMV --pattern "My Show (2024) - S01E{ep:02d}.mkv" +bdpl remux /path/to/BDMV --mkvmerge-path /path/to/mkvmerge bdpl remux /path/to/BDMV --dry-run -# Also remux special features (creditless OP/ED, extras, previews) +# Also remux special features (creditless OP/ED, commentary, extras, previews) bdpl remux /path/to/BDMV --specials bdpl remux /path/to/BDMV --specials --specials-pattern "My Show - S00E{idx:02d} - {category}.mkv" + +# Only include specials visible in the disc menu (exclude hidden extras) +bdpl remux /path/to/BDMV --specials --visible-only ``` -Default filenames use Plex/Jellyfin-compatible `SxxExx` format with the disc folder name -(e.g., `UCG_0080_D1 - S01E01.mkv`, `UCG_0080_D1 - S00E01 - creditless_op.mkv`). -Pattern variables: `{name}` (disc folder), `{ep}` (episode #), `{idx}` (special #), `{category}` (special type). +Default filenames use Plex/Jellyfin-compatible `SxxExx` format with the disc title +extracted from `META/DL/bdmt_eng.xml` (falls back to other `bdmt_*.xml`, then the +BDMV parent folder name). +Pattern variables: `{name}` (disc title), `{ep}` (episode #), `{idx}` (special #), `{category}` (special type). ### `bdpl archive` @@ -154,12 +162,17 @@ Extract still images for digital archive playlists (menu/gallery-style content). ```bash bdpl archive /path/to/BDMV --out ./DigitalArchive bdpl archive /path/to/BDMV --format png +bdpl archive /path/to/BDMV --ffmpeg-path /path/to/ffmpeg bdpl archive /path/to/BDMV --dry-run + +# Only include archives visible in the disc menu +bdpl archive /path/to/BDMV --visible-only ``` The command detects playlists classified as `digital_archive` and captures one frame per archive item via `ffmpeg`, naming outputs as -`{playlist}-{index}-{clip_id}.{ext}`. +`{stem}-{index:03d}-{clip_id}.{ext}` (e.g., `00008-001-00005.jpg`). +Requires `ffmpeg` on PATH (or use `--ffmpeg-path`). ## How It Works @@ -167,22 +180,22 @@ bdpl reads the raw BDMV binary structures — no external tools needed for analy 1. **Parse** `PLAYLIST/*.mpls` files to extract play items (clip references with in/out timestamps), chapters, and stream tables 2. **Parse** `CLIPINF/*.clpi` files for stream metadata (codecs, languages) -3. **Parse disc hints** from `index.bdmv` (title→playlist mapping), `MovieObject.bdmv` (navigation commands), and IG menu streams (button→chapter mappings) +3. **Parse disc hints** from `index.bdmv` (title→playlist mapping), `MovieObject.bdmv` (navigation commands), IG menu streams (button→chapter mappings), and disc title from `META/DL/bdmt_eng.xml` (falls back to other `bdmt_*.xml`) 4. **Analyze** the playlist graph: - Compute segment signatures and deduplicate near-identical playlists - - Cluster playlists by duration to find episode-length candidates - Detect "Play All" playlists (concatenations of other playlists) - Label shared segments as OP, ED, BODY, PREVIEW, LEGAL + - Classify playlists by duration and segment structure (episode, play_all, bumper, etc.) 5. **Infer** episode order using multiple strategies (see below) 6. **Boost confidence** when navigation hints and IG menu data confirm episode boundaries -7. **Detect special features** from IG menu `JumpTitle` buttons pointing to non-episode playlists (creditless OP/ED, extras, previews) -8. **Extract scenes** for each episode from IG/title hints and chapter anchors (fallback for metadata-only fixtures) +7. **Detect special features** from IG menu `JumpTitle` buttons pointing to non-episode playlists (creditless OP/ED, commentary, extras, previews) +8. **Extract scenes** for each episode from IG menu chapter marks and chapter anchors 9. **Export** results as JSON, text reports, M3U playlists, or MKV remux (including specials) ### Episode Inference Strategies - **Individual episode playlists**: Each episode has its own MPLS with a unique "body" segment, plus shared OP/ED. Episodes are ordered by body clip ID. -- **Play All decomposition**: Some discs (common in anime) only have a single "Play All" playlist. bdpl decomposes it — each long play item (>10 min) becomes an episode. +- **Play All decomposition**: Some discs (common in anime) only have a single "Play All" playlist. bdpl decomposes it — each play item ≥5 minutes becomes an episode. - **Chapter-based splitting**: When a disc has a single long m2ts with multiple chapters but no separate playlists, bdpl splits into episodes using chapter boundaries and target duration heuristics. When disc hints show a single main title plus a separate digital archive title, @@ -194,9 +207,13 @@ Each detected episode gets a confidence score (0–1) based on how it was identi | Source | Base | Possible boosts | |--------|------|-----------------| -| Individual playlists | 0.9 | +0.1 title hint | -| Play All decomposition | 0.7 | +0.1 title hint | +| Individual playlists | 0.9 | +0.1 title hint, +0.1 IG chapter marks | +| Play All decomposition | 0.7 | +0.1 title hint, +0.1 IG chapter marks | | Chapter splitting | 0.6 | +0.1 title hint, +0.1 IG chapter marks | +| Title-hint collapse (single main + archive) | 0.85 | +0.1 title hint, +0.1 IG chapter marks | +| Variant-dedup collapse | 0.85 | +0.1 title hint, +0.1 IG chapter marks | + +All boosts are capped at 1.0. ## JSON Schema @@ -205,7 +222,7 @@ The `scan` output uses schema version `bdpl.disc.v1`: ```json { "schema_version": "bdpl.disc.v1", - "disc": { "path": "...", "generated_at": "2026-02-08T..." }, + "disc": { "path": "...", "title": "My Anime Vol.1", "generated_at": "2026-02-08T..." }, "playlists": [ { "mpls": "00002.mpls", @@ -214,17 +231,24 @@ The `scan` output uses schema version `bdpl.disc.v1`: { "clip_id": "00007", "m2ts": "00007.m2ts", + "in_time": 0, + "out_time": 70878307, "duration_ms": 1575073.5, "label": "BODY", + "segment_key": ["00007", 0.0, 1575000.0], "streams": [ { "pid": 4113, "codec": "H.264/AVC", "lang": "" }, - { "pid": 4352, "codec": "LPCM", "lang": "jpn" }, - { "pid": 4608, "codec": "PGS", "lang": "jpn" }, - { "pid": 4609, "codec": "PGS", "lang": "eng" } + { "pid": 4352, "codec": "LPCM", "lang": "jpn" } ] } ], - "chapters": [{ "mark_id": 0, "mark_type": 1, "timestamp": 188955000 }] + "chapters": [ + { "mark_id": 0, "mark_type": 1, "play_item_ref": 0, "timestamp": 188955000, "duration_ms": 0.0 } + ], + "streams": [ + { "pid": 4113, "codec": "H.264/AVC", "lang": "" }, + { "pid": 4352, "codec": "LPCM", "lang": "jpn" } + ] } ], "episodes": [ @@ -233,6 +257,16 @@ The `scan` output uses schema version `bdpl.disc.v1`: "playlist": "00002.mpls", "duration_ms": 1575073.5, "confidence": 0.70, + "segments": [ + { + "key": ["00007", 0.0, 1575000.0], + "clip_id": "00007", + "in_ms": 0.0, + "out_ms": 1575073.5, + "duration_ms": 1575073.5, + "label": "BODY" + } + ], "scenes": [ { "key": ["SCENE", "00002.mpls", 1], @@ -245,16 +279,29 @@ The `scan` output uses schema version `bdpl.disc.v1`: ] } ], - "warnings": [{ "code": "PLAY_ALL_ONLY", "message": "..." }], + "special_features": [ + { + "index": 1, + "playlist": "00003.mpls", + "category": "creditless_op", + "duration_ms": 89400.0, + "menu_visible": true + } + ], + "warnings": [{ "code": "PLAY_ALL_ONLY", "message": "...", "context": {} }], "analysis": { "classifications": { "00002.mpls": "play_all" } } } ``` +Special features may also include `chapter_start` (integer chapter index) when the +feature is a chapter-window slice of a multi-feature playlist. + ## Development ```bash -pip install -e ".[dev]" -pytest tests/ -v +pip install -e ".[dev]" # Installs ruff, pytest, etc. +ruff check . && ruff format . # Lint and format +pytest tests/ -v # 452 tests, all bundled (no env var needed) ``` ### Project Structure @@ -262,30 +309,37 @@ pytest tests/ -v ``` bdpl/ bdpl/ - cli.py # Typer CLI - model.py # Dataclasses (Playlist, Episode, etc.) + __init__.py # Package root, version (v0.1.0) + cli.py # Typer CLI (scan, explain, playlist, remux, archive) + model.py # Dataclasses (Playlist, Episode, SpecialFeature, etc.) bdmv/ - reader.py # Big-endian binary reader - mpls.py # MPLS playlist parser - clpi.py # CLPI clip info parser - index_bdmv.py # index.bdmv title mapping parser - movieobject_bdmv.py # MovieObject.bdmv navigation command parser - ig_stream.py # [Experimental] IG menu stream parser + reader.py # Big-endian binary reader + mpls.py # MPLS playlist parser + clpi.py # CLPI clip info parser + index_bdmv.py # index.bdmv title mapping parser + movieobject_bdmv.py # MovieObject.bdmv navigation command parser + ig_stream.py # [Experimental] IG menu stream parser analyze/ - __init__.py # scan_disc() pipeline - signatures.py # Deduplication - clustering.py # Duration clustering - segment_graph.py # Segment reuse & Play All detection - classify.py # Segment & playlist labeling - ordering.py # Episode ordering (individual, Play All, chapter split) - explain.py # Human-readable reports + __init__.py # scan_disc() pipeline + signatures.py # Deduplication + clustering.py # Duration clustering + segment_graph.py # Segment reuse & Play All detection + classify.py # Segment & playlist labeling + ordering.py # Episode ordering (individual, Play All, chapter split) + explain.py # Human-readable reports export/ - json_out.py # JSON output - text_report.py # Text reports - m3u.py # M3U playlists - mkv_chapters.py # MKV remux with chapters (via mkvmerge) + json_out.py # JSON output (bdpl.disc.v1) + text_report.py # Text reports + m3u.py # M3U playlists + mkv_chapters.py # MKV remux with chapters (via mkvmerge) + digital_archive.py # Digital archive image extraction (via ffmpeg) + remux/ # (placeholder) direct remux integration + util/ # (placeholder) hashing/log helpers tests/ - fixtures/ # Bundled BDMV metadata for portable tests + fixtures/ # 28 bundled BDMV metadata fixtures (no copyrighted media) + .github/ + instructions/ # Copilot coding instructions + skills/ # Copilot agent skills pyproject.toml ```