Handle season ranges in folder names, and reject if unparsable#434
Handle season ranges in folder names, and reject if unparsable#434slbyrnes1 wants to merge 1 commit intomaxdorninger:masterfrom
Conversation
|
Hi! thanks for taking the time to contribute :) I think some of the changes were already implemented in another PR that was just merged. It would still be great if you could rebase this onto master so I can merge it. |
cd156f4 to
a3acd91
Compare
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: Repository UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (7)
🚧 Files skipped from review as they are similar to previous changes (3)
📜 Recent review details🧰 Additional context used🪛 checkmake (0.2.2)Makefile[warning] 15-15: Missing required phony target "all" (minphony) [warning] 15-15: Missing required phony target "clean" (minphony) 🔇 Additional comments (8)
📝 WalkthroughWalkthroughThis pull request enhances season and episode parsing logic to support flexible numbering and multilingual season variants, improves regex pattern matching in TV service imports, adds comprehensive test coverage for TV service functionality, and introduces a test target to the build system. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
media_manager/indexer/schemas.py (1)
57-130:⚠️ Potential issue | 🔴 CriticalCritical: Unresolved Git merge conflict markers must be removed.
The file contains merge conflict markers (
<<<<<<< HEAD,=======,>>>>>>> bf4cd4e) which will cause a syntax error and prevent the code from running. This aligns with the maintainer's request to rebase the branch onto master.Additionally, the two conflicting implementations have different capabilities:
- HEAD version (lines 58-107): Supports
S01E01, rangeS01-S03, packS01, andSeason 1patterns, plus episode parsing- PR branch version (lines 109-129): Supports explicit range
S01-S08, shorthandS01-8, and individualS##matchesThe PR branch version appears to lose support for the
Season Xformat (e.g., "Show Season 3 Complete" would return[]instead of[3]) and does not include theepisodecomputed field.Please resolve the merge conflicts by rebasing onto master and deciding which logic to keep or how to merge them appropriately.
media_manager/tv/service.py (1)
711-714:⚠️ Potential issue | 🟡 MinorInconsistency with
import_episoderegex pattern.Both
import_episodeandimport_episode_filesserve equivalent purposes but use different regex patterns. Theimport_episodepattern supports multi-episode formats (E1-2E3) and hyphen separators, whileimport_episode_filesuses the older pattern without these features. This means they handle different filename formats despite serving the same role.Update
import_episode_filesto match the pattern fromimport_episodefor consistency:Suggested fix
pattern = ( - r".*[. ]S0?" + str(season.number) + r"E0?" + str(episode.number) + r"[. ].*" + r".*[. ]S0?" + str(season.number) + r"E(?:\d+-?E)?0?" + str(episode.number) + r"[. -].*" )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@media_manager/tv/service.py` around lines 711 - 714, The regex used in import_episode_files is out of sync with import_episode: update the pattern and subtitle_pattern definitions in import_episode_files to match import_episode’s more flexible regex (supporting multi-episode ranges like E1-2E3 and hyphen separators). Specifically, replace the old pattern construction with the same logic used in import_episode so the variable pattern (built from season.number and episode.number) and subtitle_pattern (adding language capture + .srt) match import_episode’s regex behavior; ensure import_episode_files uses the same symbol names pattern and subtitle_pattern so both functions handle the same filename formats.
🧹 Nitpick comments (3)
media_manager/indexer/schemas.py (1)
119-121: Minor edge case consideration for shorthand range validation.The condition
if end > startcorrectly prevents invalid ranges likeS05-3from being interpreted as ranges. However, explicit ranges likeS05-S03on line 112 don't have this validation and would producelist(range(5, 4))which is an empty list.Consider adding similar validation for explicit ranges for consistency:
💡 Suggested improvement
range_match = re.search(r"\bS(\d+)\s*-\s*S(\d+)\b", self.title, re.IGNORECASE) if range_match: - return list(range(int(range_match.group(1)), int(range_match.group(2)) + 1)) + start, end = int(range_match.group(1)), int(range_match.group(2)) + if start <= end: + return list(range(start, end + 1))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@media_manager/indexer/schemas.py` around lines 119 - 121, The explicit-range parsing block that parses patterns like "S05-S03" should mirror the shorthand validation: after extracting start and end (the same start, end variables used in the shorthand block), ensure you check that end > start before returning list(range(start, end + 1)); if the check fails, handle it consistently (e.g., return an empty list or raise a ValueError) so explicit ranges do not produce an unintended empty range.tests/test_season_parser.py (1)
20-31: Consider adding test coverage for additional patterns.The test cases cover the main scenarios well, but a few additional cases would improve confidence:
- S##E## format: Titles with episode numbers (e.g.,
"Show S01E05 720p"→[1])- Invalid/reversed range: Edge case like
"Show S05-3 720p"→ should probably return[5](falls through to individual match)- Multiple individual seasons:
"Show S01 S03 S05 720p"→[1, 3, 5]💡 Suggested additional test cases
`@pytest.mark.parametrize`( "title, expected", [ # ... existing cases ... ("Show S01E05 720p", [1]), # with episode number ("Show S05-3 720p", [5]), # invalid range falls back to individual ("Show S01 S03 S05 720p", [1, 3, 5]), # multiple individual seasons ], )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_season_parser.py` around lines 20 - 31, Add three new parameterized test cases to the existing parametrize block in tests/test_season_parser.py for the season parser: include a title with an episode number ("Show S01E05 720p" expecting [1]), an invalid/reversed range ("Show S05-3 720p" expecting [5] to treat as individual season), and a multiple-individual-seasons case ("Show S01 S03 S05 720p" expecting [1,3,5]); update the same parametrize list (the "title, expected" block) so the test function that consumes these parameters validates these additional patterns alongside the existing cases.media_manager/tv/service.py (1)
614-619: Regex may not match shorthand multi-episode format for the second episode.The updated regex
E(?:\d+-?E)?0?{episode}correctly handles formats likeS09E19-E20but won't match the shorthand formatS09E19-20when searching for episode 20.For example, with
S01E01-02.mkv:
- Finding episode 1: ✓ (matches
E01directly)- Finding episode 2: ✗ (
E(?:\d+-?E)?0?2cannot matchE01-02since the optional group requiresEafter digits)If shorthand ranges like
E01-02are common in your media library, consider extending the pattern:💡 Suggested improvement
pattern = ( r".*[. ]S0?" + str(season.number) - + r"E(?:\d+-?E)?0?" + + r"E(?:\d+-?E?)?0?" + str(episode_number) + r"[. -].*" )This change makes the
Ewithin the optional group itself optional, allowing bothE19-E20andE19-20formats.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@media_manager/tv/service.py` around lines 614 - 619, The regex in the pattern variable used when locating episodes (built near season.number and episode_number) doesn't match shorthand ranges like "S01E01-02" because the optional group forces an 'E' after the first episode; update the optional group so the second 'E' is optional (e.g., make the group something like (?:\d+-?E?)? or equivalent) so both "E19-E20" and "E19-20" forms are accepted while keeping the rest of the pattern (the leading "E", optional zero-padding, and episode_number insertion) intact in the same spot where pattern is constructed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@media_manager/tv/service.py`:
- Around line 711-714: The regex used in import_episode_files is out of sync
with import_episode: update the pattern and subtitle_pattern definitions in
import_episode_files to match import_episode’s more flexible regex (supporting
multi-episode ranges like E1-2E3 and hyphen separators). Specifically, replace
the old pattern construction with the same logic used in import_episode so the
variable pattern (built from season.number and episode.number) and
subtitle_pattern (adding language capture + .srt) match import_episode’s regex
behavior; ensure import_episode_files uses the same symbol names pattern and
subtitle_pattern so both functions handle the same filename formats.
---
Nitpick comments:
In `@media_manager/indexer/schemas.py`:
- Around line 119-121: The explicit-range parsing block that parses patterns
like "S05-S03" should mirror the shorthand validation: after extracting start
and end (the same start, end variables used in the shorthand block), ensure you
check that end > start before returning list(range(start, end + 1)); if the
check fails, handle it consistently (e.g., return an empty list or raise a
ValueError) so explicit ranges do not produce an unintended empty range.
In `@media_manager/tv/service.py`:
- Around line 614-619: The regex in the pattern variable used when locating
episodes (built near season.number and episode_number) doesn't match shorthand
ranges like "S01E01-02" because the optional group forces an 'E' after the first
episode; update the optional group so the second 'E' is optional (e.g., make the
group something like (?:\d+-?E?)? or equivalent) so both "E19-E20" and "E19-20"
forms are accepted while keeping the rest of the pattern (the leading "E",
optional zero-padding, and episode_number insertion) intact in the same spot
where pattern is constructed.
In `@tests/test_season_parser.py`:
- Around line 20-31: Add three new parameterized test cases to the existing
parametrize block in tests/test_season_parser.py for the season parser: include
a title with an episode number ("Show S01E05 720p" expecting [1]), an
invalid/reversed range ("Show S05-3 720p" expecting [5] to treat as individual
season), and a multiple-individual-seasons case ("Show S01 S03 S05 720p"
expecting [1,3,5]); update the same parametrize list (the "title, expected"
block) so the test function that consumes these parameters validates these
additional patterns alongside the existing cases.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
media_manager/indexer/schemas.pymedia_manager/tv/service.pytests/__init__.pytests/test_season_parser.py
📜 Review details
🧰 Additional context used
🧬 Code graph analysis (2)
media_manager/tv/service.py (2)
media_manager/indexer/schemas.py (1)
season(56-56)tests/test_season_parser.py (1)
season(16-17)
tests/test_season_parser.py (1)
media_manager/indexer/schemas.py (2)
IndexerQueryResult(13-56)season(56-56)
🔇 Additional comments (1)
tests/test_season_parser.py (1)
5-17: LGTM!The test fixture structure is clean and well-organised. The
DEFAULTSdict provides all required fields, and theseason()helper function makes tests readable and maintainable.
89c3901 to
4e6e71b
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
media_manager/tv/service.py (1)
602-610: Good refactoring; consider tightening the episode number matching to avoid false positives.Centralising the pattern is a good improvement, and the
[. -]fix correctly addresses multi-episode filenames likeS09E19-E20 - Title.mkv.However, the
(?:\d+-?E?)?0?{episode}portion can produce false positives when the target episode number is a substring of the actual episode number. For example, when matching episode 1 against filenameShow.S01E11.mkv:
(?:\d+-?E?)?matches1(first digit only)0?matches empty1matches the second1[. -]matches.This would incorrectly identify E11 as E01.
This is a pre-existing issue, so it's not a blocker for this PR, but worth considering a fix since you're centralising the pattern.
🔧 Suggested fix using zero-padded format
`@staticmethod` def _episode_file_pattern(season_number: int, episode_number: int) -> str: return ( - r".*[. ]S0?" - + str(season_number) - + r"E(?:\d+-?E?)?0?" - + str(episode_number) + r".*[. ]S" + + f"{season_number:02d}" + + r"E(?:\d{2,3}-?E?)?" + + f"{episode_number:02d}" + r"[. -].*" )This enforces zero-padded two-digit format (the standard for media files), eliminating substring false matches.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@media_manager/tv/service.py` around lines 602 - 610, The current _episode_file_pattern allows substring matches (e.g., E11 matching target episode 1); change the episode portion to require a two-digit zero-padded episode token so matches are exact. In _episode_file_pattern(season_number, episode_number) format episode_number as two digits (e.g., "01", "11") and update the regex to expect two digits for the episode (use \d{2} where applicable) instead of the current optional "0?" + str(episode_number) and loosened digit group; keep the existing multi-episode/range handling (the (?:\d+-?E?)? group) but constrain it to two-digit numbers (e.g., \d{2}) so E11 cannot be misread as E01 when matching episode 1.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@media_manager/tv/service.py`:
- Around line 602-610: The current _episode_file_pattern allows substring
matches (e.g., E11 matching target episode 1); change the episode portion to
require a two-digit zero-padded episode token so matches are exact. In
_episode_file_pattern(season_number, episode_number) format episode_number as
two digits (e.g., "01", "11") and update the regex to expect two digits for the
episode (use \d{2} where applicable) instead of the current optional "0?" +
str(episode_number) and loosened digit group; keep the existing
multi-episode/range handling (the (?:\d+-?E?)? group) but constrain it to
two-digit numbers (e.g., \d{2}) so E11 cannot be misread as E01 when matching
episode 1.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
media_manager/indexer/schemas.pymedia_manager/tv/service.pytests/__init__.pytests/test_season_parser.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/test_season_parser.py
📜 Review details
🔇 Additional comments (3)
media_manager/indexer/schemas.py (1)
72-75: LGTM!The refactored season extraction logic correctly handles multiple season pack indicators (e.g.,
S01 S03 S05) by collecting all matches, deduplicating withset(), and returning a sorted list. The placement after the range-matching block ensures explicit ranges likeS01-S08are handled first.media_manager/tv/service.py (2)
624-624: LGTM!Correctly uses the new centralised helper method.
715-715: LGTM!Correctly uses the new centralised helper method, consistent with the usage in
import_episode.
|
Thanks so much! I realized that the |
4e6e71b to
bccdf55
Compare
Season parser now collects all individual season tokens (S01 S03 S05) via re.findall instead of re.search, returning a sorted, deduplicated list. Also adds multi-language season keyword support (saison, series, stagione) to match Sonarr parity. Episode file-matching regex replaces [. ] separator anchors with negative lookbehind/lookahead word-boundary assertions, preventing false positives on adjacent digits. Subtitle pattern is now end-of-string anchored with a flexible separator before the language code. Adds a parametrized test suite covering all parser cases and import method behavior (separator variants, zero-padding, wrong-episode rejection, subtitle extraction), a `make test` target, and ruff per-file ignores for the tests/ directory.
bccdf55 to
39226e6
Compare
|
Wow! You've done a lot of amazing work since I opened my very random PR two weeks ago, hah. At this point, I think the biggest thing this PR provides are just some pytests, but I did also enable the season parsing to understand a couple more cases. I also realized I forgot to put this back in draft :) oops! This PR can be considered "ready" now. |
Hello!
First, thank you so much for writing and maintaining this software. I recently dove into the world of the *Arr suite, and after playing with Sonarr and Radarr a couple weeks, it's been a delight picking up MediaManager as a replacement.
I hope you don't mind I opened a PR for this. Feel free to reject it or let me know how I can improve it. I reviewed the contributing guide and, while this is more than a couple lines, I ended up just "doing the work" first alongside Claude Code as it helped me learn and understand this project.
AI disclaimer: I used Claude Code to help me navigate and learn the project. This code was written by Claude with modifications by me, after planning mode where I gave direction and instructions. If you feel the need to reject the PR for that reason, I understand. (I'm a python developer by trade, for what it is worth). The rest of this PR description was also mostly written by Claude, with my editing.
Summary
This fixes a couple "bugs" that were causing import issues for me:
Season parser fails on range-style torrent titles — The
IndexerQueryResult.seasonregex only handled exactly 1 or 2\bS(\d+)\bmatches. Titles with season ranges likeS01-8 S01-S08produced 3 matches, which fell through to theelsebranch and returned[]. This caused torrents to download without anySeasonFileassociation.Multi-episode files fail to match during import — The
import_episoderegex used[. ]as the separator after the episode number, so multi-episode files likeS09E19-E20 - Double Episode Special.mkvnever matched when importing E19 (the-afterE19isn't.or space).Observed behavior
Torrents downloaded via the UI were never imported by the scheduled job. Logs showed:
For multi-episode files, the import job silently skipped the file because the regex couldn't match:
Changes
media_manager/indexer/schemas.py— ImprovedIndexerQueryResult.seasonparser:S01-S08→[1,2,...,8])S01-8→[1,2,...,8])S##matches, deduplicatedS##pattern still return[]media_manager/tv/service.py— Updatedimport_episoderegex:(?:\d+-?E)?to optionally match the first episode in a multi-episode range (e.g.,19-Ebefore20)[. ]to[. -]to allow hyphentests/test_season_parser.py— Added parametrized tests for the season parser covering single seasons, explicit ranges, shorthand ranges, and titles that should return empty.Summary by CodeRabbit
Release Notes
New Features
Tests
Chores
make testcommand for running the test suite