🧪 QA: Add test for SubtitleProcessor edge case#122
🧪 QA: Add test for SubtitleProcessor edge case#122daggerstuff wants to merge 1 commit intostagingfrom
Conversation
Co-authored-by: daggerstuff <261005129+daggerstuff@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Reviewer's GuideAdds unit tests for SubtitleProcessor.format_as_markdown to cover both normal metadata usage and the empty-metadata edge case, improving regression protection around subtitle-to-Markdown formatting behavior. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
📝 WalkthroughWalkthroughTwo new test cases were added to the Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/utils/test_subtitle_processor.py (1)
39-42: Tighten empty-metadata assertions to catch formatting regressionsCurrent substring checks are valid but permissive. Since
utils/subtitle_processor.pydefines a structured markdown header, asserting a stricter prefix (or exact metadata block) would better catch accidental formatting drift.Proposed test hardening
def test_format_as_markdown_missing_metadata(): text = "Just some text." metadata = {} md = SubtitleProcessor.format_as_markdown(text, metadata) - assert "Unknown Title" in md - assert "Unknown Channel" in md - assert "**Source:** \n" in md - assert "**Date:** \n" in md + assert md.startswith( + "# Unknown Title\n\n" + "**Channel:** Unknown Channel\n" + "**Source:** \n" + "**Date:** \n\n" + ) + assert "## Transcript\n\nJust some text." in md🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/utils/test_subtitle_processor.py` around lines 39 - 42, The test is too permissive—tighten the assertions in tests/utils/test_subtitle_processor.py that currently check the md string for loose substrings (md variable) and instead assert the exact metadata block or a strict prefix that matches the structured markdown header produced by utils/subtitle_processor.py (e.g., assert the metadata block equals or md.startswith the exact header lines for Title, Channel, Source and Date when empty). Update the four assertions to compare the full expected metadata block (or use startswith with the exact multi-line header) so formatting regressions will fail the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/utils/test_subtitle_processor.py`:
- Around line 39-42: The test is too permissive—tighten the assertions in
tests/utils/test_subtitle_processor.py that currently check the md string for
loose substrings (md variable) and instead assert the exact metadata block or a
strict prefix that matches the structured markdown header produced by
utils/subtitle_processor.py (e.g., assert the metadata block equals or
md.startswith the exact header lines for Title, Channel, Source and Date when
empty). Update the four assertions to compare the full expected metadata block
(or use startswith with the exact multi-line header) so formatting regressions
will fail the test.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7ea3b5f9-f1b3-4d22-a31f-e44b3e6c74f6
📒 Files selected for processing (1)
tests/utils/test_subtitle_processor.py
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The two new tests share a lot of setup and assertion patterns; consider using
pytest.mark.parametrizeor a helper function to reduce duplication and make it easier to add moreformat_as_markdownscenarios later. - The assertion on the exact substring
"**Source:** \n"makes the test tightly coupled to formatting details; it may be more robust to assert on the presence of the label and the following content separately so minor formatting changes don’t cause spurious failures.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The two new tests share a lot of setup and assertion patterns; consider using `pytest.mark.parametrize` or a helper function to reduce duplication and make it easier to add more `format_as_markdown` scenarios later.
- The assertion on the exact substring `"**Source:** \n"` makes the test tightly coupled to formatting details; it may be more robust to assert on the presence of the label and the following content separately so minor formatting changes don’t cause spurious failures.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Pull request overview
Adds pytest coverage for SubtitleProcessor.format_as_markdown, including the “metadata present” path and the edge case where metadata is empty.
Changes:
- Added a test validating markdown output includes provided title/channel/url/date and the transcript text.
- Added a test validating default/fallback fields when metadata is an empty dict.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| result = SubtitleProcessor.clean_vtt(vtt_content) | ||
| assert result == "Hello Hello world" | ||
|
|
||
| def test_format_as_markdown_edge_cases(): |
There was a problem hiding this comment.
This test appears to be the normal/happy-path case (metadata fully populated) rather than an edge case. Renaming it to something like test_format_as_markdown_with_metadata would make the intent clearer and help future readers quickly find true edge-case coverage.
| def test_format_as_markdown_edge_cases(): | |
| def test_format_as_markdown_with_metadata(): |
| assert "**Source:** \n" in md | ||
| assert "**Date:** \n" in md |
There was a problem hiding this comment.
These assertions are fairly brittle because they depend on an exact newline/whitespace sequence. If format_as_markdown changes formatting slightly (extra spaces, \\r\\n, additional text after the label), the test will fail despite correct behavior. Prefer asserting on the presence/structure more flexibly (e.g., checking that the labels exist, or normalizing line endings/whitespace before asserting, or using a regex that allows optional whitespace).
| assert "**Source:** \n" in md | |
| assert "**Date:** \n" in md | |
| assert "**Source:**" in md | |
| assert "**Date:**" in md |
💡 What: Added test cases for
SubtitleProcessor.format_as_markdown🎯 Why: Covers missing edge cases when metadata dictionary is empty and ensures the normal case is also covered correctly.
✅ Verification: Runs successfully via pytest.
PR created automatically by Jules for task 2962032702155572427 started by @daggerstuff
Summary by Sourcery
Add unit tests to cover SubtitleProcessor.format_as_markdown behavior for normal and missing metadata cases.
Tests:
Summary by CodeRabbit