Skip to content

chore: update 3play sync for multi language#3029

Open
umar8hassan wants to merge 1 commit intoumar/10983-ingest-drive-files-for-multi-languagefrom
umar/10984-update-3play-sync-for-multi-language
Open

chore: update 3play sync for multi language#3029
umar8hassan wants to merge 1 commit intoumar/10983-ingest-drive-files-for-multi-languagefrom
umar/10984-update-3play-sync-for-multi-language

Conversation

@umar8hassan
Copy link
Copy Markdown
Contributor

@umar8hassan umar8hassan commented May 4, 2026

What are the relevant tickets?

https://github.com/mitodl/hq/issues/10984

Description (What does it do?)

Fixes the 3Play sync path to write captions and transcripts using the post-10982 array format and to guard correctly against existing content.

How can this be tested?

  1. have working 3play locally, you can add a video resource with youtube_id: E8uZtq_vOYM and it'd pull the related transcript and captions.
  2. Open the video resource in Studio, and confirm video_captions_resource now shows a single English caption resource and video_transcript_resource shows a transcript.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for multi-language video captions and transcripts by transitioning metadata from single URL strings to arrays of objects. Key updates include modifying the site configuration to use multi-select relation widgets, adding filename parsing for language detection, and providing data migrations for existing records. Feedback was provided regarding code duplication in the task logic for building file data, with a suggestion to extract a shared helper function to improve maintainability.

Comment thread videos/tasks.py
Comment on lines +170 to +184
captions_file_data = []
for r in captions_list:
if not (getattr(r, "file", None) and r.file.name):
continue
lang, locale = parse_caption_language_locale(r.filename or "")
entry: dict = {
"file": urljoin(
"/",
r.file.name.replace(website.s3_path, website.url_path),
),
"language": lang,
}
if locale:
entry["locale"] = locale
captions_file_data.append(entry)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This logic for building file data from a list of resources is duplicated for captions, transcripts, and in update_transcripts_for_video. To improve maintainability and reduce code duplication, please extract this into a helper function in videos/utils.py.

@umar8hassan umar8hassan force-pushed the umar/10984-update-3play-sync-for-multi-language branch from d53e07c to 061f4c1 Compare May 4, 2026 13:19
Comment thread videos/utils_test.py
("lecture1_captions_en_us_vtt", ("en", "US")),
("lecture1_captions_fr_en_vtt", ("fr", "EN")),
("lecture1_transcript_en_us_pdf", ("en", "US")),
("lecture1_transcript_fr_gb_pdf", ("fr", "GB")),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to pick a realistic locale for the test, e.g., FR, CH, etc.

@umar8hassan umar8hassan changed the base branch from master to umar/10983-ingest-drive-files-for-multi-language May 6, 2026 05:44
- _attach_transcript_if_missing and _attach_captions_if_missing now guard
  with .get('video_transcript_resource', {}).get('content') — falsy for
  empty-string or empty-list content (site-config relation-widget default).
  Previously the guard checked the old _file field, so new videos whose
  _resource was initialised to {"content": ""} would be silently skipped.
- _create_new_content now returns (s3_path, text_id) tuple instead of just
  the path string.
- On successful 3Play fetch, writes:
    video_transcript_resource: {"content": [text_id], "website": name}
    video_transcript_file: [{"file": path, "language": "en"}]
  (and the captions pair equivalently).
- Tests:
  - test_sync_video_captions_and_transcripts_sets_references: asserts
    _resource.content is a single-item list.
  - test_sync_video_captions_and_transcripts_skips_if_resource_exists: non-empty
    content list prevents re-fetch.
  - test_sync_video_captions_and_transcripts_treats_empty_content_as_unset:
    empty-string content triggers fetch (guard is falsy-based).
@umar8hassan umar8hassan force-pushed the umar/10984-update-3play-sync-for-multi-language branch from 061f4c1 to c8fae1e Compare May 6, 2026 05:46
@umar8hassan umar8hassan marked this pull request as ready for review May 6, 2026 05:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants