Release 0.178.3 by odlbot · Pull Request #2908 · mitodl/ocw-studio

odlbot · 2026-02-20T15:26:54Z

renovate[bot]

chore(deps): update dependency pypdf to v6.7.1 [security] (chore(deps): update dependency pypdf to v6.7.1 [security] #2907) (4c5b2dbb)

Umar Hassan

feat: track referencing_content for missing content types (feat: track referencing_content for missing content types #2892) (a2bd9c39)

Sar

Replace polyfill url (Replace polyfill url #2743) (58ffcde5)

Chris Patti

feat: Add S3 content sync pipeline (feat: Add S3 content sync pipeline #2877) (fbff1897)

pt2302

Update reset_sync_states management command (Update reset_sync_states management command #2897) (e24c7e51)

* Update reset_sync_states management command * Improve efficiency and messages to user

* feat: Add S3 content sync pipeline mitodl/hq#8703

* feat: added tracking with referencing_content for course home image, video gallery items, resource list items, and other relation-backed content. * feat: added referenced_by for stories, and featured items on home page * chore: addressed feedback from copilot and gemini * refactor: parameterized tests and removed redundant tests * refactor: parameterized tests

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

gemini-code-assist · 2026-02-20T15:27:26Z

Summary of Changes

Hello @odlbot, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request, marking release 0.178.3, delivers substantial improvements to content management and synchronization. It introduces a robust S3 bucket synchronization pipeline for automated content updates and refines existing content sync mechanisms to offer more targeted control. A major focus is also on enhancing content relationship tracking by recognizing new content types and their associated metadata fields, ensuring a more interconnected content ecosystem. Additionally, it includes a crucial security update for a core dependency and a minor infrastructure change for a polyfill resource.

Highlights

S3 Content Synchronization Pipeline: A new S3 content synchronization pipeline has been introduced, allowing for periodic syncing of S3 buckets. This includes a new management command and Concourse pipeline definition to manage its lifecycle.
Enhanced Content Referencing: The system for tracking referencing content has been significantly expanded to include new content types such as course collections, promos, testimonials, stories, and homepage settings. This improves the accuracy and completeness of content relationship tracking.
Dependency Update: The pypdf dependency has been updated to version 6.7.1, addressing a security vulnerability.
Improved reset_sync_states Command: The reset_sync_states management command now supports filtering websites for synchronization, providing more granular control over the content syncing process.
Polyfill URL Update: The polyfill URL in base.html has been updated to use a different CDN source.

Changelog

RELEASE.rst
- Added release notes for version 0.178.3, summarizing the included changes.
content_sync/api.py
- Added a new function get_s3_bucket_sync_pipeline to retrieve the S3 bucket synchronization pipeline.
content_sync/management/commands/reset_sync_states.py
- Imported get_sync_backend, fetch_website, and reset_publishing_fields.
- Modified the handle method to support filtering websites for synchronization and to use the new S3 sync logic.
content_sync/management/commands/upsert_s3_bucket_sync_pipeline.py
- Added a new management command to create or update the S3 bucket sync Concourse pipeline, including options to unpause or delete existing pipelines.
content_sync/pipelines/base.py
- Introduced BaseS3BucketSyncPipeline as a base class for S3 bucket syncing pipelines.
content_sync/pipelines/concourse.py
- Imported BaseS3BucketSyncPipeline and S3BucketSyncPipelineDefinition.
- Added S3BucketSyncPipeline class, extending BaseS3BucketSyncPipeline and GeneralPipeline, to define the Concourse pipeline for S3 bucket synchronization.
content_sync/pipelines/definitions/concourse/s3_bucket_sync_pipeline.py
- Added a new file defining the S3BucketSyncPipelineDefinition class, which constructs a Concourse pipeline for periodic S3 bucket syncing using AWS CLI.
content_sync/tasks.py
- Added a new Celery task upsert_s3_bucket_sync_pipeline to manage the creation and updating of the S3 bucket sync pipeline.
main/settings.py
- Updated the project version to 0.178.3.
- Added new settings: AWS_IMPORT_STORAGE_BUCKET_NAME and AWS_S3_SYNC_INTERVAL for S3 synchronization configuration.
main/templates/base.html
- Updated the polyfill.io URL to use cdnjs.cloudflare.com.
poetry.lock
- Updated the pypdf dependency from 6.6.2 to 6.7.1.
- Removed black from the dev extra dependencies for pypdf.
websites/constants.py
- Added new content types: CONTENT_TYPE_COURSE_COLLECTION, CONTENT_TYPE_PROMO, CONTENT_TYPE_TESTIMONIAL, CONTENT_TYPE_STORY, CONTENT_TYPE_HOMEPAGE_SETTINGS.
- Added new metadata fields for various content types, such as METADATA_FIELD_COURSE_IMAGE, METADATA_FIELD_RESOURCE_LIST_RESOURCES, METADATA_FIELD_COVER_IMAGE, METADATA_FIELD_FEATURED_PROMOS, etc.
websites/management/commands/backpopulate_referencing_content_test.py
- Added docstrings to several test fixture functions for clarity.
websites/utils.py
- Introduced UUID_REGEX_STR and pre-compiled regexes for parse_resource_uuid for performance.
- Refined parse_resource_uuid to handle more complex resource shortcode patterns, including multiple UUIDs within attributes and positional UUIDs.
- Added _extract_relation_text_ids helper function to extract UUIDs from relation widget data.
- Expanded get_metadata_content_key to include new content types and their associated metadata fields for referencing content.
- Modified compile_referencing_content to use the new UUID parsing logic and handle direct UUID strings in metadata fields.
websites/utils_test.py
- Added extensive new test cases for parse_resource_uuid to cover various shortcode formats, including multiple UUIDs and positional UUIDs.
- Added new test cases for compile_referencing_content to verify correct extraction of referencing content from the newly defined content types and their metadata fields, including cross-site relations and edge cases.

Activity

renovate[bot] updated the pypdf dependency to v6.7.1 for security.
Umar Hassan implemented a feature to track referencing_content for missing content types.
Sar replaced a polyfill URL.
Chris Patti added an S3 content sync pipeline.
pt2302 updated the reset_sync_states management command.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

sentry · 2026-02-20T15:29:35Z

+        filtered_websites = []
+        if self.filter_list:
+            filtered_websites = [
+                fetch_website(site_identifier) for site_identifier in self.filter_list
+            ]


Bug: The list comprehension calling fetch_website does not handle the Website.DoesNotExist exception, causing the command to crash if an invalid website identifier is provided via a filter.
_{Severity: HIGH}

Suggested Fix

Wrap the call to fetch_website(site_identifier) inside a try...except Website.DoesNotExist block. Log a warning or skip the invalid identifier instead of letting the command crash. Alternatively, consider using a queryset-based filter approach which gracefully handles non-existent identifiers.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: content_sync/management/commands/reset_sync_states.py#L68-L72 Potential issue: The management command `reset_sync_states` processes a list of website identifiers from user-provided command-line arguments (`--filter` or `--filter-json`). The code uses a list comprehension to call `fetch_website` for each identifier. However, `fetch_website` will raise a `Website.DoesNotExist` exception if an identifier is not found, and this exception is not handled. This will cause the entire management command to crash. The same unhandled exception pattern also exists in the `sync_website_to_backend.py` and `sync_backend_to_db.py` commands when they process filters.

_{Did we get this right? 👍 / 👎 to inform future reviews.}

gemini-code-assist

Code Review

The pull request introduces an S3 bucket synchronization pipeline and updates the reset_sync_states management command to handle filtered websites directly. It also includes changes to track referencing content for new content types and updates the pypdf dependency for security. The changes enhance the content synchronization capabilities and improve the accuracy of content referencing within the system. Several new tests have been added following the preferred pytest style, and a comment regarding Celery task management has been updated to align with best practices for robust and scalable asynchronous task handling.

gemini-code-assist · 2026-02-20T15:31:53Z

 [[package]]
 name = "pypdf"
-version = "6.6.2"
+version = "6.7.1"


The pypdf dependency is updated to 6.7.1. This is a security update, which is a critical maintenance task.

gemini-code-assist · 2026-02-20T15:31:53Z

+@is_publish_pipeline_enabled
+def get_s3_bucket_sync_pipeline(api: object | None = None) -> BasePipeline:
+    """Get the S3 bucket sync pipeline"""
+    return import_string(
+        f"content_sync.pipelines.{settings.CONTENT_SYNC_PIPELINE_BACKEND}.S3BucketSyncPipeline"
+    )(api=api)


The get_s3_bucket_sync_pipeline function is decorated with @is_publish_pipeline_enabled. While this is generally good for controlling pipeline access, ensure that the S3 bucket sync pipeline is indeed considered a 'publish' pipeline and that its functionality aligns with the intent of this decorator. If it's a background sync task that doesn't directly involve publishing user-facing content, a more specific decorator or a different mechanism might be clearer.

gemini-code-assist · 2026-02-20T15:31:53Z

+        filtered_websites = []
+        if self.filter_list:
+            filtered_websites = [
+                fetch_website(site_identifier) for site_identifier in self.filter_list
+            ]


The filtered_websites list is created by fetching each website from self.filter_list. This approach could lead to N+1 queries if fetch_website performs database lookups for each site. Consider optimizing this by fetching all websites in a single query if self.filter_list is large.

gemini-code-assist · 2026-02-20T15:31:53Z

            start = now_in_utc()
-            task = sync_unsynced_websites.delay(create_backends=create_backends)
-            self.stdout.write(f"Starting task {task}...")
-            task.get()
+            if filtered_websites:
+                self.stdout.write(
+                    f"Syncing {len(filtered_websites)} filtered "
+                    "website(s) to the designated backend"
+                )
+                for website in filtered_websites:
+                    backend = get_sync_backend(website)
+                    if create_backends or backend.backend_exists():
+                        self.stdout.write(
+                            f"Syncing website '{website.title}' to backend..."
+                        )
+                        backend.create_website_in_backend()
+                        backend.sync_all_content_to_backend()
+                        reset_publishing_fields(website.name)
+                    else:
+                        self.stderr.write(
+                            f"Skipping website '{website.title}': "
+                            "backend does not exist "
+                            "(use --create_backends to create it)"
+                        )
+            else:
+                self.stdout.write(
+                    "Syncing all unsynced websites to the designated backend"
+                )
+                task = sync_unsynced_websites.delay(create_backends=create_backends)
+                self.stdout.write(f"Starting task {task}...")
+                task.get()


The logic for syncing filtered websites directly within the handle method duplicates some of the functionality found in sync_unsynced_websites.delay. While it provides immediate feedback for filtered sites, it bypasses the Celery task queue for these specific cases. This could lead to inconsistencies in how sync operations are handled, especially regarding error handling, retries, and task monitoring. It might be better to refactor sync_unsynced_websites to accept a list of website names or IDs to process, allowing all sync operations to go through Celery for consistency and scalability.

References

Use Celery's chord and chain to manage dependencies between asynchronous tasks, as it is a more robust and scalable pattern than executing them synchronously.

pt2302 and others added 7 commits February 17, 2026 06:45

Update reset_sync_states management command (#2897)

e24c7e5

* Update reset_sync_states management command * Improve efficiency and messages to user

Merge branch 'release'

eb51289

feat: Add S3 content sync pipeline (#2877)

fbff189

* feat: Add S3 content sync pipeline mitodl/hq#8703

Replace polyfill url (#2743)

58ffcde

chore(deps): update dependency pypdf to v6.7.1 [security] (#2907)

4c5b2db

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

Release 0.178.3

d5d81cd

odlbot added the deploying to rc label Feb 20, 2026

sentry Bot reviewed Feb 20, 2026

View reviewed changes

gemini-code-assist Bot reviewed Feb 20, 2026

View reviewed changes

odlbot temporarily deployed to ocw-studio-rc February 20, 2026 15:45 Inactive

odlbot added waiting for checkboxes all checkboxes checked and removed deploying to rc waiting for checkboxes labels Feb 20, 2026

odlbot merged commit d5d81cd into release Feb 24, 2026
14 checks passed

odlbot added deploying to prod deployed to prod and removed all checkboxes checked deploying to prod labels Feb 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 0.178.3#2908

Release 0.178.3#2908
odlbot merged 7 commits intoreleasefrom
release-candidate

odlbot commented Feb 20, 2026 •

edited by umar8hassan

Loading

Uh oh!

gemini-code-assist Bot commented Feb 20, 2026

Uh oh!

sentry Bot Feb 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Feb 20, 2026

Uh oh!

gemini-code-assist Bot Feb 20, 2026

Uh oh!

gemini-code-assist Bot Feb 20, 2026

Uh oh!

gemini-code-assist Bot Feb 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

odlbot commented Feb 20, 2026 • edited by umar8hassan Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

renovate[bot]

Umar Hassan

Sar

Chris Patti

pt2302

Uh oh!

gemini-code-assist Bot commented Feb 20, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

sentry Bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

odlbot commented Feb 20, 2026 •

edited by umar8hassan

Loading