Skip to content

Release 0.184.2#3000

Merged
odlbot merged 2 commits intoreleasefrom
release-candidate
Apr 24, 2026
Merged

Release 0.184.2#3000
odlbot merged 2 commits intoreleasefrom
release-candidate

Conversation

@odlbot
Copy link
Copy Markdown
Contributor

@odlbot odlbot commented Apr 21, 2026

pt2302

pt2302 and others added 2 commits April 21, 2026 09:29
* Remove 'Buy at Amazon' links

* Set dirty flags and use bulk delete
Comment on lines +38 to +45
resources_to_delete_ids = []
for content in WebsiteContent.objects.filter(
title__icontains=AMAZON_BUTTON_IMAGE,
type=CONTENT_TYPE_EXTERNAL_RESOURCE,
):
if AMAZON_BUTTON_TITLE_PATTERN.match(content.title):
resources_to_delete_ids.append(content.pk)
website_ids.add(content.website_id)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The migration may delete ExternalResource records while leaving behind broken resource_link shortcodes that reference them, as it doesn't account for all possible shortcode text variations.
Severity: MEDIUM

Suggested Fix

Before deleting an ExternalResource, query its referencing_content relationship to find all markdown content that links to it. Ensure all references are removed, not just those matching a specific shortcode pattern. This validates referential integrity and prevents creating broken links.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: websites/migrations/0071_remove_buy_at_amazon_links.py#L38-L45

Potential issue: The migration deletes `ExternalResource` records based on a title match
for an Amazon image markdown. However, it does not verify that all `resource_link`
shortcodes pointing to that resource have been removed from markdown content. If a
resource is referenced using a shortcode with plain text instead of the image markdown
(e.g., `{{% resource_link "uuid" "Buy at Amazon" %}}`), the resource will be deleted,
but the shortcode will remain, resulting in a broken link. The migration fails to use
the `referencing_content` relationship to ensure referential integrity before deletion.

Did we get this right? 👍 / 👎 to inform future reviews.

Comment on lines +30 to +34
for content in WebsiteContent.objects.filter(markdown__icontains=AMAZON_BUTTON_IMAGE):
updated = AMAZON_BUTTON_SHORTCODE_PATTERN.sub("", content.markdown)
updated = AMAZON_BUTTON_IMAGE_PATTERN.sub("", updated)
if updated != content.markdown:
content.markdown = updated
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The regex for removing Amazon button shortcodes fails to match variants with fragment arguments, leading to partial removal and creating broken shortcodes.
Severity: LOW

Suggested Fix

Update the AMAZON_BUTTON_SHORTCODE_PATTERN regex to optionally match a third positional argument for a fragment (e.g., "#[^\"]+") before the closing %\}\}. This will ensure the entire shortcode is removed correctly, even when a fragment is present.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: websites/migrations/0071_remove_buy_at_amazon_links.py#L30-L34

Potential issue: The migration uses two regex patterns sequentially. The first,
`AMAZON_BUTTON_SHORTCODE_PATTERN`, is intended to remove the entire Amazon button
shortcode. However, this pattern fails to match `resource_link` shortcodes that contain
an optional third fragment argument (e.g., `"#some-fragment"`). In such cases, the
second pattern, `AMAZON_BUTTON_IMAGE_PATTERN`, proceeds to remove only the image
markdown part of the shortcode's text argument. This leaves behind a syntactically
invalid shortcode with an empty text argument, such as `{{% resource_link "uuid" ""
"#fragment" %}}`, which could corrupt content.

Did we get this right? 👍 / 👎 to inform future reviews.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the version to 0.184.2 and introduces a data migration to remove 'Buy at Amazon' links from website content. The migration identifies and removes specific shortcodes and images from markdown fields and deletes the corresponding external resource records. The review feedback provides several optimizations for handling large datasets during the migration, including the use of .iterator(), .values_list(), and batch_size for bulk updates. Additionally, it is recommended to use .hard_delete() to align with the migration's stated intent of permanently removing the resources.


markdown_to_update = []
website_ids = set()
for content in WebsiteContent.objects.filter(markdown__icontains=AMAZON_BUTTON_IMAGE):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using .iterator() here is recommended to avoid loading all matching WebsiteContent instances into memory at once. This is particularly important for global migrations on large tables like WebsiteContent to prevent potential memory exhaustion.

Suggested change
for content in WebsiteContent.objects.filter(markdown__icontains=AMAZON_BUTTON_IMAGE):
for content in WebsiteContent.objects.filter(markdown__icontains=AMAZON_BUTTON_IMAGE).iterator():
References
  1. Using .iterator() on a Django queryset is unnecessary for memory optimization if the number of records being processed is known to be small.

Comment on lines +39 to +45
for content in WebsiteContent.objects.filter(
title__icontains=AMAZON_BUTTON_IMAGE,
type=CONTENT_TYPE_EXTERNAL_RESOURCE,
):
if AMAZON_BUTTON_TITLE_PATTERN.match(content.title):
resources_to_delete_ids.append(content.pk)
website_ids.add(content.website_id)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This loop can be optimized by using .values_list('pk', 'title', 'website_id'). This avoids instantiating full model objects for every record that matches the icontains filter, significantly reducing memory overhead and improving performance.

Suggested change
for content in WebsiteContent.objects.filter(
title__icontains=AMAZON_BUTTON_IMAGE,
type=CONTENT_TYPE_EXTERNAL_RESOURCE,
):
if AMAZON_BUTTON_TITLE_PATTERN.match(content.title):
resources_to_delete_ids.append(content.pk)
website_ids.add(content.website_id)
resources_to_delete = WebsiteContent.objects.filter(
title__icontains=AMAZON_BUTTON_IMAGE,
type=CONTENT_TYPE_EXTERNAL_RESOURCE,
).values_list("pk", "title", "website_id")
for pk, title, website_id in resources_to_delete:
if AMAZON_BUTTON_TITLE_PATTERN.match(title):
resources_to_delete_ids.append(pk)
website_ids.add(website_id)


with transaction.atomic():
if markdown_to_update:
WebsiteContent.objects.bulk_update(markdown_to_update, ["markdown"])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If markdown_to_update contains a large number of records, bulk_update might hit database limits or consume excessive memory. Specifying a batch_size (e.g., 100 or 1000) is a safer practice for migrations involving large text fields.

Suggested change
WebsiteContent.objects.bulk_update(markdown_to_update, ["markdown"])
WebsiteContent.objects.bulk_update(markdown_to_update, ["markdown"], batch_size=100)

WebsiteContent.objects.bulk_update(markdown_to_update, ["markdown"])

if resources_to_delete_ids:
WebsiteContent.objects.filter(pk__in=resources_to_delete_ids).delete()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The docstring mentions a 'hard-delete', but since WebsiteContent is a SafeDeleteModel, the standard .delete() method on a queryset will perform a soft-delete. If the intention is to permanently remove these records (and their references), consider using .hard_delete() instead.

Suggested change
WebsiteContent.objects.filter(pk__in=resources_to_delete_ids).delete()
WebsiteContent.objects.filter(pk__in=resources_to_delete_ids).hard_delete()

@odlbot odlbot merged commit 482bce3 into release Apr 24, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants