Release 0.180.0#2932
Conversation
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
* Added a GIT_REF argument to create the hash.txt file. * Add quotes around
* Update test site contentwith PK12 values * Remove dangling primary key reference
* feat: changed course-lists referencing to url_path * chore: fixed circular import error * chore: addressed feedback and reduced complexity to meet pre-commit checks
* Upgrade to Django 5.2.12 * Update settings test * Update Header to match Django 5 CSRF requirement * Always include CSRF cookie
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on upgrading dependencies, enhancing security, adding new features, and improving documentation. The Django upgrade and Granian integration represent significant improvements to the project's foundation, while the documentation update provides valuable guidance for managing course URLs. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request is a release that includes several dependency upgrades, most notably an upgrade to Django 5. It also introduces a new feature for tracking content references and adds documentation for changing a published course site URL.
My review focuses on the new content reference tracking implementation. I've identified a couple of performance issues related to N+1 queries when processing course lists, both in a new management command and in a data migration. These could lead to slow performance, especially for the migration, and I've left comments with suggestions for optimization.
| for content in WebsiteContent.objects.iterator(chunk_size=1000): | ||
| if references := compile_referencing_content(content): | ||
| referenced_objects = WebsiteContent.objects.filter(text_id__in=references) | ||
| if referenced_content_ids := resolve_referenced_content_ids(content): | ||
| referenced_objects = WebsiteContent.objects.filter( | ||
| id__in=referenced_content_ids | ||
| ) | ||
| content.referenced_by.set(referenced_objects) |
There was a problem hiding this comment.
This data migration iterates over all WebsiteContent objects and calls resolve_referenced_content_ids for each one. This function has a performance issue: for course-list content, it executes a separate database query for each course URL to resolve it to a sitemetadata object. This will result in a large number of queries (N+1 problem) and could make the migration very slow or even cause it to time out, especially in a production environment with many course lists.
To improve performance, this migration should process content in batches and batch the database queries, similar to the logic in the backpopulate_referencing_content management command. The logic for resolving course list URLs should be optimized to use a single bulk query for all url_paths in a batch.
| def _merge_course_list_ids( | ||
| self, content_batch, course_list_ids, content_references | ||
| ): | ||
| """Merge per-item course-list url_path resolutions into content_references.""" | ||
| if not course_list_ids: | ||
| return | ||
| course_list_map = {c.id: c for c in content_batch if c.id in course_list_ids} | ||
| for content_id, content in course_list_map.items(): | ||
| extra_ids = resolve_referenced_content_ids(content) | ||
| if extra_ids: | ||
| content_references[content_id] = ( | ||
| content_references.get(content_id, set()) | extra_ids | ||
| ) | ||
|
|
There was a problem hiding this comment.
This function introduces a performance issue by calling resolve_referenced_content_ids in a loop. The underlying implementation of resolve_referenced_content_ids for course lists performs database queries for each course URL, leading to an N+1 query problem.
To optimize this, you can aggregate all course url_paths from the content_batch, perform a single bulk query to fetch the corresponding WebsiteContent IDs for the sitemetadata, and then use this mapping to update the references. This will significantly reduce the number of database queries.
def _merge_course_list_ids(
self, content_batch, course_list_ids, content_references
):
"""Merge per-item course-list url_path resolutions into content_references."""
if not course_list_ids:
return
course_list_map = {c.id: c for c in content_batch if c.id in course_list_ids}
all_url_paths = {
course_entry.get("id").strip().strip("/")
for content in course_list_map.values()
for course_entry in content.metadata.get(
constants.METADATA_FIELD_COURSE_LIST_COURSES, []
)
if isinstance(course_entry, dict)
and isinstance(course_entry.get("id"), str)
}
if not all_url_paths:
return
sitemetadata_map = {
wc.website.url_path: wc.id
for wc in WebsiteContent.objects.filter(
website__url_path__in=all_url_paths,
type=constants.CONTENT_TYPE_METADATA,
).select_related("website")
}
for content_id, content in course_list_map.items():
extra_ids = {
sitemetadata_map[course_entry.get("id").strip().strip("/")]
for course_entry in content.metadata.get(
constants.METADATA_FIELD_COURSE_LIST_COURSES, []
)
if isinstance(course_entry, dict)
and isinstance(course_entry.get("id"), str)
and course_entry.get("id").strip().strip("/") in sitemetadata_map
}
if extra_ids:
content_references[content_id] = (
content_references.get(content_id, set()) | extra_ids
)
pt2302
Umar Hassan
zawan-ila
Tobias Macey
Mike Davidson
renovate[bot]