refactor: make our internally-authored content editing more generic ( First Part)#3332
refactor: make our internally-authored content editing more generic ( First Part)#3332ahtesham-quraish wants to merge 6 commits into
Conversation
OpenAPI Changes20 changes: 0 error, 1 warning, 19 info Unexpected changes? Ensure your branch is up-to-date with |
5ed0add to
919e54a
Compare
6dd69f9 to
50be222
Compare
69b010d to
640580e
Compare
There was a problem hiding this comment.
Pull request overview
This PR introduces a new website_content backend app to generalize staff-authored rich-text content behind a content_type field (e.g., news vs article), while keeping /api/v1/articles/ available as a backward-compatible alias and updating the news ETL/plugin flow to only ingest/sync content_type=news.
Changes:
- Adds the
website_contentDjango app (model, serializers, permissions, viewset, CDN purge tasks, pluggy hooks, migrations, and management command) and exposes both/api/v1/website_content/and/api/v1/articles/routes. - Updates
news_eventstasks/plugins/ETL to source fromWebsiteContentand to sync onlynewscontent. - Updates OpenAPI v1 spec and the generated frontend API client/hooks to use the new website content types/endpoints.
Reviewed changes
Copilot reviewed 39 out of 44 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| website_content/views.py | Adds WebsiteContentViewSet (list/create/update/delete + detail by id/slug) and a media upload endpoint. |
| website_content/views_test.py | Adds API tests for create (including /articles alias), detail-by-id/slug, and content_type filtering. |
| website_content/validators.py | Adds HTML sanitization config and clean_html helper for website content. |
| website_content/validators_test.py | Adds tests for clean_html sanitization behavior. |
| website_content/urls.py | Registers /api/v1/website_content/ plus a backward-compatible /api/v1/articles/ router alias and upload route. |
| website_content/tasks.py | Adds Fastly purge tasks (single-URL purge, full purge, and list-page purge). |
| website_content/tasks_test.py | Adds tests for Fastly purge API behavior (success/no-token cases). |
| website_content/serializers.py | Adds serializers for WebsiteContent and image upload plus sanitized HTML field. |
| website_content/serializers_test.py | Adds tests for HTML sanitization field + image upload serializer. |
| website_content/permissions.py | Adds view/edit permission classes and editor-group helper. |
| website_content/models.py | Adds WebsiteContent model with content_type and URL generation; adds image upload model. |
| website_content/models_test.py | Adds model tests for URL generation, slug creation on publish, and default content type. |
| website_content/migrations/0004_alter_websitecontent_created_on.py | Adds DB index to created_on. |
| website_content/migrations/0003_add_editors_group.py | Ensures the legacy editor group exists to preserve permissions. |
| website_content/migrations/0002_migrate_from_articles.py | Data migration to copy rows from articles_* tables into website_content_*. |
| website_content/migrations/0001_initial.py | Initial schema for website content tables (including content_type). |
| website_content/migrations/init.py | Marks migrations package. |
| website_content/management/commands/sync_website_content_to_news.py | Adds a management command to run the news sync pipeline. |
| website_content/management/commands/init.py | Marks commands package. |
| website_content/management/init.py | Marks management package. |
| website_content/hooks.py | Adds pluggy hook spec + plugin manager loader for website content. |
| website_content/factories.py | Adds a WebsiteContentFactory for tests. |
| website_content/constants.py | Defines editor group name and content type constants/choices. |
| website_content/apps.py | Adds AppConfig with pluggy markers and task import on app ready. |
| website_content/api.py | Adds CDN purge-on-save and “published actions” plugin trigger entrypoints. |
| website_content/api_test.py | Adds tests for website-content published-actions hook behavior/logging. |
| website_content/admin.py | Placeholder admin module. |
| website_content/init.py | Marks package. |
| profiles/serializers.py | Switches profile “article editor” check to website content editor group helper. |
| openapi/specs/v1.yaml | Updates v1 OpenAPI to include /website_content and align /articles schemas/params with website content. |
| news_events/tasks.py | Renames tasks to website-content naming, keeps backward-compatible aliases, and updates sync task to use WebsiteContent. |
| news_events/tasks_test.py | Updates tests to use WebsiteContent and new task names. |
| news_events/plugins.py | Updates plugin to hook into website_content_published and to sync only news type content. |
| news_events/plugins_test.py | Updates plugin tests for new hook name and non-news skip behavior. |
| news_events/etl/articles_news.py | Updates ETL to read WebsiteContent and filter to content_type=news. |
| news_events/etl/articles_news_test.py | Updates ETL test patch target to WebsiteContent.objects.filter. |
| main/urls.py | Switches URL inclusion from articles.urls to website_content.urls. |
| main/settings.py | Adds website_content app to INSTALLED_APPS. |
| main/settings_pluggy.py | Renames pluggy setting from MITOL_ARTICLES_PLUGINS to MITOL_WEBSITE_CONTENT_PLUGINS. |
| main/settings_celery.py | Renames beat schedule entry and env var for the articles/news sync to website content naming. |
| frontends/api/src/hooks/articles/queries.ts | Updates article query helpers to call websiteContentApi endpoints. |
| frontends/api/src/hooks/articles/index.ts | Updates mutations/queries to use website content request/response types and endpoints. |
| frontends/api/src/generated/v1/api.ts | Regenerates v1 client: adds WebsiteContentApi, replaces rich-text article schemas with website content schemas, and renames a content-file enum to avoid collisions. |
| frontends/api/src/clients.ts | Exports a new websiteContentApi client instance. |
| def post(self, request): | ||
| serializer = WebsiteContentImageUploadSerializer( | ||
| data=request.data, context={"request": request} | ||
| ) | ||
| if serializer.is_valid(): | ||
| instance = serializer.save() | ||
| return Response( | ||
| {"url": instance.image_file.url}, status=status.HTTP_201_CREATED | ||
| ) |
| log.exception("Content purge request failed, enqueued for retry.") | ||
|
|
||
| fastly_purge_website_content_list.delay() | ||
| else: |
There was a problem hiding this comment.
@ChristopherChudzicki what do you think about this ?
There was a problem hiding this comment.
Copilot's comment makes sense to me. We should only purge news if publishing news
| def purge_content_on_save(content): | ||
| """ | ||
| Purge the content item from the CDN cache when it's saved. | ||
|
|
||
| This will trigger a CDN purge for: | ||
| - The specific content page (if published and has a slug) - attempted immediately | ||
| - The content list page - queued as Celery task | ||
|
|
||
| Args: | ||
| content: The WebsiteContent instance being saved | ||
| """ | ||
| if content.is_published and content.slug: | ||
| log.info( | ||
| "WebsiteContent %s (%s) saved, purging CDN...", | ||
| content.id, | ||
| content.slug, | ||
| ) | ||
|
|
||
| content_url = content.get_url() | ||
| try: | ||
| purge_resp = fastly_purge_relative_url( | ||
| content_url, timeout=PURGE_TIMEOUT_SECONDS | ||
| ) | ||
| if purge_resp.get("status") == "ok": | ||
| log.info("Content purge request processed OK.") | ||
| else: | ||
| fastly_purge_relative_url.delay(content_url) | ||
| log.error("Content purge request failed, enqueued for retry.") | ||
| except Exception: | ||
| fastly_purge_relative_url.delay(content_url) | ||
| log.exception("Content purge request failed, enqueued for retry.") | ||
|
|
||
| fastly_purge_website_content_list.delay() | ||
| else: |
| created_on = serializers.DateTimeField(read_only=True, required=False) | ||
| updated_on = serializers.DateTimeField(read_only=True, required=False) | ||
| publish_date = serializers.DateTimeField(read_only=True, required=False) | ||
| content = serializers.JSONField(default={}) | ||
| slug = serializers.SlugField(max_length=60, required=False, allow_blank=True) | ||
| title = serializers.CharField(max_length=255) |
There was a problem hiding this comment.
@ahtesham-quraish Copilot is correct about this.
| def is_website_content_editor(request): | ||
| """ | ||
| Return True if the request user belongs to the article_editors group or | ||
| is an admin. | ||
|
|
||
| Args: | ||
| request (HTTPRequest): django request object | ||
|
|
||
| Returns: | ||
| bool: True if user is a content editor or admin | ||
| """ | ||
| return ( | ||
| request.user is not None | ||
| and request.user.groups.filter(name=GROUP_STAFF_ARTICLE_EDITORS).first() | ||
| is not None | ||
| ) |
| from main.utils import call_fastly_purge_api | ||
|
|
||
|
|
||
| @pytest.fixture | ||
| def mock_fastly_response(): | ||
| """Create a mock successful Fastly response""" | ||
| response = MagicMock(spec=Response) |
| "learning_resources", | ||
| "learning_resources_search", | ||
| "website_content.apps.WebsiteContentConfig", | ||
| "openapi", | ||
| "articles", | ||
| "oauth2_provider", |
| "task": "news_events.tasks.get_website_content_news", | ||
| "schedule": get_int( | ||
| "NEWS_EVENTS_ARTICLES_NEWS_SCHEDULE_SECONDS", 60 * 60 * 1 | ||
| "NEWS_EVENTS_WEBSITE_CONTENT_NEWS_SCHEDULE_SECONDS", 60 * 60 * 1 |
| @@ -36,44 +41,44 @@ paths: | |||
| content: | |||
| application/json: | |||
| schema: | |||
| $ref: '#/components/schemas/PaginatedRichTextArticleList' | |||
| $ref: '#/components/schemas/PaginatedWebsiteContentList' | |||
| description: '' | |||
| post: | |||
| operationId: articles_create | |||
| description: Create a new article | |||
| description: Create a new content item | |||
| summary: Create | |||
| tags: | |||
| - articles | |||
| requestBody: | |||
| content: | |||
| application/json: | |||
| schema: | |||
| $ref: '#/components/schemas/RichTextArticleRequest' | |||
| $ref: '#/components/schemas/WebsiteContentRequest' | |||
| application/x-www-form-urlencoded: | |||
| schema: | |||
| $ref: '#/components/schemas/RichTextArticleRequest' | |||
| $ref: '#/components/schemas/WebsiteContentRequest' | |||
| multipart/form-data: | |||
| schema: | |||
| $ref: '#/components/schemas/RichTextArticleRequest' | |||
| $ref: '#/components/schemas/WebsiteContentRequest' | |||
| required: true | |||
| responses: | |||
| '201': | |||
| content: | |||
| application/json: | |||
| schema: | |||
| $ref: '#/components/schemas/RichTextArticle' | |||
| $ref: '#/components/schemas/WebsiteContent' | |||
| description: '' | |||
47c3d83 to
f601325
Compare
122204c to
6e84bb2
Compare
4be3204 to
ad5e34c
Compare
0395808 to
3f1359b
Compare
for more information, see https://pre-commit.ci
|
|
||
|
|
||
| # Backward-compatible alias so any queued Celery tasks using the old name still work. | ||
| get_articles_news = get_website_content_news |
There was a problem hiding this comment.
The comment says "so any queued Celery tasks using the old name still work," but these aliases don't preserve backward compatibility. They're just Python variable references to the same task object, which is registered under the new name only. The old Celery task names (news_events.tasks.get_articles_news, news_events.tasks.sync_article_to_news) are no longer in the registry. Any messages queued under the old names before deployment would be unroutable.
Change get_articles_news = get_website_content_news
to
@app.task(name="news_events.tasks.get_articles_news")
def get_articles_news():
pipelines.articles_news_etl()
clear_views_cache()
Change sync_article_to_news = sync_website_content_to_news
to
@app.task(
name="news_events.tasks.sync_article_to_news",
bind=True,
autoretry_for=(Exception,),
retry_kwargs={"max_retries": 3, "countdown": 5},
)
def sync_article_to_news(self, article_id: int):
sync_website_content_to_news.apply(args=[article_id], throw=True)
| content_type_param = self.request.query_params.get("content_type") | ||
| if content_type_param: | ||
| qs = qs.filter(content_type=content_type_param) |
There was a problem hiding this comment.
| content_type_param = self.request.query_params.get("content_type") | |
| if content_type_param: | |
| qs = qs.filter(content_type=content_type_param) | |
| from website_content.constants import CONTENT_TYPE_CHOICES | |
| valid = {k for k, _ in CONTENT_TYPE_CHOICES} | |
| if content_type_param and content_type_param not in valid: | |
| raise ValidationError({"content_type": f"Must be one of {sorted(valid)}"}) |
Any string is accepted. ?content_type=typo returns an empty 200 with no indication something is wrong. Given this is a new user-facing filter, callers will silently get zero results rather than a helpful 400.
| description="Numeric ID or slug of the content item", | ||
| ) | ||
| ], | ||
| responses={200: RichTextArticleSerializer, 404: OpenApiResponse()}, |
There was a problem hiding this comment.
The old code had responses={200: RichTextArticleSerializer, 404: OpenApiResponse()}. The new code dropped it entirely. This means the OpenAPI spec and generated TypeScript client no longer document that this endpoint can return 404.
Fix: Add it back:
@extend_schema(
...
responses={200: WebsiteContentSerializer, 404: OpenApiResponse()},
)
daniellefrappier18
left a comment
There was a problem hiding this comment.
I'm not sure I'm the best person to review this code but took a first pass with Claude.
ChristopherChudzicki
left a comment
There was a problem hiding this comment.
This is looking pretty good. I think there are a few renames that got missed, and I left some other comments.
I also think it would be worth updating the PR instructions or noting somewhere that we are NOT renaming the tables, just copying them to a new table. That's fine. the ARticles table on prod only has a dozen or so entries.
|
|
||
|
|
||
| def extract_single_article(article: Article) -> dict: | ||
| def extract_single_article(article: WebsiteContent) -> dict: |
There was a problem hiding this comment.
extract_single_website_content ?
|
|
||
|
|
||
| def sync_single_article_to_news(article: Article): | ||
| def sync_single_article_to_news(article: WebsiteContent): |
There was a problem hiding this comment.
sync_single_website_content_news_to_news ?
| if article.content_type != CONTENT_TYPE_NEWS: | ||
| return |
There was a problem hiding this comment.
Request: The filter here is important; we should add a test that checks it.
| @@ -7,11 +7,18 @@ | |||
|
|
|||
| from main.models import TimestampedModel | |||
| from profiles.utils import article_image_upload_uri | |||
There was a problem hiding this comment.
article_image_upload_uriprobably should be renamed.... I'd also change the upload prefix to website_content.- Probably also should move it to website_content.utils ... doesn't have anything to do with profiles, right?
| CONTENT_TYPE_CHOICES = [ | ||
| (CONTENT_TYPE_NEWS, "News"), | ||
| (CONTENT_TYPE_ARTICLE, "Article"), | ||
| ] |
There was a problem hiding this comment.
elswhere we tend to use ExtendedEnum. I'd suggest:
from named_enum import ExtendedEnum
class WebsiteContentType(ExtendedEnum):
"""
Enum for channel types.
"""
news = "News"
article = "Article"Then you can:
- replace
CONTENT_TYPE_CHOICESwithWebsiteContentType.as_tuple - replace CONTENT_TYPE_NEWS
withWebsiteContentType.news`
| author_name = serializers.CharField(required=False, allow_blank=True, default="") | ||
| user = UserSerializer(read_only=True) | ||
| content_type = serializers.ChoiceField( | ||
| choices=["news", "article"], |
There was a problem hiding this comment.
You should be able to remove choices and it will get picked up automatically from the model.
If it doesn't, we should use the constant here.
| created_on = serializers.DateTimeField(read_only=True, required=False) | ||
| updated_on = serializers.DateTimeField(read_only=True, required=False) | ||
| publish_date = serializers.DateTimeField(read_only=True, required=False) | ||
| content = serializers.JSONField(default={}) | ||
| slug = serializers.SlugField(max_length=60, required=False, allow_blank=True) | ||
| title = serializers.CharField(max_length=255) |
There was a problem hiding this comment.
@ahtesham-quraish Copilot is correct about this.
|
|
||
| @pytest.mark.django_db | ||
| class TestFastlyPurgeWebsiteContentList: | ||
| """Tests for fastly_purge_website_content_list task""" |
There was a problem hiding this comment.
this test class doesn't actually call fastly_purge_website_content_list . What's it testing exactly?
Seems like all it's testing is that main.call_fastly_purge_api makes a reasonable request? Not sure it's adding any value here.
| log.exception("Content purge request failed, enqueued for retry.") | ||
|
|
||
| fastly_purge_website_content_list.delay() | ||
| else: |
There was a problem hiding this comment.
Copilot's comment makes sense to me. We should only purge news if publishing news
|
|
||
|
|
||
| @pytest.mark.django_db | ||
| class TestPurgeArticleOnSave: |
There was a problem hiding this comment.
I don't think these tests got ported to the new app, did they? (Please let me know if I am wrong!)
I think we should port them, but not quite as is. Currently they kind of just assert "x called y".
My suggestion is a new test that asserts on call_fastly_purge_api (or request itself) to depend on less of the plumbing. I think that should work fine since our tests use CELERY_ALWAYS_EAGER=True.
And whenever I test something Doesnt happy, I try to tie it very explicitly to the thing actually happening, so I like to parametrize in that case
@pytest.mark.parametrize(
"slug, is_published, expect_purge",
[
("test-article", True, True),
("test-article", False, False), # I don't think this can happen now, but makes sense to test
# (i.e., i think we ONLY have slugs after publishing and can't unpublish)
("", True, False), # This also doesn't make much sense, but doesn't hurt, right?
("", False, False),
],
)
def test_purge_content_on_save(slug, is_published, expect_purge):
content = WebsiteContentFactory(is_published=False, slug=slug)
purge_content_on_save(article)
# Now Assert fastly purged if expected, otherwise assert it was not purgedFeel free to bring this back another way, but I do think we should bring it back somehow.
What are the relevant tickets?
https://github.com/mitodl/hq/issues/11232
Description (What does it do?)
Goal: Make backend naming
content_typeagnostic and introduce one new content_type. Preserve existing API endpoint at least during cutover.Success looks like:
articlesapp and directory towebsite_contentArticlemodel toWebsiteContentwith newcontent_typefield, restricted to"news"and"article". Should default to"news", at least for now.api/v1/website_content/andapi/v1/articles/endpoints behave the same, returning any content type with filter. These can be literally the same viewset for now, but we should leave articles in place for a smooth migration of the frontend. (Expand then contract API surface.)news_events/etlonly ingestscontent_type=news.Screenshots (if appropriate):
How can this be tested?
For testing you need to first take pull of this branch and then run the containers like web and celery and frontend, then you need to visit the following url
http://open.odl.local:8062/news/new, you first create the article and check if the request is being made tohttp://api.open.odl.local:8065/api/v1/website_content/url, basically, you need to test every corner of the articles CRUD and verify everything looks good.You should keep in mind that when article is created then celery task is run which sync the article to news, so It should run to and you will be able to see the all the articles using the following url
http://open.odl.local:8062/newsAdditional Context