perf: denormalize post_status into event_dates to eliminate posts table JOIN#161
Merged
perf: denormalize post_status into event_dates to eliminate posts table JOIN#161
Conversation
…le JOIN Add post_status column to datamachine_event_dates table with composite index (post_status, start_datetime). This allows date-filtered queries to skip the 130K-row posts table entirely. - EventDatesTable: add post_status column to schema, upsert, and backfill - meta-storage: sync post_status on transition_post_status hook - DateFilter: add post_status filter and join_column params to SQL helpers - Taxonomy_Helper: skip posts JOIN when date filter is active - UpcomingCountAbilities: skip posts JOIN, use ed.post_status Benchmarks on 37K events: - Location term counts: 2.9s → 107ms (27x faster) - Cross-filter counts: 3.7s → 174ms (21x faster)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
post_statuscolumn todatamachine_event_datestable with composite(post_status, start_datetime)indexed.post_status = 'publish'directlytransition_post_statushook to keep denormalized column in syncProblem
The events site has 37K events in
event_datesand 130K rows in the posts table. Every event count/aggregation query JOINed posts just to filterpost_type = 'data_machine_events' AND post_status = 'publish', but the 128MB InnoDB buffer pool couldn't cache all those pages, causing 2-4s per query from disk reads.Changes
EventDatesTable.phppost_statuscolumn to schema,upsert(),update_status(),backfill()meta-storage.phptransition_post_statushook to keep status in syncDateFilter.php$include_statusand$join_columnparams toupcoming_sql(),past_sql(),date_range_sql()Taxonomy_Helper.phptr.object_idfor cross-filter JOINsUpcomingCountAbilities.phped.post_statusdirectlyBenchmarks (37K events, 257 location terms)
Migration
Schema change applied live. Backfill (
UPDATE ed SET post_status = p.post_status) affected 805 rows (non-published events that were defaulting to 'publish').Backward Compatibility
DateFilter::upcoming_sql()etc. default toinclude_status=true, join_column='p.ID'— existing callers that still JOIN posts are unaffectedEventDatesTable::upsert()auto-detectspost_statusfrom the post if not provided — existing callers don't need changes