Change how the total ordering of Orders is determined#196
Change how the total ordering of Orders is determined#196
Conversation
|
Since this one is a larger and more architectural change, I'd like to have someone's eyes on this. @lukel97 are you around? Basically, the gist of this change is that instead of relying on a "linked list" structure to store orders and order them relative to each other, we add a column that contains a number representing each order's absolute position in the total-order. The benefit is that sorting in a bunch of places becomes easier, and we can constrain the hack around reading In the future, I would suggest that we modify how orders are submitted so that all submissions include an increasing number representing the ordinal being added here: that would remove any hardcoded notion of This patch also unblocks work I'm doing on the side to rewrite a complete API (v5) to provide an OpenAPI specification and be a lot friendlier to AI agents, which I believe can be leveraged to be a game changer in regression detection. However, that is only experimental at the moment and will be part of a RFC since it's a major change. |
There was a problem hiding this comment.
Pull request overview
This PR changes how test suite Order records are totally ordered by introducing an explicit integer ordinal column (instead of a linked-list via PreviousOrder/NextOrder), and updates server queries/templates/tests to use ordinal ordering for DB-side sorting and navigation.
Changes:
- Add DB migration
upgrade_18_to_19to backfillordinaland remove linked-list columns. - Update
Ordermodel comparison/insertion logic and multiple query sites to order byordinal. - Add/adjust server DB tests and UI templates for ordinal-based prev/next and listing.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/server/db/OrdinalAssignment.shtest | New lit test covering ordinal assignment and insertion edge cases. |
| tests/server/db/ImportV4TestSuiteInstance.shtest | Update assertions/queries to validate ordinal ordering instead of linked-list pointers. |
| tests/server/db/CreateV4TestSuiteInstance.py | Ensure test instance creation assigns ordinals and updates related assertions. |
| lnt/server/ui/views.py | Switch order navigation/listing and several data query paths to ordinal-based ordering. |
| lnt/server/ui/templates/v4_order.html | Use previous_order/next_order objects instead of linked-list IDs. |
| lnt/server/ui/templates/v4_all_orders.html | Display ordinal instead of previous/next IDs. |
| lnt/server/ui/api.py | Update graph API endpoint ordering to use Order.ordinal. |
| lnt/server/db/testsuitedb.py | Introduce ordinal column, total ordering, and ordinal-based insertion/closest-run logic. |
| lnt/server/db/regression.py | Order regression queries by Order.ordinal. |
| lnt/server/db/migrations/upgrade_18_to_19.py | Migration to add/backfill ordinal, drop linked-list columns, and add an ordinal index. |
| q = session.query(req.field.column, ts.Order.llvm_project_revision, | ||
| ts.Order.id) \ | ||
| .select_from(ts.Sample) \ | ||
| .join(ts.Run) \ | ||
| .join(ts.Order) \ | ||
| .filter(ts.Run.machine_id == req.machine.id) \ | ||
| .filter(ts.Sample.test == req.test) \ | ||
| .filter(req.field.column.isnot(None)) \ | ||
| .order_by(ts.Order.llvm_project_revision.desc()) | ||
| .order_by(ts.Order.ordinal.desc()) | ||
|
|
There was a problem hiding this comment.
PR description says all query sites in views.py switched to ORDER BY ordinal instead of Python sorting. In v4_matrix, the query is now ordered by Order.ordinal, but the view ultimately collects revisions into a set and later does all_orders.sort(reverse=True) on the revision strings, so the rendered order is still lexicographic/non-ordinal (and order_by here has no effect). Consider keeping Orders (or (revision, ordinal) pairs) and sorting by ordinal for display, to match the PR’s stated behavior.
There was a problem hiding this comment.
This makes sense to me, I don't think the doubly linked structure scales well
My main qualm: Instead of storing an integer for the ordinal can we store some sort of type compatible with the tuple that convert_revision produces, so we can just insert it directly and order directly on it. That way we don't have to update other rows when we insert a new one.
Maybe something like using an array of TEXT, with a GIN index. Is this something claude can explore
| # | ||
| # Even though we already know the right order, this is faster than | ||
| # issueing separate queries. | ||
| runs.sort(key=lambda r: r.order, reverse=(direction == -1)) |
There was a problem hiding this comment.
Is this still needed now that the results are sorted?
There was a problem hiding this comment.
It is, since runs is:
runs = session.query(self.Run).\
filter(self.Run.machine == run.machine).\
filter(self.Run.order_id.in_(ids_to_fetch)).all()
which is not sorted. We could order runs by ordinal in the db query, but that requires joining back to Order, and the comment right above seems to say that it's faster to sort in Python. Also, the number of runs we're sorting is small since it's basically N where N is the number of adjacent runs we're extracting.
lnt/server/db/testsuitedb.py
Outdated
| existing_orders[i].ordinal = i + 1 | ||
| order.ordinal = insert_at |
There was a problem hiding this comment.
Where does this transaction get committed? Do you have any debug output of what SQL it generates when a new order is created.
Two things I can think of we need to check:
- This is indeed one single transaction, not one UPDATE per row
- The SQL generated is something like
UPDATE orders SET ordinal = ordinal + 1, notUPDATE orders SET ordinal = $1 where id = $2where$1and$2are pairs of ids and new ordinals plumbed through
There was a problem hiding this comment.
The transaction gets committed by whoever calls _getOrCreateOrder (via _getOrCreateRun).
You're right about the UPDATE here though, this generates a bunch of updates instead of a single one. Switching to:
# Shift all ordinals at or after the insertion point up by 1.
session.query(self.Order) \
.filter(self.Order.id != order.id) \
.filter(self.Order.ordinal >= insert_at) \
.update({self.Order.ordinal: self.Order.ordinal + 1},
synchronize_session=False)Prior to this patch, orders would be stored in the database as a doubly linked list, each order knowing its previous and its next order. The position of each order in that list was determined by the convert_revision() function being run on the field of the schema marked as an order. This is awkward because this organization made it extremely difficult to e.g. sort orders in "chronological" order. Doing so required querying all the orders and then performing a sort. This is especially painful in an upcoming query API I'd like to add where data can be extracted from the server using complex query fields, including orders. For the implementation to be reasonable, we need an ordinal column representing the position of the order within the order-space. This also removes a few places where llvm_project_revision was hardcoded. Some places still remain, but this is chipping away at things. Key changes: - Migration (upgrade_18_to_19): adds ordinal column, populates it from convert_revision()-sorted orders, drops PreviousOrder/NextOrder columns - Order model: ordinal-based comparison via @total_ordering, field-based comparison only during insertion, flush() instead of commit() for transactional safety, WITH FOR UPDATE for concurrency - All query sites (regression.py, views.py, api.py) use ORDER BY ordinal instead of Python sorted() or string ORDER BY - Templates use ordinal-based prev/next navigation - New OrdinalAssignment.shtest covers insertion edge cases: first order, append, prepend with shift, middle insertion, and duplicate detection Assisted-by: Claude
would be possible to support that, but it's a lot more complicated).
| # | ||
| # Even though we already know the right order, this is faster than | ||
| # issueing separate queries. | ||
| runs.sort(key=lambda r: r.order, reverse=(direction == -1)) |
There was a problem hiding this comment.
It is, since runs is:
runs = session.query(self.Run).\
filter(self.Run.machine == run.machine).\
filter(self.Run.order_id.in_(ids_to_fetch)).all()
which is not sorted. We could order runs by ordinal in the db query, but that requires joining back to Order, and the comment right above seems to say that it's faster to sort in Python. Also, the number of runs we're sorting is small since it's basically N where N is the number of adjacent runs we're extracting.
lnt/server/db/testsuitedb.py
Outdated
| existing_orders[i].ordinal = i + 1 | ||
| order.ordinal = insert_at |
There was a problem hiding this comment.
The transaction gets committed by whoever calls _getOrCreateOrder (via _getOrCreateRun).
You're right about the UPDATE here though, this generates a bunch of updates instead of a single one. Switching to:
# Shift all ordinals at or after the insertion point up by 1.
session.query(self.Order) \
.filter(self.Order.id != order.id) \
.filter(self.Order.ordinal >= insert_at) \
.update({self.Order.ordinal: self.Order.ordinal + 1},
synchronize_session=False)c61e3da to
71f3ee2
Compare
This adds complexity but allows supporting SQLite for local deployments. Assisted-by: Claude
|
@lukel97 Let me know if you have additional thoughts about this patch. This is one of the stepping stones for the upcoming v5 API (and associated UI redesign, and more!) |
Do you have any designs of the UI redesign? Interested to hear more |
Yeah, I have a working site and an new OpenAPI API that can be used by agents to investigate regressions, etc. It's still a work in progress but I'm very excited about the prospects. We should probably chat for a demo. |
It's fine to have unreferenced columns and it greatly simplifies the code.
…ciated This makes it a lot more robust and simple: Orders are created with an ordinal and that's how they are compared everywhere, period. There is no complicated fallback in case the ordinal field is None.
ldionne
left a comment
There was a problem hiding this comment.
@lukel97 I did another pass on this and here's the gist of the changes. I think it's a lot better than what I initially had, thanks for questioning some of the changes I was making:
- In the migration, we now handle Postgres and SQLite the same, except we don't drop the old columns for SQLite. That shouldn't matter, it's functionally equivalent and we don't care about db size for short lived local instances
- Orders are now created with an ordinal, always. That's the only way to compare Order objects. This simplifies a lot of the code and it simplifies the mental model even more.
- On insertion, we do some work to figure out the right ordinal for the newly-inserted order, and we then create the order with the right ordinal from the get go.
I think this is a lot better and clearer than how it started, let me know what you think.
Prior to this patch, orders would be stored in the database as a doubly linked list, each order knowing its previous and its next order. The position of each order in that list was determined by the convert_revision() function being run on the field of the schema marked as an order.
This is awkward because this organization made it extremely difficult to e.g. sort orders in "chronological" order. Doing so required querying all the orders and then performing a sort. This is especially painful in an upcoming query API I'd like to add where data can be extracted from the server using complex query fields, including orders. For the implementation to be reasonable, we need an ordinal column representing the position of the order within the order-space.
This also removes a few places where llvm_project_revision was hardcoded. Some places still remain, but this is chipping away at things.
Key changes:
Assisted-by: Claude