Welcome coln-query 👋#28
Conversation
This is a so far unintegrated copy of my thesis' code.
| [workspace.dependencies] | ||
| anyhow = "1.0.102" | ||
| thiserror = "2.0.18" | ||
| criterion = "0.8.2" |
There was a problem hiding this comment.
perhaps move serde here as well? Given both store and query needs to use serde for JSON IR?
There was a problem hiding this comment.
Yes, this was just the minimum required for my stuff. We could add:
- serde
- logging/tracing
| //! | ||
| //! 1. An interface for passing deltas of row-oriented data. There is | ||
| //! [TableDelta], [StoreDelta], and [DerivedDataDelta]. | ||
| //! 2. An interface for [Transactions](Tx). A transactions can be in exactly one |
There was a problem hiding this comment.
This transaction is a query engine concept right? The storage engine has a transaction concept as well https://github.com/coln-project/Coln/blob/main/packages/coln-store/docs/primitives.md
Just want to check they are separate
There was a problem hiding this comment.
Do you expect this transaction interface to be called by coln-store, like when the store is about to commit some data, then pass the data to coln-query using this transaction interface?
There was a problem hiding this comment.
Yes, exactly. I think we need a Tx like interface because whenever we feed in new base facts (EDPs) and try to commit, constraints could be violated and in such case state needs to be rolled back. In batch query engines, this only affects the storage layer but with incremental view maintenance state has to be undone in the query engine, too.
There was a problem hiding this comment.
This transaction is a query engine concept right? The storage engine has a transaction concept as well https://github.com/coln-project/Coln/blob/main/packages/coln-store/docs/primitives.md Just want to check they are separate
Maybe we can even reuse this interface for the storage engine, too. So far it is not specific to the query engine and anyone implementing the TxStore trait can work with it.
|
|
||
| pub struct Table { | ||
| name: String, | ||
| // TODO: How does the schema interface look like? Maybe it doesn't need to |
There was a problem hiding this comment.
We have a schema definition in the IR https://github.com/coln-project/Coln/blob/main/packages/coln-lang-rs/src/ir/mod.rs#L79
And I agree we should not pass it every time, but just once when we load a theory.
There was a problem hiding this comment.
you can also check out how coln-store represents a table https://github.com/coln-project/Coln/blob/coln-store/ffi/packages/coln-store/src/table.rs#L151
It's pretty similar to what you did, and I think for the interfacing purposes it's probably enough to just pass around the path, i.e. the name of the table.
| /// n-times. | ||
| z_weight: ZWeight, | ||
| /// The row-oriented data. | ||
| row: Vec<ScalarTypedValue>, |
There was a problem hiding this comment.
The row here will not just be scalar values, see https://github.com/coln-project/Coln/blob/main/packages/coln-store/src/table.rs#L90
There will be a id variant, which is like a foreign key. For example, in a vertex table, it might contain ids to the particular graph that this vertex belongs to. This information is probably useful to the query eninge as well.
There was a problem hiding this comment.
Ah I see the Id also contains the commit hash. I remember Martin mentioning that the commit hash may not be passed to the query engine but instead Ids are just pairs of ints (from the perspective of the query engine). Let's discuss this over the next days.
|
|
||
| /// For each query checking a constraint this reports back identified | ||
| /// counterexamples. | ||
| pub struct Violations { |
There was a problem hiding this comment.
Might be useful to have some reasons for the violation as well. Something like
| // query results to higher layers and they figure out the intepretation of | ||
| // the query results? (For the antijoin approach that would be checking if | ||
| // the result is empty (constraint met) or, otherwise, the output serve as | ||
| // counterexamples.) Shall that duty be up to the query engine? |
There was a problem hiding this comment.
I think so. Because a) the compiler won't check this because this is a runtime property; b) coln-store won't do this because this is not a storage responsiblity :)
Unless we want a third component that interprets the results, I don't know how necessary it is to have such a component though? Seems to me ok to encode the violation policy in the query engine.
There was a problem hiding this comment.
Sounds alright to me. I would like to build it as another layer anyways even if it's part of coln-query.
| // 2. In case of chased laws (inside rules): Are they just some inevitable (but | ||
| // somewhat irrelevant) computation step to figure out if all constraints are | ||
| // met, or do they require back propagation up to the end user (i.e. they | ||
| // need to be communicated back from the query engine)? |
There was a problem hiding this comment.
communicate what? To me the chase are the ways for the query engine to compute an initial model that fulfills the theory definition, given some edbs.
There was a problem hiding this comment.
Not quite sure if we talk about the same here but I guess I could rephrase it as: Is an end user ever interested in seeing the initial model given the current state of the EDB?
| // the two exclusive? I.e., are there cases in which constraints are violated | ||
| // but derived data (IDB) caused by the (invalid) transaction is still valid? |
There was a problem hiding this comment.
what do you mean by constraints violated but the derived data valid? When would such a case occur?
| } | ||
| } | ||
| /// Convenience method to add data beyond initialization. | ||
| pub fn insert<I: IntoIterator<Item = (Table, Vec<TableDelta>)>>(&mut self, deltas: I) { |
There was a problem hiding this comment.
By the way, I defined an interface for coln-store to expose row data, it's similar to what you did, so perhaps we want something (table_path, Vec)?
Not sure if you need row id, but if you don't we can remove that. In general it's useful to know the row id because one row might refer to another row by its id.
https://github.com/coln-project/Coln/blob/coln-store/ffi/packages/coln-store/src/store/mod.rs#L96
There was a problem hiding this comment.
Yes, a table_path (or table name) should be enough (just like above, we can communicate the schema once during initialization and thereafter use table paths only).
…r own wrappers, and cli table prints of deltas
This is an unintegrated copy of my thesis' code except for the API proposal for
coln-storeandcoln-queryinapi.rs. Happy to receive a review for that part, @incipit0.