Skip to content

Welcome coln-query 👋#28

Open
lstwn wants to merge 6 commits into
mainfrom
coln-query/init
Open

Welcome coln-query 👋#28
lstwn wants to merge 6 commits into
mainfrom
coln-query/init

Conversation

@lstwn

@lstwn lstwn commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

This is an unintegrated copy of my thesis' code except for the API proposal for coln-store and coln-query in api.rs. Happy to receive a review for that part, @incipit0.

This is a so far unintegrated copy of my thesis' code.
Comment thread Cargo.toml
[workspace.dependencies]
anyhow = "1.0.102"
thiserror = "2.0.18"
criterion = "0.8.2"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps move serde here as well? Given both store and query needs to use serde for JSON IR?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was just the minimum required for my stuff. We could add:

  • serde
  • logging/tracing

Comment thread packages/coln-query/src/api.rs Outdated
//!
//! 1. An interface for passing deltas of row-oriented data. There is
//! [TableDelta], [StoreDelta], and [DerivedDataDelta].
//! 2. An interface for [Transactions](Tx). A transactions can be in exactly one

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This transaction is a query engine concept right? The storage engine has a transaction concept as well https://github.com/coln-project/Coln/blob/main/packages/coln-store/docs/primitives.md
Just want to check they are separate

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you expect this transaction interface to be called by coln-store, like when the store is about to commit some data, then pass the data to coln-query using this transaction interface?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly. I think we need a Tx like interface because whenever we feed in new base facts (EDPs) and try to commit, constraints could be violated and in such case state needs to be rolled back. In batch query engines, this only affects the storage layer but with incremental view maintenance state has to be undone in the query engine, too.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This transaction is a query engine concept right? The storage engine has a transaction concept as well https://github.com/coln-project/Coln/blob/main/packages/coln-store/docs/primitives.md Just want to check they are separate

Maybe we can even reuse this interface for the storage engine, too. So far it is not specific to the query engine and anyone implementing the TxStore trait can work with it.

Comment thread packages/coln-query/src/api.rs Outdated

pub struct Table {
name: String,
// TODO: How does the schema interface look like? Maybe it doesn't need to

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a schema definition in the IR https://github.com/coln-project/Coln/blob/main/packages/coln-lang-rs/src/ir/mod.rs#L79

And I agree we should not pass it every time, but just once when we load a theory.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can also check out how coln-store represents a table https://github.com/coln-project/Coln/blob/coln-store/ffi/packages/coln-store/src/table.rs#L151

It's pretty similar to what you did, and I think for the interfacing purposes it's probably enough to just pass around the path, i.e. the name of the table.

Comment thread packages/coln-query/src/api.rs Outdated
/// n-times.
z_weight: ZWeight,
/// The row-oriented data.
row: Vec<ScalarTypedValue>,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The row here will not just be scalar values, see https://github.com/coln-project/Coln/blob/main/packages/coln-store/src/table.rs#L90

There will be a id variant, which is like a foreign key. For example, in a vertex table, it might contain ids to the particular graph that this vertex belongs to. This information is probably useful to the query eninge as well.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see the Id also contains the commit hash. I remember Martin mentioning that the commit hash may not be passed to the query engine but instead Ids are just pairs of ints (from the perspective of the query engine). Let's discuss this over the next days.

Comment thread packages/coln-query/src/api.rs Outdated

/// For each query checking a constraint this reports back identified
/// counterexamples.
pub struct Violations {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be useful to have some reasons for the violation as well. Something like

https://github.com/coln-project/Coln/blob/coln-store/ffi/packages/coln-store/src/solver/validate.rs#L17

Comment thread packages/coln-query/src/api.rs Outdated
// query results to higher layers and they figure out the intepretation of
// the query results? (For the antijoin approach that would be checking if
// the result is empty (constraint met) or, otherwise, the output serve as
// counterexamples.) Shall that duty be up to the query engine?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. Because a) the compiler won't check this because this is a runtime property; b) coln-store won't do this because this is not a storage responsiblity :)

Unless we want a third component that interprets the results, I don't know how necessary it is to have such a component though? Seems to me ok to encode the violation policy in the query engine.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds alright to me. I would like to build it as another layer anyways even if it's part of coln-query.

Comment thread packages/coln-query/src/api.rs Outdated
// 2. In case of chased laws (inside rules): Are they just some inevitable (but
// somewhat irrelevant) computation step to figure out if all constraints are
// met, or do they require back propagation up to the end user (i.e. they
// need to be communicated back from the query engine)?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

communicate what? To me the chase are the ways for the query engine to compute an initial model that fulfills the theory definition, given some edbs.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure if we talk about the same here but I guess I could rephrase it as: Is an end user ever interested in seeing the initial model given the current state of the EDB?

Comment thread packages/coln-query/src/api.rs Outdated
Comment on lines +113 to +114
// the two exclusive? I.e., are there cases in which constraints are violated
// but derived data (IDB) caused by the (invalid) transaction is still valid?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean by constraints violated but the derived data valid? When would such a case occur?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not do it ;)

Comment thread packages/coln-query/src/api.rs Outdated
}
}
/// Convenience method to add data beyond initialization.
pub fn insert<I: IntoIterator<Item = (Table, Vec<TableDelta>)>>(&mut self, deltas: I) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I defined an interface for coln-store to expose row data, it's similar to what you did, so perhaps we want something (table_path, Vec)?

Not sure if you need row id, but if you don't we can remove that. In general it's useful to know the row id because one row might refer to another row by its id.

https://github.com/coln-project/Coln/blob/coln-store/ffi/packages/coln-store/src/store/mod.rs#L96

@lstwn lstwn Jun 10, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a table_path (or table name) should be enough (just like above, we can communicate the schema once during initialization and thereafter use table paths only).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants