Optimize sync streams with multiple conditions on the same field

Take this sync stream query as an example:

```sql
SELECT * FROM tasks WHERE project_id = subscription.parameter('project_id') AND project_id IN auth.parameter('project_ids')
```

This is a fairly typical requirement: Sync streams are created for specific projects, while the JWT contains an array of all projects that the user may access.

The issue is that the internal bucket structure is quite inefficient here: This implicitly creates two bucket parameters for the two clauses, even though they reference the same field. So if you subscribe to `{project_id: 1}` and have a token with `{project_ids: [1,2,3]}`, we try to sync buckets for `tasks[1,1], tasks[1,2], tasks[1,3]`, even though data can only exist for `tasks[1,1]`. In some cases, this can result in exceeding the bucket limit of 1000.

A current workaround is to refactor the clauses into this:

```sql
SELECT * FROM tasks WHERE project_id = subscription.parameter('project_id') AND subscription.parameter('project_id') IN auth.parameter('project_ids')
```

This avoids the two bucket parameters, resolving the issue. However, we should be able to do this automatically.

## Proposal

When the same field is referenced in multiple clauses, merge the bucket parameter handling for them. We can do this in different ways:
1. Only have a single bucket parameter for this case. This is effectively a change in storage format, meaning we need to take care of not breaking instances with existing data (see "replication logic version" in #397).
2. Keep the two parameters, but pre-filter out any combinations that cannot exist.

We'll need to check if it's feasible to apply the same optimization to more cases, for example:

```sql
-- 1. Using functions / operators on the field
SELECT * FROM tasks WHERE lower(project_id) = subscription.parameter('project_id') AND lower(project_id) IN auth.parameter('project_ids')
-- 2. Combining with AND/OR operators (should work after converting to a normalized representation)
SELECT * FROM tasks WHERE always_sync OR (project_id = subscription.parameter('project_id') AND project_id IN auth.parameter('project_ids') AND auth.parameter('is_admin'))
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize sync streams with multiple conditions on the same field #399

Proposal

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Optimize sync streams with multiple conditions on the same field #399

Description

Proposal

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions