You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SELECT*FROM tasks WHERE project_id =subscription.parameter('project_id') AND project_id INauth.parameter('project_ids')
This is a fairly typical requirement: Sync streams are created for specific projects, while the JWT contains an array of all projects that the user may access.
The issue is that the internal bucket structure is quite inefficient here: This implicitly creates two bucket parameters for the two clauses, even though they reference the same field. So if you subscribe to {project_id: 1} and have a token with {project_ids: [1,2,3]}, we try to sync buckets for tasks[1,1], tasks[1,2], tasks[1,3], even though data can only exist for tasks[1,1]. In some cases, this can result in exceeding the bucket limit of 1000.
A current workaround is to refactor the clauses into this:
SELECT*FROM tasks WHERE project_id =subscription.parameter('project_id') ANDsubscription.parameter('project_id') INauth.parameter('project_ids')
This avoids the two bucket parameters, resolving the issue. However, we should be able to do this automatically.
Proposal
When the same field is referenced in multiple clauses, merge the bucket parameter handling for them. We can do this in different ways:
Only have a single bucket parameter for this case. This is effectively a change in storage format, meaning we need to take care of not breaking instances with existing data (see "replication logic version" in [Postgres] Use primary key for REPLICA IDENTITY FULL tables #397).
Keep the two parameters, but pre-filter out any combinations that cannot exist.
We'll need to check if it's feasible to apply the same optimization to more cases, for example:
-- 1. Using functions / operators on the fieldSELECT*FROM tasks WHERElower(project_id) =subscription.parameter('project_id') ANDlower(project_id) INauth.parameter('project_ids')
-- 2. Combining with AND/OR operators (should work after converting to a normalized representation)SELECT*FROM tasks WHERE always_sync OR (project_id =subscription.parameter('project_id') AND project_id INauth.parameter('project_ids') ANDauth.parameter('is_admin'))
Take this sync stream query as an example:
This is a fairly typical requirement: Sync streams are created for specific projects, while the JWT contains an array of all projects that the user may access.
The issue is that the internal bucket structure is quite inefficient here: This implicitly creates two bucket parameters for the two clauses, even though they reference the same field. So if you subscribe to
{project_id: 1}and have a token with{project_ids: [1,2,3]}, we try to sync buckets fortasks[1,1], tasks[1,2], tasks[1,3], even though data can only exist fortasks[1,1]. In some cases, this can result in exceeding the bucket limit of 1000.A current workaround is to refactor the clauses into this:
This avoids the two bucket parameters, resolving the issue. However, we should be able to do this automatically.
Proposal
When the same field is referenced in multiple clauses, merge the bucket parameter handling for them. We can do this in different ways:
We'll need to check if it's feasible to apply the same optimization to more cases, for example: