-
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
User Story
As a developer running complex queries (joins, subqueries, filtered aggregations) against DecentDB, I want the query planner to choose execution plans based on actual data distribution so that query performance is predictable and efficient regardless of table sizes and data skew.
Problem
The current planner uses fixed heuristics with no selectivity estimation and processes joins in left-to-right declaration order. For simple queries this works fine, but as EF Core and application queries grow in complexity (multi-table joins, correlated subqueries, filtered aggregates), the lack of statistics leads to suboptimal plans — full scans where index seeks would be better, or wrong join ordering that produces large intermediate results.
Acceptance Criteria
- Statistics collection:
ANALYZEcommand computes per-table row counts and per-index cardinality; statistics stored in catalog - Statistics maintenance: Row counts updated incrementally during INSERT/UPDATE/DELETE (exact or approximate)
- Cost model: Each plan operator has a cost function based on estimated input cardinality and I/O
- Selectivity estimation: WHERE predicates produce estimated selectivity from statistics (equality, range, LIKE)
- Join reordering: Planner evaluates multiple join orderings and selects lowest-cost plan (at least for ≤6 tables; heuristic fallback for more)
- Index selection: Cost-based choice between table scan vs. index seek based on estimated selectivity
- EXPLAIN ANALYZE shows estimated vs. actual row counts for plan validation
- Backward compatibility: Existing queries produce correct results (plans may change); no database format breakage for users who never run ANALYZE
- All existing tests pass; differential tests against PostgreSQL continue to match
- Write benchmarks show no regression; read benchmarks show improvement on multi-join queries
Technical Notes
- ADR: 0038 (currently Deferred/Post-1.0 — needs full design ADR before implementation)
- Risk: High — affects every query plan; join reordering could expose latent executor bugs; regression risk requires extensive testing
- Key files:
src/planner/planner.nim(~469 lines today),src/planner/explain.nim,src/exec/exec.nim,src/storage/catalog.nim - Current state: 18 plan operators, zero statistics infrastructure, no cost model, fixed left-to-right join order
- Format impact: Yes — catalog needs to persist row counts and cardinality. Requires ADR + format version consideration.
- Suggested phasing:
- Statistics collection + ANALYZE command
- Cost model + selectivity estimation
- Index selection improvements
- Join reordering (dynamic programming, Selinger-style)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request