This section documents some of the improvements that we plan to make to Sleeper.
The following improvements are actively being worked on:
- gchq#5781 Easier CDK-focused deployment.
The following are likely to be worked on in the near future:
- gchq#5078 Improvements to bulk import.
- gchq#6132 Create a Spark DataFrame from a Sleeper table.
- gchq#4215 Optional long running service for queries.
- gchq#4235 Graceful upgrade of a Sleeper instance.
The following improvements will be worked on in future (these are in no particular order):
- gchq#6059 Data types for floating point value fields.
- gchq#6058 Notifications for ingest progress.
- gchq#6117 Instance health checks.
- gchq#4213 Batch up partition splitting commits.
- gchq#1391 Create a library of repeatable, sustained, large-scale performance tests.
- gchq#1393 Bulk export queries, tracking, restore from export.
- gchq#4396 Failure handling / backpressure for state store updates.
- gchq#3693 Improvements to declarative deployment with infrastructure as code.
- gchq#576 Use Arrow types in the table schema.
- gchq#4398 Trigger compaction dispatch in transaction log follower.
- Scaling improvements.
- Usability improvements.
- gchq#1392 Create a predicate language for specifying filters on queries.
- gchq#1390 Review and extend the integrations with Athena and Trino.
- gchq#5675 Visibility of long term tracker metrics.
- Metrics page. Review and extend the metrics produced.
- Purge data from a table, i.e. delete any items matching a predicate.
We also have an article on potential deployment improvements, examining how the current deployment setup relates to the planned improvements linked above.