Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
53e1ea7
6742 Update README.md
patchwork01 Mar 13, 2026
71a01c6
6742 Remove LocalStack from getting started guide
patchwork01 Mar 13, 2026
da481f7
6742 Work on getting started flow
patchwork01 Mar 13, 2026
4341b47
6742 Adjust documentation for environment tool
patchwork01 Mar 13, 2026
21986d6
6742 Adjust getting started introduction
patchwork01 Mar 16, 2026
2a9a9bc
6742 Adjust getting started guide
patchwork01 Mar 17, 2026
29a01c6
6742 Adjust environment setup section
patchwork01 Mar 17, 2026
7f91f72
6742 Adjust explanation of demonstration deployment
patchwork01 Mar 17, 2026
9ceef30
6742 Adjust demonstration deployment docs, note Amazon Linux problem
patchwork01 Mar 17, 2026
565317f
6742 Adjust references to builder image in developer guide
patchwork01 Mar 17, 2026
360f3a1
6742 Adjust start of usage guide
patchwork01 Mar 17, 2026
1904cf8
6742 Adjust environment setup documentation
patchwork01 Mar 17, 2026
987cc63
6742 Adjust start of deployment guide
patchwork01 Mar 17, 2026
2f25c58
6742 Adjust references to Docker tools in developer guide
patchwork01 Mar 17, 2026
991dd90
6742 Adjust use of "environment" in docs
patchwork01 Mar 17, 2026
0ab56b3
6742 Update links to deployment guide
patchwork01 Mar 17, 2026
8a46fce
6742 Adjust tables documentation
patchwork01 Mar 17, 2026
8497937
6742 Explain difference between scripts and CDK deployment
patchwork01 Mar 17, 2026
21002ad
6742 Add missing link
patchwork01 Mar 18, 2026
130151e
6742 Rewrite environment tool introduction
patchwork01 Mar 18, 2026
2d71f7e
6742 Fix typo in getting started
patchwork01 Mar 18, 2026
560d824
6742 Fix Corretto link
patchwork01 Mar 18, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ Sleeper
## Introduction

Sleeper is a serverless, cloud-native, log-structured merge tree based, scalable key-value store. It is designed to
allow the ingest of very large volumes of data at low cost. Queries for rows where the key takes a given value takes
around 1-2 seconds, but many thousands can be run in parallel. Each individual query has a negligible cost.
allow the ingest of very large volumes of data at low cost. A query for rows where the key takes a given value takes
around half a second, but many thousands can be run in parallel. Each individual query has a negligible cost.

Data is stored as rows in tables. A table has a schema with key fields, optional sort fields, and some value fields.
A metadata store tracks Parquet files in S3. These files contain rows in sorted order, by the key field and then by
Expand Down
2 changes: 1 addition & 1 deletion docs/common-problems-and-their-solutions.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ These instructions will assume you start in the project root directory and Sleep
Presently the implementation is based on Arrow. Previously it used an array list, and will work differently if that is
used. See details including how to configure this in the documentation
on [row batch types](design/ingest-row-batch-types.md). To change the configuration, use the
administration client described in the [deployment guide](deployment-guide.md#sleeper-administration-client).
administration client described in the [usage guide](usage-guide.md#sleeper-administration-client).


### Arrow implementation
Expand Down
41 changes: 11 additions & 30 deletions docs/deployment-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,44 +3,25 @@ Building and deploying Sleeper

This contains instructions on how to deploy Sleeper.

Please follow the [getting started guide](getting-started.md) to install or build Sleeper and its dependencies, and
prepare your AWS account. For the rest of this guide we'll assume you're working in an EC2 instance in an AWS account
that's configured appropriately. You should either have the dependencies listed there installed in your EC2, or run a
builder Docker container in your EC2 with the [Sleeper Docker tools](deployment/docker-tools.md), which comes with those
dependencies pre-installed.

If you just want a local instance for testing, see the documentation
on [deploying to localstack](deployment/deploy-to-localstack.md). This has very limited functionality compared to a
deployed instance.

## Get your environment set up

You will need to get your environment set up correctly so that you can deploy a Sleeper instance to AWS and then
interact with it. See [environment setup](deployment/environment-setup.md) for how to install the Sleeper CLI and create
an environment suitable for deploying Sleeper.

If you set up the [Sleeper CLI deployment environment](deployment/cli-deployment-environment.md), you can connect to it
and build Sleeper like this:

```bash
sleeper environment connect # Get a shell in the EC2 you deployed
sleeper builder # Get a shell in a builder Docker container (hosted in the EC2)
cd sleeper # Change directory to the root of the Git repository
./scripts/build/build.sh
```

If you used the system test deployment described in the getting started guide, you will have already built Sleeper from
the Git repository in a `sleeper builder` container. If you deploy from outside of AWS this will involve lengthy uploads
of build artefacts, which you can avoid with the environment EC2, or your own EC2 instance. If you deploy from your own
EC2, you will need to check out the Git repository inside a `sleeper builder` container yourself.

The `sleeper builder` command gets you a shell inside a Docker container with all the dependencies required to build and
deploy an instance of Sleeper. The container will be deleted after you exit. You will start in a directory mounted into
the container from a folder in the host home directory under `~/.sleeper`. This workspace will persist after the
container exits, and will be reused by future calls to `sleeper builder`. It also inherits the AWS and Docker
configuration from the host.

## Deployment

Sleeper is deployed using the AWS CDK. You can invoke the CDK to do this either using the automated scripts or by using
the CDK directly.
the CDK directly. The scripts do the same thing as the direct CDK deployment, but through Java and with some
configuration specific to either creation or update of an existing instance. The scripts also include configuration of
tables, which direct CDK deployment does not.

Either approach should be done from within an EC2 instance set up as described above, to avoid lengthy uploads of large
jar files and Docker images.
Either approach should be done from within an EC2 instance to avoid lengthy uploads of large jar files and Docker
images.

### Using the CDK directly

Expand Down
6 changes: 3 additions & 3 deletions docs/deployment/deploy-with-cdk.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,9 +96,9 @@ SleeperInstance.createAsNestedStack(stack, "MyInstance",
### Using the CDK CLI

To deploy a Sleeper instance to AWS with the CDK, you need an [instance configuration](instance-configuration.md) and
a [suitable environment](environment-setup.md). The artefacts will need to be uploaded as described in the section
above. You can either use the instance ID as the deployment ID for the artefacts, or you can set the deployment ID in
the CDK context variable `artefactsId`, or the instance property `sleeper.artefacts.deployment`.
a [suitable deployment environment](environment-setup.md) in your AWS account. The artefacts will need to be uploaded as
described in the section above. You can either use the instance ID as the deployment ID for the artefacts, or you can
set the deployment ID in the CDK context variable `artefactsId`, or the instance property `sleeper.artefacts.deployment`.

You can use the same CDK apps used by the automated scripts, or your own CDK configuration. We'll give examples with the
CDK apps used by the automated scripts. The following commands will deploy a Sleeper instance:
Expand Down
4 changes: 2 additions & 2 deletions docs/deployment/docker-images.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ A deployment of Sleeper includes components that run in Docker containers. This
are used in Sleeper, how to build them, and how to make them available for deployment.

The easiest way to build and deploy these images is with our automated scripts. See
the [deployment guide](../deployment-guide.md) and [deployment with the CDK](./deploy-with-cdk.md) for more information.
The information below may be useful if you prefer to replicate this yourself.
the [deployment guide](../deployment-guide.md) for more information. The information below may be useful if you prefer
to replicate this yourself.

## Docker deployment images

Expand Down
46 changes: 46 additions & 0 deletions docs/deployment/docker-tools.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
Sleeper Docker tools
====================

We have a CLI available with some tools that can help you work with Sleeper. For now this just contains commands to run
a couple of Docker containers. In the future we may merge the scripts for interacting with Sleeper into this, but for
now it's more of a toolkit than a CLI to interact with Sleeper.

These tools run a Docker container that contains everything needed to build and deploy Sleeper. They can give you a
command line inside a container with the dependencies pre-installed, or run commands in such a container. This
way you can avoid needing to install any dependencies other than Docker on your machine.

## Dependencies

These tools have the following dependencies, please install these first:

* [Bash](https://www.gnu.org/software/bash/): Minimum v3.2. Use `bash --version`.
* [Docker](https://docs.docker.com/get-docker/)

## Installation

You can run the following commands to install the latest version from GitHub:

```bash
curl "https://raw.githubusercontent.com/gchq/sleeper/develop/scripts/cli/install.sh" -o ./sleeper-install.sh
chmod +x ./sleeper-install.sh
./sleeper-install.sh
```

Relaunch your terminal and check that the command `sleeper version` gives a version number. Note that this will be the
version of the CLI, rather than the version of Sleeper you will deploy.

### Commands

The CLI consists of a `sleeper` command with sub-commands. You can use `sleeper aws` or `sleeper cdk` to run `aws` or
`cdk` commands without needing to install the AWS or CDK CLI on your machine. If you set AWS environment variables or
configuration on the host machine, that will be propagated to the Docker container when you use `sleeper`.

The `sleeper builder` command gives you a command line in a Docker container with all the necessary tools to build
Sleeper, and a workspace folder persisted in the host at `~/.sleeper/builder`. You can use this to deploy and interact
with Sleeper.

You can upgrade to the latest version of the CLI using `sleeper cli upgrade`. This should be done regularly to keep the
build and deployment tools up to date.

There's a `sleeper environment` command that you can use to prepare your AWS account to deploy Sleeper into it. This is
documented in [Sleeper environment tool](environment-tool.md).
55 changes: 14 additions & 41 deletions docs/deployment/environment-setup.md
Original file line number Diff line number Diff line change
@@ -1,36 +1,5 @@
Environment setup
=================

## Install Sleeper CLI

The Sleeper CLI contains tools to deploy into AWS, and to build the system. This runs commands inside a Docker
container. This way you can avoid needing to install any dependencies other than Docker on your machine. In the future
we may also publish pre-built artefacts that will make it unnecessary to build Sleeper yourself. When using the Sleeper
CLI, the build and deployment can be invoked from a single script to minimise setup for testing.

### Dependencies

The Sleeper CLI has the following dependencies, please install these first:

* [Bash](https://www.gnu.org/software/bash/): Minimum v3.2. Use `bash --version`.
* [Docker](https://docs.docker.com/get-docker/)

### Install script

You can run the following commands to install the latest version of the CLI from GitHub:

```bash
curl "https://raw.githubusercontent.com/gchq/sleeper/develop/scripts/cli/install.sh" -o ./sleeper-install.sh
chmod +x ./sleeper-install.sh
./sleeper-install.sh
```

The CLI consists of a `sleeper` command with sub-commands. You can use `sleeper aws` or `sleeper cdk` to run `aws` or
`cdk` commands without needing to install the AWS or CDK CLI on your machine. If you set AWS environment variables or
configuration on the host machine, that will be propagated to the Docker container when you use `sleeper`.

You can upgrade to the latest version of the CLI using `sleeper cli upgrade`. This should be done regularly to keep the
build and deployment tools up to date.
Deployment environment setup
============================

## Preparing AWS

Expand Down Expand Up @@ -96,23 +65,27 @@ and you will have to see if you can adjust your existing lambdas to free some up
## Deployment environment

To deploy Sleeper, you'll need a VPC that meets certain requirements. You'll also want an EC2 instance to deploy from,
to avoid lengthy uploads of large jar files and Docker images from outside AWS. You can use the Sleeper CLI to create
both of these, see the documentation for the [Sleeper CLI deployment environment](cli-deployment-environment.md).
to avoid lengthy uploads of large jar files and Docker images from outside AWS. You can use
the [Sleeper environment tool](environment-tool.md) to create both of these.

If you prefer to use your own VPC, you'll need to ensure it meets Sleeper's requirements. It should ideally have
multiple private subnets in different availability zones. Those subnets should have egress, e.g. via a NAT gateway. The
VPC should have gateway endpoints for S3 and DynamoDB. If there is no gateway endpoint for S3, deployment of a Sleeper
instance will fail in the CDK. Note that Sleeper will not use the default security group of the VPC.

If you prefer to use your own EC2, it should run on an x86_64 architecture, with Bash and Docker, and have enough
If you prefer to use your own EC2, it should run Ubuntu on an x86_64 architecture, with Bash and Docker, and have enough
resources to build code for Maven and Rust. We've tested with 8GB RAM and 2 vCPUs, with `t3.large`. We recommend 4 vCPUs
(`t3.xlarge`), as that takes the build from over 40 minutes with 2 vCPUs, to around 20 minutes for the first build.

The [Sleeper CLI deployment environment](cli-deployment-environment.md) includes options to deploy an EC2 to
an existing VPC, or a VPC on its own.
If your EC2 runs Amazon Linux, some features such as direct queries may not work. We compile our native code against
a recent version of Ubuntu, and Amazon Linux uses an old version of glibc which is not compatible. We recommend using
Ubuntu.

The [Sleeper environment tool](environment-tool.md) includes options to deploy an EC2 to an existing VPC, or a VPC on
its own.

Once you've got a suitable VPC, and an EC2 with the Sleeper CLI installed, you can either move on
to the [deployment guide](../deployment-guide.md), or the [getting started guide deployment section](../getting-started.md#deployment)
to use a testing setup.
Once you've got a suitable VPC and EC2, you can either move on to the [deployment guide](../deployment-guide.md), or
the [getting started guide deployment section](../getting-started.md#deployment) to use a demonstration setup with
infrastructure to generate some example data.

You're now ready to build and deploy Sleeper.
Loading
Loading