This is a brief guide to developing Sleeper.
Before you do any dev work on Sleeper it is worth reading the "Get your environment setup" section in the deployment guide. Once you've built the system, exactly the same will apply here with a copy that you built yourself.
There are a number of dependencies for building Sleeper, and a few options to set up a development environment with these available.
The Sleeper Git repository includes configuration for a dev container based on the sleeper builder Docker image from
the CLI. This includes all the same dependencies. If your IDE supports Dev Containers, it can work against this Docker
image based on this configuration.
The dev container configuration will mount any AWS CLI, Maven and SSH configuration from your host machine into the container.
On Mac or Linux this should just work. On Windows it should be launched from inside Windows Subsystem for Linux (WSL). Working in WSL will also let you use the project's Bash scripts from Windows.
Here are some example steps to set this up in Windows:
- Install Visual Studio Code in Windows
- Install Ubuntu from the Windows Store
- Clone the Git repository in an Ubuntu terminal
- Run
codein an Ubuntu terminal, which will install a VS Code server and connect it to Windows - In the VS Code window that opened, open the Git repository
- Click the prompt to open the dev container, or use ctrl+shift+P, Dev Containers: Reopen in Container
- The dev container will build, open, and install VS Code extensions
You can use the Nix package manager to get up to date versions of all the
dependencies except Docker and Bash. When you have Nix installed, an easy way to get a development environment is to run
nix-shell at the root of the Sleeper Git repository. This will start a shell with all the Sleeper dependencies
installed, without installing them in your system. If you run your IDE from that shell, the dependencies will be
available in your IDE. You can run nix-shell again whenever you want to work with Sleeper.
This has problems working with Python code. The Nix package for the AWS CLI adds a number of libraries to the system Python, and pins them to specific versions. It's not possible to override this in a virtual environment, so it's likely there will be conflicts with the AWS library used in the Python code for Sleeper. This may prevent execution of the Sleeper Python code. If you change the Python dependencies in the Nix shell, this may break the AWS CLI.
You can also download shell.nix directly if you'd like to avoid installing Git. You can then git clone
the repository from the Nix shell. Here's an example to get the latest release:
curl "https://raw.githubusercontent.com/gchq/sleeper/main/shell.nix" -o ./shell.nix
nix-shell ./shell.nix
git clone https://github.com/gchq/sleeper.git
cd sleeper
git checkout --track origin/mainIf you installed the Sleeper CLI from GitHub as described in the getting started guide, you can
use sleeper builder to get a shell inside a Docker container with the dependencies pre-installed. This is the same
container image that's used for the Dev Containers setup above. It may be useful if you want to work inside Docker
without using Dev Containers.
If you're in an EC2 deployed with sleeper environment, the Sleeper CLI was pre-installed and the repository was
already checked out when you created the EC2. Otherwise, you'll need to clone the repository in the container. You can
use the commands below to do this:
sleeper builder
git clone https://github.com/gchq/sleeper.git
cd sleeperEverything in the repository will be persisted between executions of sleeper builder.
If you have AWS CLI installed in the host, the same configuration will be used in the builder container. Otherwise, any configuration you set in the container will be persisted in the host home directory. AWS authentication environment variables will be propagated to the container as well.
The host Docker environment will be propagated to the container via the Docker socket.
The files generated for the Sleeper instance will be persisted in the host home directory under ~/.sleeper, so that
if you run the Docker container multiple times you will still have details of the last Sleeper instance you worked with.
If you add a command on the end, you can run a specific script like this:
sleeper builder sleeper/scripts/test/deployAll/deployTest.sh myinstanceid myvpc mysubnetYou will need the following software:
- AWS CDK
- AWS CLI
- Bash: Minimum v3.2. Use
bash --version. - Docker
- Java: Requires version 17, to match the version used by AWS EMR
- Maven
- NodeJS / NPM
- Rust
Provided script (recommended) - this builds the code and copies the jars into the scripts directory so that the scripts work. Starting from the root of the Git repository:
./scripts/build/buildForTest.shYou can disable building the Rust code by passing -DskipRust as an argument to that script. This can also be passed in
any Maven build. This can speed up the build if you don't need the DataFusion data engine, or if you've already had a
previous build that included Rust, skipping Rust will reuse the same binaries.
When running Maven directly, you can pass -Pquick to skip tests and linting.
To build the Sleeper CLI, you can run this script:
./scripts/cli/buildAll.shUse ./scripts/cli/runInDocker.sh to run the built CLI. This will act the same as running the sleeper
command after installing the CLI. You can manually install it if you copy that script somewhere, rename it to sleeper,
and put it on the system path. Then sleeper ... commands will work as though you'd installed it normally.
If you have the CLI installed already it will be replaced with the version that is built. If the runInDocker.sh script
is different in the version you installed before, it will not be replaced. You can find it
at $HOME/.local/bin/sleeper, and manually overwrite it with the contents of ./scripts/cli/runInDocker.sh.
Tools are available to publish built artefacts to shared repositories, and to install them locally to avoid the need to build Sleeper yourself. We do not currently publish artefacts publicly. See publishing artefacts for how to set this up yourself.
The codebase is structured around the components explained in the design document. The elements of the design largely correspond to Maven modules. We'll look at the module architecture in more detail below.
If you'd like to look at how the modules relate to one another in terms of their dependencies, there is a script documented in development scripts that can display the dependency structure as a graph. There's also a document with information on past and current dependency conflicts.
If you'd like to raise or pick up an open issue, see the contributing guide for more information.
Configuration is available for various development environments.
For VS Code there's a separate setup guide.
For IntelliJ, these settings are available to import:
- Code style scheme at code-style/intellij-style.xml
- Inspection profile at code-style/intellij-inspection-profile.xml
- Copyright profile for license header at code-style/intellij-copyright-profile.xml
- Checkstyle plugin settings in code-style/checkstyle-idea
For Eclipse, these settings are available to import:
- Code style at code-style/eclipse-style.xml
- Import order at code-style/eclipse-import-order.importorder
- License header at code-style/licenseHeader.txt
- Code templates at code-style/eclipse-codetemplates.xml
- Editor templates at code-style/eclipse-templates.xml
Most Maven modules map to features of Sleeper, and we also have "core" modules, "common" modules, and some other modules to do with the build, deployment with the CDK, and system tests.
The "core" modules make up the main application code independent of infrastructure or external dependencies. These are
the module core, and other modules with "core" in the name nested against specific features. These core modules
represent the "application" part of a ports and adapters, or hexagonal architecture. They do not contain external
dependencies such as the AWS SDK, Parquet or other client libraries. They do contain dependencies for logging, and some
utilities for serialisation/deserialisation.
The core module contains shared code for things like configuring a Sleeper instance, interacting with the state of a
Sleeper table, and some common logic to track operations for reporting. Each of these have adapters that connect these
things to AWS, but the adapter code sits in other, non-core modules that are specific to those features.
The other modules with "core" in the name involve core application code that we felt was more peripheral to the system.
For example, the specifics of how we model a compaction job is in the module compaction-core, but the high level
tracking of compaction, and the interactions of compaction with the state of a Sleeper table, are both part of the
core module. The compaction-core module sits alongside other non-core modules that contain the adapter code that
links the application code for compaction to AWS.
The "common" modules are shared utilities for interacting with external dependencies in common ways. They sit under the
directory java/common. This includes tools for working with DynamoDB, testing code using AWS clients against
LocalStack, and some infrastructure code to do with invoking lambdas, running jobs in a task, and starting tasks that
will run jobs. Here a job is some process that needs to run, and a task is a piece of infrastructure that can run jobs,
e.g. an AWS ECS task.
The Maven project includes Checkstyle and Spotbugs. These are run on every pull request. You can run them locally with the Maven checkstyle:check and spotbugs:check goals. Your IDE may have plugins available to alert you of violations.
You can run both plugins together:
cd java
mvn clean compile checkstyle:check spotbugs:checkSee the test strategy for how and when to write tests, as well as information on the testing tools used in the project.
See the coding conventions document for practices we try to adhere to when working on Sleeper.
See the deployment guide for notes on how to deploy Sleeper, and the system test guide to deploy instances specifically set up for development.
See the release process guide for instructions on how to publish a release of Sleeper.
See development scripts for scripts that can assist you while working on Sleeper.