Developer Guide

This is a brief guide to developing Sleeper.

Get your environment setup

Before you do any dev work on Sleeper it is worth reading the "Get your environment setup" section in the deployment guide. Once you've built the system, exactly the same will apply here with a copy that you built yourself.

Install prerequisite software

There are a number of dependencies for building Sleeper, and a few options to set up a development environment with these available.

Dev container

The Sleeper Git repository includes configuration for a dev container based on the sleeper builder Docker image from the CLI. This includes all the same dependencies. If your IDE supports Dev Containers, it can work against this Docker image based on this configuration.

The dev container configuration will mount any AWS CLI, Maven and SSH configuration from your host machine into the container.

On Mac or Linux this should just work. On Windows it should be launched from inside Windows Subsystem for Linux (WSL). Working in WSL will also let you use the project's Bash scripts from Windows.

Here are some example steps to set this up in Windows:

Install Visual Studio Code in Windows
Install Ubuntu from the Windows Store
Clone the Git repository in an Ubuntu terminal
Run code in an Ubuntu terminal, which will install a VS Code server and connect it to Windows
In the VS Code window that opened, open the Git repository
Click the prompt to open the dev container, or use ctrl+shift+P, Dev Containers: Reopen in Container
The dev container will build, open, and install VS Code extensions

Nix shell

You can use the Nix package manager to get up to date versions of all the dependencies except Docker and Bash. When you have Nix installed, an easy way to get a development environment is to run nix-shell at the root of the Sleeper Git repository. This will start a shell with all the Sleeper dependencies installed, without installing them in your system. If you run your IDE from that shell, the dependencies will be available in your IDE. You can run nix-shell again whenever you want to work with Sleeper.

This has problems working with Python code. The Nix package for the AWS CLI adds a number of libraries to the system Python, and pins them to specific versions. It's not possible to override this in a virtual environment, so it's likely there will be conflicts with the AWS library used in the Python code for Sleeper. This may prevent execution of the Sleeper Python code. If you change the Python dependencies in the Nix shell, this may break the AWS CLI.

You can also download shell.nix directly if you'd like to avoid installing Git. You can then git clone the repository from the Nix shell. Here's an example to get the latest release:

curl "https://raw.githubusercontent.com/gchq/sleeper/main/shell.nix" -o ./shell.nix
nix-shell ./shell.nix
git clone https://github.com/gchq/sleeper.git
cd sleeper
git checkout --track origin/main

Sleeper CLI builder image

If you installed the Sleeper CLI from GitHub as described in the getting started guide, you can use sleeper builder to get a shell inside a Docker container with the dependencies pre-installed. This is the same container image that's used for the Dev Containers setup above. It may be useful if you want to work inside Docker without using Dev Containers.

If you're in an EC2 deployed with sleeper environment, the Sleeper CLI was pre-installed and the repository was already checked out when you created the EC2. Otherwise, you'll need to clone the repository in the container. You can use the commands below to do this:

sleeper builder
git clone https://github.com/gchq/sleeper.git
cd sleeper

Everything in the repository will be persisted between executions of sleeper builder.

If you have AWS CLI installed in the host, the same configuration will be used in the builder container. Otherwise, any configuration you set in the container will be persisted in the host home directory. AWS authentication environment variables will be propagated to the container as well.

The host Docker environment will be propagated to the container via the Docker socket.

The files generated for the Sleeper instance will be persisted in the host home directory under ~/.sleeper, so that if you run the Docker container multiple times you will still have details of the last Sleeper instance you worked with.

If you add a command on the end, you can run a specific script like this:

sleeper builder sleeper/scripts/test/deployAll/deployTest.sh myinstanceid myvpc mysubnet

Manual dependency installation

You will need the following software:

AWS CDK
AWS CLI
Bash: Minimum v3.2. Use bash --version.
Docker
Java: Requires version 17, to match the version used by AWS EMR
Maven
NodeJS / NPM
Rust

Building

Provided script (recommended) - this builds the code and copies the jars into the scripts directory so that the scripts work. Starting from the root of the Git repository:

./scripts/build/buildForTest.sh

You can disable building the Rust code by passing -DskipRust as an argument to that script. This can also be passed in any Maven build. This can speed up the build if you don't need the DataFusion data engine, or if you've already had a previous build that included Rust, skipping Rust will reuse the same binaries.

When running Maven directly, you can pass -Pquick to skip tests and linting.

Sleeper CLI

To build the Sleeper CLI, you can run this script:

./scripts/cli/buildAll.sh

Use ./scripts/cli/runInDocker.sh to run the built CLI. This will act the same as running the sleeper command after installing the CLI. You can manually install it if you copy that script somewhere, rename it to sleeper, and put it on the system path. Then sleeper ... commands will work as though you'd installed it normally.

If you have the CLI installed already it will be replaced with the version that is built. If the runInDocker.sh script is different in the version you installed before, it will not be replaced. You can find it at $HOME/.local/bin/sleeper, and manually overwrite it with the contents of ./scripts/cli/runInDocker.sh.

Publishing artefacts

Tools are available to publish built artefacts to shared repositories, and to install them locally to avoid the need to build Sleeper yourself. We do not currently publish artefacts publicly. See publishing artefacts for how to set this up yourself.

Using the codebase

The codebase is structured around the components explained in the design document. The elements of the design largely correspond to Maven modules. We'll look at the module architecture in more detail below.

If you'd like to look at how the modules relate to one another in terms of their dependencies, there is a script documented in development scripts that can display the dependency structure as a graph. There's also a document with information on past and current dependency conflicts.

If you'd like to raise or pick up an open issue, see the contributing guide for more information.

IDE setup

Configuration is available for various development environments.

For VS Code there's a separate setup guide.

For IntelliJ, these settings are available to import:

Code style scheme at code-style/intellij-style.xml
Inspection profile at code-style/intellij-inspection-profile.xml
Copyright profile for license header at code-style/intellij-copyright-profile.xml
Checkstyle plugin settings in code-style/checkstyle-idea

For Eclipse, these settings are available to import:

Code style at code-style/eclipse-style.xml
Import order at code-style/eclipse-import-order.importorder
License header at code-style/licenseHeader.txt
Code templates at code-style/eclipse-codetemplates.xml
Editor templates at code-style/eclipse-templates.xml

Maven module architecture

Most Maven modules map to features of Sleeper, and we also have "core" modules, "common" modules, and some other modules to do with the build, deployment with the CDK, and system tests.

The "core" modules make up the main application code independent of infrastructure or external dependencies. These are the module core, and other modules with "core" in the name nested against specific features. These core modules represent the "application" part of a ports and adapters, or hexagonal architecture. They do not contain external dependencies such as the AWS SDK, Parquet or other client libraries. They do contain dependencies for logging, and some utilities for serialisation/deserialisation.

The core module contains shared code for things like configuring a Sleeper instance, interacting with the state of a Sleeper table, and some common logic to track operations for reporting. Each of these have adapters that connect these things to AWS, but the adapter code sits in other, non-core modules that are specific to those features.

The other modules with "core" in the name involve core application code that we felt was more peripheral to the system. For example, the specifics of how we model a compaction job is in the module compaction-core, but the high level tracking of compaction, and the interactions of compaction with the state of a Sleeper table, are both part of the core module. The compaction-core module sits alongside other non-core modules that contain the adapter code that links the application code for compaction to AWS.

The "common" modules are shared utilities for interacting with external dependencies in common ways. They sit under the directory java/common. This includes tools for working with DynamoDB, testing code using AWS clients against LocalStack, and some infrastructure code to do with invoking lambdas, running jobs in a task, and starting tasks that will run jobs. Here a job is some process that needs to run, and a task is a piece of infrastructure that can run jobs, e.g. an AWS ECS task.

Linting

The Maven project includes Checkstyle and Spotbugs. These are run on every pull request. You can run them locally with the Maven checkstyle:check and spotbugs:check goals. Your IDE may have plugins available to alert you of violations.

You can run both plugins together:

cd java
mvn clean compile checkstyle:check spotbugs:check

Testing

See the test strategy for how and when to write tests, as well as information on the testing tools used in the project.

Coding conventions

See the coding conventions document for practices we try to adhere to when working on Sleeper.

Standalone deployment

See the deployment guide for notes on how to deploy Sleeper, and the system test guide to deploy instances specifically set up for development.

Release process

See the release process guide for instructions on how to publish a release of Sleeper.

Development scripts

See development scripts for scripts that can assist you while working on Sleeper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Developer Guide

Get your environment setup

Install prerequisite software

Dev container

Nix shell

Sleeper CLI builder image

Manual dependency installation

Building

Sleeper CLI

Publishing artefacts

Using the codebase

IDE setup

Maven module architecture

Linting

Testing

Coding conventions

Standalone deployment

Release process

Development scripts

FilesExpand file tree

developer-guide.md

Latest commit

History

developer-guide.md

File metadata and controls

Developer Guide

Get your environment setup

Install prerequisite software

Dev container

Nix shell

Sleeper CLI builder image

Manual dependency installation

Building

Sleeper CLI

Publishing artefacts

Using the codebase

IDE setup

Maven module architecture

Linting

Testing

Coding conventions

Standalone deployment

Release process

Development scripts