You can run a LocalStack container locally and deploy an instance of Sleeper to it. This deployment method has limited functionality and will only work with small volumes of data, but will allow you to perform a queue-based standard ingest, and run reports and scripts against the instance.
These instructions will assume you start in the project root directory and Sleeper has been built (see the developer guide for how to set that up).
Currently these scripts are only supported when run from the Docker host. This also means it's necessary to build
Sleeper in the host machine, rather than using a dev container or a sleeper builder container. It may be possible to
adjust these to run in a Docker container for those use cases. The easiest option may be to use the Nix package manager,
which avoids the need to install all the build dependencies in the host machine. This is documented
in the developer guide.
To launch the LocalStack container, you can run the following command:
./scripts/deploy/localstack/startContainer.shThis will also output commands you can use to point Sleeper scripts to your LocalStack container.
For Sleeper commands to interact with LocalStack, ensure that the AWS_ENDPOINT_URL environment variable
is set. Commands to do this are provided by the startContainer.sh script.
To deploy an instance of Sleeper to your LocalStack container, you can run the following command:
./scripts/deploy/localstack/deploy.sh <instance-id>This will create a S3 buckets, DynamoDB tables and SQS queues in LocalStack, and upload a Sleeper instance
configuration. A single Sleeper table will be created with the name system-test.
Once the instance is deployed, you can launch the admin client to view the instance and table properties of the instance, as well as running partition and file status reports.
./scripts/utility/adminClient.sh <instance-id>You can generate some random data for your instance by running the following command:
./scripts/deploy/localstack/generateRandomData.sh <instance-id> <table-name> <optional-number-of-rows>This will place randomly generated parquet files in the scripts/deploy/localstack/output directory. The number of files
generated will depend on the number of rows that you pass into the script. By default only 1 file is generated.
You can then use these files to ingest some data by running the following command:
./scripts/deploy/localstack/ingestFiles.sh <instance-id> <table-name> <file1.parquet> <file2.parquet> <file3.parquet> ....This script will upload the provided files to an ingest source bucket in LocalStack, create ingest jobs, and
send them to the ingest job queue. It will then build the ingest-runner docker image, and launch a container for it,
which will take the ingest job off the queue and perform the ingest.
You can then view the ingest jobs and task that were run by launching the admin client and running an ingest job or ingest task status report.
You can skip the step of having to run ingestFiles.sh after generating some parquet files by running the
following command:
./scripts/deploy/localstack/ingestRandomData.sh <instance-id> <table-name> <optional-number-of-rows>Note: If you do not provide a number of rows in the data generation scripts, then a default of 100000 is used.
To create compaction jobs for files that you have ingested, you can run the following command:
./scripts/deploy/localstack/createCompactionJobs.sh default <instance-id> <table-name>This script will run the CreateJobs class (which would normally run periodically in a lambda), and put the created
jobs on the compaction job SQS queue.
Note that by default the SizeRatioCompactionStrategy will be used to determine whether a compaction job will be
created for a collection of files in the same partition. You can either change this strategy to the
BasicCompactionStrategy, which just uses the COMPACTION_FILES_BATCH_SIZE table property to batch files into jobs,
or you can skip this strategy and force creation of compaction jobs, by using the all mode when calling the script:
./scripts/deploy/localstack/createCompactionJobs.sh all <instance-id> <table-name>To run these compaction jobs, you need to launch a compaction task. These would normally be run in ECS tasks, launched by a lambda periodically based on how many compaction jobs are waiting. The following script will build the docker image that the ECS tasks use, and run a docker container using the built image.
./scripts/deploy/localstack/runCompactionTask.sh <instance-id>You can view the statistic for jobs and tasks by using the compactionJobStatusReport.sh and
compactionTaskStatusReport.sh scripts respectively.
# To view all jobs
./scripts/utility/compactionJobStatusReport.sh <instance-id> <table-name> standard -a
# To view all tasks
./scripts/utility/compactionTaskStatusReport.sh <instance-id> standard -aTo query the data in your Sleeper instance, you can run the following utility script. Note that lambda queries and web socket queries do not work against a Sleeper instance deployed against LocalStack.
./scripts/utility/query.sh <instance-id>You can tear down an existing instance by running the following command:
./scripts/deploy/localstack/tearDown.sh <instance-id>To stop the LocalStack container, you can run the following command:
./scripts/deploy/localstack/stopContainer.sh