The following instance properties are commonly used throughout Sleeper.
| Property Name | Description | Default Value | Run CDK Deploy When Changed |
|---|---|---|---|
| sleeper.id | A string to uniquely identify this deployment. This should be no longer than 20 chars. It should be globally unique as it will be used to name AWS resources such as S3 buckets. | false | |
| sleeper.artefacts.deployment | The ID of the artefacts deployment to use to deploy the Sleeper instance. By default we assume an artefacts deployment with the same ID as the Sleeper instance. This property is used to compute the default values of sleeper.jars.bucket and sleeper.ecr.repository.prefix. |
true | |
| sleeper.jars.bucket | The S3 bucket containing the jar files of the Sleeper components. If unset, a default name is computed from sleeper.artefacts.deployment if it is set, or sleeper.id if it is not. |
true | |
| sleeper.ecr.repository.prefix | If set, this property will be used as a prefix for the names of ECR repositories. If unset, a default prefix is computed from sleeper.artefacts.deployment if it is set, or sleeper.id if it is not.ECR repository names are generated in the format <prefix>/<image name>. |
true | |
| sleeper.userjars | A comma-separated list of the jars containing application specific iterator code. These jars are assumed to be in the bucket given by sleeper.jars.bucket. For example, if that bucket contains two iterator jars called iterator1.jar and iterator2.jar then the property should be 'sleeper.userjars=iterator1.jar,iterator2.jar'. |
false | |
| sleeper.tags | A list of tags that will automatically be applied to all the resources in this deployment of Sleeper. The list should be in the form "key1,value1,key2,value2,key3,value3,...". For example if you want to add tags of "user=some-user" and "project-name=sleeper-test", then the list should be "user,some-user,project-name,sleeper-test". Preferably, tags should be specified in a separate file called tags.properties. See https://github.com/gchq/sleeper/blob/develop/docs/deployment/instance-configuration.md for further details. |
true | |
| sleeper.stack.tag.name | A name for a tag to identify the stack that deployed a resource. This will be set for all AWS resources, to the ID of the CDK stack that they are deployed under. This can be used to organise the cost explorer for billing. | DeploymentStack | true |
| sleeper.retain.infra.after.destroy | Whether to keep the sleeper table bucket, Dynamo tables, query results bucket, etc., when the instance is destroyed. | true | true |
| sleeper.retain.logs.after.destroy | Whether to keep the sleeper log groups when the instance is destroyed. | true | true |
| sleeper.default.table.retain.after.removal | This property is used when applying an instance configuration and a table has been removed. If this is true (default), removing the table from the configuration will just take the table offline. If this is false, it will delete all data associated with the table when the table is removed. Be aware that if a table is renamed in the configuration, the CDK will see it as a delete of the old table name and a create of the new table name. If this is set to false when that happens it will remove the table's data. This property isn't currently in use but will be in gchq#5870. |
true | false |
| sleeper.default.table.reuse.existing | This property is used when applying an instance configuration and a table has been added. By default, or if this property is false, when a table is added to an instance configuration it's created in the instance. If it already exists the update will fail. If this property is true, the existing table will be reused and imported as part of the instance configuration. If it doesn't exist the update will fail. |
false | false |
| sleeper.optional.stacks | The optional stacks to deploy. Not case sensitive. Valid values: [IngestStack, IngestBatcherStack, EmrServerlessBulkImportStack, EmrBulkImportStack, PersistentEmrBulkImportStack, EksBulkImportStack, EmrStudioStack, BulkExportStack, QueryStack, WebSocketQueryStack, AthenaStack, KeepLambdaWarmStack, CompactionStack, GarbageCollectorStack, PartitionSplittingStack, DashboardStack, TableMetricsStack] |
IngestStack,IngestBatcherStack,EmrServerlessBulkImportStack,EmrStudioStack,QueryStack,CompactionStack,GarbageCollectorStack,PartitionSplittingStack,DashboardStack,TableMetricsStack | true |
| sleeper.lambda.deploy.type | The deployment type for AWS Lambda. Not case sensitive. There are two types of Lambda deployments, jar and container. If the size of the jar file is too large, it will always be deployed as a container. Valid values: [jar, container] |
jar | true |
| sleeper.endpoint.url | The AWS endpoint URL. This should only be set for a non-standard service endpoint. Usually this is used to set the URL to LocalStack for a locally deployed instance. | false | |
| sleeper.vpc | The id of the VPC to deploy to. This property may be passed as an argument during deployment. If using the Sleeper CDK app, you can set the context variable "vpc". If using your own CDK app, you can set this in SleeperInstanceProps under networking. | false | |
| sleeper.vpc.endpoint.check | Whether to check that the VPC that the instance is deployed to has an S3 endpoint. If there is no S3 endpoint then the NAT costs can be very significant. | true | false |
| sleeper.subnets | A comma separated list of subnets to deploy to. ECS tasks will be run across multiple subnets. EMR clusters will be deployed in a subnet chosen when the cluster is created. This property may be passed as an argument during deployment. If using the Sleeper CDK app, you can set the context variable "subnets". If using your own CDK app, you can set this in SleeperInstanceProps under networking. | false | |
| sleeper.filesystem | The Hadoop filesystem used to connect to S3. | s3a:// | false |
| sleeper.errors.email | An email address used by the TopicStack to publish SNS notifications of errors. | true | |
| sleeper.log.retention.days | The length of time in days that CloudWatch logs from lambda functions, ECS containers, etc., are retained. See https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-logs-loggroup.html for valid options. Use -1 to indicate infinite retention. |
30 | true |
| sleeper.fs.s3a.max-connections | Used to set the value of fs.s3a.connection.maximum on the Hadoop configuration. This controls the maximum number of http connections to S3. See https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/performance.html |
100 | false |
| sleeper.fs.s3a.upload.block.size | Used to set the value of fs.s3a.block.size on the Hadoop configuration. Uploads to S3 happen in blocks, and this sets the size of blocks. If a larger value is used, then more data is buffered before the upload begins. See https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/performance.html |
32M | false |
| sleeper.fargate.version | The version of Fargate to use. | 1.4.0 | false |
| sleeper.task.runner.memory.mb | The amount of memory in MB for the lambda that creates ECS tasks to execute compaction and ingest jobs. | 1024 | true |
| sleeper.task.runner.timeout.seconds | The timeout in seconds for the lambda that creates ECS tasks to execute compaction jobs and ingest jobs. This must be >0 and <= 900. |
900 | true |
| sleeper.properties.force.reload | If true, properties will be reloaded every time a long running job is started or a lambda is run. This will mainly be used in test scenarios to ensure properties are up to date. | false | false |
| sleeper.default.lambda.concurrency.reserved | Default value for the reserved concurrency for each lambda in the Sleeper instance that scales according to the number of Sleeper tables. The state store committer lambda is an exception to this, as it has reserved concurrency by default. This is set in the property sleeper.statestore.committer.concurrency.reserved. Other lambdas are present that do not scale by the number of Sleeper tables, and are not set from this property. By default no concurrency is reserved for the lambdas. Each lambda also has its own property that overrides the value found here. See reserved concurrency overview at: https://docs.aws.amazon.com/lambda/latest/dg/configuration-concurrency.html |
false | |
| sleeper.default.lambda.concurrency.max | Default value for the maximum concurrency for each lambda in the Sleeper instance that scales according to the number of Sleeper tables. Other lambdas are present that do not scale by the number of Sleeper tables, and are not set from this property. By default the maximum concurrency is set to 10, which is enough for 10 online tables. If there are more online tables, this number may need to be increased. Each lambda also has its own property that overrides the value found here. See maximum concurrency overview at: https://aws.amazon.com/blogs/compute/introducing-maximum-concurrency-of-aws-lambda-functions-when-using-amazon-sqs-as-an-event-source/ |
10 | false |