Skip to content

Latest commit

 

History

History
34 lines (32 loc) · 27.9 KB

File metadata and controls

34 lines (32 loc) · 27.9 KB

Instance Properties - Common - User Defined

The following instance properties are commonly used throughout Sleeper.

Property Name Description Default Value Run CDK Deploy When Changed
sleeper.id A string to uniquely identify this deployment. This should be no longer than 20 chars. It should be globally unique as it will be used to name AWS resources such as S3 buckets. false
sleeper.artefacts.deployment The ID of the artefacts deployment to use to deploy the Sleeper instance. By default we assume an artefacts deployment with the same ID as the Sleeper instance. This property is used to compute the default values of sleeper.jars.bucket and sleeper.ecr.repository.prefix. true
sleeper.jars.bucket The S3 bucket containing the jar files of the Sleeper components. If unset, a default name is computed from sleeper.artefacts.deployment if it is set, or sleeper.id if it is not. true
sleeper.ecr.repository.prefix If set, this property will be used as a prefix for the names of ECR repositories. If unset, a default prefix is computed from sleeper.artefacts.deployment if it is set, or sleeper.id if it is not.
ECR repository names are generated in the format <prefix>/<image name>.
true
sleeper.userjars A comma-separated list of the jars containing application specific iterator code. These jars are assumed to be in the bucket given by sleeper.jars.bucket. For example, if that bucket contains two iterator jars called iterator1.jar and iterator2.jar then the property should be 'sleeper.userjars=iterator1.jar,iterator2.jar'. false
sleeper.tags A list of tags that will automatically be applied to all the resources in this deployment of Sleeper. The list should be in the form "key1,value1,key2,value2,key3,value3,...".
For example if you want to add tags of "user=some-user" and "project-name=sleeper-test", then the list should be "user,some-user,project-name,sleeper-test".
Preferably, tags should be specified in a separate file called tags.properties.
See https://github.com/gchq/sleeper/blob/develop/docs/deployment/instance-configuration.md for further details.
true
sleeper.stack.tag.name A name for a tag to identify the stack that deployed a resource. This will be set for all AWS resources, to the ID of the CDK stack that they are deployed under. This can be used to organise the cost explorer for billing. DeploymentStack true
sleeper.retain.infra.after.destroy Whether to keep the sleeper table bucket, Dynamo tables, query results bucket, etc., when the instance is destroyed. true true
sleeper.retain.logs.after.destroy Whether to keep the sleeper log groups when the instance is destroyed. true true
sleeper.default.table.retain.after.removal This property is used when applying an instance configuration and a table has been removed.
If this is true (default), removing the table from the configuration will just take the table offline.
If this is false, it will delete all data associated with the table when the table is removed.
Be aware that if a table is renamed in the configuration, the CDK will see it as a delete of the old table name and a create of the new table name. If this is set to false when that happens it will remove the table's data.
This property isn't currently in use but will be in gchq#5870.
true false
sleeper.default.table.reuse.existing This property is used when applying an instance configuration and a table has been added.
By default, or if this property is false, when a table is added to an instance configuration it's created in the instance. If it already exists the update will fail.
If this property is true, the existing table will be reused and imported as part of the instance configuration. If it doesn't exist the update will fail.
false false
sleeper.optional.stacks The optional stacks to deploy. Not case sensitive.
Valid values: [IngestStack, IngestBatcherStack, EmrServerlessBulkImportStack, EmrBulkImportStack, PersistentEmrBulkImportStack, EksBulkImportStack, EmrStudioStack, BulkExportStack, QueryStack, WebSocketQueryStack, AthenaStack, KeepLambdaWarmStack, CompactionStack, GarbageCollectorStack, PartitionSplittingStack, DashboardStack, TableMetricsStack]
IngestStack,IngestBatcherStack,EmrServerlessBulkImportStack,EmrStudioStack,QueryStack,CompactionStack,GarbageCollectorStack,PartitionSplittingStack,DashboardStack,TableMetricsStack true
sleeper.lambda.deploy.type The deployment type for AWS Lambda. Not case sensitive.
There are two types of Lambda deployments, jar and container.
If the size of the jar file is too large, it will always be deployed as a container.
Valid values: [jar, container]
jar true
sleeper.endpoint.url The AWS endpoint URL. This should only be set for a non-standard service endpoint. Usually this is used to set the URL to LocalStack for a locally deployed instance. false
sleeper.vpc The id of the VPC to deploy to. This property may be passed as an argument during deployment. If using the Sleeper CDK app, you can set the context variable "vpc". If using your own CDK app, you can set this in SleeperInstanceProps under networking. false
sleeper.vpc.endpoint.check Whether to check that the VPC that the instance is deployed to has an S3 endpoint. If there is no S3 endpoint then the NAT costs can be very significant. true false
sleeper.subnets A comma separated list of subnets to deploy to. ECS tasks will be run across multiple subnets. EMR clusters will be deployed in a subnet chosen when the cluster is created. This property may be passed as an argument during deployment. If using the Sleeper CDK app, you can set the context variable "subnets". If using your own CDK app, you can set this in SleeperInstanceProps under networking. false
sleeper.filesystem The Hadoop filesystem used to connect to S3. s3a:// false
sleeper.errors.email An email address used by the TopicStack to publish SNS notifications of errors. true
sleeper.log.retention.days The length of time in days that CloudWatch logs from lambda functions, ECS containers, etc., are retained.
See https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-logs-loggroup.html for valid options.
Use -1 to indicate infinite retention.
30 true
sleeper.fs.s3a.max-connections Used to set the value of fs.s3a.connection.maximum on the Hadoop configuration. This controls the maximum number of http connections to S3.
See https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/performance.html
100 false
sleeper.fs.s3a.upload.block.size Used to set the value of fs.s3a.block.size on the Hadoop configuration. Uploads to S3 happen in blocks, and this sets the size of blocks. If a larger value is used, then more data is buffered before the upload begins.
See https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/performance.html
32M false
sleeper.fargate.version The version of Fargate to use. 1.4.0 false
sleeper.task.runner.memory.mb The amount of memory in MB for the lambda that creates ECS tasks to execute compaction and ingest jobs. 1024 true
sleeper.task.runner.timeout.seconds The timeout in seconds for the lambda that creates ECS tasks to execute compaction jobs and ingest jobs.
This must be >0 and <= 900.
900 true
sleeper.properties.force.reload If true, properties will be reloaded every time a long running job is started or a lambda is run. This will mainly be used in test scenarios to ensure properties are up to date. false false
sleeper.default.lambda.concurrency.reserved Default value for the reserved concurrency for each lambda in the Sleeper instance that scales according to the number of Sleeper tables.
The state store committer lambda is an exception to this, as it has reserved concurrency by default. This is set in the property sleeper.statestore.committer.concurrency.reserved. Other lambdas are present that do not scale by the number of Sleeper tables, and are not set from this property.
By default no concurrency is reserved for the lambdas. Each lambda also has its own property that overrides the value found here.
See reserved concurrency overview at: https://docs.aws.amazon.com/lambda/latest/dg/configuration-concurrency.html
false
sleeper.default.lambda.concurrency.max Default value for the maximum concurrency for each lambda in the Sleeper instance that scales according to the number of Sleeper tables.
Other lambdas are present that do not scale by the number of Sleeper tables, and are not set from this property.
By default the maximum concurrency is set to 10, which is enough for 10 online tables. If there are more online tables, this number may need to be increased. Each lambda also has its own property that overrides the value found here.
See maximum concurrency overview at: https://aws.amazon.com/blogs/compute/introducing-maximum-concurrency-of-aws-lambda-functions-when-using-amazon-sqs-as-an-event-source/
10 false