sleeper/docs/usage/properties/instance/user/common.md at develop · m09526/sleeper

Instance Properties - Common - User Defined

The following instance properties are commonly used throughout Sleeper.

Property Name	Description	Default Value	Run CDK Deploy When Changed
sleeper.id	A string to uniquely identify this deployment. This should be no longer than 20 chars. It should be globally unique as it will be used to name AWS resources such as S3 buckets.		false
sleeper.artefacts.deployment	The ID of the artefacts deployment to use to deploy the Sleeper instance. By default we assume an artefacts deployment with the same ID as the Sleeper instance. This property is used to compute the default values of `sleeper.jars.bucket` and `sleeper.ecr.repository.prefix`.		true
sleeper.jars.bucket	The S3 bucket containing the jar files of the Sleeper components. If unset, a default name is computed from `sleeper.artefacts.deployment` if it is set, or `sleeper.id` if it is not.		true
sleeper.ecr.repository.prefix	If set, this property will be used as a prefix for the names of ECR repositories. If unset, a default prefix is computed from `sleeper.artefacts.deployment` if it is set, or `sleeper.id` if it is not. ECR repository names are generated in the format `<prefix>/<image name>`.		true
sleeper.userjars	A comma-separated list of the jars containing application specific iterator code. These jars are assumed to be in the bucket given by `sleeper.jars.bucket`. For example, if that bucket contains two iterator jars called iterator1.jar and iterator2.jar then the property should be 'sleeper.userjars=iterator1.jar,iterator2.jar'.		false
sleeper.tags	A list of tags that will automatically be applied to all the resources in this deployment of Sleeper. The list should be in the form "key1,value1,key2,value2,key3,value3,...". For example if you want to add tags of "user=some-user" and "project-name=sleeper-test", then the list should be "user,some-user,project-name,sleeper-test". Preferably, tags should be specified in a separate file called tags.properties. See https://github.com/gchq/sleeper/blob/develop/docs/deployment/instance-configuration.md for further details.		true
sleeper.stack.tag.name	A name for a tag to identify the stack that deployed a resource. This will be set for all AWS resources, to the ID of the CDK stack that they are deployed under. This can be used to organise the cost explorer for billing.	DeploymentStack	true
sleeper.retain.infra.after.destroy	Whether to keep the sleeper table bucket, Dynamo tables, query results bucket, etc., when the instance is destroyed.	true	true
sleeper.retain.logs.after.destroy	Whether to keep the sleeper log groups when the instance is destroyed.	true	true
sleeper.default.table.retain.after.removal	This property is used when applying an instance configuration and a table has been removed. If this is true (default), removing the table from the configuration will just take the table offline. If this is false, it will delete all data associated with the table when the table is removed. Be aware that if a table is renamed in the configuration, the CDK will see it as a delete of the old table name and a create of the new table name. If this is set to false when that happens it will remove the table's data. This property isn't currently in use but will be in gchq#5870.	true	false
sleeper.default.table.reuse.existing	This property is used when applying an instance configuration and a table has been added. By default, or if this property is false, when a table is added to an instance configuration it's created in the instance. If it already exists the update will fail. If this property is true, the existing table will be reused and imported as part of the instance configuration. If it doesn't exist the update will fail.	false	false
sleeper.optional.stacks	The optional stacks to deploy. Not case sensitive. Valid values: [IngestStack, IngestBatcherStack, EmrServerlessBulkImportStack, EmrBulkImportStack, PersistentEmrBulkImportStack, EksBulkImportStack, EmrStudioStack, BulkExportStack, QueryStack, WebSocketQueryStack, AthenaStack, KeepLambdaWarmStack, CompactionStack, GarbageCollectorStack, PartitionSplittingStack, DashboardStack, TableMetricsStack]	IngestStack,IngestBatcherStack,EmrServerlessBulkImportStack,EmrStudioStack,QueryStack,CompactionStack,GarbageCollectorStack,PartitionSplittingStack,DashboardStack,TableMetricsStack	true
sleeper.lambda.deploy.type	The deployment type for AWS Lambda. Not case sensitive. There are two types of Lambda deployments, jar and container. If the size of the jar file is too large, it will always be deployed as a container. Valid values: [jar, container]	jar	true
sleeper.endpoint.url	The AWS endpoint URL. This should only be set for a non-standard service endpoint. Usually this is used to set the URL to LocalStack for a locally deployed instance.		false
sleeper.vpc	The id of the VPC to deploy to. This property may be passed as an argument during deployment. If using the Sleeper CDK app, you can set the context variable "vpc". If using your own CDK app, you can set this in SleeperInstanceProps under networking.		false
sleeper.vpc.endpoint.check	Whether to check that the VPC that the instance is deployed to has an S3 endpoint. If there is no S3 endpoint then the NAT costs can be very significant.	true	false
sleeper.subnets	A comma separated list of subnets to deploy to. ECS tasks will be run across multiple subnets. EMR clusters will be deployed in a subnet chosen when the cluster is created. This property may be passed as an argument during deployment. If using the Sleeper CDK app, you can set the context variable "subnets". If using your own CDK app, you can set this in SleeperInstanceProps under networking.		false
sleeper.filesystem	The Hadoop filesystem used to connect to S3.	s3a://	false
sleeper.errors.email	An email address used by the TopicStack to publish SNS notifications of errors.		true
sleeper.log.retention.days	The length of time in days that CloudWatch logs from lambda functions, ECS containers, etc., are retained. See https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-logs-loggroup.html for valid options. Use -1 to indicate infinite retention.	30	true
sleeper.fs.s3a.max-connections	Used to set the value of fs.s3a.connection.maximum on the Hadoop configuration. This controls the maximum number of http connections to S3. See https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/performance.html	100	false
sleeper.fs.s3a.upload.block.size	Used to set the value of fs.s3a.block.size on the Hadoop configuration. Uploads to S3 happen in blocks, and this sets the size of blocks. If a larger value is used, then more data is buffered before the upload begins. See https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/performance.html	32M	false
sleeper.fargate.version	The version of Fargate to use.	1.4.0	false
sleeper.task.runner.memory.mb	The amount of memory in MB for the lambda that creates ECS tasks to execute compaction and ingest jobs.	1024	true
sleeper.task.runner.timeout.seconds	The timeout in seconds for the lambda that creates ECS tasks to execute compaction jobs and ingest jobs. This must be >0 and <= 900.	900	true
sleeper.properties.force.reload	If true, properties will be reloaded every time a long running job is started or a lambda is run. This will mainly be used in test scenarios to ensure properties are up to date.	false	false
sleeper.default.lambda.concurrency.reserved	Default value for the reserved concurrency for each lambda in the Sleeper instance that scales according to the number of Sleeper tables. The state store committer lambda is an exception to this, as it has reserved concurrency by default. This is set in the property sleeper.statestore.committer.concurrency.reserved. Other lambdas are present that do not scale by the number of Sleeper tables, and are not set from this property. By default no concurrency is reserved for the lambdas. Each lambda also has its own property that overrides the value found here. See reserved concurrency overview at: https://docs.aws.amazon.com/lambda/latest/dg/configuration-concurrency.html		false
sleeper.default.lambda.concurrency.max	Default value for the maximum concurrency for each lambda in the Sleeper instance that scales according to the number of Sleeper tables. Other lambdas are present that do not scale by the number of Sleeper tables, and are not set from this property. By default the maximum concurrency is set to 10, which is enough for 10 online tables. If there are more online tables, this number may need to be increased. Each lambda also has its own property that overrides the value found here. See maximum concurrency overview at: https://aws.amazon.com/blogs/compute/introducing-maximum-concurrency-of-aws-lambda-functions-when-using-amazon-sqs-as-an-event-source/	10	false

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instance Properties - Common - User Defined

FilesExpand file tree

common.md

Latest commit

History

common.md

File metadata and controls

Instance Properties - Common - User Defined