sleeper/docs/usage/properties/table/data_definition.md at develop · m09526/sleeper · GitHub

15 lines (13 loc) · 8.06 KB

Table Properties - Data Definition

The following table properties relate to the definition of data inside a table.

Property Name	Description	Default Value
sleeper.table.name	A unique name identifying this table.
sleeper.table.id	A unique ID identifying this table, generated by Sleeper on table creation.
sleeper.table.online	A boolean flag representing whether this table is online or offline. An offline table will not have any partition splitting or compaction jobs run automatically. Note that taking a table offline will not stop any partitions that are being split or compaction jobs that are running. Additionally, you are still able to ingest data to offline tables and perform queries against them.	true
sleeper.table.schema	The schema representing the structure of this table. This should be set in a separate schema.json file, and cannot be edited once the table has been created. See https://github.com/gchq/sleeper/blob/develop/docs/deployment/instance-configuration.md for further details.
sleeper.table.data.engine	Select which data engine to use for the table. Valid values are: [java, datafusion, datafusion_experimental] The options "datafusion" and "datafusion_experimental" currently have identical behaviour, as the DataFusion data engine no longer has any experimental components. We may remove the "datafusion_experimental" option in a future release, which will cause instances with that set to fail after an upgrade. Please use the "datafusion" option instead.	DATAFUSION
sleeper.table.iterator.class.name	Fully qualified class of a custom iterator to apply to this table. Defaults to nothing. This will be applied both during queries and during compaction, and will apply the results to the underlying table data persistently. This forces use of the Java data engine for compaction. This is not recommended, as the Java implementation is much slower and much more expensive. Consider using the aggregation and filtering properties instead.
sleeper.table.iterator.config	A configuration string to be passed to the iterator specified in `sleeper.table.iterator.class.name`. This will be read by the custom iterator object.
sleeper.table.filters	Sets how rows are filtered out and deleted from the table. This is applied every time the data is read, e.g. during compactions or queries. Defaults to retaining all rows. Currently this can only be `ageOff(field,age)`, to age off old data. The first parameter is the name of the timestamp field to check against, which must be of type long, in milliseconds since the epoch. The second parameter is the maximum age in milliseconds, e.g. 1209600000 for 2 weeks.
sleeper.table.aggregations	Sets how to combine rows that have the same values for all row and sort keys. This is applied every time the data is read, e.g. during compactions or queries. Defaults to leaving them as separate rows. This must be in the format `op(field),op(field)`. This must define an operation for every value field, passing the field name as the parameter. All value fields must be of a numeric or map type. The available operations are as follows: sum: adds the values together for equal rows max: takes the maximum value out of all equal rows min: takes the minimum value out of all equal rows map_sum, map_max, map_min: applies the given operation to every sub-field of a map