Skip to content

Latest commit

 

History

History
15 lines (13 loc) · 8.06 KB

File metadata and controls

15 lines (13 loc) · 8.06 KB

Table Properties - Data Definition

The following table properties relate to the definition of data inside a table.

Property Name Description Default Value
sleeper.table.name A unique name identifying this table.
sleeper.table.id A unique ID identifying this table, generated by Sleeper on table creation.
sleeper.table.online A boolean flag representing whether this table is online or offline.
An offline table will not have any partition splitting or compaction jobs run automatically.
Note that taking a table offline will not stop any partitions that are being split or compaction jobs that are running. Additionally, you are still able to ingest data to offline tables and perform queries against them.
true
sleeper.table.schema The schema representing the structure of this table. This should be set in a separate schema.json file, and cannot be edited once the table has been created.
See https://github.com/gchq/sleeper/blob/develop/docs/deployment/instance-configuration.md for further details.
sleeper.table.data.engine Select which data engine to use for the table. Valid values are: [java, datafusion, datafusion_experimental]
The options "datafusion" and "datafusion_experimental" currently have identical behaviour, as the DataFusion data engine no longer has any experimental components. We may remove the "datafusion_experimental" option in a future release, which will cause instances with that set to fail after an upgrade. Please use the "datafusion" option instead.
DATAFUSION
sleeper.table.iterator.class.name Fully qualified class of a custom iterator to apply to this table. Defaults to nothing. This will be applied both during queries and during compaction, and will apply the results to the underlying table data persistently. This forces use of the Java data engine for compaction. This is not recommended, as the Java implementation is much slower and much more expensive. Consider using the aggregation and filtering properties instead.
sleeper.table.iterator.config A configuration string to be passed to the iterator specified in sleeper.table.iterator.class.name. This will be read by the custom iterator object.
sleeper.table.filters Sets how rows are filtered out and deleted from the table. This is applied every time the data is read, e.g. during compactions or queries. Defaults to retaining all rows.
Currently this can only be ageOff(field,age), to age off old data. The first parameter is the name of the timestamp field to check against, which must be of type long, in milliseconds since the epoch. The second parameter is the maximum age in milliseconds, e.g. 1209600000 for 2 weeks.
sleeper.table.aggregations Sets how to combine rows that have the same values for all row and sort keys. This is applied every time the data is read, e.g. during compactions or queries. Defaults to leaving them as separate rows.
This must be in the format op(field),op(field). This must define an operation for every value field, passing the field name as the parameter. All value fields must be of a numeric or map type. The available operations are as follows:
sum: adds the values together for equal rows
max: takes the maximum value out of all equal rows
min: takes the minimum value out of all equal rows
map_sum, map_max, map_min: applies the given operation to every sub-field of a map