A schema describes the data in a table. A schema consists of several fields. A field has a name and a type, e.g. a field with name 'id' and type 'string'. A schema has three types of field: row keys, sort keys and values.
See a full example for an example of a schema.
For example, a simple key-value schema with a string key and a string value would allow records to be retrieved by querying for a key. The schema for this would be:
{
"rowKeyFields": [
{
"name": "key",
"type": "StringType"
}
],
"sortKeyFields": [],
"valueFields": [
{
"name": "value",
"type": "StringType"
}
]
}Note that if there are no sort or value fields then this must be indicated with an empty list.
If we wanted to sort the records for a particular field by a timestamp, we could add a sort field of type long:
{
"rowKeyFields": [
{
"name": "key",
"type": "StringType"
}
],
"sortKeyFields": [
{
"name": "timestamp",
"type": "LongType"
}
],
"valueFields": [
{
"name": "value",
"type": "StringType"
}
]
}This would cause records for a particular key to be stored (and retrieved) in increasing order of timestamps.
The following types are permitted as row keys and sort keys: IntType, LongType, StringType, ByteArrayType. All
of these types can be used for values. Additionally, value fields may be of type ListType or MapType. Here is an
example schema where there are several value fields:
{
"rowKeyFields": [
{
"name": "key",
"type": "StringType"
}
],
"sortKeyFields": [
{
"name": "timestamp",
"type": "LongType"
}
],
"valueFields": [
{
"name": "value1",
"type": "StringType"
},
{
"name": "value2",
"type": "ByteArrayType"
},
{
"name": "value3",
"type": {
"ListType": {
"elementType": "IntType"
}
}
},
{
"name": "value4",
"type": {
"MapType": {
"keyType": "IntType",
"valueType": "StringType"
}
}
}
]
}The field with name value3 is a list with integer elements. The field with name value4 is a map with integer keys and string values.
There may be multiple row key fields. In the following example two string fields are used as row keys:
{
"rowKeyFields": [
{
"name": "key1",
"type": "StringType"
},
{
"name": "key2",
"type": "StringType"
}
],
"sortKeyFields": [
{
"name": "timestamp",
"type": "LongType"
}
],
"valueFields": [
{
"name": "value",
"type": "StringType"
}
]
}Sleeper will store the records sorted by key1 and then key2. Thus retrieving all records where key1
and key2 have specified values will be quick. A range scan to retrieve all records where key1 has a certain
value and key2 can take any value will also be quick. But a query for all records where key2 has a specified
value but key1 can take any value will not be quick.