Draft: Secondary index specification#16961
Conversation
Co-authored-by: Huaxin Gao <huaxin.gao11@gmail.com>
|
|
||
| Indexes are optional. Engines may choose to create, maintain, consume, or ignore them. | ||
|
|
||
| ## Goals |
There was a problem hiding this comment.
Not sure we need the goals section here?
There was a problem hiding this comment.
I think the spec normally has a Goals section (see udf-spec and view-spec). The overlap comes from the This specification defines: list in Background. I think Background should be the motivation and what an index is, and Goals should hold that list. So I suggest keeping Goals and removing the list from Background.
| The index type communicates the capabilities of an index to query engines and helps determine whether an index is | ||
| applicable to a particular query. | ||
|
|
||
| ### Index Transform Function |
There was a problem hiding this comment.
I think these sections (Transform, Instance, Snapshot) should follow the overview. Currently they have a lot of undefined terms in them
There was a problem hiding this comment.
Agree to move these definitions to after the Overview section.
|
|
||
| ## Overview | ||
|
|
||
| Indexes are stored as independent catalog objects. |
There was a problem hiding this comment.
Suggestion that we just have a short description of the whole thing write here
| Indexes are stored as independent catalog objects. | |
| Indexes are stored as a collection of files with some Iceberg table like semantics. At a high level they consist of a tracking file (similar to a root manifest file) which contains listings for a defined set of leaf files (similar to data files.) Leaf files store an ordered set of rows containing at least a key and the path of a Iceberg Table data file and the position within that file where the row where that key is stored. The organization of leaf files is defined by an Indexing Transform which varies based on the type of index. This structure is recorded in an Index metadata.json file which contains a set of snapshots, each of which points to a single tracking file mapping to the complete state of an Iceberg table at a given Iceberg table snapshot. |
| | required | uuid | string | Stable UUID assigned at creation | | ||
| | required | table-uuid | string | UUID of the indexed table | | ||
| | required | location | string | Index root location | | ||
| | required | type | string | Logical index type | |
There was a problem hiding this comment.
Are we going to have this only be chosen from a set of index types we define? Feels like we should if these are going to be interoperable. This also makes me think a bit about the "reserved" terms above. I think basically everything should be reserved unless we define it here imho.
There was a problem hiding this comment.
Agree it should be a closed set for interoperability. One step back though: I think we only scoped the key-lookup index, we didn't actually agree on a SCALAR/VECTOR/TERM type yet.
| | required | table-uuid | string | UUID of the indexed table | | ||
| | required | location | string | Index root location | | ||
| | required | type | string | Logical index type | | ||
| | required | transform-function | string | Physical organization transform | |
There was a problem hiding this comment.
This probably needs to be well defined? An expression or something we explicitly make here?
| | required | transform-function | string | Physical organization transform | | ||
| | required | key-column-ids | list<int> | Indexed columns | | ||
| | optional | included-column-ids | list<int> | Included columns | | ||
| | required | file-format | string | Leaf file format | |
There was a problem hiding this comment.
Why do we need to define the leaf file format? Shouldn't this be done per row in the tracking file?
There was a problem hiding this comment.
Agree this should be defined in the tracking file.
| | optional | included-column-ids | list<int> | Included columns | | ||
| | required | file-format | string | Leaf file format | | ||
| | optional | properties | map<string,string> | Index properties applicable for every snapshot | | ||
| | required | snapshots | list<index-snapshot> | Known index snapshots | |
There was a problem hiding this comment.
Agree known should be removed.
|
|
||
| | Type | | ||
| |--------| | ||
| | SCALAR | |
There was a problem hiding this comment.
SCALAR is listed but never defined. I suggest adding a description column.
| The transform function determines the physical organization of the indexed data and therefore influences which query | ||
| patterns can efficiently leverage the index. | ||
|
|
||
| The following index types are reserved for future specifications: |
There was a problem hiding this comment.
| The following index types are reserved for future specifications: | |
| The following transform functions are defined in this specification:: |
| |-----------| | ||
| | IDENTITY | | ||
| | HASH | | ||
| | HILBERT | |
There was a problem hiding this comment.
I think we agreed the organization transform is an Iceberg-style transform with a sort order, so I think we should use the Iceberg transform names: use bucket instead of hash.
I think for now the key-lookup index only needs identity and bucket, so we should move hilbert to the reserved table below.
There was a problem hiding this comment.
Also add a sentence somewhere to say that tuple transforms like (bucket(key, 256), key) (bucket first, then sort) are also supported.
| - The transform function | ||
| - The indexed columns | ||
| - The included columns | ||
| - Index properties |
There was a problem hiding this comment.
Shall we mark The included columns and Index properties optional?
| ```text | ||
| Index Metadata | ||
| | | ||
| +-- Index Snapshot |
There was a problem hiding this comment.
+-- Index Snapshot (one or more)?
| | optional | included-column-ids | list<int> | Included columns | | ||
| | required | file-format | string | Leaf file format | | ||
| | optional | properties | map<string,string> | Index properties applicable for every snapshot | | ||
| | required | snapshots | list<index-snapshot> | Known index snapshots | |
There was a problem hiding this comment.
The metadata has a snapshots list but nothing says which one is current. Should we add a current-snapshot-id, or define current as the snapshot whose source-table-snapshot-id matches the table's current snapshot?
| | 103 | record_count | long | required | Number of records contained in the referenced file or aggregated under the referenced tracking file. | | ||
| | 104 | file_size_in_bytes | long | required | Total file size in bytes. | | ||
| | 146 | content_stats | struct | optional | Statistics used for planning and pruning, including transform-key statistics and optional column statistics. | | ||
| | 131 | key_metadata | binary | optional | Implementation-specific key metadata, used for leaf file encryption. | |
There was a problem hiding this comment.
key_metadata -> key-metadata?
|
|
||
| The tracking file may be stored using any supported metadata file format. | ||
|
|
||
| ### Tracking File Entry |
| | 101 | file_format | string | required | File format name, such as parquet, avro, or orc. | | ||
| | 103 | record_count | long | required | Number of records contained in the referenced file or aggregated under the referenced tracking file. | | ||
| | 104 | file_size_in_bytes | long | required | Total file size in bytes. | | ||
| | 146 | content_stats | struct | optional | Statistics used for planning and pruning, including transform-key statistics and optional column statistics. | |
There was a problem hiding this comment.
Does content_stats contain the transform bounds (transform_min / transform_max)? If so, I think we should make them explicit, required fields. They're needed for routing and non-overlapping ranges, but content_stats is marked optional here, so the bounds could be missing.
| |----------|--------------------|---------|--------------|--------------------------------------------------------------------------------------------------------------| | ||
| | 100 | location | string | required | Location of the referenced file. | | ||
| | 101 | file_format | string | required | File format name, such as parquet, avro, or orc. | | ||
| | 103 | record_count | long | required | Number of records contained in the referenced file or aggregated under the referenced tracking file. | |
There was a problem hiding this comment.
remove or aggregated under the referenced tracking file?
|
|
||
| The schema of a leaf file is determined by the index definition and contains: | ||
| - All key columns defined by the index | ||
| - All included columns defined by the index |
There was a problem hiding this comment.
Since this is optional, maybe word it as "Any included columns defined by the index" to make clear it can be empty?
| The schema of a leaf file is determined by the index definition and contains: | ||
| - All key columns defined by the index | ||
| - All included columns defined by the index | ||
| - The transform value produced by the transform function |
There was a problem hiding this comment.
for an identity transform on the key, the transform value equals the key column, do we still want to save the transform value?
|
|
||
| The following index types are reserved for future specifications: | ||
|
|
||
| | Transform | |
There was a problem hiding this comment.
The Leaf Files Transform functions section also has this table and the reserved table below it. Should we remove the tables here, or remove them from the Leaf Files section, so the list lives in only one place?
| |-----------|-----------------|--------|------------------------------------------------------------------------| | ||
| | TBD | transform_value | long | The result of applying the index transform function to the key columns | | ||
| | TBD | file_path | string | The path of the source data file the entry references | | ||
| | TBD | position | long | The row position of the entry within the source data file | |
There was a problem hiding this comment.
file_path and position are basically Iceberg's reserved _file (2147483646) and _pos (2147483645). Should we reuse those reserved IDs and give transform_value another reserved ID?
|
|
||
| | Field Id | Column | Type | Description | | ||
| |-----------|-----------------|--------|------------------------------------------------------------------------| | ||
| | TBD | transform_value | long | The result of applying the index transform function to the key columns | |
There was a problem hiding this comment.
the type is not always long, maybe change to determined by the transform function?
| Transform Function: | ||
|
|
||
| ```text | ||
| HASH(primary_key) |
There was a problem hiding this comment.
change to bucket(primary_key, N)?
| | file_path | | ||
| | position | | ||
|
|
||
| The leaf files are organized by hash key, while the tracking file stores summary information and pruning statistics. |
There was a problem hiding this comment.
The leaf files are organized by hash key -> The leaf files are organized by transform value?
|
Since @pvary is out, I'll make the simple/mechanical changes now to keep this PR moving forward and leave the design decisions for him to review when he's back. |
Co-authored-by: pvary <peter.vary.apache@gmail.com>
No description provided.