Skip to content

Draft: Secondary index specification#16961

Draft
pvary wants to merge 3 commits into
apache:mainfrom
pvary:index_spec
Draft

Draft: Secondary index specification#16961
pvary wants to merge 3 commits into
apache:mainfrom
pvary:index_spec

Conversation

@pvary

@pvary pvary commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

No description provided.

@github-actions github-actions Bot added the Specification Issues that may introduce spec changes. label Jun 25, 2026
Co-authored-by: Huaxin Gao <huaxin.gao11@gmail.com>
Comment thread format/index.md

Indexes are optional. Engines may choose to create, maintain, consume, or ignore them.

## Goals

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we need the goals section here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the spec normally has a Goals section (see udf-spec and view-spec). The overlap comes from the This specification defines: list in Background. I think Background should be the motivation and what an index is, and Goals should hold that list. So I suggest keeping Goals and removing the list from Background.

Comment thread format/index.md
The index type communicates the capabilities of an index to query engines and helps determine whether an index is
applicable to a particular query.

### Index Transform Function

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these sections (Transform, Instance, Snapshot) should follow the overview. Currently they have a lot of undefined terms in them

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree to move these definitions to after the Overview section.

Comment thread format/index.md Outdated

## Overview

Indexes are stored as independent catalog objects.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion that we just have a short description of the whole thing write here

Suggested change
Indexes are stored as independent catalog objects.
Indexes are stored as a collection of files with some Iceberg table like semantics. At a high level they consist of a tracking file (similar to a root manifest file) which contains listings for a defined set of leaf files (similar to data files.) Leaf files store an ordered set of rows containing at least a key and the path of a Iceberg Table data file and the position within that file where the row where that key is stored. The organization of leaf files is defined by an Indexing Transform which varies based on the type of index. This structure is recorded in an Index metadata.json file which contains a set of snapshots, each of which points to a single tracking file mapping to the complete state of an Iceberg table at a given Iceberg table snapshot.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree this is cleaner.

Comment thread format/index.md
| required | uuid | string | Stable UUID assigned at creation |
| required | table-uuid | string | UUID of the indexed table |
| required | location | string | Index root location |
| required | type | string | Logical index type |

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to have this only be chosen from a set of index types we define? Feels like we should if these are going to be interoperable. This also makes me think a bit about the "reserved" terms above. I think basically everything should be reserved unless we define it here imho.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree it should be a closed set for interoperability. One step back though: I think we only scoped the key-lookup index, we didn't actually agree on a SCALAR/VECTOR/TERM type yet.

Comment thread format/index.md
| required | table-uuid | string | UUID of the indexed table |
| required | location | string | Index root location |
| required | type | string | Logical index type |
| required | transform-function | string | Physical organization transform |

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably needs to be well defined? An expression or something we explicitly make here?

Comment thread format/index.md Outdated
| required | transform-function | string | Physical organization transform |
| required | key-column-ids | list<int> | Indexed columns |
| optional | included-column-ids | list<int> | Included columns |
| required | file-format | string | Leaf file format |

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to define the leaf file format? Shouldn't this be done per row in the tracking file?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree this should be defined in the tracking file.

Comment thread format/index.md Outdated
| optional | included-column-ids | list<int> | Included columns |
| required | file-format | string | Leaf file format |
| optional | properties | map<string,string> | Index properties applicable for every snapshot |
| required | snapshots | list<index-snapshot> | Known index snapshots |

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "known" ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree known should be removed.

Comment thread format/index.md

| Type |
|--------|
| SCALAR |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SCALAR is listed but never defined. I suggest adding a description column.

Comment thread format/index.md Outdated
The transform function determines the physical organization of the indexed data and therefore influences which query
patterns can efficiently leverage the index.

The following index types are reserved for future specifications:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The following index types are reserved for future specifications:
The following transform functions are defined in this specification::

Comment thread format/index.md
|-----------|
| IDENTITY |
| HASH |
| HILBERT |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we agreed the organization transform is an Iceberg-style transform with a sort order, so I think we should use the Iceberg transform names: use bucket instead of hash.

I think for now the key-lookup index only needs identity and bucket, so we should move hilbert to the reserved table below.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add a sentence somewhere to say that tuple transforms like (bucket(key, 256), key) (bucket first, then sort) are also supported.

Comment thread format/index.md
- The transform function
- The indexed columns
- The included columns
- Index properties

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we mark The included columns and Index properties optional?

Comment thread format/index.md Outdated
```text
Index Metadata
|
+-- Index Snapshot

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+-- Index Snapshot (one or more)?

Comment thread format/index.md Outdated
| optional | included-column-ids | list<int> | Included columns |
| required | file-format | string | Leaf file format |
| optional | properties | map<string,string> | Index properties applicable for every snapshot |
| required | snapshots | list<index-snapshot> | Known index snapshots |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metadata has a snapshots list but nothing says which one is current. Should we add a current-snapshot-id, or define current as the snapshot whose source-table-snapshot-id matches the table's current snapshot?

Comment thread format/index.md
| 103 | record_count | long | required | Number of records contained in the referenced file or aggregated under the referenced tracking file. |
| 104 | file_size_in_bytes | long | required | Total file size in bytes. |
| 146 | content_stats | struct | optional | Statistics used for planning and pruning, including transform-key statistics and optional column statistics. |
| 131 | key_metadata | binary | optional | Implementation-specific key metadata, used for leaf file encryption. |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key_metadata -> key-metadata?

Comment thread format/index.md

The tracking file may be stored using any supported metadata file format.

### Tracking File Entry

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this

Comment thread format/index.md
| 101 | file_format | string | required | File format name, such as parquet, avro, or orc. |
| 103 | record_count | long | required | Number of records contained in the referenced file or aggregated under the referenced tracking file. |
| 104 | file_size_in_bytes | long | required | Total file size in bytes. |
| 146 | content_stats | struct | optional | Statistics used for planning and pruning, including transform-key statistics and optional column statistics. |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does content_stats contain the transform bounds (transform_min / transform_max)? If so, I think we should make them explicit, required fields. They're needed for routing and non-overlapping ranges, but content_stats is marked optional here, so the bounds could be missing.

Comment thread format/index.md Outdated
|----------|--------------------|---------|--------------|--------------------------------------------------------------------------------------------------------------|
| 100 | location | string | required | Location of the referenced file. |
| 101 | file_format | string | required | File format name, such as parquet, avro, or orc. |
| 103 | record_count | long | required | Number of records contained in the referenced file or aggregated under the referenced tracking file. |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove or aggregated under the referenced tracking file?

Comment thread format/index.md

The schema of a leaf file is determined by the index definition and contains:
- All key columns defined by the index
- All included columns defined by the index

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is optional, maybe word it as "Any included columns defined by the index" to make clear it can be empty?

Comment thread format/index.md
The schema of a leaf file is determined by the index definition and contains:
- All key columns defined by the index
- All included columns defined by the index
- The transform value produced by the transform function

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for an identity transform on the key, the transform value equals the key column, do we still want to save the transform value?

Comment thread format/index.md

The following index types are reserved for future specifications:

| Transform |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Leaf Files Transform functions section also has this table and the reserved table below it. Should we remove the tables here, or remove them from the Leaf Files section, so the list lives in only one place?

Comment thread format/index.md
|-----------|-----------------|--------|------------------------------------------------------------------------|
| TBD | transform_value | long | The result of applying the index transform function to the key columns |
| TBD | file_path | string | The path of the source data file the entry references |
| TBD | position | long | The row position of the entry within the source data file |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file_path and position are basically Iceberg's reserved _file (2147483646) and _pos (2147483645). Should we reuse those reserved IDs and give transform_value another reserved ID?

Comment thread format/index.md

| Field Id | Column | Type | Description |
|-----------|-----------------|--------|------------------------------------------------------------------------|
| TBD | transform_value | long | The result of applying the index transform function to the key columns |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the type is not always long, maybe change to determined by the transform function?

Comment thread format/index.md
Transform Function:

```text
HASH(primary_key)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to bucket(primary_key, N)?

Comment thread format/index.md
| file_path |
| position |

The leaf files are organized by hash key, while the tracking file stores summary information and pruning statistics.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The leaf files are organized by hash key -> The leaf files are organized by transform value?

@huaxingao

Copy link
Copy Markdown
Contributor

Since @pvary is out, I'll make the simple/mechanical changes now to keep this PR moving forward and leave the design decisions for him to review when he's back.

Co-authored-by: pvary <peter.vary.apache@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Specification Issues that may introduce spec changes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants