From 437557bb8d567a2d221fdeadbf448576c8a5f99b Mon Sep 17 00:00:00 2001 From: fh-ms Date: Fri, 20 Feb 2026 11:53:39 +0100 Subject: [PATCH 1/2] Expand documentation on bitmap index types, adding comparison and usage guidance for regular vs binary indexers. --- .../gigamap/pages/indexing/bitmap/index.adoc | 2 +- .../gigamap/pages/indexing/bitmap/types.adoc | 50 +++++++++++++++++++ 2 files changed, 51 insertions(+), 1 deletion(-) diff --git a/docs/modules/gigamap/pages/indexing/bitmap/index.adoc b/docs/modules/gigamap/pages/indexing/bitmap/index.adoc index 8813fae6..e16eeaa8 100644 --- a/docs/modules/gigamap/pages/indexing/bitmap/index.adoc +++ b/docs/modules/gigamap/pages/indexing/bitmap/index.adoc @@ -60,7 +60,7 @@ Here's a list of all predefined indexers used for low cardinality: |=== -If you need indexers for high cardinality, like unique values, the binary indexers are the way to go. They are optimized for a huge number of entries. +If you need indexers for high cardinality, like unique values, the binary indexers are the way to go. They are optimized for a huge number of entries. Note that binary indexers only support equality queries (`is`, `in`), not predicates or range queries. See xref:indexing/bitmap/types.adoc#_choosing_between_regular_and_binary_indexers[Choosing Between Regular and Binary Indexers] for detailed guidance. [options="header",cols="1,2"] |=== diff --git a/docs/modules/gigamap/pages/indexing/bitmap/types.adoc b/docs/modules/gigamap/pages/indexing/bitmap/types.adoc index cd037050..da1d23e4 100644 --- a/docs/modules/gigamap/pages/indexing/bitmap/types.adoc +++ b/docs/modules/gigamap/pages/indexing/bitmap/types.adoc @@ -74,4 +74,54 @@ Each long-bit corresponds to an entry, meaning there will be 64 entries max, no Currently it only supports equality queries. Range queries are not supported yet. +== Choosing Between Regular and Binary Indexers + +The bitmap index offers two families of indexers: **regular indexers** (e.g. `IndexerString`, `IndexerLong`) and **binary indexers** (e.g. `BinaryIndexerString`, `BinaryIndexerLong`). The right choice depends on your data characteristics and query needs. + +=== Comparison + +[options="header",cols="2,3,3"] +|=== +|Aspect |Regular Indexer |Binary Indexer + +|Cardinality +|Low to medium (few distinct values, many entities per value) +|High (many distinct values, few entities per value) + +|Query types +|Equality (`is`, `in`), predicates (`is(predicate)`), and - depending on the index type - range queries +|Equality only (`is`, `in`) + +|Key type +|Any object with `equals`/`hashCode` +|Must convert to `long` + +|Memory +|On-heap hash table +|Off-heap bit-position array + +|Null values +|Native support +|Requires sentinel value handling + +|=== + +=== Use Regular Indexer When + +* Indexed values have **low cardinality** — few distinct values shared by many entities (enums, categories, status fields) +* You need **predicate-based queries**, e.g. `firstName.is(name -> name.startsWith("J"))` +* You need **range queries** via `IndexerComparing` +* The key type does not naturally map to `long` + +**Typical use cases:** enumerations, boolean flags, date/time fields, categories, multi-value fields + +=== Use Binary Indexer When + +* Indexed values have **high cardinality** — many distinct values (unique IDs, foreign keys) +* Only **equality queries** are needed (`is`, `in`) +* The key can be efficiently converted to `long` (numeric IDs, TSID, UUID) +* **Memory efficiency** matters for large datasets + +**Typical use cases:** primary/foreign key indices, UUIDs, TSIDs, unique codes or reference numbers + From e8942daf85caf22e07bf9f53bd9768b4b34dacec Mon Sep 17 00:00:00 2001 From: fh-ms Date: Fri, 20 Feb 2026 11:56:27 +0100 Subject: [PATCH 2/2] Enhance documentation for multi-value indexing and queries. --- .../pages/indexing/bitmap/defining.adoc | 6 ++- .../gigamap/pages/indexing/bitmap/index.adoc | 2 +- .../gigamap/pages/queries/defining.adoc | 45 +++++++++++++++++++ 3 files changed, 51 insertions(+), 2 deletions(-) diff --git a/docs/modules/gigamap/pages/indexing/bitmap/defining.adoc b/docs/modules/gigamap/pages/indexing/bitmap/defining.adoc index d740a6c3..00f41a60 100644 --- a/docs/modules/gigamap/pages/indexing/bitmap/defining.adoc +++ b/docs/modules/gigamap/pages/indexing/bitmap/defining.adoc @@ -103,7 +103,9 @@ public final static Indexer maritalStatus = new Indexer.A }; ---- -Collections can also be indexed. In fact, everything that implements `Iterable` is supported. +Collections can also be indexed using `IndexerMultiValue`. Everything that implements `Iterable` is supported, making it suitable for lists, sets, or any other collection type. + +This is useful for modeling many-to-many relationships, tags, categories, or any field where an entity can have multiple associated values. Each value in the collection is indexed individually, so an entity with three interests will appear in the index under all three keys. [source, java] ---- @@ -123,6 +125,8 @@ public final static IndexerMultiValue interests = new IndexerM }; ---- +In addition to the standard query methods (`is`, `in`, `not`, `notIn`), multi-value indexers provide the `all` method, which finds entities whose collection contains *all* of the specified keys. See xref:../../queries/defining.adoc#_multi_value_queries[Multi-Value Queries] for details. + With custom logic, you can define any indexer you can imagine, not just returning values from the entity. For instance, this one creates a generation index. [source, java] diff --git a/docs/modules/gigamap/pages/indexing/bitmap/index.adoc b/docs/modules/gigamap/pages/indexing/bitmap/index.adoc index e16eeaa8..853dd676 100644 --- a/docs/modules/gigamap/pages/indexing/bitmap/index.adoc +++ b/docs/modules/gigamap/pages/indexing/bitmap/index.adoc @@ -56,7 +56,7 @@ Here's a list of all predefined indexers used for low cardinality: |java.time.YearMonth |IndexerMultiValue -|java.lang.Iterable +|java.lang.Iterable (lists, sets, or any collection) |=== diff --git a/docs/modules/gigamap/pages/queries/defining.adoc b/docs/modules/gigamap/pages/queries/defining.adoc index eae2c5f0..24711c9d 100644 --- a/docs/modules/gigamap/pages/queries/defining.adoc +++ b/docs/modules/gigamap/pages/queries/defining.adoc @@ -32,3 +32,48 @@ This enables us to use any custom logic inside query definitions if the predefin ---- gigaMap.query(firstName.is(name -> name.length() > 3)); ---- + +Negation is also supported with `not` and `notIn`. + +[source, java] +---- +// all persons NOT called 'John' +gigaMap.query(firstName.not("John")); + +// all persons called neither 'John' nor 'Jim' +gigaMap.query(firstName.notIn("John", "Jim")); +---- + +== Multi-Value Queries + +When using an `IndexerMultiValue`, each entity can have multiple indexed values (e.g. a list of tags or interests). The standard query methods work on the individual values in the collection: + +* `is(key)` — matches entities whose collection *contains* that key +* `in(k1, k2, ...)` — matches entities whose collection contains *any* of the given keys (OR) +* `not(key)` — matches entities whose collection does *not* contain that key +* `notIn(k1, k2, ...)` — matches entities whose collection contains *none* of the given keys + +[source, java] +---- +// persons interested in SPORTS +gigaMap.query(interests.is(Interest.SPORTS)); + +// persons interested in SPORTS or LITERATURE (or both) +gigaMap.query(interests.in(Interest.SPORTS, Interest.LITERATURE)); +---- + +Additionally, `IndexerMultiValue` provides the `all` method, which matches entities whose collection contains *all* of the specified keys (AND logic). + +[source, java] +---- +// persons interested in both SPORTS and LITERATURE +gigaMap.query(interests.all(Interest.SPORTS, Interest.LITERATURE)); +---- + +Predicates also work with multi-value indexers. The predicate is applied to each key in the index. + +[source, java] +---- +// persons with any interest matching a custom condition +gigaMap.query(interests.is(interest -> interest.name().startsWith("S"))); +----