Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion docs/modules/gigamap/pages/indexing/bitmap/defining.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,9 @@ public final static Indexer<Person, MaritalStatus> maritalStatus = new Indexer.A
};
----

Collections can also be indexed. In fact, everything that implements `Iterable` is supported.
Collections can also be indexed using `IndexerMultiValue`. Everything that implements `Iterable` is supported, making it suitable for lists, sets, or any other collection type.

This is useful for modeling many-to-many relationships, tags, categories, or any field where an entity can have multiple associated values. Each value in the collection is indexed individually, so an entity with three interests will appear in the index under all three keys.

[source, java]
----
Expand All @@ -123,6 +125,8 @@ public final static IndexerMultiValue<Person, Interest> interests = new IndexerM
};
----

In addition to the standard query methods (`is`, `in`, `not`, `notIn`), multi-value indexers provide the `all` method, which finds entities whose collection contains *all* of the specified keys. See xref:../../queries/defining.adoc#_multi_value_queries[Multi-Value Queries] for details.

With custom logic, you can define any indexer you can imagine, not just returning values from the entity. For instance, this one creates a generation index.

[source, java]
Expand Down
4 changes: 2 additions & 2 deletions docs/modules/gigamap/pages/indexing/bitmap/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -56,11 +56,11 @@ Here's a list of all predefined indexers used for low cardinality:
|java.time.YearMonth

|IndexerMultiValue
|java.lang.Iterable
|java.lang.Iterable (lists, sets, or any collection)

|===

If you need indexers for high cardinality, like unique values, the binary indexers are the way to go. They are optimized for a huge number of entries.
If you need indexers for high cardinality, like unique values, the binary indexers are the way to go. They are optimized for a huge number of entries. Note that binary indexers only support equality queries (`is`, `in`), not predicates or range queries. See xref:indexing/bitmap/types.adoc#_choosing_between_regular_and_binary_indexers[Choosing Between Regular and Binary Indexers] for detailed guidance.

[options="header",cols="1,2"]
|===
Expand Down
50 changes: 50 additions & 0 deletions docs/modules/gigamap/pages/indexing/bitmap/types.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -74,4 +74,54 @@ Each long-bit corresponds to an entry, meaning there will be 64 entries max, no
Currently it only supports equality queries.
Range queries are not supported yet.

== Choosing Between Regular and Binary Indexers

The bitmap index offers two families of indexers: **regular indexers** (e.g. `IndexerString`, `IndexerLong`) and **binary indexers** (e.g. `BinaryIndexerString`, `BinaryIndexerLong`). The right choice depends on your data characteristics and query needs.

=== Comparison

[options="header",cols="2,3,3"]
|===
|Aspect |Regular Indexer |Binary Indexer

|Cardinality
|Low to medium (few distinct values, many entities per value)
|High (many distinct values, few entities per value)

|Query types
|Equality (`is`, `in`), predicates (`is(predicate)`), and - depending on the index type - range queries
|Equality only (`is`, `in`)

|Key type
|Any object with `equals`/`hashCode`
|Must convert to `long`

|Memory
|On-heap hash table
|Off-heap bit-position array

|Null values
|Native support
|Requires sentinel value handling

|===

=== Use Regular Indexer When

* Indexed values have **low cardinality** — few distinct values shared by many entities (enums, categories, status fields)
* You need **predicate-based queries**, e.g. `firstName.is(name -> name.startsWith("J"))`
* You need **range queries** via `IndexerComparing`
* The key type does not naturally map to `long`

**Typical use cases:** enumerations, boolean flags, date/time fields, categories, multi-value fields

=== Use Binary Indexer When

* Indexed values have **high cardinality** — many distinct values (unique IDs, foreign keys)
* Only **equality queries** are needed (`is`, `in`)
* The key can be efficiently converted to `long` (numeric IDs, TSID, UUID)
* **Memory efficiency** matters for large datasets

**Typical use cases:** primary/foreign key indices, UUIDs, TSIDs, unique codes or reference numbers


45 changes: 45 additions & 0 deletions docs/modules/gigamap/pages/queries/defining.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,48 @@ This enables us to use any custom logic inside query definitions if the predefin
----
gigaMap.query(firstName.is(name -> name.length() > 3));
----

Negation is also supported with `not` and `notIn`.

[source, java]
----
// all persons NOT called 'John'
gigaMap.query(firstName.not("John"));

// all persons called neither 'John' nor 'Jim'
gigaMap.query(firstName.notIn("John", "Jim"));
----

== Multi-Value Queries

When using an `IndexerMultiValue`, each entity can have multiple indexed values (e.g. a list of tags or interests). The standard query methods work on the individual values in the collection:

* `is(key)` — matches entities whose collection *contains* that key
* `in(k1, k2, ...)` — matches entities whose collection contains *any* of the given keys (OR)
* `not(key)` — matches entities whose collection does *not* contain that key
* `notIn(k1, k2, ...)` — matches entities whose collection contains *none* of the given keys

[source, java]
----
// persons interested in SPORTS
gigaMap.query(interests.is(Interest.SPORTS));

// persons interested in SPORTS or LITERATURE (or both)
gigaMap.query(interests.in(Interest.SPORTS, Interest.LITERATURE));
----

Additionally, `IndexerMultiValue` provides the `all` method, which matches entities whose collection contains *all* of the specified keys (AND logic).

[source, java]
----
// persons interested in both SPORTS and LITERATURE
gigaMap.query(interests.all(Interest.SPORTS, Interest.LITERATURE));
----

Predicates also work with multi-value indexers. The predicate is applied to each key in the index.

[source, java]
----
// persons with any interest matching a custom condition
gigaMap.query(interests.is(interest -> interest.name().startsWith("S")));
----