Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions docs/modules/gigamap/pages/indexing/jvector/configuration.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,10 @@ For datasets that exceed available memory, enable on-disk storage to use memory-
|`pqSubspaces`
|`0`
|Number of PQ subspaces (0 = auto: dimension/4).

|`parallelOnDiskWrite`
|`false`
|Use parallel direct buffers and multiple worker threads for on-disk index writing. Speeds up persistence for large indices but uses more resources. Only applies when `onDisk=true`.
|===

=== Example
Expand All @@ -157,6 +161,55 @@ VectorIndexConfiguration config = VectorIndexConfiguration.builder()
.build();
----

== Eventual Indexing

Enable eventual indexing to defer expensive HNSW graph mutations to a background thread, reducing mutation latency at the cost of eventual search consistency.

[options="header",cols="1,1,3"]
|===
|Parameter |Default |Description

|`eventualIndexing`
|`false`
|Defer HNSW graph mutations (add, update, remove) to a background thread. The vector store is updated synchronously, but graph construction happens asynchronously. Search results may not immediately reflect the most recent mutations.
|===

When enabled:

* The vector store is always updated synchronously (no data loss).
* HNSW graph mutations are queued and applied by a single background worker thread.
* The queue is automatically drained before `optimize()`, `persistToDisk()`, and `close()`.

=== Example

[source, java]
----
VectorIndexConfiguration config = VectorIndexConfiguration.builder()
.dimension(768)
.similarityFunction(VectorSimilarityFunction.COSINE)
.eventualIndexing(true)
.build();
----

== Parallel On-Disk Writes

When on-disk storage is enabled, persistence can optionally use parallel direct buffers and multiple worker threads (one per available processor) to write the index concurrently. This can significantly speed up persistence for large indices.

This is disabled by default, as sequential single-threaded writing is preferred in resource-constrained environments or for smaller indices.

=== Example

[source, java]
----
VectorIndexConfiguration config = VectorIndexConfiguration.builder()
.dimension(768)
.similarityFunction(VectorSimilarityFunction.COSINE)
.onDisk(true)
.indexDirectory(Path.of("/data/vectors"))
.parallelOnDiskWrite(true)
.build();
----

== Background Persistence

Enable automatic asynchronous persistence to avoid blocking operations during writes.
Expand Down
41 changes: 41 additions & 0 deletions gigamap/jvector/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ A Java library that integrates [JVector](https://github.com/datastax/jvector) (h
- **PQ Compression**: Product Quantization for reduced memory footprint
- **Background Persistence**: Automatic asynchronous persistence at configurable intervals
- **Background Optimization**: Periodic graph cleanup for improved query performance
- **Eventual Indexing**: Deferred graph mutations via background thread for reduced write latency
- **Parallel On-Disk Writes**: Multi-threaded index persistence for large on-disk indices
- **Lazy Entity Access**: Search results provide direct access to entities without additional lookups
- **Stream API**: Java Stream support for search results
- **GigaMap Integration**: Seamlessly integrates with GigaMap's index system
Expand Down Expand Up @@ -163,6 +165,13 @@ List<Document> topDocs = result.stream()
| `indexDirectory` | `null` | Directory for index files (required if `onDisk=true`) |
| `enablePqCompression` | `false` | Enable Product Quantization compression |
| `pqSubspaces` | `0` | Number of PQ subspaces (0 = auto: dimension/4) |
| `parallelOnDiskWrite` | `false` | Use parallel direct buffers and multiple worker threads for on-disk index writing. Speeds up persistence for large indices but uses more resources. Only applies when `onDisk=true` |

### Eventual Indexing

| Parameter | Default | Description |
|-----------|---------|-------------|
| `eventualIndexing` | `false` | Defer HNSW graph mutations to a background thread. The vector store is updated synchronously, but graph construction happens asynchronously. Reduces mutation latency at the cost of eventual search consistency |

### Background Persistence

Expand Down Expand Up @@ -223,6 +232,38 @@ VectorIndexConfiguration config = VectorIndexConfiguration.builder()
.build();
```

### Eventual Indexing

For high-throughput systems where mutation latency matters more than immediate search consistency:

```java
VectorIndexConfiguration config = VectorIndexConfiguration.builder()
.dimension(768)
.similarityFunction(VectorSimilarityFunction.COSINE)
// Eventual indexing (graph mutations deferred to background thread)
.eventualIndexing(true)
.build();
```

When enabled, the vector store is always updated synchronously (no data loss), but expensive HNSW graph mutations are queued and applied by a background worker thread. Search results may not immediately reflect the most recent mutations. The queue is automatically drained before `optimize()`, `persistToDisk()`, and `close()`.

### Parallel On-Disk Writes

For large on-disk indices where persistence speed is critical:

```java
VectorIndexConfiguration config = VectorIndexConfiguration.builder()
.dimension(768)
.similarityFunction(VectorSimilarityFunction.COSINE)
.onDisk(true)
.indexDirectory(Path.of("/data/vectors"))
// Parallel on-disk writing (multiple worker threads)
.parallelOnDiskWrite(true)
.build();
```

When enabled, the on-disk graph writer uses parallel direct buffers and multiple worker threads (one per available processor) to write the index concurrently. This is disabled by default as sequential writing is preferred in resource-constrained environments or for smaller indices.

### Manual Optimization and Persistence

```java
Expand Down
8 changes: 7 additions & 1 deletion gigamap/jvector/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
<url>https://projects.eclipse.org/projects/technology.store</url>

<properties>
<jvector.version>4.0.0-rc.7</jvector.version>
<jvector.version>4.0.0-rc.8</jvector.version>
</properties>

<dependencies>
Expand All @@ -44,6 +44,12 @@
<artifactId>junit-jupiter-engine</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.awaitility</groupId>
<artifactId>awaitility</artifactId>
<version>4.2.2</version>
<scope>test</scope>
</dependency>
</dependencies>

<build>
Expand Down

This file was deleted.

Loading