Summary
Building a Schema object requires 5+ levels of nested builders (Schema → ValueTypes → FloatListConfig → VectorIndexType → VectorIndexConfig → HnswConfig). This deep nesting is the real friction in schema construction — not the builder pattern itself.
Current
Schema schema = Schema.builder()
.key(Schema.EMBEDDING_KEY, ValueTypes.builder()
.floatList(FloatListConfig.builder()
.vectorIndex(VectorIndexType.builder()
.config(VectorIndexConfig.builder()
.hnsw(HnswConfig.builder()
.m(16)
.constructionEf(200)
.build())
.build())
.build())
.build())
.build())
.build();
Proposed
Add mid-level factory methods that flatten common configurations:
// Convenience: single method for the common HNSW case
Schema schema = Schema.builder()
.embedding(DistanceFunction.COSINE)
.build();
// With HNSW tuning
Schema schema = Schema.builder()
.embedding(DistanceFunction.COSINE, hnsw -> hnsw.m(16).constructionEf(200))
.build();
// With CMEK
Schema schema = Schema.builder()
.embedding(DistanceFunction.COSINE)
.cmek(Cmek.gcpKms("projects/.../cryptoKeys/my-key"))
.build();
Factory methods to add
On ValueTypes:
static ValueTypes floatWithHnsw(DistanceFunction distance) — default HNSW params
static ValueTypes floatWithHnsw(DistanceFunction distance, int m, int constructionEf) — tuned HNSW
On Schema.Builder:
Builder embedding(DistanceFunction distance) — shorthand for the most common schema pattern
Builder embedding(DistanceFunction distance, Consumer<HnswConfig.Builder> hnsw) — with HNSW tuning
Design notes
- Additive only — deep nested builder API stays for full customization
- Target the 80% case — most schemas just need distance function + optional HNSW tuning
- Don't abstract away CMEK — it's already a single method call (
Cmek.gcpKms(...))
- Schema is cloud-only, so this is primarily a DX improvement for Chroma Cloud users
References
Schema: src/main/java/tech/amikos/chromadb/v2/Schema.java
ValueTypes, FloatListConfig, VectorIndexType, VectorIndexConfig, HnswConfig — nested builder chain
Summary
Building a
Schemaobject requires 5+ levels of nested builders (Schema→ValueTypes→FloatListConfig→VectorIndexType→VectorIndexConfig→HnswConfig). This deep nesting is the real friction in schema construction — not the builder pattern itself.Current
Proposed
Add mid-level factory methods that flatten common configurations:
Factory methods to add
On
ValueTypes:static ValueTypes floatWithHnsw(DistanceFunction distance)— default HNSW paramsstatic ValueTypes floatWithHnsw(DistanceFunction distance, int m, int constructionEf)— tuned HNSWOn
Schema.Builder:Builder embedding(DistanceFunction distance)— shorthand for the most common schema patternBuilder embedding(DistanceFunction distance, Consumer<HnswConfig.Builder> hnsw)— with HNSW tuningDesign notes
Cmek.gcpKms(...))References
Schema:src/main/java/tech/amikos/chromadb/v2/Schema.javaValueTypes,FloatListConfig,VectorIndexType,VectorIndexConfig,HnswConfig— nested builder chain