Skip to content

Add convenience factory methods for Schema value types to reduce nesting #144

@oss-amikos

Description

@oss-amikos

Summary

Building a Schema object requires 5+ levels of nested builders (SchemaValueTypesFloatListConfigVectorIndexTypeVectorIndexConfigHnswConfig). This deep nesting is the real friction in schema construction — not the builder pattern itself.

Current

Schema schema = Schema.builder()
    .key(Schema.EMBEDDING_KEY, ValueTypes.builder()
        .floatList(FloatListConfig.builder()
            .vectorIndex(VectorIndexType.builder()
                .config(VectorIndexConfig.builder()
                    .hnsw(HnswConfig.builder()
                        .m(16)
                        .constructionEf(200)
                        .build())
                    .build())
                .build())
            .build())
        .build())
    .build();

Proposed

Add mid-level factory methods that flatten common configurations:

// Convenience: single method for the common HNSW case
Schema schema = Schema.builder()
    .embedding(DistanceFunction.COSINE)
    .build();

// With HNSW tuning
Schema schema = Schema.builder()
    .embedding(DistanceFunction.COSINE, hnsw -> hnsw.m(16).constructionEf(200))
    .build();

// With CMEK
Schema schema = Schema.builder()
    .embedding(DistanceFunction.COSINE)
    .cmek(Cmek.gcpKms("projects/.../cryptoKeys/my-key"))
    .build();

Factory methods to add

On ValueTypes:

  • static ValueTypes floatWithHnsw(DistanceFunction distance) — default HNSW params
  • static ValueTypes floatWithHnsw(DistanceFunction distance, int m, int constructionEf) — tuned HNSW

On Schema.Builder:

  • Builder embedding(DistanceFunction distance) — shorthand for the most common schema pattern
  • Builder embedding(DistanceFunction distance, Consumer<HnswConfig.Builder> hnsw) — with HNSW tuning

Design notes

  • Additive only — deep nested builder API stays for full customization
  • Target the 80% case — most schemas just need distance function + optional HNSW tuning
  • Don't abstract away CMEK — it's already a single method call (Cmek.gcpKms(...))
  • Schema is cloud-only, so this is primarily a DX improvement for Chroma Cloud users

References

  • Schema: src/main/java/tech/amikos/chromadb/v2/Schema.java
  • ValueTypes, FloatListConfig, VectorIndexType, VectorIndexConfig, HnswConfig — nested builder chain

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions