feat: SimString - automatic embedding backed similarity search#11
Merged
matthewmcneely merged 5 commits intomainfrom Feb 28, 2026
Merged
Conversation
There was a problem hiding this comment.
3 issues found across 5 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="mutate.go">
<violation number="1" location="mutate.go:126">
P2: The embedding transaction is committed without a subsequent `Discard`, which the Dgraph client docs recommend to clean up resources (and it’s safe after commit). Add a discard after commit and on commit error to avoid leaking txn resources.</violation>
</file>
<file name="client.go">
<violation number="1" location="client.go:125">
P2: The client cache key does not include the new EmbeddingProvider option. Creating a client with a different embedding provider will reuse an existing cached client and ignore the new provider, which can generate embeddings with the wrong model or unexpectedly skip embedding generation.</violation>
</file>
<file name="embedding.go">
<violation number="1" location="embedding.go:483">
P1: Bug: `defer cleanup()` releases the pooled `*dgo.Dgraph` connection back to the pool when `SimilarToText` returns, but the returned `*dg.QueryBlock` still holds a reference to a transaction on that connection. When the caller later calls `.Scan()`, the underlying connection may already be in use by another operation, causing data races or query failures.
The cleanup function and the `*dgo.Dgraph` client should not be deferred here — they need to remain alive until after the caller finishes with the QueryBlock. Consider either: (1) returning the cleanup function alongside the QueryBlock so the caller can manage the lifecycle, or (2) executing the query inside this function and returning the results directly.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
… executing the query fully internally before releasing the connection
…aced immediately after the transaction is created
There was a problem hiding this comment.
2 issues found across 1 file (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name=".github/workflows/ci-go-unit-tests.yaml">
<violation number="1" location=".github/workflows/ci-go-unit-tests.yaml:45">
P2: Pin the Dgraph Docker image to a specific version to keep CI runs reproducible and avoid unexpected breakages when `latest` changes.</violation>
<violation number="2" location=".github/workflows/ci-go-unit-tests.yaml:47">
P2: Fail the job when Dgraph does not become ready after the retry loop so test runs don’t proceed against an unavailable service.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Introduces SimString, a string type that transparently manages vector embeddings and HNSW-indexed shadow predicates, eliminating the need for users to manually maintain VectorFloat32 fields or call embedding APIs.
Checklist
Summary by cubic
Adds SimString for automatic text embeddings with HNSW-backed similarity search and a simpler SimilarToText that embeds, queries, and fills the model. CI now starts a Dgraph container on Linux and runs unit tests against it.
New Features
dgraph:"embedding"auto-create a<field>__vecfloat32vector (HNSW) predicate via UpdateSchema.WithEmbeddingProvider(...)(client cache key includes the provider).threshold.SimilarTo(...)for precomputed vectors;SimilarToText(...)now embeds text, runs the query, and populates the model.Migration
SimStringwithdgraph:"embedding"; optionally setmetric,exponent,threshold. Provide an EmbeddingProvider and runUpdateSchema(...); ensure providerDims()matches.SimilarToTextcall sites: it returns only an error and populates the passed model.Written for commit f7125e3. Summary will update on new commits.