Pluggable embedding providers (ollama / openai / voyage) + storage-path unification by dvcdsys · Pull Request #57 · dvcdsys/code-index

dvcdsys · 2026-05-28T21:05:52Z

What

Adds a pluggable embedding-provider system so the server can embed via the
local ollama llama-server sidecar, any OpenAI-compatible endpoint
(vLLM/TEI/LocalAI/OpenAI), or Voyage AI — switchable at runtime from the
dashboard. Also unifies on-disk storage path derivation behind that change.

Major pieces:

Provider abstraction — provider.Provider interface + kind→Factory
registry; ollama/openai/voyage implementations (ollama code moved into
provider/ollama/).
Voyage provider — int8 dequant, operator-configurable RPM/TPM
token-bucket throttle, token-aware batch splitting, adaptive bisect on the
120K "batch too large" 400, byte-window sliding split + averaging for
oversize single inputs.
Admin surface — GET/PUT /admin/embedding-providers[/active],
POST /…/{kind}/test, provider-aware /status; dashboard provider UI.
Storage unification — one permanent, model-INDEPENDENT system DB; the
chroma vector store is namespaced by the ACTIVE provider identity
(provider.StorageSlug(Provider.ID())) with live reopen on switch, so
vectors of different dimensions never collide. One-time boot migrations
adopt legacy per-model DB/chroma layouts without a reindex.

Why

The original design derived both the SQLite path and the chroma dir from the
embedding model name (a Python-era artefact). That was only safe while the
model was a fixed env var; runtime provider switching turned it into a
critical bug — switching to a higher-dim provider wrote vectors into the
previous provider's collection, breaking search with "vectors must have the
same length" — plus a footgun where a model change spawned a parallel system
DB with empty accounts.

How

Two-plane storage: model-independent system DB (cfg.SQLitePath) +
identity-namespaced chroma; vectorstore.Holder allows atomic swap of the
active store on provider switch with no restart.
API keys are never persisted/logged/returned — only the env-var NAME is
stored; the key is read live from the environment at request time.

Pre-PR review fixes (last 5 commits)

voyage: validate returned vector dimension; guard the averaging path
against inconsistent window widths (panic → clean error); accept int8 as
base64; trim base-URL trailing slash.
openai/db: trim base-URL trailing slash; migration-13 now tests for a
known kind prefix instead of any : so colon-bearing legacy ollama model
names get prefixed.
httpapi: fix a nil-actor panic in SwitchEmbeddingProvider under
CIX_AUTH_DISABLED=true; add per-endpoint 403 gating tests for all admin
routes.
embeddings: close three concurrency hazards — a data race on the
s.queue pointer swap during Restart, missing mutual exclusion between
SwitchProvider and Restart, and an unsynchronized waiterDone read in the
ollama supervisor's Stop.

Update — post-review fixes (commits `f5d6e22`, `668ad29`)

A full review of the branch was run after the original description above; two follow-up commits address the findings.

Correctness / resilience (`f5d6e22`)

Provider switch is now atomic. The work queue stays blocked across both the provider swap and the vector-store reopen, and a failed reopen rolls the provider back — closing the window where a concurrent embed could write new-dimension vectors into the previous provider's collection.
Boot no longer hard-fails on a remote provider. If the persisted openai/voyage provider's connect-test (or the ollama sidecar) fails at startup, the server degrades instead of refusing to boot, so the dashboard / auth / all-project APIs stay reachable to recover. Remote providers self-heal once the upstream/key returns.
Validate-then-persist in PUT /admin/embedding-providers/active: the live swap is applied (and validated) before the DB row is written, so a transient switch failure can't leave the DB pointing at a provider that bricks the next boot.
Chroma migration fails closed at boot instead of silently opening an empty namespace.
Sidecar status fix: healthy openai/voyage providers (StateRemote) now report ready: true (previously always false).
Restart blocks the freshly-built queue during a concurrency-cap change so in-flight embeds get a clean 503 instead of ErrSupervisor.
ollama token-window guard against a pathologically small ctx_size (no more infinite split loop).
Shared HTTP-provider plumbing (base-URL normalize, API-key resolve, Ready/Status) extracted to one place.
Release gate fixed: make test-gate (the embedding-parity gate) compiles again — a deleted embedRaw helper was breaking it.

Nested per-provider vector namespacing (`668ad29`)

Replaces the flat chroma_<slug> scheme with a nested layout, one path segment per identity field:

<ChromaPersistDir>/<kind>/<model-slug>[/<variant>]
chroma/ollama/nomic_embed_text   ·   chroma/voyage/voyage_code_3/2048_int8

The provider kind is its own directory level (derived from the kind, not glued onto the model slug), which structurally removes the legacy-migration ambiguity where a model name normalizing to a kind-looking slug (e.g. openai-community/x) was silently orphaned. Adds Provider.StorageComponents(), Config.ChromaDirFor(), and an unambiguous flat→nested migration that preserves existing ollama vectors without a reindex.

Testing

go vet ./... ✅ · go test ./... ✅ (39 pkgs) · go test -race ./... ✅ (39 pkgs, 0 data races) · embed_gate compiles.
Added regression tests: switch rollback + happy-path; per-provider StorageComponents (incl. the foo vs ollama-foo anti-collision); flat→nested migration (incl. the kind-looking-name case).

Known deferred (non-blocking)

An empty pre-unification per-model DB leftover triggers a cosmetic boot WARN from AdoptLegacyModelDB; no data impact. Can be silenced in a follow-up (skip-warn-when-empty).

Type of change

Bug fix
New feature
Refactor

Checklist

go build ./..., go vet ./..., go test ./... all pass
go test -race ./internal/embeddings/... ./internal/vectorstore/... passes
No secrets or API keys committed (provider configs store only env-var names)

🤖 Generated with Claude Code

Add provider.Provider interface and a Factory/registry under internal/embeddings/provider. Move the existing llama-server sidecar into provider/ollama as the first implementation. Add provider/openai (generic /v1/embeddings) and provider/voyage (api.voyageai.com with input_type, output_dimension, and int8 dequantization). Rewrite embeddings.Service to delegate to the active Provider while keeping the queue/backpressure machinery centralized. EmbeddingModel() now returns Provider.ID() — a fingerprint of the form "{kind}:{model}[:{dim}][:{dtype}]" — so the existing per-project drift check in repojobs transparently treats a provider switch as a model change and forces a full reindex on the next clone job. Behaviour unchanged for default ollama deployments; the openai and voyage providers are registered but not yet selectable through any admin surface. Tests are green.

Add migration 12 to extend runtime_settings with two columns: embedding_provider (kind selector) and embedding_provider_config (per-provider JSON blob). Idempotent ALTER TABLE; API keys are not persisted — providers read them live from env vars named in the blob. Add internal/embeddingscfg, a thin persistence layer over the two new columns. Wire it into main.go's boot path: read the persisted selection, seed the row from env-derived ollama defaults on fresh installs, then hand the resolved Provider to embeddings.Service via the new NewWithProvider constructor. Add embeddings.Service.SwitchProvider — build new + Start + drain queue + atomic swap + background Stop of the old. Used by the new admin endpoints: GET /api/v1/admin/embedding-providers list kinds + schemas + env-key readiness GET /api/v1/admin/embedding-providers/active current kind + persisted config PUT /api/v1/admin/embedding-providers/active atomic switch POST /api/v1/admin/embedding-providers/{kind}/test pre-save sanity check Extend /api/v1/status with embedding_provider and embedding_provider_manages_process so the dashboard footer can render the provider name as the label and a permanent green dot for HTTP-only providers that have no managed process to die. Behaviour unchanged for existing ollama installs.

Provider-level: stub httptest.Server-based tests for the openai and voyage clients. Cover request shape, response ordering, HTTP-error surfacing, missing-API-key behavior, ID() fingerprinting, Voyage input_type query/document selection, and the int8 dequantize path (127 → ~1.0, -127 → ~-1.0, etc). Admin handler: gating tests for the new endpoints — admin sees all three kinds in the list with non-empty schemas, viewer gets 403, unknown kind is rejected, /test with a missing API key returns 400.

Add EmbeddingProviderSection with a kind dropdown and per-provider forms. Three hardcoded React components (OpenAIProviderForm, VoyageProviderForm, ollama tuning kept in the existing sections) render below the dropdown. Voyage form covers model picker, output dimension (256/512/1024/2048), output dtype (float | int8), and truncation toggle. Each form surfaces a red banner when the referenced API-key env var is not set on the server and disables Save accordingly — keys themselves are never stored in the dashboard or DB. New hooks: useEmbeddingProviders, useActiveProvider, useTestProvider, useSwitchProvider. The save flow is test → switch → toast → invalidate runtime-model + sidecar-status caches so the footer and project cards update immediately. Footer: label is now the active provider kind ("ollama"/"openai"/ "voyage") instead of hardcoded "llama". Dot is green/red for ollama (real liveness signal from /health), permanently green for HTTP-only providers (no managed process to die — failures surface at search time with diagnostics). ServerPage: ollama-specific sections (EmbeddingModelSection, RuntimeParamsSection, SidecarSection) render only when the active provider is ollama. Otherwise the provider form above is the only edit surface. openapi.yaml: extend StatusResponse with embedding_provider + embedding_provider_manages_process; add the four new /admin/embedding-providers endpoints and their schemas.

Add commented examples for CIX_OPENAI_API_KEY and CIX_VOYAGE_API_KEY to both compose files. Default behavior is unchanged — first boot seeds the ollama provider from the existing CIX_EMBEDDING_MODEL + CIX_LLAMA_* vars. Switching to a remote provider is a dashboard action; the env var only needs to be exported when the operator intends to use OpenAI-compatible or Voyage backends.

The previous gate covered EmbeddingModelSection, RuntimeParamsSection, and SidecarSection but left AdvancedSection (batch size / concurrency) visible — and also exposed the ollama-only Save & Restart header button regardless of provider. Switch to /status as the single source of truth for the live active kind (already polled by the footer; no extra request) and gate AdvancedSection plus the Save & Restart button on it. Update the header copy to explain what's editable per provider type.

The Service-level embedding queue depth (max_embedding_concurrency) applies to every provider — it caps how many parallel /v1/embeddings POSTs the server runs, which OpenAI and Voyage both accept natively (subject to their account-level rate limits). The previous UI hid it behind the ollama-only block. Backend Service.Restart is now provider-aware. Submitting a runtime config change with a remote provider active only rebuilds the queue; the (HTTP-only) provider stays in place. The previous code unconditionally called buildOllamaFromConfig, which would have silently re-spawned llama-server on top of a live voyage/openai provider — that bug is fixed here. Dashboard rearrangement: - AdvancedSection renamed to Throughput in the UI, always visible. Receives isOllama: when false the llama-batch field is hidden; concurrency is shown regardless. - Save & Restart button on ServerPage stays visible for all providers; label switches to "Save" when the active provider has no managed sidecar to restart. - Header copy updated to explain the new layout.

The indexer already sends all chunks of one file as a batched POST (input: [chunk1, chunk2, ...]); concurrency caps how many such batched POSTs run in parallel. The previous copy only mentioned "parallel" without explaining that batching is the inner loop — caused confusion when picking values. Also call out the unsplit Voyage limits (voyage-code-3 = 128 inputs per request) so operators know to keep files under that threshold until the per-provider auto-split lands.

Voyage caps inputs per /v1/embeddings POST at 128 for voyage-code-* models (1000 for voyage-3*). OpenAI proper caps at 2048. The previous implementation forwarded the caller's slice as-is, so a single file with 200 chunks would 422 against voyage-code-3. Both HTTP providers now slice oversized inputs into sequential sub-batches inside EmbedDocuments under the same Service queue slot (no extra concurrency consumed) and concatenate results in input order. The split is transparent to callers — the indexer keeps sending one batched POST per file and providers handle the rest. Tests assert two POSTs for a 200-item Voyage call (128 + 72) and a 3000-item OpenAI call (2048 + 952). Dashboard copy updated: throughput card no longer warns about unsplit limits since the safety net is in place.

Add an informational banner to the Voyage form pointing operators to the dashboard billing page and explaining the free-tier 3 RPM / 10K TPM ceiling — small enough for a smoke test but the indexer will saturate it on any real repo. Suggest concurrency=1 as the workaround for free-tier users. Add a one-paragraph note to the OpenAI form about the current no-retry policy: 429s are surfaced as-is; users should lower concurrency or upgrade their account tier. Self-hosted servers typically don't need this. Decision (per discussion): no client-side retry / token-bucket implementation. Industry tools like LangChain do retry, but on hard tier limits (3 RPM) retry buys very little — files just fail into the next clone job which retries idempotently. Keeping the provider HTTP layer simple was preferred.

Surface the per-provider cost expectation before the admin picks anything. The callout sits inside EmbeddingProviderSection and is visible regardless of which provider is currently active or selected in the dropdown: - Ollama: free, local. - OpenAI-compatible: paid on api.openai.com, free on self-hosted. - Voyage: paid plan strongly recommended (free tier 3 RPM = unusable). Dropdown option labels also annotated with (free) / (paid) so the expectation is visible even when the callout is scrolled past.

The EmbeddingProviderSection dropdown allowed picking "ollama" while a remote provider was active, but the Save button was conditionally hidden (draftKind !== 'ollama') so the switch never actually fired. Even if it had, the backend SwitchEmbeddingProvider rejected empty or {} configs — and the ollama-specific fields (GGUF model, ctx, GPU layers, sidecar paths) aren't part of this card's form to begin with, so the dashboard had nothing valid to send. Fix on both sides: Backend admin_embeddings.go: when kind == "ollama" and the submitted config is empty / {} / null, synthesize the full config from the live runtime-cfg snapshot applied on top of the env defaults via BuildOllamaConfigFromEnv. This is the same path the bootstrap seed in main.go uses, so the resulting ollama provider config is identical to what a fresh install would have. Dashboard EmbeddingProviderSection.tsx: always send {} for ollama-switches and always show the Save button (label switches to "Save & switch to {kind}" when the kind has changed). Skip the /test pre-check for ollama since {} would fail factory validation. The button copy under the ollama hint now explains that switching back will restart the sidecar with the current runtime config + trigger a full reindex per project.

…legacy rows Two coupled bugs were causing every project to show a "stale model" badge forever even after reindex: 1. The indexer wrote a BARE model name to projects.indexed_with_model at FinishIndexing because Service.embeddingModel was set from cfg.EmbeddingModel ONCE at boot. The drift detector and dashboard on the read side were comparing against the live Provider.ID() which is now PREFIXED ("ollama:<model>") after the pluggable- provider refactor — so writes never matched reads, and reindex couldn't clear the badge. 2. Pre-existing rows (projects indexed under the old code path) hold the bare model name. Without backfill they'd stay marked stale even after the indexer fix, because their own row never matches the prefixed live ID until the next reindex. Fixes: Indexer: - Add SetEmbeddingModelLookup(func() string) — binds the indexer to a live function (typically embeddings.Service.EmbeddingModel). - FinishIndexing now writes Service.EmbeddingModel() which prefers the live lookup, so a runtime provider switch (PUT /admin/ embedding-providers/active) takes effect on the next reindex without a process restart. - Static SetEmbeddingModel(string) kept for tests that don't wire a live Service. main.go: wire idx.SetEmbeddingModelLookup(embedSvc.EmbeddingModel). Migration 13 (indexed_with_model_provider_prefix): one-time UPDATE that prepends "ollama:" to every projects.indexed_with_model value that lacks a ":" prefix. Safe because pre-refactor there was no non-ollama backend. Idempotent (rows containing ":" are skipped). Tested: TestMigrate_IndexedWithModelProviderPrefix covers the three expected cases — bare gets prefixed, already-prefixed left alone, NULL left alone.

Voyage's /v1/embeddings caps each request at 120K tokens, separate from the 128-input cap voyage-code-* models enforce. A single large file (e.g. bench/results/reference_embeddings.json at 187K tokens) sat under the input-count limit but blew past the token limit, producing a 400 from the server with the message: "The max allowed tokens per submitted batch is 120000. Your batch has 187609 tokens after truncation." The provider now plans batches against BOTH limits. planBatches walks the input list and closes the current batch whenever either limit would be exceeded: - inputs in batch >= 128 (voyage-code-* model cap), OR - estimated tokens + next text > 100_000 (our target, 17% under Voyage's 120K hard cap to leave headroom for estimation error). Token estimation: bytes ÷ 3, conservative for code (typically 3-4 chars/token) and even safer for UTF-8 multi-byte input (Cyrillic comments etc count as more bytes per character). Always over-counts so we never under-batch into a 400. Why not an SDK: investigated the community Go SDK at github.com/austinfhunter/voyageai — it's a thin HTTP wrapper that does NOT handle token batching, int8 dequant, or retry strategy. Switching would lose our existing int8 support and add a third- party dependency for no real benefit. Observability: structured slog.Info at the split decision logs the batch counts and limits; slog.Debug logs each sub-batch POST with input count + estimated token cost. Operators can grep "voyage: splitting batch" to see batch activity and "voyage: sub-batch POST" to follow individual requests. Tests cover: - planBatches splits a 300K-byte text + smalls into multiple batches respecting the 100K token budget. - 200 small texts still split at 128 + 72 (legacy count behavior). - End-to-end EmbedDocuments produces multiple POSTs when token budget is the binding constraint. Dashboard Throughput card copy updated to mention both limits.

Voyage AI does NOT expose rate limits via API — confirmed by checking https://docs.voyageai.com/docs/rate-limits and the /reference/rate-limits page. All limit data lives in their public docs and web dashboard. So instead of a fake "cache + refresh" mechanism, we ship a hardcoded snapshot from the docs: internal/embeddings/provider/voyage/limits.go KnownModelLimits — per-model Tier 1 RPM / TPM + per-request input and token caps for every Voyage model the dashboard offers. KnownLimitsSource — attribution string with the docs URL and snapshot date ("2026-05-27") so an operator can tell when to consult upstream for an update. GET /api/v1/admin/embedding-providers gains a documented_limits field on the voyage entry. Schema documented in openapi.yaml as DocumentedEmbeddingLimits / DocumentedModelLimit. Dashboard VoyageProviderForm renders a compact card under the model dropdown: - Active model row showing RPM / TPM / inputs-per-req / tokens-per-req - Tier multiplier note (×2 / ×3) since Voyage scales by lifetime spend and we can't detect the operator's tier - <details> expander with all models in the snapshot, current model highlighted - Footnote with the source URL + snapshot date - Button linking to the Voyage dashboard for live tier confirmation No "Refresh" button: it would re-fetch the same hardcoded constant, giving a false impression of "sync my plan from here". A reload after upgrading cix-server gets the new snapshot — the source footnote tells the operator when to do that.

…ottle Replace the hardcoded "documented limits" snapshot card with editable input fields for RPM, TPM, max-inputs-per-request, and max-tokens- per-request. Voyage doesn't expose limits via API (confirmed against docs.voyageai.com/docs/rate-limits) — the operator copies their tier's numbers from the Voyage dashboard, the indexer throttles itself accordingly. Backend (provider/voyage): Config gains four new fields, all optional: RateLimitRPM — caps requests/minute via token-bucket RateLimitTPM — caps tokens/minute via token-bucket MaxInputsPerRequest — overrides the static 128 default MaxTokensPerRequest — overrides the static 100K default Two rate.Limiter instances are wired into Provider when the corresponding RPM/TPM > 0; embed() waits on both before each POST (ctx-aware — server shutdown drains cleanly). nil limiters short- circuit when the operator leaves the fields empty, matching prior behaviour (only react to upstream 429/400). planBatches takes maxInputs+maxTokens args instead of consts, so the per-request caps come from cfg. Backend (admin_embeddings, openapi.yaml, limits.go): Drop the documented_limits payload + voyage.KnownModelLimits table — the dashboard no longer renders a "cached snapshot", values come straight from the operator instead. Less code, fewer things to keep in sync. Frontend: VoyageProviderForm: replace the read-only "Documented limits" card with a 2x2 grid of number inputs (RPM, TPM, max-inputs, max-tokens). Hint paragraph above the grid still mentions the public-docs baseline so the operator has a starting point, plus a link to the Voyage dashboard. Empty = no client-side enforcement (per-field). EmbeddingProviderSection: drop the documentedLimits prop + useMemo; populate the four new VoyageConfig fields from the persisted blob when loading the active provider. Tests: - TestRateLimitRPMThrottlesRequests: 120 RPM with two sequential EmbedQuery calls — second waits ~500ms on the limiter. - TestRateLimitTPMThrottlesTokens: 600K TPM with two ~60K-token POSTs — second waits ~2s for the bucket to refill. - planBatches tests updated to pass explicit caps.

Two gaps in the prior token-aware planBatches were leaking real 400s to the indexer: 1. Our estimator (bytes/3) is conservative for prose but can under-count for dense code where 1 token ≈ 2 bytes. estimated 100K could become actual 150K → POST → 400. 2. planBatches puts a single input larger than the cap in its own batch (no choice — splitting the text would corrupt the chunk). That POST would 400 unconditionally. Both surfaced as the user-reported error: "The max allowed tokens per submitted batch is 120000. Your batch has 187609 tokens after truncation." Fix: a new embedWithAdaptiveSplit wraps each embed() call. On a 400 whose message matches voyageBatchTooLargeRegex, it bisects the batch in half and retries both halves — recursively, so a real tokenizer miscalculation that's ~2x our estimate self-corrects in a single extra round-trip. EmbedQuery and EmbedDocuments both go through this path now. Single-chunk edge: when the offending batch is already 1 input, there is nothing to bisect. The provider returns a clear error pointing the operator at the chunker upstream: "voyage: a single chunk produced 187609 tokens (cap 120000). Reduce the indexer's max chunk size or switch to a model with a higher per-request cap" This is the right place to fail — silently dropping a chunk would hurt search quality without telling anyone; truncating it would change vector semantics; retrying forever would hang the indexer. parseBatchTooLarge extracts (cap, actual) from Voyage's stable error message so logs surface both numbers, letting an operator diagnose persistent estimator drift (e.g. "real tokens always 1.5x our estimate → tighten MaxTokensPerRequest in the form"). Tests: - EmbedDocuments with 4 inputs and a stub that rejects any batch >1 input ends up running 4 singleton POSTs after bisection. - Single oversize input returns the clean error. - parseBatchTooLarge covers happy path + non-match cases (429, different 400 reasons).

The "Truncate over-length inputs server-side" switch maps to Voyage's truncation API parameter. It only affects what happens when a SINGLE chunk exceeds the model's per-input context window (e.g. 32K tokens for voyage-code-3) — totally separate from the 120K-tokens-per-batch cap our adaptive bisect already handles. Before: bare label, no hint, easy to confuse with the batch cap. Now: a paragraph under the switch explains the trade-off (silent truncation vs explicit 400) and explicitly notes the separation from the batch limit.

Voyage's truncation:true silently drops content past the model's per-input context window (32K tokens for voyage-code-3). For long chunks that means the tail never makes it into the embedding — search quality suffers without any signal at index time. The ollama provider already has a sliding-window pipeline for the same problem (TokenizeAndEmbed: tokenize → split at CLS+SEP boundaries → embed all windows → average vectors). Mirror that for voyage, byte-aligned instead of token-aligned since Voyage has no public /tokenize endpoint. Added: - Config.MaxInputBytes (default 30_000) — caps per-input byte length. Sized so worst-case 1 byte/token dense code still fits voyage-code-3's 32K window with margin. - splitOversizeInput(text, maxBytes) — slices at rune-aligned boundaries so we never cut a UTF-8 multi-byte sequence in half. - embedAndAverage(ctx, texts, inputType) — three-phase pipeline: Phase 1: expand each oversize input into N windows; remember spans (which window indices map to which original). Phase 2: planBatches + adaptive-bisect POST chain on the expanded slice (existing logic, unchanged). Phase 3: average sub-window vectors back to one per original input. Single-window inputs pass through unchanged (fast path, no allocations). - EmbedQuery and EmbedDocuments both go through embedAndAverage now. Tests: - splitOversizeInput: under-cap singleton, ASCII over-cap 3-way split with byte-perfect rejoin, UTF-8 Cyrillic split that keeps every window valid UTF-8. - EmbedDocuments_AveragesOversizeInputWindows: stub returns [1.0, 2.0, 3.0] for the 3 windows of a 250-byte input under MaxInputBytes=100; provider must return averaged [2.0]. - Existing TPM throttling test now sets MaxInputBytes=1M to keep its single 180K-byte input from being auto-split (that test exercises the rate-limit bucket, not the window-split).

Per Voyage's own docs (https://docs.voyageai.com/docs/tokenization): "one word roughly corresponds to 1.2 - 1.5 tokens on average; for rough estimates divide character count by 5" Our previous bytesPerToken=3 over-counted for everything except the densest code, causing planBatches to split into more sub-batches than necessary. Switching to bytesPerToken=5 matches Voyage's published rule of thumb and reduces spurious splits on ordinary content. The trade-off: dense code can run hotter (1 token ≈ 2-3 bytes empirically), so our estimate may UNDER-count for very tight code and let planBatches pack inputs tighter than the real cap permits. That's fine — embedWithAdaptiveSplit detects the resulting 400 "max tokens per batch" response and bisects the batch in half. We trade a few extra round-trips on the worst-case dense files for materially fewer splits on the common case. Better long-term path is to load Voyage's real HF tokenizer (publicly available at huggingface.co/voyageai/voyage-code-3 as a standard tokenizer.json), but that requires a CGO Rust dep (daulet/tokenizers) — deferred to a follow-up that justifies the Docker / distroless work. Tests: - TestPlanBatches_SplitsByTokenBudget: bumped big text from 500K → 600K bytes so it exceeds the 100K-token cap at the new divisor. - TestEmbedDocuments_SplitsByTokenBudget: bumped from 300K → 500K and disabled per-input window split (MaxInputBytes=1M) to keep the test focused on the batch-token cap. - TestRateLimitTPMThrottlesTokens: bumped 180K → 300K so the per-call cost still hits ~60K tokens at the new divisor.

The /5 heuristic comes from Voyage's English-prose docs, but cix is primarily a code-indexer and dense source code is much denser per token. Production logs against voyage-code-3 show ~1.4 bytes/token on tight Go files — a 64-input batch estimating 51K tokens actually charged 187K, triggering Voyage's 120K hard-cap 400 and forcing the bisect safety net to fire on every batch even with a correctly configured rate-limits section. bytesPerToken=2 catches code reality with margin and over-estimates prose by ~2.5x (more round-trips, never a 400). The embedWithAdaptiveSplit bisect stays as a residual safety net. Tests adjusted for the new divisor (240K/200K/120K byte text sizes where the original called for 600K/500K/300K under /5). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… vector namespace The storage layout derived both the SQLite path and the chroma dir from the embedding model name (a Python-era artefact ported 1:1). Safe only while the model was a fixed env var; runtime provider switching turned it into a critical bug — switching to a higher-dim provider (e.g. voyage 2048) wrote vectors into the previous provider's collection, breaking search with "vectors must have the same length" — and a footgun where a model change spawned a parallel system DB with empty accounts. Two-plane design: - System DB is ONE permanent, model-INDEPENDENT file at cfg.SQLitePath (accounts + catalog + parsed code). The parsed-code tables are model-independent; vector search never reads them. - The chroma vector store is namespaced by the ACTIVE provider identity via provider.StorageSlug(Provider.ID()), so vectors of different dimensions never share a collection. Switching back reuses the prior namespace without a reindex. Changes: - provider.StorageSlug: deterministic, idempotent filesystem slug from a Provider.ID() fingerprint. - config: drop DynamicSQLitePath/DynamicChromaPersistDir; add ChromaDirForSlug; keep ModelSafeName + LegacyDynamicSQLitePath solely for the one-time adoption migration. - internal/storage (new): boot-time, idempotent file migrations — AdoptLegacyModelDB (checkpoint-then-rename the legacy per-model DB onto the canonical path; fossil moved aside to *.pre-unify-*) and PrefixLegacyChromaDirs (rename pre-unification ollama dirs to the unified scheme; canonicalises the legacy suffix through StorageSlug so model names with '.'/':' still match the path the server opens — no silent orphaning). - db.HasTables: helper to distinguish a real unified DB from a fossil. - vectorstore.Holder: concurrency-safe swappable *Store wrapper + Interface satisfied by both *Store and *Holder; indexer/httpapi/repojobs hold the Interface. - embeddings.Service.SwitchProvider now reopens the vector store under the new identity slug and atomically swaps it into the Holder (live, no restart); reopen failure keeps the old store and surfaces a loud error. - main.go/server.go: orchestrate boot migrations, Holder wiring, and identity-namespaced storage display. One-shot migration shims (ModelSafeName, LegacyDynamicSQLitePath, AdoptLegacyModelDB, PrefixLegacyChromaDirs and their call sites) are marked "LEGACY-MIGRATION (remove next release)" for deletion once all deployments have booted on the unified layout. Tests: StorageSlug (incl. special-char canonicalisation), dbmigrate (adopt/fossil-aside/WAL-drain/idempotency), chromamigrate (strict normalize/no-clobber/idempotency), Holder (-race swap-under-read), live reopen (swap/failure-keeps-old/unwired-noop). go build/vet/test ./... green; -race on vectorstore green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ze base URL Three robustness fixes surfaced by pre-PR review: - H2: validate each returned vector's length against the configured output_dimension (when >0) and reject a mismatch loudly. A model that silently ignores output_dimension (no Matryoshka support / typo'd model) would otherwise write wrong-width vectors deep into the store with no attribution. Also guard the oversize-input averaging path against windows of differing width, which would panic with index-out-of-range; it now returns a clean error. - M4: accept int8 embeddings returned as a base64-packed byte string (some OpenAI-compatible proxies) in addition to the default JSON int array, so a proxy swap doesn't fail the whole batch. - L4: trim a trailing slash from base_url in New() so url building never produces a double slash that stricter proxies can 404 on. Tests: wrong-dimension rejection, inconsistent-window-dim error, int8 base64 dequant. Adjusted an existing input_type test whose 2-dim mock response no longer matched its 1024 configured dimension. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…gacy models - L4: trim a trailing slash from the openai provider's base_url in New() so url building never emits a double slash that stricter OpenAI-compatible servers (vLLM/TEI behind a proxy) 404 on. Mirrors the same fix in the voyage provider. - L5: migration 13 (indexed_with_model provider-prefix backfill) tested for the mere presence of a ':' to decide a row was already prefixed. A legacy Ollama-style model name like "nomic-embed-text:latest" contains a colon but is NOT prefixed, so it was wrongly skipped and left flagged "stale model" forever. Now tests for a known kind prefix (ollama:/openai:/voyage:) instead. NOTE: editing an unreleased migration in place — deployments that already applied v13 with the old logic won't reprocess, but the default model carries no colon so no shipped DB is affected. Tests: openai trailing-slash path normalization; migration-13 colon- bearing legacy row now gets the ollama: prefix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…03 gating tests - M1: mustBeAdmin returns a nil authContext when CIX_AUTH_DISABLED=true (a legitimate deployment mode), and SwitchEmbeddingProvider then dereferenced user.User.ID for the audit field → handler-goroutine nil-panic. Resolve the actor through an `if user != nil` guard, matching the ac != nil pattern used elsewhere (tunnels.go). - L2: add per-endpoint 403 gating tests for the three admin embedding-provider routes that lacked them (GET/PUT active, POST test), satisfying the project rule that a non-admin caller must be rejected on every new endpoint. Auth was already enforced (all four handlers gate on mustBeAdmin first); these tests pin it against regressions. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…cycle, supervisor Three concurrency hazards surfaced by pre-PR review: - H1: Restart reassigned the s.queue pointer with no lock while Embed* callers read it (acquire + defer-release) — an unsynchronized pointer read/write (data race) reachable via the dashboard "Save & Restart" during an active reindex. The queue pointer is now guarded by s.mu: acquireProvider snapshots the queue and returns it so Release targets the SAME instance the slot was taken from, Restart writes the new queue under the lock, and Status reads via a snapshot. - M2: SwitchProvider and Restart had no mutual exclusion, so two admin actions could interleave their s.current / s.queue mutations or both tear down a provider. A new lifecycleMu serializes the two (Embed* paths don't take it). - M3: supervisor.Stop read s.waiterDone bare while a crash-driven spawn() reassigns it under s.mu — data race, far more reachable now that SwitchProvider can Stop a provider at runtime. Stop now snapshots cmd + waiterDone together under the read lock. Test: TestRestart_ConcurrentWithEmbeds_NoRace exercises the queue-swap race under -race (6 embedders vs a restarter alternating the cap to force pointer swaps). go test -race ./internal/embeddings/... green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

gofmt the two ollama-package files whose field/map alignment drifted when they were extracted/moved into provider/ollama on this branch (gguf.go GGUFInputs struct, prefix.go QueryPrefixes map). Pure formatting, no behavior change. Pre-existing gofmt drift in files this branch only incidentally touches (e.g. vectorstore/store.go) is left alone to keep the diff focused. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The dashboard 404'd ("project not found: hash=…") on local projects whose host_path and stored path_hash legitimately diverge — e.g. a project keyed as sha1("local:{machine}:{path}") while host_path stays the bare filesystem path. The list/detail responses set PathHash by RE-deriving sha1(host_path), which for those rows yields a hash that GetByHash (which resolves against the stored path_hash column) never matches → 404. host_path can't simply be rewritten to the namespaced key form: it is the vector-store collection key (md5(host_path)) and the chunks/symbols key, so changing it would orphan the existing index and force a reindex. Fix: carry the stored path_hash column through projects.Project and return it verbatim in both project responses (server.go ProjectSummary + Project), instead of recomputing from host_path. The stored column is the single source of truth the lookup already uses, so link hash == lookup hash for local, external, and freshly-created projects alike. No data migration, no reindex. Pre-existing issue (per-machine namespacing + an incomplete manual re-key), independent of the embedding-provider work; bundled here per request. Test: a row whose stored path_hash differs from sha1(host_path) — Get/List surface the stored hash and GetByHash round-trips it back to the project. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Replace the hand-rolled Sscanf-switch + printf-table for ~/.cix/config.yaml with a tag-driven schema (`key`/`desc`/`default`/`validate`/`env`/`sensitive` struct tags) that drives Load, Save, `config show`, `config set`, and a new full-screen editor — one source of truth, no more drift across three files. Loader: koanf/v2 (file+yaml + confmap defaults from tags + rawbytes legacy- key normalization). Validator: go-playground/validator/v10, runs on every mutation. Setter: reflective SetByPath with type-aware parsing and rollback on validation failure. Show/keys: reflect over the schema walker — sensitive leaves never bind their value to a named variable (CodeQL-safe). Env overrides: CIX_SERVER / CIX_API_URL / CIX_API_KEY override server selection / URL / key with flag > env > file precedence. Local-only — never persisted. Designed for CI runners passing secrets via env. TUI: new lazygit-style `cix config edit` and `cix config init` built on bubbletea + bubbles + lipgloss — two-panel layout, vim-friendly nav (h/j/k/l), inline textinput edits, server CRUD (a/d/m/t), help overlay. Replaces the earlier huh-based form (which felt like a one-shot wizard, not an editor). On-disk format: legacy `api:` block still migrates to `servers:`. Legacy lowercase viper keys (debouncems, excludepatterns, cachettl, autowatch, batchsize) still normalize. The dead `server.port` / `server.cache_ttl` block is dropped on next save — old files load without error. New: `cix config keys` lists every settable key with default / env / description from the schema. `doc/CLI_CONFIG.md` documents the full key table. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Add a "Servers" subsection to both copies of cix SKILL.md (skills/cix/ and plugins/cix/skills/cix/) and a "Targeting multiple cix servers" block to plugins/cix/README.md. Explains the named-server / default_server model, the --server flag, and the legacy api.url/api.key aliases, so agents using the plugin know to use the default and only switch via --server when the user names one. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…oot resilience, gate Addresses the full code review of feat/embedding-providers. Blockers - parity_test.go called the deleted Service.embedRaw, so `make test-gate` (the release embedding-parity gate) failed to compile. Restore via a build-tagged helper that reaches the live provider's EmbedRaw. - SwitchProvider swapped the provider and the vector store as two non-atomic ops with the queue resumed in between, so a concurrent embed could write new-dimension vectors into the old provider's collection; on reopen failure it left a live new-provider / old-store pairing with no rollback. Now the queue stays blocked across both swaps (resumed via defer on every path) and a reopen failure rolls the provider swap back and stops the half-started provider — fail closed, never corrupt. - Boot no longer hard-fails the whole server when the persisted provider's Start() (remote connect-test / sidecar spawn) fails. A bad build falls back to env ollama; a failed Start attaches the provider in a degraded, self-healing state. The HTTP server (dashboard/auth/all projects) stays up so an operator can recover instead of editing SQLite by hand. Medium - SwitchEmbeddingProvider now validates-then-persists: SwitchProvider runs first and only a successful swap is saved, so a transient switch failure can't leave the DB pointing at a provider that bricks the next boot. - PrefixLegacyChromaDirs failure at boot is now fatal (was warn+continue), which silently orphaned the index behind a fresh empty namespace. Added vectorstore.WarnIfNamespaceOrphaned as a loud diagnostic when the active namespace is empty but sibling namespaces exist. - GetSidecarStatus reported ready=false for healthy openai/voyage providers (StateRemote != "running"); treat StateRunning and StateRemote as ready. - Restart now blocks the freshly-built queue during a concurrency-cap change so embeds get a clean 503 during the sidecar respawn instead of ErrSupervisor from a transiently-nil provider. Minor / cleanup - ollama TokenizeAndEmbed clamps windowSize to >= 1, guarding a CtxSize<=2 misconfig that spun the split loop forever while holding a queue slot. - Extract shared HTTP-provider plumbing (base-URL normalize, API-key resolve, Ready/Status) into provider/httpremote.go; openai and voyage now delegate instead of copy-pasting. Tests: switch rollback + happy-path, namespace-orphan diagnostic. Full suite, `go vet` (incl. embed_gate), and `-race` on embeddings + httpapi all green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ixes #8 class) The flat single-slug scheme `chroma_<StorageSlug(id)>` flattened provider+model into one string with `_` as the field separator — but `_` is also what model-name normalization emits. That in-band separator made the legacy-migration heuristic (`hasKnownPrefix`) ambiguous: a legacy ollama model whose name normalized to `ollama_…`/`openai_…`/`voyage_…` (e.g. `openai-community/x`) was misread as already-unified, skipped, and its vectors silently orphaned behind a fresh empty namespace. Worse, `ollama:foo` and a model literally named `ollama-foo` both produced the exact same string — unfixable by any name parsing. Make the separator structural instead: namespace the vector store as a nested directory tree, one path segment per identity field: <ChromaPersistDir>/<kind>/<model-slug>[/<variant>] chroma/ollama/nomic_embed_text chroma/voyage/voyage_code_3/2048_int8 chroma/openai/text_embedding_3_large/256 The provider kind is now its own directory level, derived from the kind (not glued onto the model slug), so it can never collide with a model name. `ollama:foo` → ollama/foo, `ollama:ollama-foo` → ollama/ollama_foo — distinct. The #8 case resolves correctly: openai-community/x → ollama/openai_community_x. Changes - provider.Provider gains StorageComponents() []string — each provider builds its path from its OWN structured fields (never by re-parsing the flattened ID), so the in-band-separator problem can't recur. - config: ChromaDirForSlug(slug) → ChromaDirFor(components) (filepath.Join under the container). - Service: StorageSlug() → StoragePath() []string; reopen + AttachVectorStore thread components. - storage: PrefixLegacyChromaDirs (+ hasKnownPrefix prefix-guessing) → MigrateFlatChromaToNested. Now unambiguous: every flat `chroma_*` sibling is a legacy ollama dir (the old build was ollama-only) and moves to chroma/ollama/<StorageSlug(suffix)>/. StorageSlug(ModelSafeName(m)) == StorageSlug(m), so it lands where the server resolves the identity — even for kind-looking model names. Existing main/prod ollama vectors migrate without a reindex. main.go relocates a legacy Python store at the container path first, then migrates (fail-closed). - Drop vectorstore.WarnIfNamespaceOrphaned — the orphan ambiguity it warned about is gone, so the diagnostic is no longer needed. - Drop the throwaway intermediate flat-unified scheme entirely (dev-branch only; no back-compat). Tests: per-provider StorageComponents incl. the ollama foo vs ollama-foo anti-collision; migration moves flat→nested incl. the kind-looking-name (#8) regression, special-char normalization, idempotency, no-clobber, python-backup skip; switch/reopen tests assert nested paths. Full suite, `go vet` (incl. embed_gate), and `-race` all green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… tests Pre-merge review fixes for the pluggable-provider work: - httpapi: reject kind=ollama on POST /admin/embedding-providers/{kind}/test. Start()ing a throw-away ollama provider would spawn a second llama-server child on the live sidecar's socket. The dashboard never tests ollama (it is configured via the runtime-config form), so fail fast with 400 instead of risking a competing sidecar from an ad-hoc admin call. Adds a gating test. - embeddingscfg: add the previously-missing test suite for the provider persistence layer boot reads to pick the active provider — Get on a fresh DB, Save insert + update round-trips, empty-config, and the empty-kind / invalid-JSON rejections. go vet ./... · go build ./... · go test ./internal/{embeddingscfg,httpapi}/... all pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

dvcdsys · 2026-06-02T14:53:07Z

Code review summary

Reviewed focusing on the high-risk areas: provider abstraction, switch atomicity, concurrency, storage migrations, and endpoint authorization. The CLI config/TUI (cli/) and the React dashboard were only skimmed.

Verified locally on feat/embedding-providers: go build ./... ✅ · go vet ./internal/{embeddings,storage,httpapi,vectorstore}/... ✅ · targeted go test (storage / vectorstore / providers / httpapi / embeddingscfg) ✅ · go test -race ./internal/embeddings -run 'Switch|Restart|Provider' ✅ (0 races) · no stale Dynamic* path-helper refs in live code.

Strengths

Auth gating meets the project rule — all 4 admin endpoints gate on mustBeAdmin, with per-endpoint 403 tests for a viewer.
Concurrency is careful and race-tested — lifecycleMu serializes switch/restart; the queue stays blocked across both the provider swap and the vector-store reopen, closing the wrong-dimension-write window.
Switch is atomic with rollback; a failed reopen rolls s.current back and stops the half-started provider (fail-closed).
Boot resilience — a failing remote connect-test degrades instead of bricking boot.
Secrets — only the env-var name is persisted; the key is read live.
Migrations are fail-closed, idempotent, with the kind-as-its-own-path-segment anti-collision design and regression tests.

Fixed pre-merge (commit `2a9e848`)

/admin/embedding-providers/{kind}/test now rejects kind=ollama (400). Start()ing a throw-away ollama provider would spawn a second llama-server child on the live sidecar's socket. The dashboard configures ollama via the runtime-config form, not /test. + gating test.
Added the missing embeddingscfg test suite — the persistence layer boot reads to choose the active provider was previously untested. Covers Get/Save round-trips (insert + update), empty config, and empty-kind / invalid-JSON rejection.

Non-blocking follow-ups (optional)

embeddingscfg.Save is UPDATE-then-INSERT, not an atomic upsert. Two concurrent writers could both miss the UPDATE and race the INSERT (one fails on PK conflict). Non-corrupting and rare (admin actions); INSERT … ON CONFLICT(id) DO UPDATE would be race-free. (Confirmed runtimecfg writes only its own columns, so no embedding-column clobber.)
Degraded-boot inconsistency: if the persisted provider config is malformed, boot falls back to embeddings.New, which starts ollama and can still hard-fail boot (e.g. remote-only box with no GGUF) — slightly at odds with the "never brick boot" goal.
voyage.embedAndAverage: totalSplits is misnamed — it counts windows from oversize inputs (incremented by len(windows)), logged as split_windows. Functionally a >0 gate; cosmetic.
Averaged window vectors aren't re-normalized (voyage + ollama). Fine for cosine search (chromem normalizes); worth confirming no dot-product path relies on unit norm. Pre-existing pattern.
Holder.Swap discards the old *Store without Close — relies on chromem-go holding no file handles/goroutines (true today). A defensive Close-if-io.Closer would future-proof.
Scope: this PR also bundles the CLI config schema + TUI (d9742e8, ~20 files) and the projects path_hash fix (97fd9c4), which are unrelated to embedding providers and make review/bisect harder. Consider splitting such work next time.

Overall: solid, no blockers. 👍

CodeQL go/incorrect-integer-conversion (high) flagged parseDefaultValue: strconv.ParseInt(raw, 10, 64) yields an int64 that int(v) narrows, silently truncating on 32-bit builds. Parse with bitSize 0 (platform int width) instead, so the value is bounded to int range before conversion and an out-of-range tag errors (ok=false) rather than wrapping. Sibling parser in set.go is unaffected — it uses reflect.Value.SetInt with an explicit OverflowInt guard and no narrowing conversion. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

dvcdsys and others added 29 commits May 26, 2026 13:28

github-advanced-security AI found potential problems May 31, 2026

View reviewed changes

Comment thread cli/internal/config/loader_koanf.go Fixed

dvcdsys and others added 4 commits June 2, 2026 10:58

dvcdsys merged commit efc6e56 into main Jun 2, 2026
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pluggable embedding providers (ollama / openai / voyage) + storage-path unification#57

Pluggable embedding providers (ollama / openai / voyage) + storage-path unification#57
dvcdsys merged 34 commits into
mainfrom
feat/embedding-providers

dvcdsys commented May 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

dvcdsys commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dvcdsys commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

How

Pre-PR review fixes (last 5 commits)

Update — post-review fixes (commits f5d6e22, 668ad29)

Correctness / resilience (f5d6e22)

Nested per-provider vector namespacing (668ad29)

Testing

Known deferred (non-blocking)

Type of change

Checklist

Uh oh!

Uh oh!

dvcdsys commented Jun 2, 2026

Code review summary

Strengths

Fixed pre-merge (commit 2a9e848)

Non-blocking follow-ups (optional)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dvcdsys commented May 28, 2026 •

edited

Loading

Update — post-review fixes (commits `f5d6e22`, `668ad29`)

Correctness / resilience (`f5d6e22`)

Nested per-provider vector namespacing (`668ad29`)

Fixed pre-merge (commit `2a9e848`)