Pluggable embedding providers (ollama / openai / voyage) + storage-path unification#57
Merged
Conversation
Add provider.Provider interface and a Factory/registry under
internal/embeddings/provider. Move the existing llama-server sidecar
into provider/ollama as the first implementation. Add provider/openai
(generic /v1/embeddings) and provider/voyage (api.voyageai.com with
input_type, output_dimension, and int8 dequantization). Rewrite
embeddings.Service to delegate to the active Provider while keeping
the queue/backpressure machinery centralized.
EmbeddingModel() now returns Provider.ID() — a fingerprint of the form
"{kind}:{model}[:{dim}][:{dtype}]" — so the existing per-project
drift check in repojobs transparently treats a provider switch as a
model change and forces a full reindex on the next clone job.
Behaviour unchanged for default ollama deployments; the openai and
voyage providers are registered but not yet selectable through any
admin surface. Tests are green.
Add migration 12 to extend runtime_settings with two columns:
embedding_provider (kind selector) and embedding_provider_config
(per-provider JSON blob). Idempotent ALTER TABLE; API keys are not
persisted — providers read them live from env vars named in the
blob.
Add internal/embeddingscfg, a thin persistence layer over the two
new columns. Wire it into main.go's boot path: read the persisted
selection, seed the row from env-derived ollama defaults on fresh
installs, then hand the resolved Provider to embeddings.Service via
the new NewWithProvider constructor.
Add embeddings.Service.SwitchProvider — build new + Start + drain
queue + atomic swap + background Stop of the old. Used by the new
admin endpoints:
GET /api/v1/admin/embedding-providers list kinds + schemas + env-key readiness
GET /api/v1/admin/embedding-providers/active current kind + persisted config
PUT /api/v1/admin/embedding-providers/active atomic switch
POST /api/v1/admin/embedding-providers/{kind}/test pre-save sanity check
Extend /api/v1/status with embedding_provider and
embedding_provider_manages_process so the dashboard footer can
render the provider name as the label and a permanent green dot
for HTTP-only providers that have no managed process to die.
Behaviour unchanged for existing ollama installs.
Provider-level: stub httptest.Server-based tests for the openai and voyage clients. Cover request shape, response ordering, HTTP-error surfacing, missing-API-key behavior, ID() fingerprinting, Voyage input_type query/document selection, and the int8 dequantize path (127 → ~1.0, -127 → ~-1.0, etc). Admin handler: gating tests for the new endpoints — admin sees all three kinds in the list with non-empty schemas, viewer gets 403, unknown kind is rejected, /test with a missing API key returns 400.
Add EmbeddingProviderSection with a kind dropdown and per-provider
forms. Three hardcoded React components (OpenAIProviderForm,
VoyageProviderForm, ollama tuning kept in the existing sections)
render below the dropdown. Voyage form covers model picker, output
dimension (256/512/1024/2048), output dtype (float | int8), and
truncation toggle. Each form surfaces a red banner when the
referenced API-key env var is not set on the server and disables
Save accordingly — keys themselves are never stored in the
dashboard or DB.
New hooks: useEmbeddingProviders, useActiveProvider, useTestProvider,
useSwitchProvider. The save flow is test → switch → toast → invalidate
runtime-model + sidecar-status caches so the footer and project cards
update immediately.
Footer: label is now the active provider kind ("ollama"/"openai"/
"voyage") instead of hardcoded "llama". Dot is green/red for ollama
(real liveness signal from /health), permanently green for HTTP-only
providers (no managed process to die — failures surface at search
time with diagnostics).
ServerPage: ollama-specific sections (EmbeddingModelSection,
RuntimeParamsSection, SidecarSection) render only when the active
provider is ollama. Otherwise the provider form above is the only
edit surface.
openapi.yaml: extend StatusResponse with embedding_provider +
embedding_provider_manages_process; add the four new
/admin/embedding-providers endpoints and their schemas.
Add commented examples for CIX_OPENAI_API_KEY and CIX_VOYAGE_API_KEY to both compose files. Default behavior is unchanged — first boot seeds the ollama provider from the existing CIX_EMBEDDING_MODEL + CIX_LLAMA_* vars. Switching to a remote provider is a dashboard action; the env var only needs to be exported when the operator intends to use OpenAI-compatible or Voyage backends.
The previous gate covered EmbeddingModelSection, RuntimeParamsSection, and SidecarSection but left AdvancedSection (batch size / concurrency) visible — and also exposed the ollama-only Save & Restart header button regardless of provider. Switch to /status as the single source of truth for the live active kind (already polled by the footer; no extra request) and gate AdvancedSection plus the Save & Restart button on it. Update the header copy to explain what's editable per provider type.
The Service-level embedding queue depth (max_embedding_concurrency)
applies to every provider — it caps how many parallel
/v1/embeddings POSTs the server runs, which OpenAI and Voyage both
accept natively (subject to their account-level rate limits). The
previous UI hid it behind the ollama-only block.
Backend Service.Restart is now provider-aware. Submitting a runtime
config change with a remote provider active only rebuilds the
queue; the (HTTP-only) provider stays in place. The previous code
unconditionally called buildOllamaFromConfig, which would have
silently re-spawned llama-server on top of a live voyage/openai
provider — that bug is fixed here.
Dashboard rearrangement:
- AdvancedSection renamed to Throughput in the UI, always visible.
Receives isOllama: when false the llama-batch field is hidden;
concurrency is shown regardless.
- Save & Restart button on ServerPage stays visible for all
providers; label switches to "Save" when the active provider
has no managed sidecar to restart.
- Header copy updated to explain the new layout.
The indexer already sends all chunks of one file as a batched POST (input: [chunk1, chunk2, ...]); concurrency caps how many such batched POSTs run in parallel. The previous copy only mentioned "parallel" without explaining that batching is the inner loop — caused confusion when picking values. Also call out the unsplit Voyage limits (voyage-code-3 = 128 inputs per request) so operators know to keep files under that threshold until the per-provider auto-split lands.
Voyage caps inputs per /v1/embeddings POST at 128 for voyage-code-* models (1000 for voyage-3*). OpenAI proper caps at 2048. The previous implementation forwarded the caller's slice as-is, so a single file with 200 chunks would 422 against voyage-code-3. Both HTTP providers now slice oversized inputs into sequential sub-batches inside EmbedDocuments under the same Service queue slot (no extra concurrency consumed) and concatenate results in input order. The split is transparent to callers — the indexer keeps sending one batched POST per file and providers handle the rest. Tests assert two POSTs for a 200-item Voyage call (128 + 72) and a 3000-item OpenAI call (2048 + 952). Dashboard copy updated: throughput card no longer warns about unsplit limits since the safety net is in place.
Add an informational banner to the Voyage form pointing operators to the dashboard billing page and explaining the free-tier 3 RPM / 10K TPM ceiling — small enough for a smoke test but the indexer will saturate it on any real repo. Suggest concurrency=1 as the workaround for free-tier users. Add a one-paragraph note to the OpenAI form about the current no-retry policy: 429s are surfaced as-is; users should lower concurrency or upgrade their account tier. Self-hosted servers typically don't need this. Decision (per discussion): no client-side retry / token-bucket implementation. Industry tools like LangChain do retry, but on hard tier limits (3 RPM) retry buys very little — files just fail into the next clone job which retries idempotently. Keeping the provider HTTP layer simple was preferred.
Surface the per-provider cost expectation before the admin picks anything. The callout sits inside EmbeddingProviderSection and is visible regardless of which provider is currently active or selected in the dropdown: - Ollama: free, local. - OpenAI-compatible: paid on api.openai.com, free on self-hosted. - Voyage: paid plan strongly recommended (free tier 3 RPM = unusable). Dropdown option labels also annotated with (free) / (paid) so the expectation is visible even when the callout is scrolled past.
The EmbeddingProviderSection dropdown allowed picking "ollama" while
a remote provider was active, but the Save button was conditionally
hidden (draftKind !== 'ollama') so the switch never actually fired.
Even if it had, the backend SwitchEmbeddingProvider rejected empty
or {} configs — and the ollama-specific fields (GGUF model, ctx,
GPU layers, sidecar paths) aren't part of this card's form to begin
with, so the dashboard had nothing valid to send.
Fix on both sides:
Backend admin_embeddings.go: when kind == "ollama" and the
submitted config is empty / {} / null, synthesize the full config
from the live runtime-cfg snapshot applied on top of the env
defaults via BuildOllamaConfigFromEnv. This is the same path the
bootstrap seed in main.go uses, so the resulting ollama provider
config is identical to what a fresh install would have.
Dashboard EmbeddingProviderSection.tsx: always send {} for
ollama-switches and always show the Save button (label switches
to "Save & switch to {kind}" when the kind has changed). Skip
the /test pre-check for ollama since {} would fail factory
validation. The button copy under the ollama hint now explains
that switching back will restart the sidecar with the current
runtime config + trigger a full reindex per project.
…legacy rows
Two coupled bugs were causing every project to show a "stale model"
badge forever even after reindex:
1. The indexer wrote a BARE model name to projects.indexed_with_model
at FinishIndexing because Service.embeddingModel was set from
cfg.EmbeddingModel ONCE at boot. The drift detector and dashboard
on the read side were comparing against the live Provider.ID()
which is now PREFIXED ("ollama:<model>") after the pluggable-
provider refactor — so writes never matched reads, and reindex
couldn't clear the badge.
2. Pre-existing rows (projects indexed under the old code path) hold
the bare model name. Without backfill they'd stay marked stale
even after the indexer fix, because their own row never matches
the prefixed live ID until the next reindex.
Fixes:
Indexer:
- Add SetEmbeddingModelLookup(func() string) — binds the indexer
to a live function (typically embeddings.Service.EmbeddingModel).
- FinishIndexing now writes Service.EmbeddingModel() which prefers
the live lookup, so a runtime provider switch (PUT /admin/
embedding-providers/active) takes effect on the next reindex
without a process restart.
- Static SetEmbeddingModel(string) kept for tests that don't wire
a live Service.
main.go: wire idx.SetEmbeddingModelLookup(embedSvc.EmbeddingModel).
Migration 13 (indexed_with_model_provider_prefix): one-time UPDATE
that prepends "ollama:" to every projects.indexed_with_model value
that lacks a ":" prefix. Safe because pre-refactor there was no
non-ollama backend. Idempotent (rows containing ":" are skipped).
Tested: TestMigrate_IndexedWithModelProviderPrefix covers the three
expected cases — bare gets prefixed, already-prefixed left alone,
NULL left alone.
Voyage's /v1/embeddings caps each request at 120K tokens, separate
from the 128-input cap voyage-code-* models enforce. A single large
file (e.g. bench/results/reference_embeddings.json at 187K tokens)
sat under the input-count limit but blew past the token limit,
producing a 400 from the server with the message:
"The max allowed tokens per submitted batch is 120000. Your batch
has 187609 tokens after truncation."
The provider now plans batches against BOTH limits. planBatches
walks the input list and closes the current batch whenever either
limit would be exceeded:
- inputs in batch >= 128 (voyage-code-* model cap), OR
- estimated tokens + next text > 100_000 (our target, 17% under
Voyage's 120K hard cap to leave headroom for estimation error).
Token estimation: bytes ÷ 3, conservative for code (typically 3-4
chars/token) and even safer for UTF-8 multi-byte input (Cyrillic
comments etc count as more bytes per character). Always over-counts
so we never under-batch into a 400.
Why not an SDK: investigated the community Go SDK at
github.com/austinfhunter/voyageai — it's a thin HTTP wrapper that
does NOT handle token batching, int8 dequant, or retry strategy.
Switching would lose our existing int8 support and add a third-
party dependency for no real benefit.
Observability: structured slog.Info at the split decision logs the
batch counts and limits; slog.Debug logs each sub-batch POST with
input count + estimated token cost. Operators can grep
"voyage: splitting batch" to see batch activity and
"voyage: sub-batch POST" to follow individual requests.
Tests cover:
- planBatches splits a 300K-byte text + smalls into multiple
batches respecting the 100K token budget.
- 200 small texts still split at 128 + 72 (legacy count behavior).
- End-to-end EmbedDocuments produces multiple POSTs when token
budget is the binding constraint.
Dashboard Throughput card copy updated to mention both limits.
Voyage AI does NOT expose rate limits via API — confirmed by checking https://docs.voyageai.com/docs/rate-limits and the /reference/rate-limits page. All limit data lives in their public docs and web dashboard. So instead of a fake "cache + refresh" mechanism, we ship a hardcoded snapshot from the docs: internal/embeddings/provider/voyage/limits.go KnownModelLimits — per-model Tier 1 RPM / TPM + per-request input and token caps for every Voyage model the dashboard offers. KnownLimitsSource — attribution string with the docs URL and snapshot date ("2026-05-27") so an operator can tell when to consult upstream for an update. GET /api/v1/admin/embedding-providers gains a documented_limits field on the voyage entry. Schema documented in openapi.yaml as DocumentedEmbeddingLimits / DocumentedModelLimit. Dashboard VoyageProviderForm renders a compact card under the model dropdown: - Active model row showing RPM / TPM / inputs-per-req / tokens-per-req - Tier multiplier note (×2 / ×3) since Voyage scales by lifetime spend and we can't detect the operator's tier - <details> expander with all models in the snapshot, current model highlighted - Footnote with the source URL + snapshot date - Button linking to the Voyage dashboard for live tier confirmation No "Refresh" button: it would re-fetch the same hardcoded constant, giving a false impression of "sync my plan from here". A reload after upgrading cix-server gets the new snapshot — the source footnote tells the operator when to do that.
…ottle
Replace the hardcoded "documented limits" snapshot card with editable
input fields for RPM, TPM, max-inputs-per-request, and max-tokens-
per-request. Voyage doesn't expose limits via API (confirmed against
docs.voyageai.com/docs/rate-limits) — the operator copies their
tier's numbers from the Voyage dashboard, the indexer throttles
itself accordingly.
Backend (provider/voyage):
Config gains four new fields, all optional:
RateLimitRPM — caps requests/minute via token-bucket
RateLimitTPM — caps tokens/minute via token-bucket
MaxInputsPerRequest — overrides the static 128 default
MaxTokensPerRequest — overrides the static 100K default
Two rate.Limiter instances are wired into Provider when the
corresponding RPM/TPM > 0; embed() waits on both before each POST
(ctx-aware — server shutdown drains cleanly). nil limiters short-
circuit when the operator leaves the fields empty, matching prior
behaviour (only react to upstream 429/400).
planBatches takes maxInputs+maxTokens args instead of consts, so
the per-request caps come from cfg.
Backend (admin_embeddings, openapi.yaml, limits.go):
Drop the documented_limits payload + voyage.KnownModelLimits
table — the dashboard no longer renders a "cached snapshot",
values come straight from the operator instead. Less code, fewer
things to keep in sync.
Frontend:
VoyageProviderForm: replace the read-only "Documented limits"
card with a 2x2 grid of number inputs (RPM, TPM, max-inputs,
max-tokens). Hint paragraph above the grid still mentions the
public-docs baseline so the operator has a starting point, plus
a link to the Voyage dashboard. Empty = no client-side
enforcement (per-field).
EmbeddingProviderSection: drop the documentedLimits prop +
useMemo; populate the four new VoyageConfig fields from the
persisted blob when loading the active provider.
Tests:
- TestRateLimitRPMThrottlesRequests: 120 RPM with two sequential
EmbedQuery calls — second waits ~500ms on the limiter.
- TestRateLimitTPMThrottlesTokens: 600K TPM with two ~60K-token
POSTs — second waits ~2s for the bucket to refill.
- planBatches tests updated to pass explicit caps.
Two gaps in the prior token-aware planBatches were leaking real
400s to the indexer:
1. Our estimator (bytes/3) is conservative for prose but can
under-count for dense code where 1 token ≈ 2 bytes. estimated
100K could become actual 150K → POST → 400.
2. planBatches puts a single input larger than the cap in its own
batch (no choice — splitting the text would corrupt the chunk).
That POST would 400 unconditionally.
Both surfaced as the user-reported error:
"The max allowed tokens per submitted batch is 120000.
Your batch has 187609 tokens after truncation."
Fix: a new embedWithAdaptiveSplit wraps each embed() call. On a
400 whose message matches voyageBatchTooLargeRegex, it bisects the
batch in half and retries both halves — recursively, so a real
tokenizer miscalculation that's ~2x our estimate self-corrects in
a single extra round-trip. EmbedQuery and EmbedDocuments both go
through this path now.
Single-chunk edge: when the offending batch is already 1 input,
there is nothing to bisect. The provider returns a clear error
pointing the operator at the chunker upstream:
"voyage: a single chunk produced 187609 tokens (cap 120000).
Reduce the indexer's max chunk size or switch to a model with
a higher per-request cap"
This is the right place to fail — silently dropping a chunk would
hurt search quality without telling anyone; truncating it would
change vector semantics; retrying forever would hang the indexer.
parseBatchTooLarge extracts (cap, actual) from Voyage's stable
error message so logs surface both numbers, letting an operator
diagnose persistent estimator drift (e.g. "real tokens always
1.5x our estimate → tighten MaxTokensPerRequest in the form").
Tests:
- EmbedDocuments with 4 inputs and a stub that rejects any
batch >1 input ends up running 4 singleton POSTs after
bisection.
- Single oversize input returns the clean error.
- parseBatchTooLarge covers happy path + non-match cases (429,
different 400 reasons).
The "Truncate over-length inputs server-side" switch maps to Voyage's truncation API parameter. It only affects what happens when a SINGLE chunk exceeds the model's per-input context window (e.g. 32K tokens for voyage-code-3) — totally separate from the 120K-tokens-per-batch cap our adaptive bisect already handles. Before: bare label, no hint, easy to confuse with the batch cap. Now: a paragraph under the switch explains the trade-off (silent truncation vs explicit 400) and explicitly notes the separation from the batch limit.
Voyage's truncation:true silently drops content past the model's
per-input context window (32K tokens for voyage-code-3). For long
chunks that means the tail never makes it into the embedding —
search quality suffers without any signal at index time.
The ollama provider already has a sliding-window pipeline for the
same problem (TokenizeAndEmbed: tokenize → split at CLS+SEP
boundaries → embed all windows → average vectors). Mirror that for
voyage, byte-aligned instead of token-aligned since Voyage has no
public /tokenize endpoint.
Added:
- Config.MaxInputBytes (default 30_000) — caps per-input byte
length. Sized so worst-case 1 byte/token dense code still
fits voyage-code-3's 32K window with margin.
- splitOversizeInput(text, maxBytes) — slices at rune-aligned
boundaries so we never cut a UTF-8 multi-byte sequence in
half.
- embedAndAverage(ctx, texts, inputType) — three-phase pipeline:
Phase 1: expand each oversize input into N windows; remember
spans (which window indices map to which original).
Phase 2: planBatches + adaptive-bisect POST chain on the
expanded slice (existing logic, unchanged).
Phase 3: average sub-window vectors back to one per original
input. Single-window inputs pass through unchanged
(fast path, no allocations).
- EmbedQuery and EmbedDocuments both go through embedAndAverage
now.
Tests:
- splitOversizeInput: under-cap singleton, ASCII over-cap 3-way
split with byte-perfect rejoin, UTF-8 Cyrillic split that
keeps every window valid UTF-8.
- EmbedDocuments_AveragesOversizeInputWindows: stub returns
[1.0, 2.0, 3.0] for the 3 windows of a 250-byte input under
MaxInputBytes=100; provider must return averaged [2.0].
- Existing TPM throttling test now sets MaxInputBytes=1M to
keep its single 180K-byte input from being auto-split (that
test exercises the rate-limit bucket, not the window-split).
Per Voyage's own docs (https://docs.voyageai.com/docs/tokenization): "one word roughly corresponds to 1.2 - 1.5 tokens on average; for rough estimates divide character count by 5" Our previous bytesPerToken=3 over-counted for everything except the densest code, causing planBatches to split into more sub-batches than necessary. Switching to bytesPerToken=5 matches Voyage's published rule of thumb and reduces spurious splits on ordinary content. The trade-off: dense code can run hotter (1 token ≈ 2-3 bytes empirically), so our estimate may UNDER-count for very tight code and let planBatches pack inputs tighter than the real cap permits. That's fine — embedWithAdaptiveSplit detects the resulting 400 "max tokens per batch" response and bisects the batch in half. We trade a few extra round-trips on the worst-case dense files for materially fewer splits on the common case. Better long-term path is to load Voyage's real HF tokenizer (publicly available at huggingface.co/voyageai/voyage-code-3 as a standard tokenizer.json), but that requires a CGO Rust dep (daulet/tokenizers) — deferred to a follow-up that justifies the Docker / distroless work. Tests: - TestPlanBatches_SplitsByTokenBudget: bumped big text from 500K → 600K bytes so it exceeds the 100K-token cap at the new divisor. - TestEmbedDocuments_SplitsByTokenBudget: bumped from 300K → 500K and disabled per-input window split (MaxInputBytes=1M) to keep the test focused on the batch-token cap. - TestRateLimitTPMThrottlesTokens: bumped 180K → 300K so the per-call cost still hits ~60K tokens at the new divisor.
The /5 heuristic comes from Voyage's English-prose docs, but cix is primarily a code-indexer and dense source code is much denser per token. Production logs against voyage-code-3 show ~1.4 bytes/token on tight Go files — a 64-input batch estimating 51K tokens actually charged 187K, triggering Voyage's 120K hard-cap 400 and forcing the bisect safety net to fire on every batch even with a correctly configured rate-limits section. bytesPerToken=2 catches code reality with margin and over-estimates prose by ~2.5x (more round-trips, never a 400). The embedWithAdaptiveSplit bisect stays as a residual safety net. Tests adjusted for the new divisor (240K/200K/120K byte text sizes where the original called for 600K/500K/300K under /5). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… vector namespace The storage layout derived both the SQLite path and the chroma dir from the embedding model name (a Python-era artefact ported 1:1). Safe only while the model was a fixed env var; runtime provider switching turned it into a critical bug — switching to a higher-dim provider (e.g. voyage 2048) wrote vectors into the previous provider's collection, breaking search with "vectors must have the same length" — and a footgun where a model change spawned a parallel system DB with empty accounts. Two-plane design: - System DB is ONE permanent, model-INDEPENDENT file at cfg.SQLitePath (accounts + catalog + parsed code). The parsed-code tables are model-independent; vector search never reads them. - The chroma vector store is namespaced by the ACTIVE provider identity via provider.StorageSlug(Provider.ID()), so vectors of different dimensions never share a collection. Switching back reuses the prior namespace without a reindex. Changes: - provider.StorageSlug: deterministic, idempotent filesystem slug from a Provider.ID() fingerprint. - config: drop DynamicSQLitePath/DynamicChromaPersistDir; add ChromaDirForSlug; keep ModelSafeName + LegacyDynamicSQLitePath solely for the one-time adoption migration. - internal/storage (new): boot-time, idempotent file migrations — AdoptLegacyModelDB (checkpoint-then-rename the legacy per-model DB onto the canonical path; fossil moved aside to *.pre-unify-*) and PrefixLegacyChromaDirs (rename pre-unification ollama dirs to the unified scheme; canonicalises the legacy suffix through StorageSlug so model names with '.'/':' still match the path the server opens — no silent orphaning). - db.HasTables: helper to distinguish a real unified DB from a fossil. - vectorstore.Holder: concurrency-safe swappable *Store wrapper + Interface satisfied by both *Store and *Holder; indexer/httpapi/repojobs hold the Interface. - embeddings.Service.SwitchProvider now reopens the vector store under the new identity slug and atomically swaps it into the Holder (live, no restart); reopen failure keeps the old store and surfaces a loud error. - main.go/server.go: orchestrate boot migrations, Holder wiring, and identity-namespaced storage display. One-shot migration shims (ModelSafeName, LegacyDynamicSQLitePath, AdoptLegacyModelDB, PrefixLegacyChromaDirs and their call sites) are marked "LEGACY-MIGRATION (remove next release)" for deletion once all deployments have booted on the unified layout. Tests: StorageSlug (incl. special-char canonicalisation), dbmigrate (adopt/fossil-aside/WAL-drain/idempotency), chromamigrate (strict normalize/no-clobber/idempotency), Holder (-race swap-under-read), live reopen (swap/failure-keeps-old/unwired-noop). go build/vet/test ./... green; -race on vectorstore green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ze base URL Three robustness fixes surfaced by pre-PR review: - H2: validate each returned vector's length against the configured output_dimension (when >0) and reject a mismatch loudly. A model that silently ignores output_dimension (no Matryoshka support / typo'd model) would otherwise write wrong-width vectors deep into the store with no attribution. Also guard the oversize-input averaging path against windows of differing width, which would panic with index-out-of-range; it now returns a clean error. - M4: accept int8 embeddings returned as a base64-packed byte string (some OpenAI-compatible proxies) in addition to the default JSON int array, so a proxy swap doesn't fail the whole batch. - L4: trim a trailing slash from base_url in New() so url building never produces a double slash that stricter proxies can 404 on. Tests: wrong-dimension rejection, inconsistent-window-dim error, int8 base64 dequant. Adjusted an existing input_type test whose 2-dim mock response no longer matched its 1024 configured dimension. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…gacy models - L4: trim a trailing slash from the openai provider's base_url in New() so url building never emits a double slash that stricter OpenAI-compatible servers (vLLM/TEI behind a proxy) 404 on. Mirrors the same fix in the voyage provider. - L5: migration 13 (indexed_with_model provider-prefix backfill) tested for the mere presence of a ':' to decide a row was already prefixed. A legacy Ollama-style model name like "nomic-embed-text:latest" contains a colon but is NOT prefixed, so it was wrongly skipped and left flagged "stale model" forever. Now tests for a known kind prefix (ollama:/openai:/voyage:) instead. NOTE: editing an unreleased migration in place — deployments that already applied v13 with the old logic won't reprocess, but the default model carries no colon so no shipped DB is affected. Tests: openai trailing-slash path normalization; migration-13 colon- bearing legacy row now gets the ollama: prefix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…03 gating tests - M1: mustBeAdmin returns a nil authContext when CIX_AUTH_DISABLED=true (a legitimate deployment mode), and SwitchEmbeddingProvider then dereferenced user.User.ID for the audit field → handler-goroutine nil-panic. Resolve the actor through an `if user != nil` guard, matching the ac != nil pattern used elsewhere (tunnels.go). - L2: add per-endpoint 403 gating tests for the three admin embedding-provider routes that lacked them (GET/PUT active, POST test), satisfying the project rule that a non-admin caller must be rejected on every new endpoint. Auth was already enforced (all four handlers gate on mustBeAdmin first); these tests pin it against regressions. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…cycle, supervisor Three concurrency hazards surfaced by pre-PR review: - H1: Restart reassigned the s.queue pointer with no lock while Embed* callers read it (acquire + defer-release) — an unsynchronized pointer read/write (data race) reachable via the dashboard "Save & Restart" during an active reindex. The queue pointer is now guarded by s.mu: acquireProvider snapshots the queue and returns it so Release targets the SAME instance the slot was taken from, Restart writes the new queue under the lock, and Status reads via a snapshot. - M2: SwitchProvider and Restart had no mutual exclusion, so two admin actions could interleave their s.current / s.queue mutations or both tear down a provider. A new lifecycleMu serializes the two (Embed* paths don't take it). - M3: supervisor.Stop read s.waiterDone bare while a crash-driven spawn() reassigns it under s.mu — data race, far more reachable now that SwitchProvider can Stop a provider at runtime. Stop now snapshots cmd + waiterDone together under the read lock. Test: TestRestart_ConcurrentWithEmbeds_NoRace exercises the queue-swap race under -race (6 embedders vs a restarter alternating the cap to force pointer swaps). go test -race ./internal/embeddings/... green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
gofmt the two ollama-package files whose field/map alignment drifted when they were extracted/moved into provider/ollama on this branch (gguf.go GGUFInputs struct, prefix.go QueryPrefixes map). Pure formatting, no behavior change. Pre-existing gofmt drift in files this branch only incidentally touches (e.g. vectorstore/store.go) is left alone to keep the diff focused. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The dashboard 404'd ("project not found: hash=…") on local projects whose
host_path and stored path_hash legitimately diverge — e.g. a project keyed
as sha1("local:{machine}:{path}") while host_path stays the bare filesystem
path. The list/detail responses set PathHash by RE-deriving sha1(host_path),
which for those rows yields a hash that GetByHash (which resolves against
the stored path_hash column) never matches → 404.
host_path can't simply be rewritten to the namespaced key form: it is the
vector-store collection key (md5(host_path)) and the chunks/symbols key, so
changing it would orphan the existing index and force a reindex.
Fix: carry the stored path_hash column through projects.Project and return
it verbatim in both project responses (server.go ProjectSummary + Project),
instead of recomputing from host_path. The stored column is the single
source of truth the lookup already uses, so link hash == lookup hash for
local, external, and freshly-created projects alike. No data migration, no
reindex.
Pre-existing issue (per-machine namespacing + an incomplete manual re-key),
independent of the embedding-provider work; bundled here per request.
Test: a row whose stored path_hash differs from sha1(host_path) — Get/List
surface the stored hash and GetByHash round-trips it back to the project.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the hand-rolled Sscanf-switch + printf-table for ~/.cix/config.yaml with a tag-driven schema (`key`/`desc`/`default`/`validate`/`env`/`sensitive` struct tags) that drives Load, Save, `config show`, `config set`, and a new full-screen editor — one source of truth, no more drift across three files. Loader: koanf/v2 (file+yaml + confmap defaults from tags + rawbytes legacy- key normalization). Validator: go-playground/validator/v10, runs on every mutation. Setter: reflective SetByPath with type-aware parsing and rollback on validation failure. Show/keys: reflect over the schema walker — sensitive leaves never bind their value to a named variable (CodeQL-safe). Env overrides: CIX_SERVER / CIX_API_URL / CIX_API_KEY override server selection / URL / key with flag > env > file precedence. Local-only — never persisted. Designed for CI runners passing secrets via env. TUI: new lazygit-style `cix config edit` and `cix config init` built on bubbletea + bubbles + lipgloss — two-panel layout, vim-friendly nav (h/j/k/l), inline textinput edits, server CRUD (a/d/m/t), help overlay. Replaces the earlier huh-based form (which felt like a one-shot wizard, not an editor). On-disk format: legacy `api:` block still migrates to `servers:`. Legacy lowercase viper keys (debouncems, excludepatterns, cachettl, autowatch, batchsize) still normalize. The dead `server.port` / `server.cache_ttl` block is dropped on next save — old files load without error. New: `cix config keys` lists every settable key with default / env / description from the schema. `doc/CLI_CONFIG.md` documents the full key table. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add a "Servers" subsection to both copies of cix SKILL.md (skills/cix/ and plugins/cix/skills/cix/) and a "Targeting multiple cix servers" block to plugins/cix/README.md. Explains the named-server / default_server model, the --server flag, and the legacy api.url/api.key aliases, so agents using the plugin know to use the default and only switch via --server when the user names one. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…oot resilience, gate Addresses the full code review of feat/embedding-providers. Blockers - parity_test.go called the deleted Service.embedRaw, so `make test-gate` (the release embedding-parity gate) failed to compile. Restore via a build-tagged helper that reaches the live provider's EmbedRaw. - SwitchProvider swapped the provider and the vector store as two non-atomic ops with the queue resumed in between, so a concurrent embed could write new-dimension vectors into the old provider's collection; on reopen failure it left a live new-provider / old-store pairing with no rollback. Now the queue stays blocked across both swaps (resumed via defer on every path) and a reopen failure rolls the provider swap back and stops the half-started provider — fail closed, never corrupt. - Boot no longer hard-fails the whole server when the persisted provider's Start() (remote connect-test / sidecar spawn) fails. A bad build falls back to env ollama; a failed Start attaches the provider in a degraded, self-healing state. The HTTP server (dashboard/auth/all projects) stays up so an operator can recover instead of editing SQLite by hand. Medium - SwitchEmbeddingProvider now validates-then-persists: SwitchProvider runs first and only a successful swap is saved, so a transient switch failure can't leave the DB pointing at a provider that bricks the next boot. - PrefixLegacyChromaDirs failure at boot is now fatal (was warn+continue), which silently orphaned the index behind a fresh empty namespace. Added vectorstore.WarnIfNamespaceOrphaned as a loud diagnostic when the active namespace is empty but sibling namespaces exist. - GetSidecarStatus reported ready=false for healthy openai/voyage providers (StateRemote != "running"); treat StateRunning and StateRemote as ready. - Restart now blocks the freshly-built queue during a concurrency-cap change so embeds get a clean 503 during the sidecar respawn instead of ErrSupervisor from a transiently-nil provider. Minor / cleanup - ollama TokenizeAndEmbed clamps windowSize to >= 1, guarding a CtxSize<=2 misconfig that spun the split loop forever while holding a queue slot. - Extract shared HTTP-provider plumbing (base-URL normalize, API-key resolve, Ready/Status) into provider/httpremote.go; openai and voyage now delegate instead of copy-pasting. Tests: switch rollback + happy-path, namespace-orphan diagnostic. Full suite, `go vet` (incl. embed_gate), and `-race` on embeddings + httpapi all green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ixes #8 class) The flat single-slug scheme `chroma_<StorageSlug(id)>` flattened provider+model into one string with `_` as the field separator — but `_` is also what model-name normalization emits. That in-band separator made the legacy-migration heuristic (`hasKnownPrefix`) ambiguous: a legacy ollama model whose name normalized to `ollama_…`/`openai_…`/`voyage_…` (e.g. `openai-community/x`) was misread as already-unified, skipped, and its vectors silently orphaned behind a fresh empty namespace. Worse, `ollama:foo` and a model literally named `ollama-foo` both produced the exact same string — unfixable by any name parsing. Make the separator structural instead: namespace the vector store as a nested directory tree, one path segment per identity field: <ChromaPersistDir>/<kind>/<model-slug>[/<variant>] chroma/ollama/nomic_embed_text chroma/voyage/voyage_code_3/2048_int8 chroma/openai/text_embedding_3_large/256 The provider kind is now its own directory level, derived from the kind (not glued onto the model slug), so it can never collide with a model name. `ollama:foo` → ollama/foo, `ollama:ollama-foo` → ollama/ollama_foo — distinct. The #8 case resolves correctly: openai-community/x → ollama/openai_community_x. Changes - provider.Provider gains StorageComponents() []string — each provider builds its path from its OWN structured fields (never by re-parsing the flattened ID), so the in-band-separator problem can't recur. - config: ChromaDirForSlug(slug) → ChromaDirFor(components) (filepath.Join under the container). - Service: StorageSlug() → StoragePath() []string; reopen + AttachVectorStore thread components. - storage: PrefixLegacyChromaDirs (+ hasKnownPrefix prefix-guessing) → MigrateFlatChromaToNested. Now unambiguous: every flat `chroma_*` sibling is a legacy ollama dir (the old build was ollama-only) and moves to chroma/ollama/<StorageSlug(suffix)>/. StorageSlug(ModelSafeName(m)) == StorageSlug(m), so it lands where the server resolves the identity — even for kind-looking model names. Existing main/prod ollama vectors migrate without a reindex. main.go relocates a legacy Python store at the container path first, then migrates (fail-closed). - Drop vectorstore.WarnIfNamespaceOrphaned — the orphan ambiguity it warned about is gone, so the diagnostic is no longer needed. - Drop the throwaway intermediate flat-unified scheme entirely (dev-branch only; no back-compat). Tests: per-provider StorageComponents incl. the ollama foo vs ollama-foo anti-collision; migration moves flat→nested incl. the kind-looking-name (#8) regression, special-char normalization, idempotency, no-clobber, python-backup skip; switch/reopen tests assert nested paths. Full suite, `go vet` (incl. embed_gate), and `-race` all green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… tests
Pre-merge review fixes for the pluggable-provider work:
- httpapi: reject kind=ollama on POST /admin/embedding-providers/{kind}/test.
Start()ing a throw-away ollama provider would spawn a second llama-server
child on the live sidecar's socket. The dashboard never tests ollama (it is
configured via the runtime-config form), so fail fast with 400 instead of
risking a competing sidecar from an ad-hoc admin call. Adds a gating test.
- embeddingscfg: add the previously-missing test suite for the provider
persistence layer boot reads to pick the active provider — Get on a fresh
DB, Save insert + update round-trips, empty-config, and the empty-kind /
invalid-JSON rejections.
go vet ./... · go build ./... · go test ./internal/{embeddingscfg,httpapi}/... all pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Owner
Author
Code review summaryReviewed focusing on the high-risk areas: provider abstraction, switch atomicity, concurrency, storage migrations, and endpoint authorization. The CLI config/TUI ( Verified locally on Strengths
Fixed pre-merge (commit 2a9e848)
Non-blocking follow-ups (optional)
Overall: solid, no blockers. 👍 |
CodeQL go/incorrect-integer-conversion (high) flagged parseDefaultValue: strconv.ParseInt(raw, 10, 64) yields an int64 that int(v) narrows, silently truncating on 32-bit builds. Parse with bitSize 0 (platform int width) instead, so the value is bounded to int range before conversion and an out-of-range tag errors (ok=false) rather than wrapping. Sibling parser in set.go is unaffected — it uses reflect.Value.SetInt with an explicit OverflowInt guard and no narrowing conversion. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds a pluggable embedding-provider system so the server can embed via the
local ollama llama-server sidecar, any OpenAI-compatible endpoint
(vLLM/TEI/LocalAI/OpenAI), or Voyage AI — switchable at runtime from the
dashboard. Also unifies on-disk storage path derivation behind that change.
Major pieces:
provider.Providerinterface + kind→Factoryregistry; ollama/openai/voyage implementations (ollama code moved into
provider/ollama/).token-bucket throttle, token-aware batch splitting, adaptive bisect on the
120K "batch too large" 400, byte-window sliding split + averaging for
oversize single inputs.
GET/PUT /admin/embedding-providers[/active],POST /…/{kind}/test, provider-aware/status; dashboard provider UI.chroma vector store is namespaced by the ACTIVE provider identity
(
provider.StorageSlug(Provider.ID())) with live reopen on switch, sovectors of different dimensions never collide. One-time boot migrations
adopt legacy per-model DB/chroma layouts without a reindex.
Why
The original design derived both the SQLite path and the chroma dir from the
embedding model name (a Python-era artefact). That was only safe while the
model was a fixed env var; runtime provider switching turned it into a
critical bug — switching to a higher-dim provider wrote vectors into the
previous provider's collection, breaking search with "vectors must have the
same length" — plus a footgun where a model change spawned a parallel system
DB with empty accounts.
How
cfg.SQLitePath) +identity-namespaced chroma;
vectorstore.Holderallows atomic swap of theactive store on provider switch with no restart.
stored; the key is read live from the environment at request time.
Pre-PR review fixes (last 5 commits)
against inconsistent window widths (panic → clean error); accept int8 as
base64; trim base-URL trailing slash.
known kind prefix instead of any
:so colon-bearing legacy ollama modelnames get prefixed.
SwitchEmbeddingProviderunderCIX_AUTH_DISABLED=true; add per-endpoint 403 gating tests for all adminroutes.
s.queuepointer swap during Restart, missing mutual exclusion betweenSwitchProvider and Restart, and an unsynchronized
waiterDoneread in theollama supervisor's Stop.
Update — post-review fixes (commits f5d6e22, 668ad29)
A full review of the branch was run after the original description above; two follow-up commits address the findings.
Correctness / resilience (f5d6e22)
PUT /admin/embedding-providers/active: the live swap is applied (and validated) before the DB row is written, so a transient switch failure can't leave the DB pointing at a provider that bricks the next boot.StateRemote) now reportready: true(previously alwaysfalse).ErrSupervisor.ctx_size(no more infinite split loop).Ready/Status) extracted to one place.make test-gate(the embedding-parity gate) compiles again — a deletedembedRawhelper was breaking it.Nested per-provider vector namespacing (668ad29)
Replaces the flat
chroma_<slug>scheme with a nested layout, one path segment per identity field:The provider kind is its own directory level (derived from the kind, not glued onto the model slug), which structurally removes the legacy-migration ambiguity where a model name normalizing to a kind-looking slug (e.g.
openai-community/x) was silently orphaned. AddsProvider.StorageComponents(),Config.ChromaDirFor(), and an unambiguous flat→nested migration that preserves existing ollama vectors without a reindex.Testing
go vet ./...✅ ·go test ./...✅ (39 pkgs) ·go test -race ./...✅ (39 pkgs, 0 data races) ·embed_gatecompiles.StorageComponents(incl. thefoovsollama-fooanti-collision); flat→nested migration (incl. the kind-looking-name case).Known deferred (non-blocking)
WARNfromAdoptLegacyModelDB; no data impact. Can be silenced in a follow-up (skip-warn-when-empty).Type of change
Checklist
go build ./...,go vet ./...,go test ./...all passgo test -race ./internal/embeddings/... ./internal/vectorstore/...passes🤖 Generated with Claude Code