feat: make sentence-transformers optional; interactive `ccc init` by georgeh0 · Pull Request #132 · cocoindex-io/cocoindex-code

georgeh0 · 2026-04-14T05:09:30Z

Summary

Closes #117 (optional sentence-transformers) and #70 (interactive ccc init).

Packaging — sentence-transformers moves behind new [embeddings-local] and [default] extras (delegated to cocoindex[sentence-transformers] so the pin lives in one place). Bare pip install cocoindex-code becomes LiteLLM-only.
Interactive ccc init — when global settings don't exist, prompts for provider (sentence-transformers / litellm) and model via a questionary TUI. New --litellm-model MODEL flag skips prompts and is the non-TTY escape hatch for LiteLLM. The interactive flow is gated on "global settings missing" — subsequent ccc init calls in other projects keep the old behavior.
Default model changes from sentence-transformers/all-MiniLM-L6-v2 to Snowflake/snowflake-arctic-embed-xs (lighter, better for code retrieval).
Generated global_settings.yml now includes a ccc doctor reminder and commented-out env-var examples (OPENAI_API_KEY, GEMINI_API_KEY, ANTHROPIC_API_KEY, VOYAGE_API_KEY).
Model test runs in the daemon — init sends a DoctorRequest via the existing _client.doctor path; the daemon loads the model once and stays running, so the user's next ccc index starts warm. Output is wrapped in a rich spinner; HF warnings and "Loading weights" progress stream to daemon.log instead of cluttering the init output.
Docker — image now installs cocoindex-code[default] and pre-caches the Snowflake model. The COCOINDEX_CODE_EMBEDDING_MODEL env var is no longer documented for Docker; users mount a pre-written global_settings.yml or pass --litellm-model.
Shared helper — extract check_embedding + EmbeddingCheckResult into shared.py. Daemon's _check_model delegates to it; error messages in ccc doctor output now include the exception type name (strictly more informative).
Tests — switch to the lighter paraphrase-MiniLM-L3-v2 model via a new make_test_user_settings() helper in conftest.py, keeping CI cache costs unchanged. New E2E tests cover the init flows (TTY defaults, --litellm-model, non-TTY slim install error, failure-is-non-fatal, flag-rejected-when-settings-exist).

Test plan

CI (ruff, mypy, pytest — 130 tests including new init-flow coverage).
Manual smoke test of ccc init with a bogus --litellm-model: FAIL tag + edit-settings hint + project init still completes. ✓

- Move `sentence-transformers` behind `[embeddings-local]` and `[default]` extras (via `cocoindex[sentence-transformers]`), so `pip install cocoindex-code` is LiteLLM-only. Closes #117. - `ccc init` is now interactive when global settings don't exist: pick provider (sentence-transformers / litellm) and model via a questionary TUI. New `--litellm-model MODEL` flag skips prompts and is the non-TTY escape hatch for LiteLLM. Closes #70. - Change the default sentence-transformers model from `all-MiniLM-L6-v2` to `Snowflake/snowflake-arctic-embed-xs` (lighter, better quality for code). - Generated `global_settings.yml` now includes a `ccc doctor` reminder and commented-out env-var examples (OPENAI_API_KEY, GEMINI_API_KEY, ANTHROPIC_API_KEY, VOYAGE_API_KEY). - Model test during init runs in the daemon via the existing `DoctorRequest` path; the daemon loads the model once and stays running, so the user's next `ccc index` starts warm. - Docker image now installs `cocoindex-code[default]` and pre-caches the new default model. The `COCOINDEX_CODE_EMBEDDING_MODEL` env var is no longer documented for Docker; users mount a `global_settings.yml` or pass `--litellm-model`. - Extract `check_embedding` + `EmbeddingCheckResult` into `shared.py`; refactor daemon `_check_model` to delegate. Error messages in doctor output now include the exception type name (strictly more informative). - Tests switch to a lighter `paraphrase-MiniLM-L3-v2` model via a new `make_test_user_settings()` helper in `conftest.py`, leaving CI cache costs unchanged.

Follow-up to #132: the skill's management.md still said `pipx install cocoindex-code` and described `ccc init` as a purely non-interactive command. Update to reflect: - Two install styles — `[default]` (batteries included) vs bare (slim, LiteLLM-only). - First-run `ccc init` is interactive; prompts for provider/model and runs a test embed via the daemon. - `--litellm-model MODEL` flag for non-TTY / scripted use. - Pointer to `ccc doctor` if the init model test fails.

georgeh0 merged commit 5b6e3f5 into main Apr 14, 2026
4 of 5 checks passed

georgeh0 deleted the g/optional-st-dep branch April 14, 2026 05:25

georgeh0 mentioned this pull request Apr 14, 2026

docs: update ccc skill install + init guidance #134

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: make sentence-transformers optional; interactive `ccc init`#132

feat: make sentence-transformers optional; interactive `ccc init`#132
georgeh0 merged 1 commit intomainfrom
g/optional-st-dep

georgeh0 commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

georgeh0 commented Apr 14, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant