Skip to content

feat(models): Phase 2 — ollama backend#651

Merged
heskew merged 6 commits into
mainfrom
feat/models-ollama
May 21, 2026
Merged

feat(models): Phase 2 — ollama backend#651
heskew merged 6 commits into
mainfrom
feat/models-ollama

Conversation

@heskew
Copy link
Copy Markdown
Member

@heskew heskew commented May 20, 2026

Summary

Phase 2 of #510 — the first real ModelBackend implementation, talking
to Ollama's HTTP API. Validates the Phase 1 (#628 / PR #638) interface
against a non-trivial real provider without external CI dependencies.

Closes #629
Tracking: #510

Note

Stacked on feat/models-foundation (PR #638). Merge target moves to
main once Phase 1 lands.

What ships

  • components/ollama/index.tsOllamaBackend implements ModelBackend
    (embed, generate, generateStream) over native fetch, plus
    registerOllamaBackend(...) for the boot bridge to call.
  • resources/models/bootstrap.tsbootstrapModels(rootConfig) reads
    models.{embedding,generative}.<name> entries and dispatches through
    a { ollama: registerOllamaBackend } factory map. Unknown backends
    log at error level and skip.
  • validation/configValidator.ts — Joi schema for the top-level
    models: block. .unknown(false) on the entry schema so typos block
    boot instead of silently dropping into the registry as no-ops.
  • components/componentLoader.tsbootstrapModels(config) runs at
    root-config load time, before per-component iteration, so
    scope.models.embed(...) works from handleApplication(scope).
  • security/jsLoader.ts — adds models to getHarperExports so user
    code can call harper.models.embed(...) from anywhere a component
    runs.
  • static/defaultConfig.yaml — intentionally unchanged. Backends
    are opt-in via block presence (matches replication: and the mcp:
    block from PR [MCP] foundation: component scaffold + config schema + boot gating #649).

File-layout pivot from the issue body

Issue #629 proposed resources/models/backends/ollama.ts (in-core
directory). This PR ships the backend under components/ollama/ to
match the convention Kyle established in #649 (components/mcp/) —
core imports a register function from the component path. This sets
the same precedent for #630#633 (openai, gateway, anthropic, bedrock).

API decisions

  • tools: false, adapters: false. Ollama tool-call support
    exists on some models but is uneven across the catalog; v1
    portability stays honest by leaving the surface off.
  • /api/embed (not legacy /api/embeddings). Ollama deprecated
    the legacy endpoint; the new one accepts array input.
  • /api/generate for string prompts, /api/chat for messages.
    system is prepended as the first message (Ollama chat has no
    top-level system field).
  • AbortSignal propagation. opts.signal passes through to fetch;
    when requestTimeoutMs is configured, the two are combined via
    AbortSignal.any so the upstream HTTP closes on either caller abort
    or timeout.
  • inputType: 'document' | 'query' is honored for embedding models
    that use prefixes (nomic-embed-text v1.5+ takes search_document:
    / search_query: ). Non-prefix models are untouched.
  • Token counts dropped if not finite, non-negative integers. Ollama
    is admin-trusted but the model it's running is community-pulled;
    a bad eval-count poisoning SUM(prompt_tokens) over hdb_model_calls
    is worth guarding against.

harper.models exposure (Phase 1 deferred decision A)

Picks harper.models via getHarperExports, the canonical Harper
export shape (harper.Resource, harper.tables, etc.). Same path
the openai backend (#630), the /v1/* gateway (#631), and the
@embed directive (#632) will plug into.

A module-singleton Models instance is used in jsLoader.ts rather
than reaching into Phase 1's per-Scope instance. The Models facade
has no per-Scope state (registry is module-scope, analytics writer is
a process-singleton), so the two are equivalent in behavior. The
singleton sidesteps wiring Scope references through
getHarperExports (which only sees ApplicationScope) without
touching Phase 1's existing per-Scope new Models() in
components/Scope.ts while #638 is still in review.

YAML→registry boot bridge (Phase 1 deferred decision B)

models: is a top-level config block. bootstrapModels(rootConfig)
runs in componentLoader.ts once getConfigObj() returns the merged
root config, before per-component iteration. Dispatch via a hardcoded
factory map ({ ollama: registerOllamaBackend }) — explicit, easy to
read, idempotent. Joi defaults under models.* don't propagate to
env.get (same finding Kyle hit on mcp.* in #649), so the bootstrap
reads getConfigObj() directly rather than the flat env map.

The factory map will grow with #630 (openai), #633 (anthropic /
bedrock). A directive registry / self-registering shape is plausible
later; the explicit map is fine for v1.

Pre-PR verification cadence

Per the verifier cadence agreed for this series:

  • deep-review single-pass (three parallel agents — api,
    concurrency, config). Surfaced 11 findings; 7 applied as
    blockers in this PR, 1 deferred as Tier 5 (AbortSignal.any
    accumulation under sustained load — speculative, defer until
    profiling shows growth), 1 deferred forward-looking for [Models] Phase 3 — openai backend (+ toolMode: 'return') #630
    (secrets-in-plaintext design).
  • Multi-framed external-cross-model pass attempted but the CLI hung
    in the local environment; deferring to the GitHub workflow's
    cross-model review pass.

Files

Path Change
components/ollama/index.ts new — OllamaBackend + registerOllamaBackend
resources/models/bootstrap.ts new — bootstrapModels(rootConfig)
validation/configValidator.ts + models: Joi schema (with .unknown(false) typo guard)
components/componentLoader.ts + bootstrapModels(config) call at root-config load
security/jsLoader.ts + models in getHarperExports
unitTests/components/ollama/index.test.js new — 35 unit tests with mocked fetch
unitTests/resources/models/bootstrap.test.js new — 8 unit tests for factory dispatch
unitTests/validation/configValidator.test.js + 10 tests for the models: block
integrationTests/server/ollama-backend.test.ts new — gated on local Ollama reachability

Acceptance criteria

  • OllamaBackend implements ModelBackend per Phase 1's interface
  • scope.models.embed() configured with backend: ollama produces
    a vector (validated via integration test against local Ollama)
  • scope.models.generate() produces a completion
  • scope.models.generateStream() yields GenerateChunk per the
    Phase 1 contract
  • Per-call accounting records backend: 'ollama', model, token
    counts, latency, success (via the Phase 1 writer — flows through
    unchanged)
  • AbortSignal from BackendOpts cancels in-flight requests
  • Capability negotiation reports tools: false, adapters: false
  • inputType: 'document' | 'query' differentiates vectors on
    nomic-style models
  • Integration tests pass against a running local Ollama (gated;
    auto-skip when unreachable — CI provisioning still TBD)
  • CI green (build, lint, unit tests on Node 20/22/24 — pending CI run)

Test plan

  • npm run build clean
  • npm run lint:required clean
  • 72 new + extended unit tests pass locally (Node 24)
  • Phase 1 model tests still pass (60 — no regression)
  • CI green across full matrix
  • Manual smoke against a local Ollama with nomic-embed-text +
    llama3.2 pulled — OLLAMA_HOST / OLLAMA_EMBED_MODEL /
    OLLAMA_GENERATE_MODEL envs honored

Out of scope

  • Tool calls (tools: false per the issue body)
  • LoRA adapter selection (adapters: false — FAB-503 territory via vLLM)
  • Streaming cancellation beyond AbortSignal propagation (no explicit
    cancel endpoint on Ollama)
  • Custom HTTP transport (mTLS, proxies) — default fetch for v1
  • CI provisioning of an Ollama container — to be decided in PR review

🤖 Generated with Claude Code

@heskew heskew requested a review from a team as a code owner May 20, 2026 21:47
Comment thread integrationTests/server/ollama-backend.test.ts Outdated
Comment thread unitTests/components/ollama/index.test.js Outdated
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 20, 2026

Reviewed; no blockers found.

Comment thread integrationTests/server/ollama-backend.test.ts Outdated
heskew added a commit that referenced this pull request May 20, 2026
Two findings from claude-bot's inline PR review on #651:

- Integration `'AbortSignal cancels an in-flight stream'`: replace
  `ok(rejected || true)` (a tautology — asserts nothing) with a real
  termination check. Race the iterator drain against a 5 s deadline so
  the actual failure mode being guarded (hung stream after abort) fails
  the test instead of timing the suite out.
- Unit `'throws OllamaBackendError when a stream line exceeds the byte
  cap'`: `1 << 20 + 1` evaluated to `1 << 21` (2 MiB) due to operator
  precedence — `+` binds tighter than `<<`. Prettier's autofix
  parenthesized the wrong side (`1 << (20 + 1)`, same value).
  Re-parenthesize to `(1 << 20) + 1` (1 MiB + 1 byte) — exactly one
  byte past the cap, the comment now matches the allocation, and the
  test memory footprint halves.

Tracking: #629, #510

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@kriszyp kriszyp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Base automatically changed from feat/models-foundation to main May 21, 2026 00:27
heskew and others added 6 commits May 20, 2026 17:29
First real `ModelBackend` against the Ollama HTTP API: `embed` via
`/api/embed`, `generate` via `/api/generate` or `/api/chat`, and
`generateStream` via NDJSON over chunked HTTP.

Backend lives under `components/<name>/` matching the pattern from the
MCP foundation (PR #649) — core imports `registerOllamaBackend` and calls
it during boot; the file is not a `handleApplication(scope)` self-loader.

Capabilities advertise `tools: false` and `adapters: false`. Ollama tool
support exists on some models but is uneven across the catalog; the v1
portability guarantee keeps it off here.

Validates the Phase 1 (#628 / PR #638) `ModelBackend` interface against a
non-trivial real provider without external dependencies in CI.

Tracking: #629, #510

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a `models:` block to `harperdb-config.yaml` (presence-gated, matches
the `replication:` and `mcp:` conventions), validates it via Joi with
`.unknown(false)` so field typos block boot instead of silently skipping,
and dispatches per-entry registration through a factory map in
`resources/models/bootstrap.ts`.

Boot site: `components/componentLoader.ts` calls `bootstrapModels(config)`
once the root config is loaded and before per-component iteration, so
`scope.models.embed(...)` works from `handleApplication(scope)` as well
as from Resource methods.

The factory map (`{ ollama: registerOllamaBackend }`) is hardcoded for
v1. Unknown backends are logged at error level (not warn) and skipped —
silently registering nothing on an opt-in feature is a footgun. Schema
validation catches field-name typos before the factory runs.

`requestTimeoutMs: 0` is rejected by the schema (`min(1)`): omit the
field for "no timeout" so the meaning is unambiguous at the YAML layer.

Tracking: #629, #510

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds `models` to `getHarperExports` so user code can call
`harper.models.embed(...)` / `.generate(...)` / `.generateStream(...)`
from anywhere a component runs — `handleApplication(scope)` init code,
Resource methods, internal jobs.

Uses a module-singleton `new Models()` rather than reaching into the
per-`Scope` instance Phase 1 wires in `components/Scope.ts`. The `Models`
facade has no per-Scope state — the backend registry is module-scope and
the analytics writer is a process-singleton — so the two are equivalent
in behavior. The singleton sidesteps wiring `Scope` references through
`getHarperExports` (which only sees `ApplicationScope`) without touching
Phase 1's existing wiring while #638 is still in review.

Tracking: #629, #510

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Unit tests (mocked `fetch`):
- `OllamaBackend` capability shape, host normalization, wire format for
  `/api/embed`, `/api/generate`, `/api/chat`, NDJSON streaming including
  split-line and oversize-line handling.
- AbortSignal propagation: caller-only, composed via `AbortSignal.any`
  with a per-call timeout, and abort-while-pending.
- Robust response handling: vector-count mismatch, non-finite vector
  values, non-finite / non-integer / negative token counts, non-string
  `content` fields, non-JSON response bodies — all surface as
  `OllamaBackendError`.
- NDJSON error messages are static so upstream-derived content cannot
  leak through the thrown error.

`bootstrap.ts` factory dispatch: ollama embedding + generative
registration under arbitrary logical names, unknown-backend skip, bad
entry shapes skipped without throwing.

`configValidator` `models:` block coverage: missing `backend`, bad
`requestTimeoutMs` (non-numeric, negative, `0`), typo'd field names
(`.unknown(false)` rejection), multi-logical-name acceptance.

Integration test (`integrationTests/server/ollama-backend.test.ts`):
exercises `OllamaBackend` end-to-end against a real local Ollama,
gated on reachability + presence of `OLLAMA_EMBED_MODEL` and
`OLLAMA_GENERATE_MODEL` in the local `/api/tags`. Skips silently when
unmet so CI without an Ollama provisioned passes. Validates that the
mocked wire format used in unit tests matches what Ollama actually
produces.

Tracking: #629, #510

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Run prettier on the four ollama-related files (CI Format Check missed
  these locally before push).
- Defer the integration test's `OllamaBackend` import to a dynamic
  `await import(...)` inside `before()`. Statically importing from
  `components/ollama/` triggers a pre-existing CJS require cycle
  (`utility/common_utils.ts` ↔ `utility/logging/harper_logger.ts`) when
  loaded by `node --test`, which is fatal on Node 22+
  (`ERR_REQUIRE_CYCLE_MODULE`). Other integration tests don't hit it
  because they only import from `@harperfast/integration-testing` and
  spawn Harper as a subprocess.

Tracking: #629, #510

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two findings from claude-bot's inline PR review on #651:

- Integration `'AbortSignal cancels an in-flight stream'`: replace
  `ok(rejected || true)` (a tautology — asserts nothing) with a real
  termination check. Race the iterator drain against a 5 s deadline so
  the actual failure mode being guarded (hung stream after abort) fails
  the test instead of timing the suite out.
- Unit `'throws OllamaBackendError when a stream line exceeds the byte
  cap'`: `1 << 20 + 1` evaluated to `1 << 21` (2 MiB) due to operator
  precedence — `+` binds tighter than `<<`. Prettier's autofix
  parenthesized the wrong side (`1 << (20 + 1)`, same value).
  Re-parenthesize to `(1 << 20) + 1` (1 MiB + 1 byte) — exactly one
  byte past the cap, the comment now matches the allocation, and the
  test memory footprint halves.

Tracking: #629, #510

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@heskew heskew force-pushed the feat/models-ollama branch from fa8ae69 to ddb8c19 Compare May 21, 2026 00:30
Copy link
Copy Markdown
Member Author

@heskew heskew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review of PR #651: feat(models): Phase 2 — ollama backend

This PR is an outstanding implementation of the ModelBackend interface. It is exceptionally clean, robust, and sets a perfect, highly-modular precedent for the upcoming model backend integrations (OpenAI, Anthropic, Bedrock, etc.).

Below is a detailed breakdown of the review findings, highlighting several outstanding defensive coding practices and verified implementation details.


Key Architectural Strengths

  1. Modular Component-Based File Layout:

    • Pivoting from the initial plan (resources/models/backends/ollama.ts) to a component-based layout (components/ollama/index.ts) is a brilliant architectural decision. It aligns perfectly with the pattern established in the MCP component (components/mcp/) and prevents the core codebase from bloating with provider-specific transport code.
  2. Lazy-Loaded Module Singleton Facade:

    • The way harper.models is exposed via getHarperExports in security/jsLoader.ts using a lazily-instantiated singleton Models facade is extremely elegant. This sidesteps complex scope-ref-wiring issues without introducing any global state concerns, as the backend registry and analytics writer are already process-singletons.
  3. Boot-Time Precedence:

    • Running bootstrapModels(config) in components/componentLoader.ts at the root-level config load ensures that models are registered before individual plugin modules are processed. This makes scope.models safely available from any app initialization hook (handleApplication(scope)).

Exceptional Defensive Coding Practices

This PR stands out for its high level of defensive engineering:

  • Poison-Resistant Aggregates (assignFiniteTokenCount):
    • Guarding the token count insertion by checking Number.isFinite, value >= 0, and Number.isInteger is a fantastic precaution. Since community-pulled models run on Ollama can return erratic evaluation values, this prevents poisoning of downstream database metrics (e.g. SUM(prompt_tokens)) that could crash analytics pipelines.
  • JSON Parse Error Sanitization:
    • Wrapping native JSON parse failures inside a standardized OllamaBackendError (rather than letting the raw SyntaxError bubble up) is an excellent security measure. Raw syntax error messages often echo the offending payload bytes, which could leak upstream or private user prompt content into server logs.
  • Stream Memory-Growth Protection (MAX_NDJSON_LINE_BYTES):
    • Enforcing a 1 MiB line-byte cap inside the NDJSON streaming parser (readNdjson) protects the server from runaway memory growth or heap exhaustion in case of a pathological or malicious stream output from a model.
  • Ambiguity-Free Timeout Boundaries:
    • Validating requestTimeoutMs: number.min(1) in the Joi schema (and rejecting 0) is highly pragmatic. Since 0 would otherwise validate but get treated loosely as "no timeout" in downstream signal composition, this prevents misleading caller intent.

Verification of Core Logic

  • NDJSON Parsing and Flushing:
    • Walked through the stream decoder loop inside readNdjson. The chunk buffering, index slicing, and the final flush call (buf += decoder.decode()) are mathematically complete and handle trailing chunks correctly without leaving unparsed bytes on the wire.
  • Nomic-Embed Prefix Application:
    • The conditional embedding prefix logic for Nomic-style models is very clean. It respects model-specific requirements while safely leaving non-prefix models untouched.
  • Capability Integrity:
    • Setting tools: false and adapters: false is a highly pragmatic and honest choice for V1 portability, given the uneven tool-use capabilities across Ollama's local model catalog.

Conclusion

The PR has been fully reviewed and is pristine and production-ready. It has superb test coverage (72 new/extended unit and integration tests) and serves as an exceptional blueprint for the rest of the Phase 2 backends.


🤖 Posted by Antigravity on Nathan's behalf

@heskew heskew merged commit e671827 into main May 21, 2026
37 checks passed
@heskew heskew deleted the feat/models-ollama branch May 21, 2026 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Models] Phase 2 — ollama backend

2 participants