feat(models): Phase 2 — ollama backend by heskew · Pull Request #651 · HarperFast/harper

heskew · 2026-05-20T21:47:27Z

Summary

Phase 2 of #510 — the first real ModelBackend implementation, talking
to Ollama's HTTP API. Validates the Phase 1 (#628 / PR #638) interface
against a non-trivial real provider without external CI dependencies.

Closes #629
Tracking: #510

Note

Stacked on feat/models-foundation (PR #638). Merge target moves to
main once Phase 1 lands.

What ships

components/ollama/index.ts — OllamaBackend implements ModelBackend
(embed, generate, generateStream) over native fetch, plus
registerOllamaBackend(...) for the boot bridge to call.
resources/models/bootstrap.ts — bootstrapModels(rootConfig) reads
models.{embedding,generative}.<name> entries and dispatches through
a { ollama: registerOllamaBackend } factory map. Unknown backends
log at error level and skip.
validation/configValidator.ts — Joi schema for the top-level
models: block. .unknown(false) on the entry schema so typos block
boot instead of silently dropping into the registry as no-ops.
components/componentLoader.ts — bootstrapModels(config) runs at
root-config load time, before per-component iteration, so
scope.models.embed(...) works from handleApplication(scope).
security/jsLoader.ts — adds models to getHarperExports so user
code can call harper.models.embed(...) from anywhere a component
runs.
static/defaultConfig.yaml — intentionally unchanged. Backends
are opt-in via block presence (matches replication: and the mcp:
block from PR [MCP] foundation: component scaffold + config schema + boot gating #649).

File-layout pivot from the issue body

Issue #629 proposed resources/models/backends/ollama.ts (in-core
directory). This PR ships the backend under components/ollama/ to
match the convention Kyle established in #649 (components/mcp/) —
core imports a register function from the component path. This sets
the same precedent for #630–#633 (openai, gateway, anthropic, bedrock).

API decisions

tools: false, adapters: false. Ollama tool-call support
exists on some models but is uneven across the catalog; v1
portability stays honest by leaving the surface off.
/api/embed (not legacy /api/embeddings). Ollama deprecated
the legacy endpoint; the new one accepts array input.
/api/generate for string prompts, /api/chat for messages.
system is prepended as the first message (Ollama chat has no
top-level system field).
AbortSignal propagation. opts.signal passes through to fetch;
when requestTimeoutMs is configured, the two are combined via
AbortSignal.any so the upstream HTTP closes on either caller abort
or timeout.
inputType: 'document' | 'query' is honored for embedding models
that use prefixes (nomic-embed-text v1.5+ takes search_document:
/ search_query: ). Non-prefix models are untouched.
Token counts dropped if not finite, non-negative integers. Ollama
is admin-trusted but the model it's running is community-pulled;
a bad eval-count poisoning SUM(prompt_tokens) over hdb_model_calls
is worth guarding against.

harper.models exposure (Phase 1 deferred decision A)

Picks harper.models via getHarperExports, the canonical Harper
export shape (harper.Resource, harper.tables, etc.). Same path
the openai backend (#630), the /v1/* gateway (#631), and the
@embed directive (#632) will plug into.

A module-singleton Models instance is used in jsLoader.ts rather
than reaching into Phase 1's per-Scope instance. The Models facade
has no per-Scope state (registry is module-scope, analytics writer is
a process-singleton), so the two are equivalent in behavior. The
singleton sidesteps wiring Scope references through
getHarperExports (which only sees ApplicationScope) without
touching Phase 1's existing per-Scope new Models() in
components/Scope.ts while #638 is still in review.

YAML→registry boot bridge (Phase 1 deferred decision B)

models: is a top-level config block. bootstrapModels(rootConfig)
runs in componentLoader.ts once getConfigObj() returns the merged
root config, before per-component iteration. Dispatch via a hardcoded
factory map ({ ollama: registerOllamaBackend }) — explicit, easy to
read, idempotent. Joi defaults under models.* don't propagate to
env.get (same finding Kyle hit on mcp.* in #649), so the bootstrap
reads getConfigObj() directly rather than the flat env map.

The factory map will grow with #630 (openai), #633 (anthropic /
bedrock). A directive registry / self-registering shape is plausible
later; the explicit map is fine for v1.

Pre-PR verification cadence

Per the verifier cadence agreed for this series:

deep-review single-pass (three parallel agents — api,
concurrency, config). Surfaced 11 findings; 7 applied as
blockers in this PR, 1 deferred as Tier 5 (AbortSignal.any
accumulation under sustained load — speculative, defer until
profiling shows growth), 1 deferred forward-looking for [Models] Phase 3 — openai backend (+ toolMode: 'return') #630
(secrets-in-plaintext design).
Multi-framed external-cross-model pass attempted but the CLI hung
in the local environment; deferring to the GitHub workflow's
cross-model review pass.

Files

Path	Change
`components/ollama/index.ts`	new — `OllamaBackend` + `registerOllamaBackend`
`resources/models/bootstrap.ts`	new — `bootstrapModels(rootConfig)`
`validation/configValidator.ts`	+ `models:` Joi schema (with `.unknown(false)` typo guard)
`components/componentLoader.ts`	+ `bootstrapModels(config)` call at root-config load
`security/jsLoader.ts`	+ `models` in `getHarperExports`
`unitTests/components/ollama/index.test.js`	new — 35 unit tests with mocked `fetch`
`unitTests/resources/models/bootstrap.test.js`	new — 8 unit tests for factory dispatch
`unitTests/validation/configValidator.test.js`	+ 10 tests for the `models:` block
`integrationTests/server/ollama-backend.test.ts`	new — gated on local Ollama reachability

Acceptance criteria

Test plan

npm run build clean
npm run lint:required clean
72 new + extended unit tests pass locally (Node 24)
Phase 1 model tests still pass (60 — no regression)
CI green across full matrix
Manual smoke against a local Ollama with nomic-embed-text +
llama3.2 pulled — OLLAMA_HOST / OLLAMA_EMBED_MODEL /
OLLAMA_GENERATE_MODEL envs honored

Out of scope

Tool calls (tools: false per the issue body)
LoRA adapter selection (adapters: false — FAB-503 territory via vLLM)
Streaming cancellation beyond AbortSignal propagation (no explicit
cancel endpoint on Ollama)
Custom HTTP transport (mTLS, proxies) — default fetch for v1
CI provisioning of an Ollama container — to be decided in PR review

🤖 Generated with Claude Code

claude · 2026-05-20T21:52:04Z

Reviewed; no blockers found.

Two findings from claude-bot's inline PR review on #651: - Integration `'AbortSignal cancels an in-flight stream'`: replace `ok(rejected || true)` (a tautology — asserts nothing) with a real termination check. Race the iterator drain against a 5 s deadline so the actual failure mode being guarded (hung stream after abort) fails the test instead of timing the suite out. - Unit `'throws OllamaBackendError when a stream line exceeds the byte cap'`: `1 << 20 + 1` evaluated to `1 << 21` (2 MiB) due to operator precedence — `+` binds tighter than `<<`. Prettier's autofix parenthesized the wrong side (`1 << (20 + 1)`, same value). Re-parenthesize to `(1 << 20) + 1` (1 MiB + 1 byte) — exactly one byte past the cap, the comment now matches the allocation, and the test memory footprint halves. Tracking: #629, #510 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

kriszyp

Looks great!

First real `ModelBackend` against the Ollama HTTP API: `embed` via `/api/embed`, `generate` via `/api/generate` or `/api/chat`, and `generateStream` via NDJSON over chunked HTTP. Backend lives under `components/<name>/` matching the pattern from the MCP foundation (PR #649) — core imports `registerOllamaBackend` and calls it during boot; the file is not a `handleApplication(scope)` self-loader. Capabilities advertise `tools: false` and `adapters: false`. Ollama tool support exists on some models but is uneven across the catalog; the v1 portability guarantee keeps it off here. Validates the Phase 1 (#628 / PR #638) `ModelBackend` interface against a non-trivial real provider without external dependencies in CI. Tracking: #629, #510 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a `models:` block to `harperdb-config.yaml` (presence-gated, matches the `replication:` and `mcp:` conventions), validates it via Joi with `.unknown(false)` so field typos block boot instead of silently skipping, and dispatches per-entry registration through a factory map in `resources/models/bootstrap.ts`. Boot site: `components/componentLoader.ts` calls `bootstrapModels(config)` once the root config is loaded and before per-component iteration, so `scope.models.embed(...)` works from `handleApplication(scope)` as well as from Resource methods. The factory map (`{ ollama: registerOllamaBackend }`) is hardcoded for v1. Unknown backends are logged at error level (not warn) and skipped — silently registering nothing on an opt-in feature is a footgun. Schema validation catches field-name typos before the factory runs. `requestTimeoutMs: 0` is rejected by the schema (`min(1)`): omit the field for "no timeout" so the meaning is unambiguous at the YAML layer. Tracking: #629, #510 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds `models` to `getHarperExports` so user code can call `harper.models.embed(...)` / `.generate(...)` / `.generateStream(...)` from anywhere a component runs — `handleApplication(scope)` init code, Resource methods, internal jobs. Uses a module-singleton `new Models()` rather than reaching into the per-`Scope` instance Phase 1 wires in `components/Scope.ts`. The `Models` facade has no per-Scope state — the backend registry is module-scope and the analytics writer is a process-singleton — so the two are equivalent in behavior. The singleton sidesteps wiring `Scope` references through `getHarperExports` (which only sees `ApplicationScope`) without touching Phase 1's existing wiring while #638 is still in review. Tracking: #629, #510 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Unit tests (mocked `fetch`): - `OllamaBackend` capability shape, host normalization, wire format for `/api/embed`, `/api/generate`, `/api/chat`, NDJSON streaming including split-line and oversize-line handling. - AbortSignal propagation: caller-only, composed via `AbortSignal.any` with a per-call timeout, and abort-while-pending. - Robust response handling: vector-count mismatch, non-finite vector values, non-finite / non-integer / negative token counts, non-string `content` fields, non-JSON response bodies — all surface as `OllamaBackendError`. - NDJSON error messages are static so upstream-derived content cannot leak through the thrown error. `bootstrap.ts` factory dispatch: ollama embedding + generative registration under arbitrary logical names, unknown-backend skip, bad entry shapes skipped without throwing. `configValidator` `models:` block coverage: missing `backend`, bad `requestTimeoutMs` (non-numeric, negative, `0`), typo'd field names (`.unknown(false)` rejection), multi-logical-name acceptance. Integration test (`integrationTests/server/ollama-backend.test.ts`): exercises `OllamaBackend` end-to-end against a real local Ollama, gated on reachability + presence of `OLLAMA_EMBED_MODEL` and `OLLAMA_GENERATE_MODEL` in the local `/api/tags`. Skips silently when unmet so CI without an Ollama provisioned passes. Validates that the mocked wire format used in unit tests matches what Ollama actually produces. Tracking: #629, #510 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Run prettier on the four ollama-related files (CI Format Check missed these locally before push). - Defer the integration test's `OllamaBackend` import to a dynamic `await import(...)` inside `before()`. Statically importing from `components/ollama/` triggers a pre-existing CJS require cycle (`utility/common_utils.ts` ↔ `utility/logging/harper_logger.ts`) when loaded by `node --test`, which is fatal on Node 22+ (`ERR_REQUIRE_CYCLE_MODULE`). Other integration tests don't hit it because they only import from `@harperfast/integration-testing` and spawn Harper as a subprocess. Tracking: #629, #510 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two findings from claude-bot's inline PR review on #651: - Integration `'AbortSignal cancels an in-flight stream'`: replace `ok(rejected || true)` (a tautology — asserts nothing) with a real termination check. Race the iterator drain against a 5 s deadline so the actual failure mode being guarded (hung stream after abort) fails the test instead of timing the suite out. - Unit `'throws OllamaBackendError when a stream line exceeds the byte cap'`: `1 << 20 + 1` evaluated to `1 << 21` (2 MiB) due to operator precedence — `+` binds tighter than `<<`. Prettier's autofix parenthesized the wrong side (`1 << (20 + 1)`, same value). Re-parenthesize to `(1 << 20) + 1` (1 MiB + 1 byte) — exactly one byte past the cap, the comment now matches the allocation, and the test memory footprint halves. Tracking: #629, #510 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

heskew

Deep Review of PR #651: `feat(models): Phase 2 — ollama backend`

This PR is an outstanding implementation of the ModelBackend interface. It is exceptionally clean, robust, and sets a perfect, highly-modular precedent for the upcoming model backend integrations (OpenAI, Anthropic, Bedrock, etc.).

Below is a detailed breakdown of the review findings, highlighting several outstanding defensive coding practices and verified implementation details.

Key Architectural Strengths

Modular Component-Based File Layout:
- Pivoting from the initial plan (resources/models/backends/ollama.ts) to a component-based layout (components/ollama/index.ts) is a brilliant architectural decision. It aligns perfectly with the pattern established in the MCP component (components/mcp/) and prevents the core codebase from bloating with provider-specific transport code.
Lazy-Loaded Module Singleton Facade:
- The way harper.models is exposed via getHarperExports in security/jsLoader.ts using a lazily-instantiated singleton Models facade is extremely elegant. This sidesteps complex scope-ref-wiring issues without introducing any global state concerns, as the backend registry and analytics writer are already process-singletons.
Boot-Time Precedence:
- Running bootstrapModels(config) in components/componentLoader.ts at the root-level config load ensures that models are registered before individual plugin modules are processed. This makes scope.models safely available from any app initialization hook (handleApplication(scope)).

Exceptional Defensive Coding Practices

This PR stands out for its high level of defensive engineering:

Poison-Resistant Aggregates (assignFiniteTokenCount):
- Guarding the token count insertion by checking Number.isFinite, value >= 0, and Number.isInteger is a fantastic precaution. Since community-pulled models run on Ollama can return erratic evaluation values, this prevents poisoning of downstream database metrics (e.g. SUM(prompt_tokens)) that could crash analytics pipelines.
JSON Parse Error Sanitization:
- Wrapping native JSON parse failures inside a standardized OllamaBackendError (rather than letting the raw SyntaxError bubble up) is an excellent security measure. Raw syntax error messages often echo the offending payload bytes, which could leak upstream or private user prompt content into server logs.
Stream Memory-Growth Protection (MAX_NDJSON_LINE_BYTES):
- Enforcing a 1 MiB line-byte cap inside the NDJSON streaming parser (readNdjson) protects the server from runaway memory growth or heap exhaustion in case of a pathological or malicious stream output from a model.
Ambiguity-Free Timeout Boundaries:
- Validating requestTimeoutMs: number.min(1) in the Joi schema (and rejecting 0) is highly pragmatic. Since 0 would otherwise validate but get treated loosely as "no timeout" in downstream signal composition, this prevents misleading caller intent.

Verification of Core Logic

NDJSON Parsing and Flushing:
- Walked through the stream decoder loop inside readNdjson. The chunk buffering, index slicing, and the final flush call (buf += decoder.decode()) are mathematically complete and handle trailing chunks correctly without leaving unparsed bytes on the wire.
Nomic-Embed Prefix Application:
- The conditional embedding prefix logic for Nomic-style models is very clean. It respects model-specific requirements while safely leaving non-prefix models untouched.
Capability Integrity:
- Setting tools: false and adapters: false is a highly pragmatic and honest choice for V1 portability, given the uneven tool-use capabilities across Ollama's local model catalog.

Conclusion

The PR has been fully reviewed and is pristine and production-ready. It has superb test coverage (72 new/extended unit and integration tests) and serves as an exceptional blueprint for the rest of the Phase 2 backends.

🤖 Posted by Antigravity on Nathan's behalf

heskew requested a review from a team as a code owner May 20, 2026 21:47

claude Bot reviewed May 20, 2026

View reviewed changes

Comment thread integrationTests/server/ollama-backend.test.ts Outdated

claude Bot reviewed May 20, 2026

View reviewed changes

Comment thread unitTests/components/ollama/index.test.js Outdated

claude Bot reviewed May 20, 2026

View reviewed changes

Comment thread integrationTests/server/ollama-backend.test.ts Outdated

kriszyp approved these changes May 20, 2026

View reviewed changes

Base automatically changed from feat/models-foundation to main May 21, 2026 00:27

heskew and others added 6 commits May 20, 2026 17:29

heskew force-pushed the feat/models-ollama branch from fa8ae69 to ddb8c19 Compare May 21, 2026 00:30

heskew commented May 21, 2026

View reviewed changes

heskew merged commit e671827 into main May 21, 2026
37 checks passed

heskew deleted the feat/models-ollama branch May 21, 2026 16:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(models): Phase 2 — ollama backend#651

feat(models): Phase 2 — ollama backend#651
heskew merged 6 commits into
mainfrom
feat/models-ollama

heskew commented May 20, 2026

Uh oh!

Uh oh!

Uh oh!

claude Bot commented May 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

kriszyp left a comment

Uh oh!

heskew left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

heskew commented May 20, 2026

Summary

What ships

File-layout pivot from the issue body

API decisions

harper.models exposure (Phase 1 deferred decision A)

YAML→registry boot bridge (Phase 1 deferred decision B)

Pre-PR verification cadence

Files

Acceptance criteria

Test plan

Out of scope

Uh oh!

Uh oh!

Uh oh!

claude Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

kriszyp left a comment

Choose a reason for hiding this comment

Uh oh!

heskew left a comment

Choose a reason for hiding this comment

Deep Review of PR #651: feat(models): Phase 2 — ollama backend

Key Architectural Strengths

Exceptional Defensive Coding Practices

Verification of Core Logic

Conclusion

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude Bot commented May 20, 2026 •

edited

Loading

Deep Review of PR #651: `feat(models): Phase 2 — ollama backend`