added llama-server inference support by pvasilek · Pull Request #4 · mraza007/echovault

pvasilek · 2026-02-23T10:33:40Z

There is support for llama-server inferenced embedding models, like
"llama-server.exe" --model %MODEL_PATH% --alias "text-embedder" --embeddings --jinja --port 11435

mraza007 · 2026-02-24T03:52:01Z

On src/memory/embeddings/llama.py (return lines in embed() and search()):
resp.json()[0]["embedding"][0] returns a single float, not the full embedding vector.
This class advertises list[float], so this should likely return the full embedding array (same for both methods).
On src/memory/embeddings/llama_nomic.py (return lines in embed() and search()):
Same issue here: resp.json()[0]["embedding"][0] returns one dimension only.
Please return the full vector, otherwise downstream vector search/dimension logic will break.
On src/memory/embeddings/base.py (new abstract search()):
Making search() abstract is a breaking interface change for existing embedding providers/test doubles that only implement embed().
This currently breaks fixture instantiation (FakeEmbeddingProvider) in tests.
Suggestion: add a default search() implementation that delegates to embed() for backward compatibility.
On tests/test_search.py (assertions around embedding calls):
Tests still assert embed() is called, but implementation now calls search().
Please update these assertions (or keep compatibility via default search=embed) so tests reflect the new contract.

pvasilek · 2026-02-24T07:41:17Z

It is not a single float as llama-server return array of arrays. Check the log below.

srv  log_server_r: done request: POST /embeddings 127.0.0.1 200
srv  log_server_r: request:  {"model":"text-embedder","content":"Switched to JWT auth Replaced session cookies with JWT Needed stateless auth for API All endpoints now require Bearer token auth jwt"}
srv  log_server_r: response: [{"index":0,"embedding":[[-0.0443197600543499,...,-0.05202531814575195]]}]

As I see 'search = embed' exists in the old classes. It does not work as expected?

mraza007 · 2026-02-24T10:25:07Z

Thanks, this helps. You’re right that your llama-server response shape is nested (embedding: [[...]]), so indexing [0] can be expected there

Please add tests for the new search() contract before merge.

I’m still seeing two regressions on current PR head:

EmbeddingProvider.search() is abstract now, which breaks existing providers/test doubles that only implement embed() (fixture error in tests/conftest.py).
tests/test_search.py still expects embed() calls, but implementation now calls search().

Can you update/add tests to cover this explicitly?

pvasilek · 2026-02-24T12:22:50Z

I am not a python programmer and do not have pytest here. So, for me that is quite complex task, sorry.

mraza007 · 2026-02-24T22:57:21Z

Sounds good
I'll take care of the tests

pvasilek added 2 commits February 23, 2026 12:28

added llama-server inference support

cbebf7e

added llama-server nomic-ai models inference support

1ac8778

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added llama-server inference support#4

added llama-server inference support#4
pvasilek wants to merge 2 commits intomraza007:mainfrom
pvasilek:main

pvasilek commented Feb 23, 2026

Uh oh!

mraza007 commented Feb 24, 2026

Uh oh!

pvasilek commented Feb 24, 2026

Uh oh!

mraza007 commented Feb 24, 2026

Uh oh!

pvasilek commented Feb 24, 2026

Uh oh!

mraza007 commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pvasilek commented Feb 23, 2026

Uh oh!

mraza007 commented Feb 24, 2026

Uh oh!

pvasilek commented Feb 24, 2026

Uh oh!

mraza007 commented Feb 24, 2026

Uh oh!

pvasilek commented Feb 24, 2026

Uh oh!

mraza007 commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants