Skip to content

added llama-server inference support#4

Open
pvasilek wants to merge 2 commits intomraza007:mainfrom
pvasilek:main
Open

added llama-server inference support#4
pvasilek wants to merge 2 commits intomraza007:mainfrom
pvasilek:main

Conversation

@pvasilek
Copy link
Copy Markdown

There is support for llama-server inferenced embedding models, like
"llama-server.exe" --model %MODEL_PATH% --alias "text-embedder" --embeddings --jinja --port 11435

@mraza007
Copy link
Copy Markdown
Owner

  1. On src/memory/embeddings/llama.py (return lines in embed() and search()):
    resp.json()[0]["embedding"][0] returns a single float, not the full embedding vector.
    This class advertises list[float], so this should likely return the full embedding array (same for both methods).
  2. On src/memory/embeddings/llama_nomic.py (return lines in embed() and search()):
    Same issue here: resp.json()[0]["embedding"][0] returns one dimension only.
    Please return the full vector, otherwise downstream vector search/dimension logic will break.
  3. On src/memory/embeddings/base.py (new abstract search()):
    Making search() abstract is a breaking interface change for existing embedding providers/test doubles that only implement embed().
    This currently breaks fixture instantiation (FakeEmbeddingProvider) in tests.
    Suggestion: add a default search() implementation that delegates to embed() for backward compatibility.
  4. On tests/test_search.py (assertions around embedding calls):
    Tests still assert embed() is called, but implementation now calls search().
    Please update these assertions (or keep compatibility via default search=embed) so tests reflect the new contract.

@pvasilek
Copy link
Copy Markdown
Author

It is not a single float as llama-server return array of arrays. Check the log below.

srv  log_server_r: done request: POST /embeddings 127.0.0.1 200
srv  log_server_r: request:  {"model":"text-embedder","content":"Switched to JWT auth Replaced session cookies with JWT Needed stateless auth for API All endpoints now require Bearer token auth jwt"}
srv  log_server_r: response: [{"index":0,"embedding":[[-0.0443197600543499,...,-0.05202531814575195]]}]

As I see 'search = embed' exists in the old classes. It does not work as expected?

@mraza007
Copy link
Copy Markdown
Owner

Thanks, this helps. You’re right that your llama-server response shape is nested (embedding: [[...]]), so indexing [0] can be expected there


Please add tests for the new search() contract before merge.

I’m still seeing two regressions on current PR head:

  1. EmbeddingProvider.search() is abstract now, which breaks existing providers/test doubles that only implement embed() (fixture error in tests/conftest.py).
  2. tests/test_search.py still expects embed() calls, but implementation now calls search().

Can you update/add tests to cover this explicitly?

@pvasilek
Copy link
Copy Markdown
Author

I am not a python programmer and do not have pytest here. So, for me that is quite complex task, sorry.

@mraza007
Copy link
Copy Markdown
Owner

Sounds good
I'll take care of the tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants