🦙 Queue tha Llama

This is a web-based chat application that integrates Large Language Model (LLM) capabilities with Bull Queue, Redis, and Chroma. It handles concurrent chat sessions with advanced queue management, maintains robust client-server communication with heartbeat signals, utilizes a RAG model for chat memory, and smartly manages inactive clients and job cleanups for a seamless chat experience.

Prerequisites

Install npm Dependencies

run from cloned repo directory
```
npm ci
```

Setup Redis Docker Container

pull docker image

docker pull redis

run redis in docker

docker run -p 6379:6379 --name llm-redis -d redis redis-server --requirepass "yourpassword"

Setup Chroma Vector Store Container

Pull latest Docker Image
```
docker pull chromadb/chroma
```
Create storage for your Chroma Docker instance:
- create a directory somewhere on the server
  (example: c:\chromadb-storage\)
Start a Docker Container using server storage from previous step
```
docker run -d --name llm-chroma -p 8001:8000 -v C:\Git\llama\chromadb-storage:/chroma/chroma chromadb/chroma
```
Download Embedding Models

Skip this if the .env variable ALLOW_REMOTE_MODELS is true [default]; The model will automatically be downloaded as needed by the app.
- all-MiniLM-L6-v2
  - download all files from https://huggingface.co/Xenova/all-MiniLM-L6-v2/tree/main and save them to the models/Xenova/all-Min0LM-L6-v2 folder
- paraphrase-multilingual-MiniLM-L12-v2 (optional)
  - download all files from https://huggingface.co/Xenova/paraphrase-multilingual-MiniLM-L12-v2/tree/main and save them to the models/Xenova/paraphrase-multilingual-MiniLM-L12-v2 folder
Download and Run an LLM via Llama.cpp

Skip this if using a cloud API (.env variable LLM_SERVER_API="cloud")
- Download the latest version of llama.cpp from https://github.com/ggerganov/llama.cpp or run the downloader PowerShell Script here:
  ./tools/download-latest-llama.ps1
- Download a model (GGUF architecture) and save to your computer (note the path as it will be required when running your LLM Server)
  ⇢ Reccomended 7B models to try:
- Run the Llama.cpp server with continuous batching and parallel requests via command line or llmserver PowerShell Script:
  /tools/llmserver.ps1
```
.\server.exe -m .\models\7b\mistral-7b-instruct-v0.2.Q4_K_M.gguf -c 2048 -cb -np 2
```
(OPTIONAL) Setup AWS Bedrock .evn variable

If using AWS Bedrock for LLM Inference set the follwing .env variables:
- LLM_BEDROCK=true
- LLM_BEDROCK_REGION="us-west-2" # bedrock region
- LLM_BEDROCK_ACCESS_KEY_ID="AKIAxxxxxxxxxxxxxx" # bedrock access key
- LLM_BEDROCK_SECRET_ACCESS_KEY="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" # bedrock secret key
additionally, LLM_MODEL .env variable value must be set to one of the supported models as defined here:
https://github.com/jparkerweb/bedrock-wrapper?tab=readme-ov-file#supported-models
(OPTIONAL) Download and Run an Automatic Speech Recognition model via Whisper.cpp

used for audio transcriptions (experimental)
- whisper.cpp

Run

Ensure the Redis and Chroma Docker Containers are started
Ensure the Llama.cpp server is running a loaded LLM
Verify all environment variables are set correctly in the .env file:
- UI to server (message review concept / chatbot / audio transcriber)
- redis and choroma URLs/ports
- max concurrent requests (this is the parallel requests option set when you started your LLM server; default is 2)
- embedding model (default is english only: all-MiniLM-L6-v2)
- whisper server ULR/port (optional if you want to transcribe audio)
Start the Express Web server via
```
node server.js
```
Visit web server page (link displayed on server.js startup)
default site ⇢ http://localhost:3001/
Optionally Dashboards
- Redis Queue Dashboard
  default site ⇢ http://localhost:3001/admin/queues/
- Chroma Collections Dashboard
  default site ⇢ http://localhost:3001/list-collections/

Chromadb Admin

Optional Chromadb Admin UI for viewing your collections and running test searches with your embeddings:
⇢ chromadb-admin
⇢ GitHub repo

Name		Name	Last commit message	Last commit date
Latest commit History 140 Commits
.github		.github
docs		docs
media		media
public		public
tools		tools
.env		.env
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
admin-html-templates.js		admin-html-templates.js
api-routes.js		api-routes.js
chroma.js		chroma.js
embedding.js		embedding.js
embedding_test.js		embedding_test.js
llm-api.js		llm-api.js
package-lock.json		package-lock.json
package.json		package.json
queue-handler.js		queue-handler.js
queue-tha-llama.ico		queue-tha-llama.ico
semantic-routes.js		semantic-routes.js
semantic-routes.json		semantic-routes.json
server.js		server.js
utils.js		utils.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🦙 Queue tha Llama

Prerequisites

Install npm Dependencies

Setup Redis Docker Container

Setup Chroma Vector Store Container

Download Embedding Models

all-MiniLM-L6-v2

paraphrase-multilingual-MiniLM-L12-v2 (optional)

Download and Run an LLM via Llama.cpp

(OPTIONAL) Setup AWS Bedrock .evn variable

(OPTIONAL) Download and Run an Automatic Speech Recognition model via Whisper.cpp

Run

Chromadb Admin

About

Uh oh!

Releases 2

Sponsor this project

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

🦙 Queue tha Llama

Prerequisites

Install npm Dependencies

Setup Redis Docker Container

Setup Chroma Vector Store Container

Download Embedding Models

all-MiniLM-L6-v2

paraphrase-multilingual-MiniLM-L12-v2 (optional)

Download and Run an LLM via Llama.cpp

(OPTIONAL) Setup AWS Bedrock .evn variable

(OPTIONAL) Download and Run an Automatic Speech Recognition model via Whisper.cpp

Run

Chromadb Admin

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Sponsor this project

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages