Skip to content

jparkerweb/queue-tha-llama

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

140 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🦙 Queue tha Llama

banner

This is a web-based chat application that integrates Large Language Model (LLM) capabilities with Bull Queue, Redis, and Chroma. It handles concurrent chat sessions with advanced queue management, maintains robust client-server communication with heartbeat signals, utilizes a RAG model for chat memory, and smartly manages inactive clients and job cleanups for a seamless chat experience.


Prerequisites

  • Install npm Dependencies

    run from cloned repo directory

    npm ci
    
  • Setup Redis Docker Container

    pull docker image

    docker pull redis
    

    run redis in docker

    docker run -p 6379:6379 --name llm-redis -d redis redis-server --requirepass "yourpassword" 
    
  • Setup Chroma Vector Store Container

    Pull latest Docker Image

    docker pull chromadb/chroma
    

    Create storage for your Chroma Docker instance:

    • create a directory somewhere on the server
      (example: c:\chromadb-storage\)

    Start a Docker Container using server storage from previous step

    docker run -d --name llm-chroma -p 8001:8000 -v C:\Git\llama\chromadb-storage:/chroma/chroma chromadb/chroma
    
  • Download Embedding Models

    Skip this if the .env variable ALLOW_REMOTE_MODELS is true [default]; The model will automatically be downloaded as needed by the app.

  • Download and Run an LLM via Llama.cpp

    Skip this if using a cloud API (.env variable LLM_SERVER_API="cloud")

  • (OPTIONAL) Setup AWS Bedrock .evn variable

    If using AWS Bedrock for LLM Inference set the follwing .env variables:

    • LLM_BEDROCK=true
    • LLM_BEDROCK_REGION="us-west-2" # bedrock region
    • LLM_BEDROCK_ACCESS_KEY_ID="AKIAxxxxxxxxxxxxxx" # bedrock access key
    • LLM_BEDROCK_SECRET_ACCESS_KEY="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" # bedrock secret key

    additionally, LLM_MODEL .env variable value must be set to one of the supported models as defined here:
    https://github.com/jparkerweb/bedrock-wrapper?tab=readme-ov-file#supported-models

  • (OPTIONAL) Download and Run an Automatic Speech Recognition model via Whisper.cpp

    used for audio transcriptions (experimental)


Run

  • Ensure the Redis and Chroma Docker Containers are started

  • Ensure the Llama.cpp server is running a loaded LLM

  • Verify all environment variables are set correctly in the .env file:

    • UI to server (message review concept / chatbot / audio transcriber)
    • redis and choroma URLs/ports
    • max concurrent requests (this is the parallel requests option set when you started your LLM server; default is 2)
    • embedding model (default is english only: all-MiniLM-L6-v2)
    • whisper server ULR/port (optional if you want to transcribe audio)
  • Start the Express Web server via

    node server.js
    
  • Visit web server page (link displayed on server.js startup)
    default site ⇢ http://localhost:3001/

  • Optionally Dashboards


Chromadb Admin

Optional Chromadb Admin UI for viewing your collections and running test searches with your embeddings:
⇢ chromadb-admin
⇢ GitHub repo

About

🦙 Queue tha Llama ⇢ LLM-powered web chat application, featuring efficient queue management and real-time client activity tracking

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

Contributors