Small Local RAG API (Node.js + SQLite + node-llama-cpp)

A lightweight, local-only Retrieval Augmented Generation (RAG) API.

Stack

Express - HTTP API layer
better-sqlite3 + sqlite-vector - local storage and vector similarity search
node-llama-cpp - local embedding + generation models (no cloud dependency)

The app reads .txt files from documents/, splits them into chunks, generates embeddings, stores them in SQLite, retrieves the most similar chunks for a query, and uses a local LLM to produce the final answer.

What this project does

Runs a local HTTP API on http://localhost:3000
Reads files from ./documents
Splits file content into chunks by blank lines (\n\n+)
Creates embeddings with model Qwen3-Embedding-0.6B
Stores embeddings in rag.db using cosine distance
Retrieves top 8 (can be modified) chunks for a query
Generates the final answer with model Qwen3-0.6B

Local-only scope

This is intentionally a small local project (not production-ready):

no auth
no multi-user isolation
no background jobs / queueing
no cloud storage

Requirements

Node.js (v24.14.1)

Setup

Install dependencies:

npm install

postinstall automatically downloads the two GGUF models into ./models.

Make sure this folder exists and contains your text files:

documents/

Start the API:

npm start

Or with watch mode:

npm run start:watch

API

Base URL: http://localhost:3000

1) List available local documents

curl -X GET http://localhost:3000/documents

Example response:

{
  "totalFiles": 2,
  "files": [{ "name": "cv.txt" }, { "name": "projects.txt" }]
}

2) Embed one document into the vector DB

file must exist inside ./documents.

curl -X POST http://localhost:3000/documents/embed \
  -H "Content-Type: application/json" \
  -d '{"fileName":"example.txt"}'

Example response:

{
  "message": "Embeddings created successfully",
  "chunksStored": 12
}

3) Search and generate an answer

curl -X POST http://localhost:3000/documents/search \
  -H "Content-Type: application/json" \
  -d '{"query":"What backend experience do you have?"}'

Example response:

{
  "generatedAnswer": "..."
}

4) Remove one embedded document from DB

This removes rows from SQLite, not the physical file in documents/.

curl -X DELETE http://localhost:3000/documents/cv.txt

Example response:

{
  "message": "File deleted successfully"
}

Useful scripts

npm start - run API
npm run start:watch - run API with Node watch mode
npm run models:pull - download models manually
npm run models:check - open node-llama-cpp chat check

Project files

index.js - API server + embedding/search flow
rag.db - local SQLite database (created at runtime)
models/ - downloaded GGUF models
documents/ - your local source files for indexing

Notes

First request that needs a model can take longer (model load).
The .txt file is first read as plain text (parsed into one full string), then split into chunks by blank lines using /\n\n+/ (paragraph-style boundaries).
1024 means the number of values in each embedding vector (dimensions), not text length. f32 means each of those 1024 values is stored as a 32-bit float (float32) in SQLite via vector_as_f32(...).
Embedding is currently one-file-at-a-time via API.
If a file is already embedded, /documents/embed returns an error until you delete it from DB first.
Im using macOS Apple Silicon that is why @sqliteai/sqlite-vector-darwin-arm64 is loaded, this is installed automatically. For other platforms, check sqlite-vector docs.
The prompt should be modified in index.js to fit your use case, currently it's a simple instruction + retrieved chunks. You can also add system instructions or few-shot examples as needed.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
documents		documents
.gitignore		.gitignore
.nvmrc		.nvmrc
README.md		README.md
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Small Local RAG API (Node.js + SQLite + node-llama-cpp)

Stack

What this project does

Local-only scope

Requirements

Setup

API

1) List available local documents

2) Embed one document into the vector DB

3) Search and generate an answer

4) Remove one embedded document from DB

Useful scripts

Project files

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Small Local RAG API (Node.js + SQLite + node-llama-cpp)

Stack

What this project does

Local-only scope

Requirements

Setup

API

1) List available local documents

2) Embed one document into the vector DB

3) Search and generate an answer

4) Remove one embedded document from DB

Useful scripts

Project files

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages