RAG Service is a template service for a retrieval-augmented generator based on the examples of LangChain. See: Build a Retrieval Augmented Generation (RAG) App
This service can be used with curl but there is another project that serves the user interface: L1Blom/ragui
- Clone L1Blom/rag to your project directory
- Move config.py_example to config.py and add your API Keys
- Choose an ID for your instance, like MyDocs
- Copy constants/constants.ini to constants_MyDocs.ini
- Change the contents of this file to reflect your situation
- Create a MyDocs/ directory in data/ and in MyDocs/ a directory vectorstore/ and html/
- Add the files for your RAG context, like text and PDF files, to your MyDocs/ directory
- Create a Python virtual environment for your project (optional)
- Modify the example services/rag.service_template file to point to the right place of your project directory
- Copy the services file to /etc/systed/system/rag_MyDocs.service
- Enable and start the service
sudo systemctl enable rag_MyDocs
sudo systemctl start rag_MyDocs
sudo systemctl status rag_MyDocsAll calls support POST and GET. For <ID> use your chosen ID like MyDocs
-
/prompt/<ID>/
Parameter: prompt (string)
Your prompt to be send
-
/prompt/<ID>/full
Parameter: prompt (string)
Your prompt to be send
Returns all document fragments used for this prompt
-
/prompt/<ID>/search
Parameter: prompt (string)
Your prompt to be send
Similar search in the local documents, returns fragments and scores
-
/prompt/<ID>/model
Parameter: model (string)
Your model to be used, like "gpt-4o"
Checking on valid models with OpenAI client.models.list(). Can result in http 500 error (non-fatal)
-
/prompt/<ID>/temp
Parameter: temp (string, will be cast to float)
Temperature setting, between 0.0 and 2.0
Settings above 1.0 can give significant halicunations and degrades performance too.
Timeout can result in http 408 error (non-fatal)
-
/prompt/<ID>/reload
Parameters: none
Rebuilds the vector store from local files.
Current behavior:
- Clears existing vector data first.
- Reloads all configured document sources.
- Re-initializes the RAG chain after rebuild.
For X posts, reload uses local snapshots (
data/<tweet_id>/post.json+post.txt) and does not need a refetch. -
/prompt/<ID>/clear
Paramters: none
Clears the cache, the in-memory history
-
/prompt/<ID>/cache
Paramaters: none
Prints the cache contents to the response object
-
/prompt/<ID>/modelnames
Paramaters: none
Prints the names of the possible models used in the selected APIs
-
/prompt/<ID>/params
Paramaters: section (string), param (string)
Prints the settings from the .ini file
-
/prompt/<ID>/image
Parameters: prompt (string), image (URL to image)
Uploads the image to openAI and use prompt to get the desired contents like: 'What is the mood of the persons?'
Note: only works if model is set to 'gpt-4o'. Other models result in http 500 error (non-fatal)
-
/prompt/<ID>/upload
Parameters: file (string) (maximum size 16 Mb)
Uploads the file to the directory DATA_DIR, only if the extension is listed in DATA_GLOB_* If not, results in http 500 error (non-fatal)
-
/prompt/<ID>/uploadx
Parameters: url (string) - X (Twitter) post URL
Fetches an X post via the X API v2, stores a local snapshot, and vectorizes text for RAG context.
Supported URL formats:
Requires X API Bearer Token to be set in environment variable:
- X_API_KEY or TWITTER_BEARER_TOKEN
Get your API key from: https://developer.x.com/
Local storage layout per post:
data/<tweet_id>/post.jsondata/<tweet_id>/post.txtdata/<tweet_id>/images/data/<tweet_id>/videos/data/<tweet_id>/audio/
Indexing scope:
- Text content is indexed.
- Video and audio are downloaded for later use and are not indexed/transformed yet.
-
/prompt//uploadx/batch
Parameters: file (JSON array or text file, one URL per line)
Batch version of
uploadx.Behavior:
- Validates and normalizes URLs.
- Processes each valid URL sequentially.
- Stores URLs in
x.jsonand writes local per-post snapshots. - Returns a summary of successful and failed URLs.
# change the model
curl -X POST --data-urlencode "model=gpt-4o" http://<your server>:<your port>/prompt/<ID>/model
Model set to: gpt-4o
# prompt to your data
curl -X POST --data-urlencode "prompt=your question?" http://<your server>:<your port>/prompt/<ID>
Your answer based on the context files provided in data/<ID>important contstants are:
# simple string like "myDocs"
ID = _unittest
# Directory that will be scanned for files to be added to the context
DATA_DIR=data/_unittest
# All the file extentions you want to be part of the context, see LangChain documentation
# Currently text and pdf are supported by RAG Service
DATA_GLOB_TXT = *.txt
DATA_GLOB_PDF = *.pdf
# Persistence directory for vectorstore
PERSISTENCE = data/_unittest/vectorstore
# Where the HTML files reside, also needed for the unit tests
HTML = data/_unittest/htmlTo run the unit tests, run the program in the project directory using the ID '_unittest'. It will start a local RAG service accessible at port 8888 (see constants__unittest.py for all defaults). When it is running, unit tests can be performed. Currently when USE_LLM is set to OPENAI, it will run smoothly. Other settings like GROQ might fail depending on the licences you have because of too many calls per minute. if so, try to run the unit test one by one. See below all possible API-calls and paramters:
<your virtual environment>/bin/python ragservice.py _unittest
INFO:root:Working directory is /home/leen/projects/rag
INFO:httpx:HTTP Request: GET https://api.openai.com/v1/models "HTTP/1.1 200 OK"
INFO:root:path -> /prompt/_unittest prompt
INFO:root:path -> /prompt/_unittest/full prompt
INFO:root:path -> /prompt/_unittest/search prompt,similar
INFO:root:path -> /prompt/_unittest/documents id
INFO:root:path -> /prompt/_unittest/params section,param
INFO:root:path -> /prompt/_unittest/globals
INFO:root:path -> /prompt/_unittest/modelnames
INFO:root:path -> /prompt/_unittest/embeddingnames
INFO:root:path -> /prompt/_unittest/model model
INFO:root:path -> /prompt/_unittest/embeddings embedding
INFO:root:path -> /prompt/_unittest/chunk chunk_size,chunk_overlap
INFO:root:path -> /prompt/_unittest/temp temp
INFO:root:path -> /prompt/_unittest/reload
INFO:root:path -> /prompt/_unittest/clear
INFO:root:path -> /prompt/_unittest/cache
INFO:root:path -> /prompt/_unittest/file file
INFO:root:path -> /prompt/_unittest/context file,action
INFO:root:path -> /prompt/_unittest/image image,prompt
INFO:root:path -> /prompt/_unittest/upload
INFO:root:path -> /prompt/_unittest/uploadx url
INFO:chromadb.telemetry.product.posthog:Anonymized telemetry enabled. See https://docs.trychroma.com/telemetry for more information.
INFO:root:Loaded 8 chunks from persistent vectorstore
INFO:root:Chain initialized: gpt-4o
* Serving Flask app 'ragservice'
* Debug mode: off
INFO:werkzeug:WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on all addresses (0.0.0.0)
* Running on http://127.0.0.1:8888
* Running on http://192.168.2.200:8888
INFO:werkzeug:Press CTRL+C to quitNow you are able to run the unit tests:
<your virtual environament>/bin/python ragservice_unittest.py -v
Testing OPENAI
test_cache (__main__.RagServiceMethods.test_cache)
Test to print the contents of the cache ... User:content='who wrote rag service?' User:content='who wrote rag service?'
AI:content='RAG Service was developed by Leen Blom.' AI:content='RAG Service was developed by L1Blom.'
ok
test_clear (__main__.RagServiceMethods.test_clear)
Test to clear the cache ... ok
test_image (__main__.RagServiceMethods.test_image)
Test image ... ok
test_model (__main__.RagServiceMethods.test_model)
Test model setting, correct or incorrect model according to LLM ... ok
test_prompt (__main__.RagServiceMethods.test_prompt)
Test prompt ... ok
test_reload (__main__.RagServiceMethods.test_reload)
Test reload of the data ... ok
test_temperature (__main__.RagServiceMethods.test_temperature)
Test to set temparature too high, low, within boundaries 0.0 and 2.0 ... ok
----------------------------------------------------------------------
Ran 7 tests in 17.713s
OK- None at the moment
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.
The image used in the unittest is licensed CC BY-NC-ND 4.0 and was found at Trusted Reviews