DOTNET-RAG-Document-QA

Author: Krish Nangia

A .NET 8 console application implementing a RAG pipeline for document-based question answering.
This code ingests plain text documents, generate embeddings for semantic search, and query an LLM to get answers grounded strictly in the provided documents.

Overview

This application is designed to demonstrate a simple RAG workflow:

Document Ingestion – Read and chunk text documents.
Vector Embedding – Generate embeddings for each chunk using Gemini.
Vector Store – Store chunk embeddings with metadata for retrieval.
Query Handling – Accept user questions via the console.
Similarity Search – Retrieve the most relevant chunks using cosine similarity.
LLM Answer Generation – Send retrieved context + user question to LLM to produce a grounded answer.

Step-by-Step Flow

1. Program Startup

Loads FileVectorStore from the configured VECTOR_STORE_LOCATION.
If the folder does not exist, initializes a new store.

2. Document Ingestion

Scans DOCUMENTS_FOLDER_PATH for .txt files.
Checks whether files were previously ingested by comparing existing vector store chunks.
Already ingested files are skipped to avoid duplication.
Currently, ingesting all documents in the Stories directory from scratch takes around 6 minutes.

3. Chunking Documents

Reads each file’s content.
Uses SemanticSlicer to split the document into smaller chunks (e.g., 50–100 words per chunk).

4. Embedding Generation

For each chunk, calls GeminiEmbeddingGenerator to produce a vector embedding.
Stores metadata for each chunk:
- SourceFile – the original filename
- ChunkIndex – position of the chunk within the file

5. Vector Store Update

Adds each chunk and its embedding to FileVectorStore.
Implements a delay (delayMs) between API calls to avoid rate limiting.

6. User Query Handling

Waits for console input from the user.
Accepts natural language questions.
Type exit to quit.

7. Similarity Search

Computes similarity (cosine similarity) between query embedding and stored chunk embeddings.
Retrieves the top N most relevant chunks (default: 5).

8. Context Construction

Combines the retrieved chunks into a single context string.
Optionally includes metadata such as SourceFile and ChunkIndex for debugging.

9. LLM Query & Answer Generation

Sends the context and question as a prompt to the LLM (LLMService).
Receives an answer grounded strictly in the provided context.

10. Output Answer

Prints the answer to the console.
Loops back to accept more queries until the user exits.

Features

Incremental document ingestion.
Chunk-based vector embeddings for precise semantic retrieval.
Top-N similarity search for relevant context.
LLM-based answer generation grounded in retrieved chunks.
Simple console interface with safe exit handling.

Approach & Assumptions

Unique filenames are used for ingestion detection. If a file is modified but the name is unchanged, the system will not re-ingest it.
The vector store preserves all previous embeddings; old chunks remain even if files are updated.
Works only with plain text documents. PDFs or Word documents require preprocessing into .txt.
Delays (delayMs) are used to avoid hitting rate limits.
Metadata such as SourceFile and ChunkIndex is optional but useful for debugging and context reference.

Setup & Installation

git clone https://github.com/knangia04/DOTNET-RAG-Document-QA.git
cd DOTNET-RAG-Document-QA
dotnet restore
dotnet build

Place your documents in the Documents folder and set GlobalSettings.DOCUMENTS_FOLDER_PATH in the GlobalSettings.cs.
Set GlobalSettings.VECTOR_STORE_LOCATION for storing chunk embeddings.

Usage

Run the application:

dotnet run

The program will ingest new documents and generate embeddings.
After ingestion, you can enter questions interactively.
Type exit to quit the program safely.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
RAGProject.Cli		RAGProject.Cli
RAGProject.sln		RAGProject.sln
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DOTNET-RAG-Document-QA

Overview

Step-by-Step Flow

1. Program Startup

2. Document Ingestion

3. Chunking Documents

4. Embedding Generation

5. Vector Store Update

6. User Query Handling

7. Similarity Search

8. Context Construction

9. LLM Query & Answer Generation

10. Output Answer

Features

Approach & Assumptions

Setup & Installation

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DOTNET-RAG-Document-QA

Overview

Step-by-Step Flow

1. Program Startup

2. Document Ingestion

3. Chunking Documents

4. Embedding Generation

5. Vector Store Update

6. User Query Handling

7. Similarity Search

8. Context Construction

9. LLM Query & Answer Generation

10. Output Answer

Features

Approach & Assumptions

Setup & Installation

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages