Clinical RAG — Retrieval-Augmented Generation for Medical Texts

This repository implements a Retrieval-Augmented Generation (RAG) pipeline tailored for the clinical / medical domain.
It combines a retriever (for fetching relevant passages from a medical knowledge base) with a large language model (LLM) to generate grounded, context-aware answers to clinical queries.

Overview

Retriever: indexes clinical knowledge (e.g., PubMed abstracts, patient notes, or structured medical texts).
Augmented Generation: retrieved chunks are passed into the LLM to enhance factual accuracy and reduce hallucinations.
Goal: demonstrate how RAG can be applied in the healthcare space to support clinical decision-making, question answering, and knowledge exploration.

This is described in detail in the blog post:
Clinical Retrieval-Augmented Generation

Short Breakdown

I used the MIMIC-IV-Ext Direct dataset, I chunked the data, the optimal size and overlap of the data was decide by reading Evaluating the Ideal Chunk Size for a RAG System using LlamaIndex, Given the medical orientation of the data I decide to use BioBert to generate domain-specific embeddings. To store and retreive these vectors I used FAISS a library for efficient similarity search and clustering of dense vectors. To demonstrate my RAG app I coupled it with Google Gemini. This repositry contains the notebook for indexing and pre-proccessing medical data as well as the demo web app.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
faiss_vectorstore_models_text-embedding-004		faiss_vectorstore_models_text-embedding-004
mimic-iv-ext-direct-1.0.0		mimic-iv-ext-direct-1.0.0
.gitignore		.gitignore
LICENSE		LICENSE
Medical_Q_A_RAG.ipynb		Medical_Q_A_RAG.ipynb
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clinical RAG — Retrieval-Augmented Generation for Medical Texts

Overview

Short Breakdown

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Clinical RAG — Retrieval-Augmented Generation for Medical Texts

Overview

Short Breakdown

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages