A Streamlit-based fact-checking application that uses RAG (Retrieval-Augmented Generation) to verify claims against a curated database of Snopes fact-check articles. The system retrieves relevant context using FAISS vector search and generates structured verdicts with probability scores via GPT-4o.
- Data Ingestion — Fact-check articles are loaded from a curated JSON dataset sourced from Snopes (technology category).
- Chunking & Embedding — Articles are split using
RecursiveCharacterTextSplitterand embedded with OpenAI embeddings. - Vector Store — Chunks are indexed in a FAISS vector store for fast similarity search.
- Retrieval — When a user submits a claim, the top-3 most relevant chunks are retrieved.
- LLM Verdict — GPT-4o evaluates the claim against the retrieved context and returns a structured output with probabilities for
True,False, andUnproven, along with a detailed rationale.
| Component | Technology |
|---|---|
| Frontend | Streamlit |
| LLM | GPT-4o (via LangChain) |
| Embeddings | OpenAI Embeddings |
| Vector Store | FAISS |
| Output Parsing | Pydantic + LangChain |
- Python 3.10+
- OpenAI API key
# Clone the repository
git clone https://github.com/<your-username>/truth-seeker-fact-checker.git
cd truth-seeker-fact-checker
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Configure environment variables
cp .env.example .env
# Edit .env and add your OpenAI API keystreamlit run app.py.
├── app.py # Streamlit application (main entry point)
├── vector_stores.py # Data loading, chunking, FAISS indexing, and retrieval
├── read.py # Utility script for data exploration / preprocessing
├── technology_fact_checks.json # Curated fact-check dataset (Snopes — technology)
├── requirements.txt # Python dependencies
├── .env.example # Environment variable template
└── BUDT758O_Final Presentation.pdf # Project presentation
- Launch the app with
streamlit run app.py - Enter any claim in the text area (e.g., "5G towers spread COVID-19")
- Click Check Claim
- View the verdict (True / False / Unproven), confidence scores, probability distribution chart, rationale, and source links
BUDT758O — Designing Generative AI Systems
M.S. Information Systems, University of Maryland
This project was developed for academic purposes. The fact-check data is sourced from Snopes.
