EDGAR-Searcher

A fully local RAG (Retrieval-Augmented Generation) application for exploring public company filings from the SEC EDGAR system. Pick a ticker, pull down the filings, embed them into a local vector database, and chat with an LLM that answers questions grounded in those filings — all on your own machine.

Overview

EDGAR-Searcher pulls 10-K and 10-Q filings from the SEC's EDGAR API, chunks and embeds the text into a Chroma vector database, and exposes a Streamlit UI for querying the filings through a local LLM served by Ollama. Nothing leaves your machine — no cloud APIs, no hosted models.

Features

SEC filing retrieval — Fetches 10-K and 10-Q filings for any public company by ticker and date range via the EDGAR API (backend/edgar_client.py).
Document chunking — Splits filing HTML into meaningful chunks with item-level metadata (backend/document_chunker.py).
Local embeddings — Uses Ollama's mxbai-embed-large model to embed chunks into a Chroma vector store (backend/embedding_client.py).
Hybrid retrieval with reranking — Retrieves candidates from Chroma then reranks them with BM25 keyword scoring for better context selection (backend/reranker.py).
Local LLM chat — Streams answers from a locally-running Ollama model with source citations back to the original filing (backend/llm_client.py).
Streamlit frontend — A simple multi-page UI for loading filings, inspecting retrieved chunks, and chatting with the filings.

Tech Stack

Python (3.9+)
Streamlit — frontend
Chroma — vector database
Ollama — local embedding and LLM runtime
requests / BeautifulSoup / lxml — EDGAR fetching and HTML parsing

Running Locally

Prerequisites

Python 3.9+
Ollama installed and running locally — https://ollama.com
Chroma running locally as a server (see step 3 below)

Quick Start

Clone and install dependencies

git clone <your-repo-url>
cd EDGAR-Searcher
pip install -r requirements.txt

Pull the required Ollama models
```
ollama pull mxbai-embed-large
ollama pull gemma3:270m
```
These are just default models that are local friendly.

Start a local Chroma server

pip install chromadb
chroma run --host localhost --port 8000

Run the Streamlit app
```
streamlit run frontend/app.py
```
The app will be available at http://localhost:8501.

Run with Docker Compose

A docker-compose.yml is provided that spins up the Streamlit frontend, Ollama, and Chroma together:

docker compose up --build

Then open http://localhost:8501. Note: you'll still need to docker exec into the Ollama container to ollama pull the embedding and chat models on first run.

Notes

The app only fetches 10-K and 10-Q filings. Other form types are filtered out.
The default chat model is gemma3:270m, a very small model chosen so the app runs on modest hardware. Swap to gemma3:4b or another pulled model from the Chat page dropdown for higher-quality answers at the cost of more RAM and slower responses.
The SEC EDGAR API requires a User-Agent header. The current one lives in backend/edgar_client.py — update it to your own contact info before heavy use.
Reranker selection is controlled by the RERANKER_MODE env var (crossencoder | bm25 | off, default crossencoder) and overridable per session from the Chat page. The cross-encoder downloads mixedbread-ai/mxbai-rerank-xsmall-v1 (~70M params) on first use.
The Chroma collection used for embeddings is sec_filings_embeddings_v2. If you previously ran an earlier version with sec_filings_embeddings, that older collection can be removed with EmbeddingClient().delete_collection("sec_filings_embeddings") from a Python shell once you've confirmed the new one works.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.streamlit		.streamlit
backend		backend
eval		eval
frontend		frontend
.DS_Store		.DS_Store
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EDGAR-Searcher

Overview

Features

Tech Stack

Running Locally

Prerequisites

Quick Start

Run with Docker Compose

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EDGAR-Searcher

Overview

Features

Tech Stack

Running Locally

Prerequisites

Quick Start

Run with Docker Compose

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages