Skip to content

jackc602/EDGAR-Searcher

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EDGAR-Searcher

A fully local RAG (Retrieval-Augmented Generation) application for exploring public company filings from the SEC EDGAR system. Pick a ticker, pull down the filings, embed them into a local vector database, and chat with an LLM that answers questions grounded in those filings — all on your own machine.

Overview

EDGAR-Searcher pulls 10-K and 10-Q filings from the SEC's EDGAR API, chunks and embeds the text into a Chroma vector database, and exposes a Streamlit UI for querying the filings through a local LLM served by Ollama. Nothing leaves your machine — no cloud APIs, no hosted models.

Features

  • SEC filing retrieval — Fetches 10-K and 10-Q filings for any public company by ticker and date range via the EDGAR API (backend/edgar_client.py).
  • Document chunking — Splits filing HTML into meaningful chunks with item-level metadata (backend/document_chunker.py).
  • Local embeddings — Uses Ollama's mxbai-embed-large model to embed chunks into a Chroma vector store (backend/embedding_client.py).
  • Hybrid retrieval with reranking — Retrieves candidates from Chroma then reranks them with BM25 keyword scoring for better context selection (backend/reranker.py).
  • Local LLM chat — Streams answers from a locally-running Ollama model with source citations back to the original filing (backend/llm_client.py).
  • Streamlit frontend — A simple multi-page UI for loading filings, inspecting retrieved chunks, and chatting with the filings.

Tech Stack

  • Python (3.9+)
  • Streamlit — frontend
  • Chroma — vector database
  • Ollama — local embedding and LLM runtime
  • requests / BeautifulSoup / lxml — EDGAR fetching and HTML parsing

Running Locally

Prerequisites

  1. Python 3.9+
  2. Ollama installed and running locally — https://ollama.com
  3. Chroma running locally as a server (see step 3 below)

Quick Start

  1. Clone and install dependencies

    git clone <your-repo-url>
    cd EDGAR-Searcher
    pip install -r requirements.txt
  2. Pull the required Ollama models

    ollama pull mxbai-embed-large
    ollama pull gemma3:270m

    These are just default models that are local friendly.

  3. Start a local Chroma server

    pip install chromadb
    chroma run --host localhost --port 8000
  4. Run the Streamlit app

    streamlit run frontend/app.py

    The app will be available at http://localhost:8501.

Run with Docker Compose

A docker-compose.yml is provided that spins up the Streamlit frontend, Ollama, and Chroma together:

docker compose up --build

Then open http://localhost:8501. Note: you'll still need to docker exec into the Ollama container to ollama pull the embedding and chat models on first run.

Notes

  • The app only fetches 10-K and 10-Q filings. Other form types are filtered out.
  • The default chat model is gemma3:270m, a very small model chosen so the app runs on modest hardware. Swap to gemma3:4b or another pulled model from the Chat page dropdown for higher-quality answers at the cost of more RAM and slower responses.
  • The SEC EDGAR API requires a User-Agent header. The current one lives in backend/edgar_client.py — update it to your own contact info before heavy use.
  • Reranker selection is controlled by the RERANKER_MODE env var (crossencoder | bm25 | off, default crossencoder) and overridable per session from the Chat page. The cross-encoder downloads mixedbread-ai/mxbai-rerank-xsmall-v1 (~70M params) on first use.
  • The Chroma collection used for embeddings is sec_filings_embeddings_v2. If you previously ran an earlier version with sec_filings_embeddings, that older collection can be removed with EmbeddingClient().delete_collection("sec_filings_embeddings") from a Python shell once you've confirmed the new one works.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors