Skip to content

taroserigano/MAG-7-SEC-Filing-Analyzer-RAG-Agneitc-AI--FastAPI-App____

Repository files navigation

MAG7 SEC Filings Analyzer

AI-Powered Financial Intelligence Platform

An agentic AI-optimized full-stack RAG platform that turns SEC filings into fast, cited, analyst-grade intelligence with deterministic routing, single-call synthesis, and aggressive latency optimization.

Python FastAPI React LangChain Pinecone Docker


Infrastructure (Terraform + AWS)

Provisioned with Terraform and deployed on AWS with cost guardrails:

  • EC2 (t3.micro) runs the FastAPI backend container
  • ECR stores backend Docker images
  • S3 Static Website hosts the React frontend build
  • SSM Parameter Store stores runtime secrets/config
  • AWS Budgets + CloudWatch Billing Alarm + SNS sends cost alerts
flowchart TB
        U[Users / Browser]
        S3[S3 Static Website\nReact Build]
        EC2[EC2 t3.micro\nFastAPI Docker Container]
        ECR[ECR\nBackend Image Repository]
        SSM[SSM Parameter Store\nSecrets + Runtime Config]
        EXT[OpenAI / Anthropic / Ollama]
        PC[Pinecone Vector DB]
        TF[Terraform]
        BUD[AWS Budgets]
        CW[CloudWatch Billing Alarm]
        SNS[SNS Email Alerts]

        U --> S3
        S3 -->|REST API calls| EC2
        EC2 --> EXT
        EC2 --> PC
        EC2 --> SSM
        EC2 -->|docker pull| ECR

        TF --> EC2
        TF --> ECR
        TF --> S3
        TF --> SSM
        TF --> BUD
        TF --> CW
        TF --> SNS

        BUD --> SNS
        CW --> SNS
Loading

Diagram source: docs/infra-architecture.mmd


Demo

Single Company Q&A Multi-Company Compare
Ask any question about a MAG7 stock's SEC filings and receive a cited, LLM-generated answer with source references. Compare financial metrics, risks, and strategies across multiple companies side-by-side.

🤖 Agentic AI Optimization Focus

This project is deliberately engineered to showcase optimized agentic AI systems design:

  • Deterministic Router Agent minimizes unnecessary LLM calls and reduces cost/latency.
  • Fast RAG Agent (single-call synthesis) compresses retrieval + reasoning + reporting into one high-efficiency pass.
  • Retrieval + Answer Caching delivers ultra-fast repeated queries and benchmark-level responsiveness.
  • Request Deduplication prevents duplicate concurrent work under load and improves throughput.
  • Provider-Agnostic LLM Layer enables rapid model switching (OpenAI / Anthropic / Ollama) without architecture changes.

In short: this app is not just “using AI” — it is optimizing agentic AI execution paths for real-world performance.


Key Technologies

🔥 Core Stack (Strong Highlights)

  • FastAPI + Async Python — blazing-fast APIs, clean architecture, and excellent developer velocity.
  • LangChain Multi-Agent RAG — optimization-first routing + retrieval + synthesis that demonstrates true agentic orchestration.
  • Pinecone Vector Database — lightning semantic search over large SEC filing corpora.
  • React 18 + Vite — ultra-snappy UI feedback and modern frontend productivity.
  • Terraform on AWS — repeatable, production-style infrastructure with real cost guardrails.

⚡ Why These Technologies Shine

  • FastAPI: automatic docs, strong typing, and async performance that scales elegantly.
  • LangChain: flexible orchestration primitives for multi-step reasoning and retrieval workflows.
  • Pinecone: purpose-built vector infrastructure optimized for low-latency relevance.
  • React + Vite: excellent DX, fast HMR, and smooth interactive UX for data-heavy applications.
  • Terraform: infrastructure as code that is predictable, reviewable, and easy to evolve.
  • AWS (EC2/ECR/S3/SSM): practical cloud primitives that balance control, speed, and cost.
Layer Tech Stack
LLM Providers OpenAI GPT-4o-mini · Anthropic Claude 3.5 Haiku · Ollama (local)
RAG Pipeline LangChain 0.3 · Custom multi-agent architecture · Deterministic routing
Vector Database Pinecone (serverless) · Sentence-Transformers embeddings
Backend FastAPI · Pydantic v2 · Async Python · Uvicorn
Frontend React 18 · Vite · Custom hooks · CSS modules
Data Source SEC EDGAR API · 10-K & 10-Q filings
DevOps / Infra Docker · Terraform · AWS EC2/ECR/S3/SSM · AWS Budgets · CloudWatch

Architecture

┌─────────────────────────────────────────────────────────────┐
│                      React Frontend                         │
│   TickerSelector → ChatWindow → ComparePanel → SECPreview   │
└────────────────────────┬────────────────────────────────────┘
                         │ REST API
┌────────────────────────▼────────────────────────────────────┐
│                    FastAPI Backend                           │
│                                                             │
│  ┌──────────┐   ┌──────────────┐   ┌────────────────────┐  │
│  │  Router   │──▶│  Fast RAG    │──▶│  LLM Provider      │  │
│  │  Agent    │   │  Agent       │   │  (OpenAI/Anthropic/ │  │
│  │(deterministic)│ (single call)│   │   Ollama)           │  │
│  └──────────┘   └──────┬───────┘   └────────────────────┘  │
│                         │                                    │
│              ┌──────────▼───────────┐                       │
│              │   Pinecone Vector DB  │                      │
│              │  (semantic retrieval)  │                      │
│              └───────────────────────┘                       │
└─────────────────────────────────────────────────────────────┘

RAG Agent Flow (Mermaid)

flowchart LR
        Q[User Question]
        R[Router Agent\nDeterministic Intent Routing]
        RET[Retriever\nPinecone Semantic Search]
        CONTEXT[Top-k Filing Chunks\n+ Metadata]
        FAST[Fast RAG Agent\nSingle-call synthesis]
        LLM[OpenAI / Anthropic / Ollama]
        A[Final Answer\nwith Citations]

        Q --> R
        R --> RET
        RET --> CONTEXT
        CONTEXT --> FAST
        FAST --> LLM
        LLM --> FAST
        FAST --> A
Loading

Diagram source: docs/rag-agent-flow.mmd


Features

Intelligent Q&A with Citations

Ask natural language questions about any MAG7 company's SEC filings. The system retrieves relevant filing excerpts, synthesizes an answer, and returns source citations — all in a single optimized LLM call.

Multi-Agent RAG Pipeline

  • Router Agent — Deterministic classification (no LLM call) routes queries with optimization-first control.
  • Fast RAG Agent — Retriever + analyst + reporter fused into a single LLM call (~3x fewer calls than naive chains).
  • LLM Cache — Reusable LLM instances with provider-aware pooling to reduce warmup overhead.
  • Request Deduplication Layer — Identical in-flight requests share execution for better concurrency behavior.

Multi-Provider LLM Support

Switch between OpenAI GPT-4o-mini, Anthropic Claude 3.5 Haiku, or Ollama (fully local, offline) with a single click in the UI. No code changes required.

Multi-Company Comparison

Compare financial metrics, risk factors, or business strategies across multiple MAG7 stocks side-by-side. Powered by concurrent API calls for fast results.

Performance Optimizations

Metric Before After Improvement
Repeated query 9.69s 20ms 485x faster
Compare 2 stocks (cached) 12.21s 16ms 610x faster
Frontend re-renders Excessive Memoized React.memo + useCallback
Health check polling Every 30s Every 2min 4x reduction

Advanced RAG Controls

Toggle reranking, query rewriting, retrieval caching, section boosting, and hybrid search from the UI control panel — empowering users to experiment with different retrieval strategies.

Real-Time SEC Data Ingestion

Fetch the latest 10-K and 10-Q filings directly from the SEC EDGAR API, chunk and embed them, and store in Pinecone — all from inside the app.


Project Structure

├── backend/
│   ├── app/
│   │   ├── agents/              # Multi-agent RAG system
│   │   │   ├── router_agent.py  #   Deterministic query classifier
│   │   │   ├── fast_rag.py      #   Single-call RAG pipeline
│   │   │   ├── llm_cache.py     #   Provider-aware LLM caching
│   │   │   ├── retriever_agent.py
│   │   │   ├── analyst_agent.py
│   │   │   └── reporter_agent.py
│   │   ├── services/            # SEC EDGAR API, text processing
│   │   ├── utils/               # HTTP client, request deduplication
│   │   ├── main.py              # FastAPI app with lifespan management
│   │   ├── models.py            # Pydantic v2 request/response schemas
│   │   ├── config.py            # Environment-based settings
│   │   └── pinecone_client.py   # Vector DB client
│   ├── tests/                   # pytest suite
│   ├── requirements.txt
│   └── Dockerfile
├── frontend/
│   ├── src/
│   │   ├── components/          # React 18 components (memoized)
│   │   │   ├── ChatWindow.jsx   #   Message display + auto-scroll
│   │   │   ├── ChatInput.jsx    #   User input with model selector
│   │   │   ├── ComparePanel.jsx #   Multi-stock comparison
│   │   │   ├── ControlPanel.jsx #   RAG parameter controls
│   │   │   ├── TickerSelector.jsx
│   │   │   └── SECPreviewModal.jsx
│   │   ├── services/api.js      # API client with timeout/retry
│   │   └── App.jsx
│   ├── vitest.config.js         # Frontend test config
│   ├── package.json
│   └── Dockerfile
├── docker-compose.yml           # One-command full stack launch
├── start-all.sh                 # Dev startup script
└── README.md

Quick Start

Prerequisites

  • Python 3.9+ — Backend runtime
  • Node.js 18+ — Frontend tooling
  • API Keys — Pinecone + at least one LLM provider (OpenAI, Anthropic, or Ollama)

1. Configure

cd backend
cp .env.example .env
# Edit .env with your API keys

2. Install

# Backend
cd backend
python -m venv venv
source venv/bin/activate    # Windows: venv\Scripts\activate
pip install -r requirements.txt

# Frontend
cd ../frontend
npm install

3. Launch

# Option A — One command
bash start-all.sh

# Option B — Docker
docker-compose up -d
Service URL
Frontend http://localhost:5173
Backend API http://localhost:8000
API Docs (Swagger) http://localhost:8000/docs

Testing

# Backend
cd backend
pytest tests/ -v
pytest tests/ --cov=app          # with coverage

# Frontend
cd frontend
npm test                          # run all tests
npm test -- --coverage            # with coverage

Sample Questions

Financial Performance

  • "What was Apple's total revenue and operating income in 2023?"
  • "How did NVIDIA's data center revenue grow compared to last year?"
  • "What are Tesla's gross margins and how have they changed?"

Risk & Strategy

  • "What are the key risk factors for Microsoft?"
  • "What is Google's AI strategy according to their latest filings?"
  • "What cybersecurity risks does Amazon face?"

Company Comparisons

  • "Compare NVIDIA and AMD's GPU market performance and revenue"
  • "How do Apple and Microsoft's R&D investments compare?"
  • "Compare Amazon and Google's cloud infrastructure spending"

Technical Highlights

  • Agentic path optimization — Explicitly engineered execution paths that minimize token, latency, and call overhead.
  • Single-call RAG synthesis — Retrieval + reasoning + reporting in one pass for materially faster responses.
  • Deterministic routing control — Zero-cost query routing before model invocation.
  • Retrieval + answer caching — Sub-second repeat behavior and dramatic latency collapse on warm paths.
  • Request deduplication under concurrency — Identical parallel requests are collapsed into one pipeline run.
  • Provider-agnostic model orchestration — OpenAI ↔ Anthropic ↔ Ollama switching without architectural rewrites.
  • Async-first throughput design — End-to-end async processing from API edge to model call.
  • Preloaded embedding runtime — Startup-time model readiness avoids first-query cold penalties.

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors