An agentic AI-optimized full-stack RAG platform that turns SEC filings into fast, cited, analyst-grade intelligence with deterministic routing, single-call synthesis, and aggressive latency optimization.
Provisioned with Terraform and deployed on AWS with cost guardrails:
- EC2 (t3.micro) runs the FastAPI backend container
- ECR stores backend Docker images
- S3 Static Website hosts the React frontend build
- SSM Parameter Store stores runtime secrets/config
- AWS Budgets + CloudWatch Billing Alarm + SNS sends cost alerts
flowchart TB
U[Users / Browser]
S3[S3 Static Website\nReact Build]
EC2[EC2 t3.micro\nFastAPI Docker Container]
ECR[ECR\nBackend Image Repository]
SSM[SSM Parameter Store\nSecrets + Runtime Config]
EXT[OpenAI / Anthropic / Ollama]
PC[Pinecone Vector DB]
TF[Terraform]
BUD[AWS Budgets]
CW[CloudWatch Billing Alarm]
SNS[SNS Email Alerts]
U --> S3
S3 -->|REST API calls| EC2
EC2 --> EXT
EC2 --> PC
EC2 --> SSM
EC2 -->|docker pull| ECR
TF --> EC2
TF --> ECR
TF --> S3
TF --> SSM
TF --> BUD
TF --> CW
TF --> SNS
BUD --> SNS
CW --> SNS
Diagram source: docs/infra-architecture.mmd
| Single Company Q&A | Multi-Company Compare |
|---|---|
| Ask any question about a MAG7 stock's SEC filings and receive a cited, LLM-generated answer with source references. | Compare financial metrics, risks, and strategies across multiple companies side-by-side. |
This project is deliberately engineered to showcase optimized agentic AI systems design:
- Deterministic Router Agent minimizes unnecessary LLM calls and reduces cost/latency.
- Fast RAG Agent (single-call synthesis) compresses retrieval + reasoning + reporting into one high-efficiency pass.
- Retrieval + Answer Caching delivers ultra-fast repeated queries and benchmark-level responsiveness.
- Request Deduplication prevents duplicate concurrent work under load and improves throughput.
- Provider-Agnostic LLM Layer enables rapid model switching (OpenAI / Anthropic / Ollama) without architecture changes.
In short: this app is not just “using AI” — it is optimizing agentic AI execution paths for real-world performance.
- FastAPI + Async Python — blazing-fast APIs, clean architecture, and excellent developer velocity.
- LangChain Multi-Agent RAG — optimization-first routing + retrieval + synthesis that demonstrates true agentic orchestration.
- Pinecone Vector Database — lightning semantic search over large SEC filing corpora.
- React 18 + Vite — ultra-snappy UI feedback and modern frontend productivity.
- Terraform on AWS — repeatable, production-style infrastructure with real cost guardrails.
- FastAPI: automatic docs, strong typing, and async performance that scales elegantly.
- LangChain: flexible orchestration primitives for multi-step reasoning and retrieval workflows.
- Pinecone: purpose-built vector infrastructure optimized for low-latency relevance.
- React + Vite: excellent DX, fast HMR, and smooth interactive UX for data-heavy applications.
- Terraform: infrastructure as code that is predictable, reviewable, and easy to evolve.
- AWS (EC2/ECR/S3/SSM): practical cloud primitives that balance control, speed, and cost.
| Layer | Tech Stack |
|---|---|
| LLM Providers | OpenAI GPT-4o-mini · Anthropic Claude 3.5 Haiku · Ollama (local) |
| RAG Pipeline | LangChain 0.3 · Custom multi-agent architecture · Deterministic routing |
| Vector Database | Pinecone (serverless) · Sentence-Transformers embeddings |
| Backend | FastAPI · Pydantic v2 · Async Python · Uvicorn |
| Frontend | React 18 · Vite · Custom hooks · CSS modules |
| Data Source | SEC EDGAR API · 10-K & 10-Q filings |
| DevOps / Infra | Docker · Terraform · AWS EC2/ECR/S3/SSM · AWS Budgets · CloudWatch |
┌─────────────────────────────────────────────────────────────┐
│ React Frontend │
│ TickerSelector → ChatWindow → ComparePanel → SECPreview │
└────────────────────────┬────────────────────────────────────┘
│ REST API
┌────────────────────────▼────────────────────────────────────┐
│ FastAPI Backend │
│ │
│ ┌──────────┐ ┌──────────────┐ ┌────────────────────┐ │
│ │ Router │──▶│ Fast RAG │──▶│ LLM Provider │ │
│ │ Agent │ │ Agent │ │ (OpenAI/Anthropic/ │ │
│ │(deterministic)│ (single call)│ │ Ollama) │ │
│ └──────────┘ └──────┬───────┘ └────────────────────┘ │
│ │ │
│ ┌──────────▼───────────┐ │
│ │ Pinecone Vector DB │ │
│ │ (semantic retrieval) │ │
│ └───────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
flowchart LR
Q[User Question]
R[Router Agent\nDeterministic Intent Routing]
RET[Retriever\nPinecone Semantic Search]
CONTEXT[Top-k Filing Chunks\n+ Metadata]
FAST[Fast RAG Agent\nSingle-call synthesis]
LLM[OpenAI / Anthropic / Ollama]
A[Final Answer\nwith Citations]
Q --> R
R --> RET
RET --> CONTEXT
CONTEXT --> FAST
FAST --> LLM
LLM --> FAST
FAST --> A
Diagram source: docs/rag-agent-flow.mmd
Ask natural language questions about any MAG7 company's SEC filings. The system retrieves relevant filing excerpts, synthesizes an answer, and returns source citations — all in a single optimized LLM call.
- Router Agent — Deterministic classification (no LLM call) routes queries with optimization-first control.
- Fast RAG Agent — Retriever + analyst + reporter fused into a single LLM call (~3x fewer calls than naive chains).
- LLM Cache — Reusable LLM instances with provider-aware pooling to reduce warmup overhead.
- Request Deduplication Layer — Identical in-flight requests share execution for better concurrency behavior.
Switch between OpenAI GPT-4o-mini, Anthropic Claude 3.5 Haiku, or Ollama (fully local, offline) with a single click in the UI. No code changes required.
Compare financial metrics, risk factors, or business strategies across multiple MAG7 stocks side-by-side. Powered by concurrent API calls for fast results.
| Metric | Before | After | Improvement |
|---|---|---|---|
| Repeated query | 9.69s | 20ms | 485x faster |
| Compare 2 stocks (cached) | 12.21s | 16ms | 610x faster |
| Frontend re-renders | Excessive | Memoized | React.memo + useCallback |
| Health check polling | Every 30s | Every 2min | 4x reduction |
Toggle reranking, query rewriting, retrieval caching, section boosting, and hybrid search from the UI control panel — empowering users to experiment with different retrieval strategies.
Fetch the latest 10-K and 10-Q filings directly from the SEC EDGAR API, chunk and embed them, and store in Pinecone — all from inside the app.
├── backend/
│ ├── app/
│ │ ├── agents/ # Multi-agent RAG system
│ │ │ ├── router_agent.py # Deterministic query classifier
│ │ │ ├── fast_rag.py # Single-call RAG pipeline
│ │ │ ├── llm_cache.py # Provider-aware LLM caching
│ │ │ ├── retriever_agent.py
│ │ │ ├── analyst_agent.py
│ │ │ └── reporter_agent.py
│ │ ├── services/ # SEC EDGAR API, text processing
│ │ ├── utils/ # HTTP client, request deduplication
│ │ ├── main.py # FastAPI app with lifespan management
│ │ ├── models.py # Pydantic v2 request/response schemas
│ │ ├── config.py # Environment-based settings
│ │ └── pinecone_client.py # Vector DB client
│ ├── tests/ # pytest suite
│ ├── requirements.txt
│ └── Dockerfile
├── frontend/
│ ├── src/
│ │ ├── components/ # React 18 components (memoized)
│ │ │ ├── ChatWindow.jsx # Message display + auto-scroll
│ │ │ ├── ChatInput.jsx # User input with model selector
│ │ │ ├── ComparePanel.jsx # Multi-stock comparison
│ │ │ ├── ControlPanel.jsx # RAG parameter controls
│ │ │ ├── TickerSelector.jsx
│ │ │ └── SECPreviewModal.jsx
│ │ ├── services/api.js # API client with timeout/retry
│ │ └── App.jsx
│ ├── vitest.config.js # Frontend test config
│ ├── package.json
│ └── Dockerfile
├── docker-compose.yml # One-command full stack launch
├── start-all.sh # Dev startup script
└── README.md
- Python 3.9+ — Backend runtime
- Node.js 18+ — Frontend tooling
- API Keys — Pinecone + at least one LLM provider (OpenAI, Anthropic, or Ollama)
cd backend
cp .env.example .env
# Edit .env with your API keys# Backend
cd backend
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
# Frontend
cd ../frontend
npm install# Option A — One command
bash start-all.sh
# Option B — Docker
docker-compose up -d| Service | URL |
|---|---|
| Frontend | http://localhost:5173 |
| Backend API | http://localhost:8000 |
| API Docs (Swagger) | http://localhost:8000/docs |
# Backend
cd backend
pytest tests/ -v
pytest tests/ --cov=app # with coverage
# Frontend
cd frontend
npm test # run all tests
npm test -- --coverage # with coverageFinancial Performance
- "What was Apple's total revenue and operating income in 2023?"
- "How did NVIDIA's data center revenue grow compared to last year?"
- "What are Tesla's gross margins and how have they changed?"
Risk & Strategy
- "What are the key risk factors for Microsoft?"
- "What is Google's AI strategy according to their latest filings?"
- "What cybersecurity risks does Amazon face?"
Company Comparisons
- "Compare NVIDIA and AMD's GPU market performance and revenue"
- "How do Apple and Microsoft's R&D investments compare?"
- "Compare Amazon and Google's cloud infrastructure spending"
- Agentic path optimization — Explicitly engineered execution paths that minimize token, latency, and call overhead.
- Single-call RAG synthesis — Retrieval + reasoning + reporting in one pass for materially faster responses.
- Deterministic routing control — Zero-cost query routing before model invocation.
- Retrieval + answer caching — Sub-second repeat behavior and dramatic latency collapse on warm paths.
- Request deduplication under concurrency — Identical parallel requests are collapsed into one pipeline run.
- Provider-agnostic model orchestration — OpenAI ↔ Anthropic ↔ Ollama switching without architectural rewrites.
- Async-first throughput design — End-to-end async processing from API edge to model call.
- Preloaded embedding runtime — Startup-time model readiness avoids first-query cold penalties.
MIT