Home — your identity + live training-dataset stats (repos · code files · posts · chunks), with the on-device toggle and browser history.
An answer — rendered Markdown + metadata catalog + interactive AD card + a live Mermaid diagram.
Important
algsoch is a personal AI chatbot trained on your digital footprint — your GitHub repositories (the actual code + READMEs), your blog posts, and your profile. Unlike a generic chatbot, it genuinely knows your work, takes actions (emails you, looks up your latest repo), updates itself the moment you push new code, and shows you exactly how it reached every answer.
In plain terms: point it at your GitHub (and blog), and you get a private assistant that can answer "what's my latest project?", "explain my Blindfold repo", or "email me, I want to talk" — grounded in your real data, not made-up facts. It runs on fast cloud models (Groq/Ollama) by default, and can also run fully on-device in your browser (RunAnywhere, keyless) when you want privacy/offline.
| 🎯 What | A personal, agentic AI trained on your repos, code, blogs & profile |
| ⚙️ How | RAG (retrieval) + function-calling agent + auto-reindex on every push |
| 🧠 Brains | Groq (fast) · Ollama (local) · RunAnywhere (on-device, optional) |
| 🔒 Yours | History stays in your browser · export a fine-tune dataset anytime |
|
|
Runs with zero credentials. Every integration (Groq, Coral, GitHub, RunAnywhere, SMTP, ChromaDB) degrades gracefully — develop and demo the whole product before wiring any keys.
flowchart TB
subgraph Browser["🖥️ Browser — React + Vite"]
UI["Chat UI · suggestions · wake button"]
FAST["Client rules / intent engine<br/>greeting · projects · contact · hiring"]
CRAG["In-browser RAG<br/>hashEmbed + hybrid search over index.json"]
FB["👍 / 👎 / 💡 feedback"]
RA["on-device LLM (RunAnywhere)<br/>— coming soon"]
HIST[("IndexedDB history")]
end
subgraph API["⚙️ FastAPI backend"]
HOME["/ landing dashboard · /docs"]
AGENT["Agent loop · function calling"]
RAG["RAG · hybrid search (dense + keyword)"]
TOOLS["Tools: github · email · profile · export"]
GATE["rate limit · admin-gated reindex"]
FBE["/feedback → feedback branch"]
end
subgraph Lake["📦 Data-lake + portable index"]
VS[("Vector store")]
PRE["prebuilt_index.json<br/>(committed · hash 384-d)"]
end
subgraph Providers["☁️ Providers"]
GROQ["Groq — fast cloud LLM"]
OLLAMA["Ollama — local LLM + nomic embeddings"]
CORAL["Coral SQL / GitHub API"]
SMTP["SMTP: Hostinger + Gmail"]
GHA["GitHub Action · reindex on CI"]
end
UI --> FAST
FAST -->|simple intent| UI
FAST -->|substantive| AGENT
UI -. backend asleep .-> CRAG
CRAG --> PRE
AGENT --> RAG --> VS
AGENT --> GROQ
AGENT --> OLLAMA
RAG --> OLLAMA
AGENT --> TOOLS --> CORAL
TOOLS --> SMTP
CORAL --> VS
FB --> FBE
PRE -->|seeds| VS
PRE -->|synced to| CRAG
GHA --> PRE
UI -. local route .-> RA
UI --> HIST
sequenceDiagram
autonumber
participant U as You
participant F as React (App.tsx)
participant C as Client rules + in-browser RAG
participant B as FastAPI agent
participant L as Groq / Ollama
U->>F: ask a question
alt common intent (greeting · project · contact · hiring · latest repo)
F->>C: match intent / retrieve over committed index
C-->>F: instant curated answer + sources
else substantive question (backend reachable)
F->>B: POST /chat (adaptive timeout: 60s local · 9s remote)
B->>B: hybrid retrieve + tool calls (query_github / send_email)
B->>L: messages + tool schemas
L-->>B: final answer (Markdown + Mermaid)
B-->>F: answer + metadata + 🔎 transparency + AD card
else backend asleep / unreachable
F->>C: in-browser RAG (hashEmbed + hybrid → extractive / NVIDIA proxy)
C-->>F: answer from the committed index
end
F-->>U: rendered answer + 👍/👎 feedback + promo
flowchart LR
GH["GitHub repos<br/>paginated · ETag-cached<br/>README + code files + tree"] --> EX
MED["Medium RSS"] --> EX
PROF["Owner profile"] --> EX
LI["LinkedIn export<br/>professional CSVs only"] --> EX
RES["Resume PDF"] --> EX
DROP["pdfs/ · notes/<br/>(drop files here)"] --> EX
EX["Extract → redact secrets<br/>→ data-lake .md per record"] --> CH["Chunk ~500 tokens"]
CH --> EM["Embed<br/>local: nomic 768-d · portable: hash 384-d"]
EM --> VDB[("Vector store")]
VDB --> PRE["prebuilt_index.json (committed)"]
PRE --> FE["frontend/public/index.json<br/>→ in-browser RAG"]
GHA["GitHub Action · Reindex"] -.rebuilds.-> PRE
VDB --> EXP["dataset → JSONL<br/>(client-side or /export)"]
| Layer | Tech |
|---|---|
| Frontend | React 18, Vite, react-markdown + Mermaid, localforage (IndexedDB) |
| Backend | FastAPI, httpx, pydantic-settings |
| LLM (answering) | Hybrid — client rules for common intents · Groq llama-3.3-70b → Ollama for substantive questions · in-browser fallback when the backend is asleep |
| Embeddings | Ollama nomic-embed-text (768-d) → hash (384-d, portable for the committed index + in-browser RAG) |
| Retrieval | Hybrid = dense cosine + keyword/title overlap — runs both server-side and in-browser (exact JS port) |
| Refresh | GitHub Action rebuilds the index on CI → commits → frontend redeploys |
| On-device | @runanywhere/web (WASM, keyless) — wired, currently coming soon |
| GitHub data | Coral (withcoral/coral) SQL + GitHub REST fallback (paginated, ETag-cached) |
| Hostinger SMTP (primary) → Gmail (backup) | |
| Deploy | Render Blueprint (render.yaml) with COOP/COEP headers |
1. Backend
cd backend
python3.12 -m venv .venv
.venv/bin/python -m pip install -r requirements.txt
cp .env.example .env # optional — add GROQ_API_KEY, GITHUB_TOKEN, …
.venv/bin/python -m uvicorn app.main:app --reload --port 8000Populate the knowledge base (deep-indexes repos + code files):
curl -X POST http://localhost:8000/reindex2. Frontend
cd frontend
npm install
npm run dev # http://localhost:5173 (proxies /api → :8000)3. (Optional) Ollama — local LLM + embeddings
ollama serve
ollama pull qwen2.5 # OLLAMA_MODEL=auto picks an installed chat model
ollama pull nomic-embed-text # embeddings (used automatically when present)4. (Optional) On-device RunAnywhere — keyless
Drop a GGUF into frontend/public/models/ and set frontend/.env.local:
VITE_RUNANYWHERE_ENABLE=1
VITE_RUNANYWHERE_MODEL_URL=/models/qwen2.5-0.5b-instruct-q4_0.gguf
See frontend/public/models/README.md. Runs the LLM in your browser — no API key, offline-capable.
Tests: cd backend && .venv/bin/python -m pytest -q
📊 Status: STATUS.md · 🗺️ Roadmap: ROADMAP.md · ☁️ Cost-optimized Azure deploy: AZURE.md
| Method | Path | Purpose |
|---|---|---|
GET |
/ |
landing dashboard — status, endpoints, docs links, reindex, chat tester |
GET |
/health |
status, indexed chunks, capabilities, degraded list, data-lake manifest |
POST |
/chat |
{message, conversation_id?, history?} → answer + metadata + transparency + promo (rate-limited) |
GET |
/stats |
training-dataset stats (per-account repos, code files, posts) + owner identity |
POST |
/send-email |
email the owner via SMTP (rate-limited; IP + user-agent added server-side) |
POST |
/feedback |
👍/👎 on an answer → committed to the feedback branch |
GET |
/agent/tools |
discover algsoch's callable tools (for external AI agents) |
POST |
/agent/call |
invoke a public tool (search_knowledge · get_repo_stats · get_profile · …) |
GET |
/feedback/recent |
recent feedback (dashboard widget) |
POST |
/reindex |
rebuild this backend's local index (admin-gated, background) |
POST |
/reindex/publish |
trigger the GitHub Action that rebuilds + commits the index (admin-gated) |
POST |
/webhook/github |
GitHub webhook → re-index the changed repo |
GET |
/export/dataset?format=jsonl|raw |
download the fine-tune dataset |
GET |
/profile |
promo-card data |
GET |
/docs · /redoc |
interactive API docs |
Answering is hybrid: common intents (greeting, projects, contact, resume, hiring) are answered instantly by client rules; substantive questions go to the backend LLM (Groq → Ollama); if the backend is asleep the frontend answers from the committed index in-browser. So the bot always responds.
The deployed bot answers from the committed backend/prebuilt_index.json (copied into the static site). To refresh it after adding repos/content, don't rely on a cloud reindex (the free-tier backend is RAM-limited and its disk is ephemeral). Instead:
- Best — GitHub Action: Actions tab → Reindex knowledge → Run workflow (or the dashboard's 🚀 Rebuild & publish button). It reindexes on GitHub's runners, commits the index, and the frontend redeploys. Add a
GH_PATrepo secret (public-repo read) for a higher API rate limit. - Local: rebuild with
EMBED_BACKEND=hashand push (see AGENTS.md).
flowchart LR
Repo["GitHub repo"] --> BP["Render Blueprint<br/>render.yaml"]
BP --> API["algsoch-api<br/>FastAPI + persistent disk"]
BP --> WEB["algsoch-web<br/>static React (COOP/COEP)"]
GH["GitHub webhook"] --> API
API -. installs .-> CORAL["Coral binary"]
Option A — Blueprint (one click, recommended)
- Push to GitHub → Render New → Blueprint → select
render.yaml. - Set secrets on algsoch-api:
GROQ_API_KEY,GITHUB_TOKEN,WEBHOOK_SECRET,HOSTINGER_SMTP_PASS,GMAIL_APP_PASSWORD. - Add a GitHub webhook →
https://algsoch-api.onrender.com/webhook/github(JSON, secret =WEBHOOK_SECRET, events: Repositories + Pushes). - Open
https://algsoch-web.onrender.com.
Option B — Manual (no Blueprint, two services by hand)
Create the two services yourself in the Render dashboard (no render.yaml needed):
1. Backend — algsoch-api (Web Service)
- New → Web Service → connect this repo → Root Directory:
backend - Runtime: Python · Build:
./build.sh· Start:uvicorn app.main:app --host 0.0.0.0 --port $PORT - Add a Disk (for the vector store): mount at
backend/data, ~1 GB - Environment: set
ENVIRONMENT=production,GROQ_API_KEY,GITHUB_TOKEN,WEBHOOK_SECRET,GITHUB_USERS=fiscalmindset,algsoch,HOSTINGER_SMTP_PASS,GMAIL_APP_PASSWORD, andCORS_ORIGINS=https://<your-web-name>.onrender.com
2. Frontend — algsoch-web (Static Site)
- New → Static Site → same repo → Root Directory:
frontend - Build:
npm install && npm run build· Publish Directory:dist - Environment:
VITE_API_BASE=https://<your-api-name>.onrender.com - Rewrite rule:
/*→/index.html(SPA) - Custom Headers (for on-device RunAnywhere WASM):
Cross-Origin-Opener-Policy: same-originandCross-Origin-Embedder-Policy: require-corp
3. Add the GitHub webhook → https://<your-api-name>.onrender.com/webhook/github (JSON, secret = WEBHOOK_SECRET, events: Repositories + Pushes), then open the web URL.
backend/build.sh installs Coral automatically; the GitHub REST fallback keeps everything working if it can't.
| Group | Keys |
|---|---|
| LLM | GROQ_API_KEY, GROQ_MODEL, OLLAMA_MODEL=auto, OLLAMA_EMBED_MODEL |
| GitHub | GITHUB_TOKEN, GITHUB_USERS=fiscalmindset,algsoch, WEBHOOK_SECRET |
| Training depth | INDEX_CODE_FILES, MAX_FILES_PER_REPO, MAX_FILE_BYTES, INCLUDE_FORKS |
HOSTINGER_SMTP_PASS, GMAIL_APP_PASSWORD, NOTIFY_EMAIL* |
| Account | Repos | Stars | PRs merged | Contributions |
|---|---|---|---|---|
| @FiscalMindset | 20 | 6 | 22 | — |
| @algsoch | 107+ | 24+ | 28 | 350+ |
🏆 Pull Shark (22+ PRs) · YOLO · Quickdraw (<5 min merge) · Coral Hackathon Track 2 — Top 50
Built by Vicky Kumar · FiscalMindset / algsoch
LinkedIn · Medium · YouTube · Portfolio · Email
MIT License · ⭐ star the repo if it helped · see CONTRIBUTING
Built with React · FastAPI · Groq · RunAnywhere · Coral · Render