algsoch

algsoch

GitHub · FiscalMindset · algsoch · LinkedIn · Medium · Portfolio · Website · 📄 Resume

🖼️ Preview

algsoch home — identity and live dataset stats

_{Home — your identity + live training-dataset stats (repos · code files · posts · chunks), with the on-device toggle and browser history.}

algsoch answer — Markdown, metadata, AD card, Mermaid diagram

_{An answer — rendered Markdown + metadata catalog + interactive AD card + a live Mermaid diagram.}

🤖 What is algsoch?

Important

algsoch is a personal AI chatbot trained on your digital footprint — your GitHub repositories (the actual code + READMEs), your blog posts, and your profile. Unlike a generic chatbot, it genuinely knows your work, takes actions (emails you, looks up your latest repo), updates itself the moment you push new code, and shows you exactly how it reached every answer.

In plain terms: point it at your GitHub (and blog), and you get a private assistant that can answer "what's my latest project?", "explain my Blindfold repo", or "email me, I want to talk" — grounded in your real data, not made-up facts. It runs on fast cloud models (Groq/Ollama) by default, and can also run fully on-device in your browser (RunAnywhere, keyless) when you want privacy/offline.

🎯 What	A personal, agentic AI trained on your repos, code, blogs & profile
⚙️ How	RAG (retrieval) + function-calling agent + auto-reindex on every push
🧠 Brains	Groq (fast) · Ollama (local) · RunAnywhere (on-device, optional)
🔒 Yours	History stays in your browser · export a fine-tune dataset anytime

✨ What it does

🧠 Agentic — a real function-calling loop (Groq / Ollama) that decides which tools to use
🔧 Tools — search knowledge, query GitHub, send you email, get profile, export dataset
🔁 Auto-updates — a GitHub webhook re-indexes a repo the moment you push
📚 Deep training — indexes actual code files inside every repo, not just READMEs

🔎 Explainable — every answer shows which dataset, which rule, and how it was produced
🎨 Interactive — answers render Markdown + live Mermaid diagrams + inline HTML
🕒 Private history — conversations live only in your browser (IndexedDB)
📦 Exportable — download a fine-tune-ready JSONL of everything it learned

Runs with zero credentials. Every integration (Groq, Coral, GitHub, RunAnywhere, SMTP, ChromaDB) degrades gracefully — develop and demo the whole product before wiring any keys.

🏗️ Architecture

flowchart TB
    subgraph Browser["🖥️ Browser — React + Vite"]
        UI["Chat UI · suggestions · wake button"]
        FAST["Client rules / intent engine<br/>greeting · projects · contact · hiring"]
        CRAG["In-browser RAG<br/>hashEmbed + hybrid search over index.json"]
        FB["👍 / 👎 / 💡 feedback"]
        RA["on-device LLM (RunAnywhere)<br/>— coming soon"]
        HIST[("IndexedDB history")]
    end

    subgraph API["⚙️ FastAPI backend"]
        HOME["/ landing dashboard · /docs"]
        AGENT["Agent loop · function calling"]
        RAG["RAG · hybrid search (dense + keyword)"]
        TOOLS["Tools: github · email · profile · export"]
        GATE["rate limit · admin-gated reindex"]
        FBE["/feedback → feedback branch"]
    end

    subgraph Lake["📦 Data-lake + portable index"]
        VS[("Vector store")]
        PRE["prebuilt_index.json<br/>(committed · hash 384-d)"]
    end

    subgraph Providers["☁️ Providers"]
        GROQ["Groq — fast cloud LLM"]
        OLLAMA["Ollama — local LLM + nomic embeddings"]
        CORAL["Coral SQL / GitHub API"]
        SMTP["SMTP: Hostinger + Gmail"]
        GHA["GitHub Action · reindex on CI"]
    end

    UI --> FAST
    FAST -->|simple intent| UI
    FAST -->|substantive| AGENT
    UI -. backend asleep .-> CRAG
    CRAG --> PRE
    AGENT --> RAG --> VS
    AGENT --> GROQ
    AGENT --> OLLAMA
    RAG --> OLLAMA
    AGENT --> TOOLS --> CORAL
    TOOLS --> SMTP
    CORAL --> VS
    FB --> FBE
    PRE -->|seeds| VS
    PRE -->|synced to| CRAG
    GHA --> PRE
    UI -. local route .-> RA
    UI --> HIST

🔄 How one answer is produced

sequenceDiagram
    autonumber
    participant U as You
    participant F as React (App.tsx)
    participant C as Client rules + in-browser RAG
    participant B as FastAPI agent
    participant L as Groq / Ollama

    U->>F: ask a question
    alt common intent (greeting · project · contact · hiring · latest repo)
        F->>C: match intent / retrieve over committed index
        C-->>F: instant curated answer + sources
    else substantive question (backend reachable)
        F->>B: POST /chat (adaptive timeout: 60s local · 9s remote)
        B->>B: hybrid retrieve + tool calls (query_github / send_email)
        B->>L: messages + tool schemas
        L-->>B: final answer (Markdown + Mermaid)
        B-->>F: answer + metadata + 🔎 transparency + AD card
    else backend asleep / unreachable
        F->>C: in-browser RAG (hashEmbed + hybrid → extractive / NVIDIA proxy)
        C-->>F: answer from the committed index
    end
    F-->>U: rendered answer + 👍/👎 feedback + promo

🧬 Training / data pipeline

flowchart LR
    GH["GitHub repos<br/>paginated · ETag-cached<br/>README + code files + tree"] --> EX
    MED["Medium RSS"] --> EX
    PROF["Owner profile"] --> EX
    LI["LinkedIn export<br/>professional CSVs only"] --> EX
    RES["Resume PDF"] --> EX
    DROP["pdfs/ · notes/<br/>(drop files here)"] --> EX
    EX["Extract → redact secrets<br/>→ data-lake .md per record"] --> CH["Chunk ~500 tokens"]
    CH --> EM["Embed<br/>local: nomic 768-d · portable: hash 384-d"]
    EM --> VDB[("Vector store")]
    VDB --> PRE["prebuilt_index.json (committed)"]
    PRE --> FE["frontend/public/index.json<br/>→ in-browser RAG"]
    GHA["GitHub Action · Reindex"] -.rebuilds.-> PRE
    VDB --> EXP["dataset → JSONL<br/>(client-side or /export)"]

🧰 Tech stack

Layer	Tech
Frontend	React 18, Vite, react-markdown + Mermaid, localforage (IndexedDB)
Backend	FastAPI, httpx, pydantic-settings
LLM (answering)	Hybrid — client rules for common intents · Groq `llama-3.3-70b` → Ollama for substantive questions · in-browser fallback when the backend is asleep
Embeddings	Ollama `nomic-embed-text` (768-d) → hash (384-d, portable for the committed index + in-browser RAG)
Retrieval	Hybrid = dense cosine + keyword/title overlap — runs both server-side and in-browser (exact JS port)
Refresh	GitHub Action rebuilds the index on CI → commits → frontend redeploys
On-device	`@runanywhere/web` (WASM, keyless) — wired, currently coming soon
GitHub data	Coral (`withcoral/coral`) SQL + GitHub REST fallback (paginated, ETag-cached)
Email	Hostinger SMTP (primary) → Gmail (backup)
Deploy	Render Blueprint (`render.yaml`) with COOP/COEP headers

🚀 Local development

1. Backend

cd backend
python3.12 -m venv .venv
.venv/bin/python -m pip install -r requirements.txt
cp .env.example .env          # optional — add GROQ_API_KEY, GITHUB_TOKEN, …
.venv/bin/python -m uvicorn app.main:app --reload --port 8000

Populate the knowledge base (deep-indexes repos + code files):

curl -X POST http://localhost:8000/reindex

2. Frontend

cd frontend
npm install
npm run dev          # http://localhost:5173  (proxies /api → :8000)

3. (Optional) Ollama — local LLM + embeddings

ollama serve
ollama pull qwen2.5            # OLLAMA_MODEL=auto picks an installed chat model
ollama pull nomic-embed-text   # embeddings (used automatically when present)

4. (Optional) On-device RunAnywhere — keyless

Drop a GGUF into frontend/public/models/ and set frontend/.env.local:

VITE_RUNANYWHERE_ENABLE=1
VITE_RUNANYWHERE_MODEL_URL=/models/qwen2.5-0.5b-instruct-q4_0.gguf

See frontend/public/models/README.md. Runs the LLM in your browser — no API key, offline-capable.

Tests: cd backend && .venv/bin/python -m pytest -q

📊 Status: STATUS.md · 🗺️ Roadmap: ROADMAP.md · ☁️ Cost-optimized Azure deploy: AZURE.md

🔌 API endpoints

Method	Path	Purpose
`GET`	`/`	landing dashboard — status, endpoints, docs links, reindex, chat tester
`GET`	`/health`	status, indexed chunks, capabilities, degraded list, data-lake manifest
`POST`	`/chat`	`{message, conversation_id?, history?}` → answer + metadata + transparency + promo (rate-limited)
`GET`	`/stats`	training-dataset stats (per-account repos, code files, posts) + owner identity
`POST`	`/send-email`	email the owner via SMTP (rate-limited; IP + user-agent added server-side)
`POST`	`/feedback`	👍/👎 on an answer → committed to the `feedback` branch
`GET`	`/agent/tools`	discover algsoch's callable tools (for external AI agents)
`POST`	`/agent/call`	invoke a public tool (`search_knowledge` · `get_repo_stats` · `get_profile` · …)
`GET`	`/feedback/recent`	recent feedback (dashboard widget)
`POST`	`/reindex`	rebuild this backend's local index (admin-gated, background)
`POST`	`/reindex/publish`	trigger the GitHub Action that rebuilds + commits the index (admin-gated)
`POST`	`/webhook/github`	GitHub webhook → re-index the changed repo
`GET`	`/export/dataset?format=jsonl\|raw`	download the fine-tune dataset
`GET`	`/profile`	promo-card data
`GET`	`/docs` · `/redoc`	interactive API docs

Answering is hybrid: common intents (greeting, projects, contact, resume, hiring) are answered instantly by client rules; substantive questions go to the backend LLM (Groq → Ollama); if the backend is asleep the frontend answers from the committed index in-browser. So the bot always responds.

🔄 Refreshing the knowledge index

The deployed bot answers from the committed backend/prebuilt_index.json (copied into the static site). To refresh it after adding repos/content, don't rely on a cloud reindex (the free-tier backend is RAM-limited and its disk is ephemeral). Instead:

Best — GitHub Action: Actions tab → Reindex knowledge → Run workflow (or the dashboard's 🚀 Rebuild & publish button). It reindexes on GitHub's runners, commits the index, and the frontend redeploys. Add a GH_PAT repo secret (public-repo read) for a higher API rate limit.
Local: rebuild with EMBED_BACKEND=hash and push (see AGENTS.md).

☁️ Deploy to Render

flowchart LR
    Repo["GitHub repo"] --> BP["Render Blueprint<br/>render.yaml"]
    BP --> API["algsoch-api<br/>FastAPI + persistent disk"]
    BP --> WEB["algsoch-web<br/>static React (COOP/COEP)"]
    GH["GitHub webhook"] --> API
    API -. installs .-> CORAL["Coral binary"]

Option A — Blueprint (one click, recommended)

Push to GitHub → Render New → Blueprint → select render.yaml.
Set secrets on algsoch-api: GROQ_API_KEY, GITHUB_TOKEN, WEBHOOK_SECRET, HOSTINGER_SMTP_PASS, GMAIL_APP_PASSWORD.
Add a GitHub webhook → https://algsoch-api.onrender.com/webhook/github (JSON, secret = WEBHOOK_SECRET, events: Repositories + Pushes).
Open https://algsoch-web.onrender.com.

Option B — Manual (no Blueprint, two services by hand)

Create the two services yourself in the Render dashboard (no render.yaml needed):

1. Backend — algsoch-api (Web Service)

New → Web Service → connect this repo → Root Directory: backend
Runtime: Python · Build: ./build.sh · Start: uvicorn app.main:app --host 0.0.0.0 --port $PORT
Add a Disk (for the vector store): mount at backend/data, ~1 GB
Environment: set ENVIRONMENT=production, GROQ_API_KEY, GITHUB_TOKEN, WEBHOOK_SECRET, GITHUB_USERS=fiscalmindset,algsoch, HOSTINGER_SMTP_PASS, GMAIL_APP_PASSWORD, and CORS_ORIGINS=https://<your-web-name>.onrender.com

2. Frontend — algsoch-web (Static Site)

New → Static Site → same repo → Root Directory: frontend
Build: npm install && npm run build · Publish Directory: dist
Environment: VITE_API_BASE=https://<your-api-name>.onrender.com
Rewrite rule: /* → /index.html (SPA)
Custom Headers (for on-device RunAnywhere WASM): Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy: require-corp

3. Add the GitHub webhook → https://<your-api-name>.onrender.com/webhook/github (JSON, secret = WEBHOOK_SECRET, events: Repositories + Pushes), then open the web URL.

backend/build.sh installs Coral automatically; the GitHub REST fallback keeps everything working if it can't.

🔑 Environment (all optional — graceful degradation)

Group	Keys
LLM	`GROQ_API_KEY`, `GROQ_MODEL`, `OLLAMA_MODEL=auto`, `OLLAMA_EMBED_MODEL`
GitHub	`GITHUB_TOKEN`, `GITHUB_USERS=fiscalmindset,algsoch`, `WEBHOOK_SECRET`
Training depth	`INDEX_CODE_FILES`, `MAX_FILES_PER_REPO`, `MAX_FILE_BYTES`, `INCLUDE_FORKS`
Email	`HOSTINGER_SMTP_PASS`, `GMAIL_APP_PASSWORD`, `NOTIFY_EMAIL*`

📊 Creator — Vicky Kumar

Account	Repos	Stars	PRs merged	Contributions
@FiscalMindset	20	6	22	—
@algsoch	107+	24+	28	350+

🏆 Pull Shark (22+ PRs) · YOLO · Quickdraw (<5 min merge) · Coral Hackathon Track 2 — Top 50

Built by Vicky Kumar · FiscalMindset / algsoch

LinkedIn · Medium · YouTube · Portfolio · Email

_{MIT License · ⭐ star the repo if it helped · see CONTRIBUTING}

_{Built with React · FastAPI · Groq · RunAnywhere · Coral · Render}

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.github/workflows		.github/workflows
assets		assets
backend		backend
frontend		frontend
infra		infra
.gitignore		.gitignore
.python-version		.python-version
AGENTS.md		AGENTS.md
AZURE.md		AZURE.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
ROADMAP.md		ROADMAP.md
STATUS.md		STATUS.md
explain.md		explain.md
render.yaml		render.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

algsoch

🖼️ Preview

🤖 What is algsoch?

✨ What it does

🏗️ Architecture

🔄 How one answer is produced

🧬 Training / data pipeline

🧰 Tech stack

🚀 Local development

🔌 API endpoints

🔄 Refreshing the knowledge index

☁️ Deploy to Render

🔑 Environment (all optional — graceful degradation)

📊 Creator — Vicky Kumar

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

algsoch

🖼️ Preview

🤖 What is algsoch?

✨ What it does

🏗️ Architecture

🔄 How one answer is produced

🧬 Training / data pipeline

🧰 Tech stack

🚀 Local development

🔌 API endpoints

🔄 Refreshing the knowledge index

☁️ Deploy to Render

🔑 Environment (all optional — graceful degradation)

📊 Creator — Vicky Kumar

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages