Skip to content

FiscalMindset/vickykumar

Repository files navigation

Vicky Kumar

🖼️ Preview

algsoch home — identity and live dataset stats

Home — your identity + live training-dataset stats (repos · code files · posts · chunks), with the on-device toggle and browser history.



algsoch answer — Markdown, metadata, AD card, Mermaid diagram

An answer — rendered Markdown + metadata catalog + interactive AD card + a live Mermaid diagram.


🤖 What is algsoch?

Important

algsoch is a personal AI chatbot trained on your digital footprint — your GitHub repositories (the actual code + READMEs), your blog posts, and your profile. Unlike a generic chatbot, it genuinely knows your work, takes actions (emails you, looks up your latest repo), updates itself the moment you push new code, and shows you exactly how it reached every answer.

In plain terms: point it at your GitHub (and blog), and you get a private assistant that can answer "what's my latest project?", "explain my Blindfold repo", or "email me, I want to talk" — grounded in your real data, not made-up facts. It runs on fast cloud models (Groq/Ollama) by default, and can also run fully on-device in your browser (RunAnywhere, keyless) when you want privacy/offline.

🎯 What A personal, agentic AI trained on your repos, code, blogs & profile
⚙️ How RAG (retrieval) + function-calling agent + auto-reindex on every push
🧠 Brains Groq (fast) · Ollama (local) · RunAnywhere (on-device, optional)
🔒 Yours History stays in your browser · export a fine-tune dataset anytime

✨ What it does

  • 🧠 Agentic — a real function-calling loop (Groq / Ollama) that decides which tools to use
  • 🔧 Tools — search knowledge, query GitHub, send you email, get profile, export dataset
  • 🔁 Auto-updates — a GitHub webhook re-indexes a repo the moment you push
  • 📚 Deep training — indexes actual code files inside every repo, not just READMEs
  • 🔎 Explainable — every answer shows which dataset, which rule, and how it was produced
  • 🎨 Interactive — answers render Markdown + live Mermaid diagrams + inline HTML
  • 🕒 Private history — conversations live only in your browser (IndexedDB)
  • 📦 Exportable — download a fine-tune-ready JSONL of everything it learned

Runs with zero credentials. Every integration (Groq, Coral, GitHub, RunAnywhere, SMTP, ChromaDB) degrades gracefully — develop and demo the whole product before wiring any keys.


🏗️ Architecture

flowchart TB
    subgraph Browser["🖥️ Browser — React + Vite"]
        UI["Chat UI · suggestions · wake button"]
        FAST["Client rules / intent engine<br/>greeting · projects · contact · hiring"]
        CRAG["In-browser RAG<br/>hashEmbed + hybrid search over index.json"]
        FB["👍 / 👎 / 💡 feedback"]
        RA["on-device LLM (RunAnywhere)<br/>— coming soon"]
        HIST[("IndexedDB history")]
    end

    subgraph API["⚙️ FastAPI backend"]
        HOME["/ landing dashboard · /docs"]
        AGENT["Agent loop · function calling"]
        RAG["RAG · hybrid search (dense + keyword)"]
        TOOLS["Tools: github · email · profile · export"]
        GATE["rate limit · admin-gated reindex"]
        FBE["/feedback → feedback branch"]
    end

    subgraph Lake["📦 Data-lake + portable index"]
        VS[("Vector store")]
        PRE["prebuilt_index.json<br/>(committed · hash 384-d)"]
    end

    subgraph Providers["☁️ Providers"]
        GROQ["Groq — fast cloud LLM"]
        OLLAMA["Ollama — local LLM + nomic embeddings"]
        CORAL["Coral SQL / GitHub API"]
        SMTP["SMTP: Hostinger + Gmail"]
        GHA["GitHub Action · reindex on CI"]
    end

    UI --> FAST
    FAST -->|simple intent| UI
    FAST -->|substantive| AGENT
    UI -. backend asleep .-> CRAG
    CRAG --> PRE
    AGENT --> RAG --> VS
    AGENT --> GROQ
    AGENT --> OLLAMA
    RAG --> OLLAMA
    AGENT --> TOOLS --> CORAL
    TOOLS --> SMTP
    CORAL --> VS
    FB --> FBE
    PRE -->|seeds| VS
    PRE -->|synced to| CRAG
    GHA --> PRE
    UI -. local route .-> RA
    UI --> HIST
Loading

🔄 How one answer is produced

sequenceDiagram
    autonumber
    participant U as You
    participant F as React (App.tsx)
    participant C as Client rules + in-browser RAG
    participant B as FastAPI agent
    participant L as Groq / Ollama

    U->>F: ask a question
    alt common intent (greeting · project · contact · hiring · latest repo)
        F->>C: match intent / retrieve over committed index
        C-->>F: instant curated answer + sources
    else substantive question (backend reachable)
        F->>B: POST /chat (adaptive timeout: 60s local · 9s remote)
        B->>B: hybrid retrieve + tool calls (query_github / send_email)
        B->>L: messages + tool schemas
        L-->>B: final answer (Markdown + Mermaid)
        B-->>F: answer + metadata + 🔎 transparency + AD card
    else backend asleep / unreachable
        F->>C: in-browser RAG (hashEmbed + hybrid → extractive / NVIDIA proxy)
        C-->>F: answer from the committed index
    end
    F-->>U: rendered answer + 👍/👎 feedback + promo
Loading

🧬 Training / data pipeline

flowchart LR
    GH["GitHub repos<br/>paginated · ETag-cached<br/>README + code files + tree"] --> EX
    MED["Medium RSS"] --> EX
    PROF["Owner profile"] --> EX
    LI["LinkedIn export<br/>professional CSVs only"] --> EX
    RES["Resume PDF"] --> EX
    DROP["pdfs/ · notes/<br/>(drop files here)"] --> EX
    EX["Extract → redact secrets<br/>→ data-lake .md per record"] --> CH["Chunk ~500 tokens"]
    CH --> EM["Embed<br/>local: nomic 768-d · portable: hash 384-d"]
    EM --> VDB[("Vector store")]
    VDB --> PRE["prebuilt_index.json (committed)"]
    PRE --> FE["frontend/public/index.json<br/>→ in-browser RAG"]
    GHA["GitHub Action · Reindex"] -.rebuilds.-> PRE
    VDB --> EXP["dataset → JSONL<br/>(client-side or /export)"]
Loading

🧰 Tech stack

Layer Tech
Frontend React 18, Vite, react-markdown + Mermaid, localforage (IndexedDB)
Backend FastAPI, httpx, pydantic-settings
LLM (answering) Hybrid — client rules for common intents · Groq llama-3.3-70bOllama for substantive questions · in-browser fallback when the backend is asleep
Embeddings Ollama nomic-embed-text (768-d) → hash (384-d, portable for the committed index + in-browser RAG)
Retrieval Hybrid = dense cosine + keyword/title overlap — runs both server-side and in-browser (exact JS port)
Refresh GitHub Action rebuilds the index on CI → commits → frontend redeploys
On-device @runanywhere/web (WASM, keyless) — wired, currently coming soon
GitHub data Coral (withcoral/coral) SQL + GitHub REST fallback (paginated, ETag-cached)
Email Hostinger SMTP (primary) → Gmail (backup)
Deploy Render Blueprint (render.yaml) with COOP/COEP headers

🚀 Local development

1. Backend
cd backend
python3.12 -m venv .venv
.venv/bin/python -m pip install -r requirements.txt
cp .env.example .env          # optional — add GROQ_API_KEY, GITHUB_TOKEN, …
.venv/bin/python -m uvicorn app.main:app --reload --port 8000

Populate the knowledge base (deep-indexes repos + code files):

curl -X POST http://localhost:8000/reindex
2. Frontend
cd frontend
npm install
npm run dev          # http://localhost:5173  (proxies /api → :8000)
3. (Optional) Ollama — local LLM + embeddings
ollama serve
ollama pull qwen2.5            # OLLAMA_MODEL=auto picks an installed chat model
ollama pull nomic-embed-text   # embeddings (used automatically when present)
4. (Optional) On-device RunAnywhere — keyless

Drop a GGUF into frontend/public/models/ and set frontend/.env.local:

VITE_RUNANYWHERE_ENABLE=1
VITE_RUNANYWHERE_MODEL_URL=/models/qwen2.5-0.5b-instruct-q4_0.gguf

See frontend/public/models/README.md. Runs the LLM in your browser — no API key, offline-capable.

Tests: cd backend && .venv/bin/python -m pytest -q

📊 Status: STATUS.md · 🗺️ Roadmap: ROADMAP.md · ☁️ Cost-optimized Azure deploy: AZURE.md


🔌 API endpoints

Method Path Purpose
GET / landing dashboard — status, endpoints, docs links, reindex, chat tester
GET /health status, indexed chunks, capabilities, degraded list, data-lake manifest
POST /chat {message, conversation_id?, history?} → answer + metadata + transparency + promo (rate-limited)
GET /stats training-dataset stats (per-account repos, code files, posts) + owner identity
POST /send-email email the owner via SMTP (rate-limited; IP + user-agent added server-side)
POST /feedback 👍/👎 on an answer → committed to the feedback branch
GET /agent/tools discover algsoch's callable tools (for external AI agents)
POST /agent/call invoke a public tool (search_knowledge · get_repo_stats · get_profile · …)
GET /feedback/recent recent feedback (dashboard widget)
POST /reindex rebuild this backend's local index (admin-gated, background)
POST /reindex/publish trigger the GitHub Action that rebuilds + commits the index (admin-gated)
POST /webhook/github GitHub webhook → re-index the changed repo
GET /export/dataset?format=jsonl|raw download the fine-tune dataset
GET /profile promo-card data
GET /docs · /redoc interactive API docs

Answering is hybrid: common intents (greeting, projects, contact, resume, hiring) are answered instantly by client rules; substantive questions go to the backend LLM (Groq → Ollama); if the backend is asleep the frontend answers from the committed index in-browser. So the bot always responds.

🔄 Refreshing the knowledge index

The deployed bot answers from the committed backend/prebuilt_index.json (copied into the static site). To refresh it after adding repos/content, don't rely on a cloud reindex (the free-tier backend is RAM-limited and its disk is ephemeral). Instead:

  • Best — GitHub Action: Actions tab → Reindex knowledgeRun workflow (or the dashboard's 🚀 Rebuild & publish button). It reindexes on GitHub's runners, commits the index, and the frontend redeploys. Add a GH_PAT repo secret (public-repo read) for a higher API rate limit.
  • Local: rebuild with EMBED_BACKEND=hash and push (see AGENTS.md).

☁️ Deploy to Render

flowchart LR
    Repo["GitHub repo"] --> BP["Render Blueprint<br/>render.yaml"]
    BP --> API["algsoch-api<br/>FastAPI + persistent disk"]
    BP --> WEB["algsoch-web<br/>static React (COOP/COEP)"]
    GH["GitHub webhook"] --> API
    API -. installs .-> CORAL["Coral binary"]
Loading
Option A — Blueprint (one click, recommended)
  1. Push to GitHub → Render New → Blueprint → select render.yaml.
  2. Set secrets on algsoch-api: GROQ_API_KEY, GITHUB_TOKEN, WEBHOOK_SECRET, HOSTINGER_SMTP_PASS, GMAIL_APP_PASSWORD.
  3. Add a GitHub webhook → https://algsoch-api.onrender.com/webhook/github (JSON, secret = WEBHOOK_SECRET, events: Repositories + Pushes).
  4. Open https://algsoch-web.onrender.com.
Option B — Manual (no Blueprint, two services by hand)

Create the two services yourself in the Render dashboard (no render.yaml needed):

1. Backend — algsoch-api (Web Service)

  • New → Web Service → connect this repo → Root Directory: backend
  • Runtime: Python · Build: ./build.sh · Start: uvicorn app.main:app --host 0.0.0.0 --port $PORT
  • Add a Disk (for the vector store): mount at backend/data, ~1 GB
  • Environment: set ENVIRONMENT=production, GROQ_API_KEY, GITHUB_TOKEN, WEBHOOK_SECRET, GITHUB_USERS=fiscalmindset,algsoch, HOSTINGER_SMTP_PASS, GMAIL_APP_PASSWORD, and CORS_ORIGINS=https://<your-web-name>.onrender.com

2. Frontend — algsoch-web (Static Site)

  • New → Static Site → same repo → Root Directory: frontend
  • Build: npm install && npm run build · Publish Directory: dist
  • Environment: VITE_API_BASE=https://<your-api-name>.onrender.com
  • Rewrite rule: /*/index.html (SPA)
  • Custom Headers (for on-device RunAnywhere WASM): Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy: require-corp

3. Add the GitHub webhook → https://<your-api-name>.onrender.com/webhook/github (JSON, secret = WEBHOOK_SECRET, events: Repositories + Pushes), then open the web URL.

backend/build.sh installs Coral automatically; the GitHub REST fallback keeps everything working if it can't.


🔑 Environment (all optional — graceful degradation)

Group Keys
LLM GROQ_API_KEY, GROQ_MODEL, OLLAMA_MODEL=auto, OLLAMA_EMBED_MODEL
GitHub GITHUB_TOKEN, GITHUB_USERS=fiscalmindset,algsoch, WEBHOOK_SECRET
Training depth INDEX_CODE_FILES, MAX_FILES_PER_REPO, MAX_FILE_BYTES, INCLUDE_FORKS
Email HOSTINGER_SMTP_PASS, GMAIL_APP_PASSWORD, NOTIFY_EMAIL*

📊 Creator — Vicky Kumar

Follow FiscalMindset Follow algsoch LinkedIn

Account Repos Stars PRs merged Contributions
@FiscalMindset 20 6 22
@algsoch 107+ 24+ 28 350+

🏆 Pull Shark (22+ PRs) · YOLO · Quickdraw (<5 min merge) · Coral Hackathon Track 2 — Top 50


Vicky Kumar

Built by Vicky Kumar · FiscalMindset / algsoch

LinkedIn · Medium · YouTube · Portfolio · Email

MIT License · ⭐ star the repo if it helped · see CONTRIBUTING

Built with React · FastAPI · Groq · RunAnywhere · Coral · Render

About

algsoch is a personal AI chatbot trained on your digital footprint

Topics

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors