- 🔬 KT_GPT (Flagship LLM Research) — Architecting a 992.68M parameter Sparse Mixture-of-Experts (MoE) model with sub-200M active capacity (141.12M active parameters). Features low-rank Multi-head Latent Attention (MLA), SwiGLU experts, and online bias-based load balancing. Deployed on Modal and aligned with a custom 94k SFT dataset for robust, hallucination-free RAG grounding.
- 💻 CodeShiftAI — VS Code + FastAPI + Agno agent for realtime, context-aware coding; features WebSocket streaming, vector-based context retrieval, and incremental parsing.
- 🎥 Professor Peter — AI Video Educator featuring FFmpeg-based lip-sync and ElevenLabs TTS narration (Hackathon Winning Project).
- 🎬 ForgeTube — Script-to-video automation pipeline utilizing Stable Diffusion for scene generation, Kokoro TTS, and FFmpeg for automated MP4 rendering.
PyTorch · Transformers · MLA (Multi-head Latent Attention) · Mixture-of-Experts (MoE) · SwiGLU · LoRA Fine-Tuning · GRPO Reinforcement Learning · ChromaDB · Tokenizers
FastAPI · Agno Agentic Framework · WebSockets · Python · AsyncIO · REST APIs · PostgreSQL · SQLite
FFmpeg · Stable Diffusion (Prompt Generation) · ElevenLabs TTS · Kokoro TTS · Subprocess sandboxing · Media stitching
Modal Serverless · Docker · Weights & Biases (W&B) · UV Package Manager · Git/GitHub · VS Code · PyCharm
- 🎓 LLM Research Breakthrough: Successfully implemented and verified DeepSeek-style MLA (compressing KV cache by 8.8x) and Online Bias-Based Load Balancing inside our 992M parameter KT_GPT model.
- 💰 Infrastructure Grant: Secured a $500 Modal research grant + monthly credits to run scalable pretraining and reinforcement learning (GRPO).
- 🥇 Hackathon Champion: Took 1st place with Professor Peter (FastAPI-driven FFmpeg lip-sync + ElevenLabs TTS).
- 🎙️ Community Leadership: Conducted hands-on ML workshops on the Agno agent framework and hosted interactive tech events.
🚀 KT_GPT — Sparse Mixture-of-Experts LLM (PyTorch • MLA • SwiGLU • Modal)
- Compute & Capacity Decoupled: 992.68M total parameters with only 141.12M active parameters per token via 37 Routed Experts + 1 Shared Expert.
- Latent KV Cache Compression: Cutting KV cache per token per layer from 1,408 values to just 160 values (an 8.8x reduction), allowing ultra-long context inference on edge GPUs.
- Robust Grounding Alignment: Fine-tuned on a custom 94K-sample dataset to optimize no-info refusal ("I don't have that information") and math parsing via external tool results.
💻 CodeShiftAI — Realtime AI coding assistant (VS Code • FastAPI • Agno)
- Instant UX: WebSocket streaming suggestions with incremental parsing for lightning-fast latency.
- Deep Codebase Awareness: Vector-based context retrieval for multi-file knowledge and smarter completions.
- Agentic Routing: Multi-agent reasoning via Agno for task routing and complex code transformations.
🎥 Professor Peter — AI video educator (FastAPI • FFmpeg • ElevenLabs)
- Real-Time Lip Sync: Text-to-video generation engine utilizing FFmpeg filters for audio-to-mouth alignment.
- High Fidelity Narration: Integrated ElevenLabs TTS API to render natural, high-fidelity lectures.
- Clean Orchestration: Designed simple REST APIs for rapid content generation queues.
🎬 ForgeTube — Script‑to‑video pipeline (SD prompts • FFmpeg • Kokoro TTS)
- LLM Script Segmentation: Translates raw scripts into individual scenes with custom-generated Stable Diffusion image prompts.
- Dynamic Media Stitching: Automated video stitching via FFmpeg with precise timestamp alignment for Kokoro TTS audio.
- Automated Publishing: Generates ready-to-upload YouTube MP4 files complete with background tracks.
- Email: kartikeyatrivedi@outlook.com
- LinkedIn: linkedin.com/in/kartikeyatrivedi
- GitHub: github.com/Kartikeya-trivedi
- Website: kartikeya.me


