AI Engineer with 2.5+ years building end-to-end production AI systems — from GPU-accelerated LLM inference and RAG pipelines to GAN-based hallucination mitigation and multi-agent automation.
I work across the full hardware-to-software AI stack: selected globally for Google Summer of Code 2026 at Intel/OpenVINO to build a 5-agent privacy-preserving desktop automation system, and contributed 12 verified merged PRs into Intel's production inference toolchain, the Linux Foundation's canonical RISC-V ISA specification, and Harvard/MIT's HNN-Core.
| 🏆 | Achievement |
|---|---|
| 🌐 | Google Summer of Code 2026 — Selected Contributor, Intel / OpenVINO Toolkit |
| 🐧 | LFX Mentorship 2025 — Linux Foundation / RISC-V International (merit-based) |
| 🔀 | 12 Verified Merged PRs — Intel · Linux Foundation · Harvard/MIT · MERL Lab |
| ⚡ | Sub-600ms end-to-end VLM+LLM inference on consumer hardware |
| 🎓 | IBM AI Engineer Professional Certificate (6 courses, Coursera 2025) |
May 2026 – Sep 2026 · Google-Sponsored · Remote
Selected globally to build a privacy-preserving GUI Desktop Automation Agent — fully local, zero cloud dependency.
5-Agent Pipeline (A2A Protocol + MCP Server)
─────────────────────────────────────────────────────────────────
Router → Planning → UI Grounding → Action Execution → Reflection
DeepSeek-R1 Phi-3.5-Vision Native OS Self-Correction
Qwen-7B-INT4 INT4 via OVMS via MCP Loop
- Sub-600ms step latency · INT4 quantization · KV caching · prefix caching · WEIGHTLESS optimization
- Mentored directly by Intel engineers Ethan Yang and Zhuo Wu
- Entire pipeline runs on-device — enterprise-grade privacy by design
Apr 2026 – Present · Walnut Creek, CA · Remote
- Production LangChain RAG pipelines with FAISS for enterprise knowledge management — measurable gains in LLM accuracy and hallucination reduction
- Full data engineering stack: ingestion → semantic chunking → embedding → FAISS indexing → retrieval
- Agentic AI workflows for autonomous multi-step task execution
- Owns model monitoring loops — retrieval quality, relevance, and latency metrics
Jan 2026 – Apr 2026 · Remote
- Built Verimate — AI platform automating UVM verification plan and testbench generation for RTL/chip designs (semiconductor sector)
- LangChain + FAISS + Gemini API + GPU-accelerated local LLMs
- Delivered measurable reduction in hardware verification engineering cycle time
Dec 2025 – Mar 2026 · Santa Clara, CA · Remote
- Transpose-aware LoRA Correction — correct handling of MatMul layers with transposed activations in INT4/INT8 quantized LLMs
- Migrated LLM compression examples to OpenVINO stateful inference flow — production-grade, no manual KV-cache handling
- 3 merged PRs: NNCF #3814 · NNCF #3864
Mar 2025 – Jun 2025 · Remote · Mentored by Qualcomm, Ventana, Synopsys
- 7 merged PRs to the canonical machine-readable RISC-V ISA specification
- Implemented ISA extensions: Zilsd, Zclsd, Zcmop compressed MOP instructions
- Ruby tooling, CI/CD automation with GitHub Actions, schema and IDL fixes
Jul 2023 – Dec 2025 · Karachi, Pakistan
- Led AI4Org: multi-discriminator GAN + REINFORCE RL hallucination mitigation — fine-tuned TinyLlama / GPT-2 / LLaMA2 / Mistral 7B with PEFT/LoRA on dual AMD RX 7900 XTX
- Led ArcheV: first LLM benchmark suite for RISC-V RV32I assembly code — functional correctness, syntactic validity, ISA edge-case evaluation
- Built Vermithor: 5-stage pipelined RV32I processor in Chisel (Scala) — hazard detection, forwarding, Verilator verification
| Organization | Repository | PRs | Highlights |
|---|---|---|---|
| 🔵 Intel | openvinotoolkit/nncf + openvino | 3 | Stateful LLM compression · transpose-aware LoRA · PyTorch frontend |
| 🐧 Linux Foundation | riscv-unified-db | 7 | ISA extensions · Ruby tooling · CI/CD automation |
| 🔴 Harvard / MIT | hnn-core | 1 | Documentation (#1001) |
| 🟡 MERL Lab | ai4org | 1 | hallucination reduction pipeline |
🤖 GSoC '26 — Privacy-Preserving Desktop Automation Agent (Intel / OpenVINO)
Fully local multi-agent desktop automation — zero data leaves the device, enterprise privacy by design.
| Agent | Model | Role |
|---|---|---|
| Router | Rule-based | Intent classification & dispatch |
| Planning | DeepSeek-R1-Qwen-7B-INT4 | Multi-step reasoning |
| UI Grounding | Phi-3.5-Vision-INT4 via OVMS | Screen understanding |
| Action Execution | MCP Server | Native OS/app control |
| Reflection | LLM judge | Quality evaluation & retry |
Stack: OpenVINO OVMS A2A Protocol MCP PyQt INT4 Quantization KV Caching Python
🔬 AI4Org — GAN-based Hallucination Mitigation for Private LLMs
Privacy-first framework reducing LLM hallucinations entirely on-premise — no external APIs.
- Multi-discriminator GAN pipeline + REINFORCE RL optimization loop
- FAISS semantic retrieval for grounded generation
- Models: TinyLlama · GPT-2 · LLaMA2 · Mistral 7B via PEFT/LoRA/SFT
- Hardware: RTX 2060 + dual AMD RX 7900 XTX · gradient checkpointing · CI/CD
Stack: PyTorch Transformers PEFT LoRA FAISS CUDA GANs REINFORCE RL
⏱️ sktime-MCP — MCP Protocol Layer for Time-Series ML
MCP server exposing sktime's full forecasting API — enabling LLM agents to invoke production ML pipelines as structured tool calls. Bridges classical ML and modern agentic architectures.
Stack: Python MCP Protocol sktime Scikit-learn
📊 ArcheV — First LLM Benchmark Suite for RISC-V RV32I
The first standardized evaluation framework for LLM-generated RISC-V assembly code — functional correctness · syntactic validity · ISA edge-case coverage · reproducible JSON output.
Stack: Python Verilog llama.cpp JSON
AI / LLM
GPU & ML
Cloud & MLOps
Languages
BSc Software Engineering — UIT University, Karachi · Feb 2022 – Feb 2026
IBM AI Engineer Professional Certificate (Coursera / IBM, 2025)
Generative AI Applications with RAG and LangChain · Fundamentals of AI Agents Using RAG and LangChain · Generative AI Advanced Fine-Tuning for LLMs · Generative AI Engineering and Fine-Tuning Transformers · Deep Learning with PyTorch · AI Capstone Project with Deep Learning
Open to: Production AI · LLM Infrastructure · GPU Optimization · Agentic Systems · Research · Open Source


