Shehroz Kashif Shehrozkashif

👋 About Me

AI Engineer with 2.5+ years building end-to-end production AI systems — from GPU-accelerated LLM inference and RAG pipelines to GAN-based hallucination mitigation and multi-agent automation.

I work across the full hardware-to-software AI stack: selected globally for Google Summer of Code 2026 at Intel/OpenVINO to build a 5-agent privacy-preserving desktop automation system, and contributed 12 verified merged PRs into Intel's production inference toolchain, the Linux Foundation's canonical RISC-V ISA specification, and Harvard/MIT's HNN-Core.

🏅 Recognition at a Glance

🏆	Achievement
🌐	Google Summer of Code 2026 — Selected Contributor, Intel / OpenVINO Toolkit
🐧	LFX Mentorship 2025 — Linux Foundation / RISC-V International (merit-based)
🔀	12 Verified Merged PRs — Intel · Linux Foundation · Harvard/MIT · MERL Lab
⚡	Sub-600ms end-to-end VLM+LLM inference on consumer hardware
🎓	IBM AI Engineer Professional Certificate (6 courses, Coursera 2025)

💼 Experience

🟣 Google Summer of Code 2026 — AI Engineer @ Intel (OpenVINO)

May 2026 – Sep 2026 · Google-Sponsored · Remote

Selected globally to build a privacy-preserving GUI Desktop Automation Agent — fully local, zero cloud dependency.

5-Agent Pipeline (A2A Protocol + MCP Server)
─────────────────────────────────────────────────────────────────
Router → Planning → UI Grounding → Action Execution → Reflection
         DeepSeek-R1    Phi-3.5-Vision      Native OS         Self-Correction
         Qwen-7B-INT4   INT4 via OVMS       via MCP           Loop

Sub-600ms step latency · INT4 quantization · KV caching · prefix caching · WEIGHTLESS optimization
Mentored directly by Intel engineers Ethan Yang and Zhuo Wu
Entire pipeline runs on-device — enterprise-grade privacy by design

🔵 AI Engineer — Skoop

Apr 2026 – Present · Walnut Creek, CA · Remote

Production LangChain RAG pipelines with FAISS for enterprise knowledge management — measurable gains in LLM accuracy and hallucination reduction
Full data engineering stack: ingestion → semantic chunking → embedding → FAISS indexing → retrieval
Agentic AI workflows for autonomous multi-step task execution
Owns model monitoring loops — retrieval quality, relevance, and latency metrics

🟠 AI Engineer — TheOvalLabs

Jan 2026 – Apr 2026 · Remote

Built Verimate — AI platform automating UVM verification plan and testbench generation for RTL/chip designs (semiconductor sector)
LangChain + FAISS + Gemini API + GPU-accelerated local LLMs
Delivered measurable reduction in hardware verification engineering cycle time

🔴 Open Source AI Engineer — Intel (OpenVINO / NNCF)

Dec 2025 – Mar 2026 · Santa Clara, CA · Remote

Transpose-aware LoRA Correction — correct handling of MatMul layers with transposed activations in INT4/INT8 quantized LLMs
Migrated LLM compression examples to OpenVINO stateful inference flow — production-grade, no manual KV-cache handling
3 merged PRs: NNCF #3814 · NNCF #3864

🟢 Software Engineer — LFX Mentorship @ Linux Foundation · RISC-V International

Mar 2025 – Jun 2025 · Remote · Mentored by Qualcomm, Ventana, Synopsys

7 merged PRs to the canonical machine-readable RISC-V ISA specification
Implemented ISA extensions: Zilsd, Zclsd, Zcmop compressed MOP instructions
Ruby tooling, CI/CD automation with GitHub Actions, schema and IDL fixes

PRs: #521 · #530 · #542 · #577 · #617 · #654 · #923

🟡 Research Assistant — MERL Lab (Micro Electronics Research Lab)

Jul 2023 – Dec 2025 · Karachi, Pakistan

Led AI4Org: multi-discriminator GAN + REINFORCE RL hallucination mitigation — fine-tuned TinyLlama / GPT-2 / LLaMA2 / Mistral 7B with PEFT/LoRA on dual AMD RX 7900 XTX
Led ArcheV: first LLM benchmark suite for RISC-V RV32I assembly code — functional correctness, syntactic validity, ISA edge-case evaluation
Built Vermithor: 5-stage pipelined RV32I processor in Chisel (Scala) — hazard detection, forwarding, Verilator verification

🔀 Open Source — 12 Verified Merged PRs

Organization	Repository	PRs	Highlights
🔵 Intel	openvinotoolkit/nncf + openvino	3	Stateful LLM compression · transpose-aware LoRA · PyTorch frontend
🐧 Linux Foundation	riscv-unified-db	7	ISA extensions · Ruby tooling · CI/CD automation
🔴 Harvard / MIT	hnn-core	1	Documentation (#1001)
🟡 MERL Lab	ai4org	1	hallucination reduction pipeline

🧠 Key Projects

🤖 GSoC '26 — Privacy-Preserving Desktop Automation Agent (Intel / OpenVINO)

Fully local multi-agent desktop automation — zero data leaves the device, enterprise privacy by design.

Agent	Model	Role
Router	Rule-based	Intent classification & dispatch
Planning	DeepSeek-R1-Qwen-7B-INT4	Multi-step reasoning
UI Grounding	Phi-3.5-Vision-INT4 via OVMS	Screen understanding
Action Execution	MCP Server	Native OS/app control
Reflection	LLM judge	Quality evaluation & retry

Stack: OpenVINO OVMS A2A Protocol MCP PyQt INT4 Quantization KV Caching Python

🔬 AI4Org — GAN-based Hallucination Mitigation for Private LLMs

github.com/merledu/ai4org

Privacy-first framework reducing LLM hallucinations entirely on-premise — no external APIs.

Multi-discriminator GAN pipeline + REINFORCE RL optimization loop
FAISS semantic retrieval for grounded generation
Models: TinyLlama · GPT-2 · LLaMA2 · Mistral 7B via PEFT/LoRA/SFT
Hardware: RTX 2060 + dual AMD RX 7900 XTX · gradient checkpointing · CI/CD

Stack: PyTorch Transformers PEFT LoRA FAISS CUDA GANs REINFORCE RL

⏱️ sktime-MCP — MCP Protocol Layer for Time-Series ML

MCP server exposing sktime's full forecasting API — enabling LLM agents to invoke production ML pipelines as structured tool calls. Bridges classical ML and modern agentic architectures.

Stack: Python MCP Protocol sktime Scikit-learn

📊 ArcheV — First LLM Benchmark Suite for RISC-V RV32I

github.com/merledu/ArcheV

The first standardized evaluation framework for LLM-generated RISC-V assembly code — functional correctness · syntactic validity · ISA edge-case coverage · reproducible JSON output.

Stack: Python Verilog llama.cpp JSON

🛠️ Tech Stack

AI / LLM

GPU & ML

Cloud & MLOps

Languages

📊 GitHub Stats

🎓 Education & Certifications

BSc Software Engineering — UIT University, Karachi · Feb 2022 – Feb 2026

IBM AI Engineer Professional Certificate (Coursera / IBM, 2025)

Generative AI Applications with RAG and LangChain · Fundamentals of AI Agents Using RAG and LangChain · Generative AI Advanced Fine-Tuning for LLMs · Generative AI Engineering and Fine-Tuning Transformers · Deep Learning with PyTorch · AI Capstone Project with Deep Learning

📬 Let's Connect

Open to: Production AI · LLM Infrastructure · GPU Optimization · Agentic Systems · Research · Open Source

"The gap between a model and a system is where most people give up. That's where I live."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly