Skip to content
View Shehrozkashif's full-sized avatar

Highlights

  • Pro

Block or report Shehrozkashif

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Shehrozkashif/README.md

👋 About Me

AI Engineer with 2.5+ years building end-to-end production AI systems — from GPU-accelerated LLM inference and RAG pipelines to GAN-based hallucination mitigation and multi-agent automation.

I work across the full hardware-to-software AI stack: selected globally for Google Summer of Code 2026 at Intel/OpenVINO to build a 5-agent privacy-preserving desktop automation system, and contributed 12 verified merged PRs into Intel's production inference toolchain, the Linux Foundation's canonical RISC-V ISA specification, and Harvard/MIT's HNN-Core.


🏅 Recognition at a Glance

🏆 Achievement
🌐 Google Summer of Code 2026 — Selected Contributor, Intel / OpenVINO Toolkit
🐧 LFX Mentorship 2025 — Linux Foundation / RISC-V International (merit-based)
🔀 12 Verified Merged PRs — Intel · Linux Foundation · Harvard/MIT · MERL Lab
Sub-600ms end-to-end VLM+LLM inference on consumer hardware
🎓 IBM AI Engineer Professional Certificate (6 courses, Coursera 2025)

💼 Experience

🟣 Google Summer of Code 2026 — AI Engineer @ Intel (OpenVINO)

May 2026 – Sep 2026  ·  Google-Sponsored · Remote

Selected globally to build a privacy-preserving GUI Desktop Automation Agent — fully local, zero cloud dependency.

5-Agent Pipeline (A2A Protocol + MCP Server)
─────────────────────────────────────────────────────────────────
Router → Planning → UI Grounding → Action Execution → Reflection
         DeepSeek-R1    Phi-3.5-Vision      Native OS         Self-Correction
         Qwen-7B-INT4   INT4 via OVMS       via MCP           Loop
  • Sub-600ms step latency · INT4 quantization · KV caching · prefix caching · WEIGHTLESS optimization
  • Mentored directly by Intel engineers Ethan Yang and Zhuo Wu
  • Entire pipeline runs on-device — enterprise-grade privacy by design

🔵 AI Engineer — Skoop

Apr 2026 – Present  ·  Walnut Creek, CA · Remote

  • Production LangChain RAG pipelines with FAISS for enterprise knowledge management — measurable gains in LLM accuracy and hallucination reduction
  • Full data engineering stack: ingestion → semantic chunking → embedding → FAISS indexing → retrieval
  • Agentic AI workflows for autonomous multi-step task execution
  • Owns model monitoring loops — retrieval quality, relevance, and latency metrics

🟠 AI Engineer — TheOvalLabs

Jan 2026 – Apr 2026  ·  Remote

  • Built Verimate — AI platform automating UVM verification plan and testbench generation for RTL/chip designs (semiconductor sector)
  • LangChain + FAISS + Gemini API + GPU-accelerated local LLMs
  • Delivered measurable reduction in hardware verification engineering cycle time

🔴 Open Source AI Engineer — Intel (OpenVINO / NNCF)

Dec 2025 – Mar 2026  ·  Santa Clara, CA · Remote

  • Transpose-aware LoRA Correction — correct handling of MatMul layers with transposed activations in INT4/INT8 quantized LLMs
  • Migrated LLM compression examples to OpenVINO stateful inference flow — production-grade, no manual KV-cache handling
  • 3 merged PRs: NNCF #3814 · NNCF #3864

🟢 Software Engineer — LFX Mentorship @ Linux Foundation · RISC-V International

Mar 2025 – Jun 2025  ·  Remote · Mentored by Qualcomm, Ventana, Synopsys

  • 7 merged PRs to the canonical machine-readable RISC-V ISA specification
  • Implemented ISA extensions: Zilsd, Zclsd, Zcmop compressed MOP instructions
  • Ruby tooling, CI/CD automation with GitHub Actions, schema and IDL fixes

PRs: #521 · #530 · #542 · #577 · #617 · #654 · #923


🟡 Research Assistant — MERL Lab (Micro Electronics Research Lab)

Jul 2023 – Dec 2025  ·  Karachi, Pakistan

  • Led AI4Org: multi-discriminator GAN + REINFORCE RL hallucination mitigation — fine-tuned TinyLlama / GPT-2 / LLaMA2 / Mistral 7B with PEFT/LoRA on dual AMD RX 7900 XTX
  • Led ArcheV: first LLM benchmark suite for RISC-V RV32I assembly code — functional correctness, syntactic validity, ISA edge-case evaluation
  • Built Vermithor: 5-stage pipelined RV32I processor in Chisel (Scala) — hazard detection, forwarding, Verilator verification

🔀 Open Source — 12 Verified Merged PRs

Organization Repository PRs Highlights
🔵 Intel openvinotoolkit/nncf + openvino 3 Stateful LLM compression · transpose-aware LoRA · PyTorch frontend
🐧 Linux Foundation riscv-unified-db 7 ISA extensions · Ruby tooling · CI/CD automation
🔴 Harvard / MIT hnn-core 1 Documentation (#1001)
🟡 MERL Lab ai4org 1 hallucination reduction pipeline

🧠 Key Projects

🤖 GSoC '26 — Privacy-Preserving Desktop Automation Agent (Intel / OpenVINO)

Fully local multi-agent desktop automation — zero data leaves the device, enterprise privacy by design.

Agent Model Role
Router Rule-based Intent classification & dispatch
Planning DeepSeek-R1-Qwen-7B-INT4 Multi-step reasoning
UI Grounding Phi-3.5-Vision-INT4 via OVMS Screen understanding
Action Execution MCP Server Native OS/app control
Reflection LLM judge Quality evaluation & retry

Stack: OpenVINO OVMS A2A Protocol MCP PyQt INT4 Quantization KV Caching Python

🔬 AI4Org — GAN-based Hallucination Mitigation for Private LLMs

github.com/merledu/ai4org

Privacy-first framework reducing LLM hallucinations entirely on-premise — no external APIs.

  • Multi-discriminator GAN pipeline + REINFORCE RL optimization loop
  • FAISS semantic retrieval for grounded generation
  • Models: TinyLlama · GPT-2 · LLaMA2 · Mistral 7B via PEFT/LoRA/SFT
  • Hardware: RTX 2060 + dual AMD RX 7900 XTX · gradient checkpointing · CI/CD

Stack: PyTorch Transformers PEFT LoRA FAISS CUDA GANs REINFORCE RL

⏱️ sktime-MCP — MCP Protocol Layer for Time-Series ML

MCP server exposing sktime's full forecasting API — enabling LLM agents to invoke production ML pipelines as structured tool calls. Bridges classical ML and modern agentic architectures.

Stack: Python MCP Protocol sktime Scikit-learn

📊 ArcheV — First LLM Benchmark Suite for RISC-V RV32I

github.com/merledu/ArcheV

The first standardized evaluation framework for LLM-generated RISC-V assembly code — functional correctness · syntactic validity · ISA edge-case coverage · reproducible JSON output.

Stack: Python Verilog llama.cpp JSON


🛠️ Tech Stack

AI / LLM

RAG LangChain FAISS OpenVINO NNCF PEFT/LoRA GANs Agentic AI MCP A2A llama.cpp Gemini API

GPU & ML

PyTorch CUDA HuggingFace TensorFlow Scikit-learn MLflow INT4/INT8

Cloud & MLOps

Azure AWS GCP Docker GitHub Actions Linux

Languages

Python C++ Scala Ruby SQL Bash JavaScript


📊 GitHub Stats

GitHub Stats

GitHub Streak

Top Languages

Activity Graph


🎓 Education & Certifications

BSc Software Engineering — UIT University, Karachi  ·  Feb 2022 – Feb 2026

IBM AI Engineer Professional Certificate (Coursera / IBM, 2025)

Generative AI Applications with RAG and LangChain  ·  Fundamentals of AI Agents Using RAG and LangChain  ·  Generative AI Advanced Fine-Tuning for LLMs  ·  Generative AI Engineering and Fine-Tuning Transformers  ·  Deep Learning with PyTorch  ·  AI Capstone Project with Deep Learning


📬 Let's Connect

LinkedIn Email

Open to: Production AI · LLM Infrastructure · GPU Optimization · Agentic Systems · Research · Open Source


"The gap between a model and a system is where most people give up. That's where I live."

Profile Views

Pinned Loading

  1. riscv/riscv-unified-db riscv/riscv-unified-db Public

    Monorepo containing a machine-readable database of the RISC-V specification and artifact generation tools

    Ruby 172 145

  2. Vermithor Vermithor Public

    RISCV RV-32I 5 Stage Pipelined Processor

    Scala

  3. merledu/ai4org merledu/ai4org Public

    Hallucination reduction framework for LLMs using RAG, multi-discriminator RL, and automated data pipelines.

    Python 1 5

  4. intel-openvino-desktop-agent intel-openvino-desktop-agent Public

    Local desktop automation agent using VLMs + LLMs via Intel OpenVINO

    Python