Skip to content
View rajveer100704's full-sized avatar

Block or report rajveer100704

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
rajveer100704/README.md

Hi, I'm Rajveer Singh Saggu πŸ‘‹

AI Systems Engineer β€’ ML Infrastructure Builder β€’ LLM Inference Optimization

I build production-grade AI systems focused on inference optimization, distributed ML infrastructure, agent orchestration, and scalable backend platforms.

Currently pursuing B.Tech in Electronics & Communication Engineering at BIT Mesra (CGPA: 9.0/10.0) while building systems that improve latency, throughput, reliability and deployment efficiency for modern AI applications.


πŸš€ Current Focus

  • LLM Inference Optimization
  • Triton & CUDA-based Systems
  • AI Infrastructure Engineering
  • Agent Orchestration Frameworks
  • Distributed Systems
  • Production ML Platforms
  • Reliability & Observability

πŸ† Highlights

πŸ… Amazon ML Summer School Scholar (Top 0.2% Nationwide)

πŸ… CDAC Merit Scholar

πŸ… Open Source Contributor (GSSOC)

πŸ… ML Engineer @ Elevate Labs

πŸ… AI Systems Intern @ OutriX

πŸ… Cybersecurity Intern @ CDAC India

πŸ… Algorithmic Trading Intern @ Lunor AI


Professional Experience

ML Engineer Intern | Elevate Labs

  • Designed PyTorch training and inference pipelines for NLP and computer vision tasks; improved experiment reproducibility through structured preprocessing and automated evaluation tooling.
  • Optimized inference workflows via latency profiling and batching strategies, reducing average inference time by ∼18% across 3 deployed model variants.
  • Built ML evaluation harness for model validation, benchmarking and regression testing across 5 model iterations

AI Systems Intern | OutriX

  • Built an LLM evaluation pipeline processing 1M+ records β€” automated scoring, regression testing and failure triage β€” cutting experimentation turnaround time by 30%.
  • Owned ETL/ELT data workflows feeding inference benchmarking dashboards; instrumented with OpenTelemetry for end-to-end latency observability.
  • Profiled high-throughput AI inference workflows,identifying 3 bottleneck stages optimized to reduce p95 latency by ∼18%.

Cybersecurity Intern | CDAC India

  • Built a 3-stage anomaly detection pipeline on structured network-intrusion datasets (∼50K samples): feature extraction β†’ threshold calibration β†’ alert triage, reducing manual review queue by ∼35%.
  • Implemented distributed validation and monitoring workflows for automated anomaly scoring across multi-source security data streams.

Algorithmic Trading Intern| Lunor AI

  • Developed deterministic multi-asset trading strategies using SQL-backed financial time-series datasets.
  • Built backtesting systems evaluating Sharpe ratio, volatility and maximum drawdown for strategy validation.
  • Implemented volatility-adjusted optimization techniques improving risk-adjusted returns and portfolio stability.

Technical Expertise

Languages

Python β€’ C++ β€’ TypeScript β€’ SQL

AI & Machine Learning

PyTorch β€’ Transformers β€’ LLMs β€’ RAG β€’ CNNs β€’ Agent Systems

Inference & Optimization

CUDA β€’ Triton β€’ TensorRT β€’ FlashAttention β€’ Quantization β€’ KV Cache Optimization

Backend & Infrastructure

FastAPI β€’ Redis β€’ PostgreSQL β€’ Docker β€’ Kubernetes β€’ AWS

Observability

OpenTelemetry β€’ MLflow β€’ Monitoring β€’ Performance Profiling

Distributed Systems

AsyncIO β€’ Event-Driven Architecture β€’ Scheduling β€’ Caching β€’ Message Queues


What Interests Me

I enjoy solving engineering problems involving:

  • GPU Utilization Optimization
  • Inference Throughput Scaling
  • Low-Latency Architectures
  • Distributed Scheduling
  • Agent Systems
  • AI Reliability Engineering
  • Production AI Deployment

Connect With Me

πŸ“§ rajveer19255@gmail.com

πŸ’Ό LinkedIn

πŸ’» GitHub

🧠 LeetCode


πŸ“Š GitHub Analytics

πŸ”₯ GitHub Streak


Building the infrastructure layer powering modern AI systems.

Pinned Loading

  1. AdaptiveRL-Orchestrator AdaptiveRL-Orchestrator Public

    Production-grade adaptive AI inference control plane using hierarchical reinforcement learning for cache-aware request routing and proactive autoscaling across heterogeneous worker pools. Features …

    Python 1

  2. AgentForge-AI AgentForge-AI Public

    A multi-agent AI system that autonomously plans tasks, generates code, and reviews outputs. Specialized agents collaborate through feedback loops to transform high-level goals into reliable, produc…

    Python 1

  3. QuantForge QuantForge Public

    AI-driven vector compression and fused-attention engine for efficient LLM inference and large-scale vector search (HuggingFace + vLLM + Triton)

    Python 1

  4. sentinelX sentinelX Public

    Production-grade AI Control Plane for safe, reliable and observable LLM systems. Features dual-view safety, prompt injection defense, PII protection, multi-model fallback (OpenAIβ†’Ollama), rate limi…

    Python 1

  5. SyncSpace SyncSpace Public

    SyncSpace is a real-time collaboration platform that lets teams communicate, share files, and work together seamlessly in one unified space. It combines chat, voice, video, and smart syncing to kee…

    TypeScript 1

  6. Triton-Inference-Engine Triton-Inference-Engine Public

    Production-grade Transformer inference engine with Triton-based FlashAttention, static KV cache for O(1) decoding, and async dynamic batching. Achieves 5.7Γ— speedup over PyTorch baseline with real-…

    Python 1