Skip to content
View SantoshAdabala's full-sized avatar

Block or report SantoshAdabala

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
SantoshAdabala/README.md

Portfolio LinkedIn Email


What I work on

I build ML systems end-to-end - from training runs on rented GPUs to production inference APIs that actually stay up. My focus is the gap between "it works in the notebook" and "it runs at scale without falling over."

Current obsessions: LLM alignment (why does reward accuracy diverge from factuality?), model compression without killing accuracy, and distributed ML pipelines that don't require a dedicated ops team to maintain.


Benchmarks that matter

Result Project
DPO Reward Accuracy 82% (peak 88%) distill-align-llm
Factuality - LLM-judge 75.7% on 500-prompt benchmark distill-align-llm
Model Compression 107.7M -> 65.2M params, 93.2% F1 retained clinical-nlp-optimization
Inference Speedup 39ms -> 10.8ms, 1.9× faster clinical-nlp-optimization
Weak Label Generation 19,506 entities from 7,064 PubMed abstracts clinical-nlp-optimization
SLA Compliance 97% of requests under 50ms (100-req load test) clinical-nlp-optimization
Training Cost ~$27 total, SFT + DPO on Llama-3.1-8B distill-align-llm

Projects

distill-align-llm - SFT -> DPO alignment on Llama-3.1-8B, Live dashboard

Full alignment pipeline using QLoRA (r=16, α=32, 4-bit NF4). Trained on RunPod for ~$27 total.

The main finding: reward accuracy and factuality are not the same thing. 82% reward accuracy on DPO, but only 17.6% factuality with strict keyword matching on 51 prompts - and 75.7% with a proper 500-prompt LLM-judge benchmark. Same model. The evaluation methodology matters more than people admit.

Token probability analysis showed the model knows the answers - it just suppresses them. Median correct token rank after SFT/DPO: position 2. It's a generation suppression problem, not a forgetting problem.

Stack: PyTorch, HuggingFace TRL, PEFT, bitsandbytes, Streamlit, pytest (44 passing)

Repo Dashboard

clinical-nlp-optimization - Knowledge distillation + distributed NLP pipeline for clinical NER

Compressed Bio_ClinicalBERT (107.7M params) down to DistilClinicalBERT (65.2M) while retaining 93.2% of F1. Deployed as a FastAPI inference server with Prometheus + OpenTelemetry observability.

The pipeline covers six components end-to-end: distillation, distributed weak labeling on PySpark/EMR, ONNX pruning + INT8 quantization, LangChain agentic evaluation, statistical A/B testing (Mann-Whitney + Wilcoxon), and a production observability stack. 97% SLA compliance on a 100-request load test.

Stack: PyTorch, HuggingFace, ONNX Runtime, PySpark, AWS EMR/S3/Lambda, FastAPI, LangChain, Prometheus, Grafana, Terraform

Repo

TheInheritableAgent - Cryptographic AI inheritance, Auth0 for AI Agents Hackathon

When someone passes away, their family inherits their belongings - but never their way of thinking. This lets a parent's AI-extracted decision patterns be inherited by their child through cryptographically scoped Auth0 tokens, while keeping every piece of personal data permanently inaccessible.

The boundary is enforced at the identity layer, not application code. 2-of-3 trustee multi-sig, step-up auth for sensitive topics, multi-generational token delegation where scopes can only shrink - never expand.

Stack: Python, Flask, Auth0 Token Vault, JWT, Claude API

Repo

PHI/PII Parser - FHIR-compliant redaction on AWS Lambda

Reads HL7 FHIR Bundle JSON files from S3, detects and redacts PII/PHI fields (name, DOB, SSN, address), and writes a cleaned CSV back to S3. Two deployment modes: FastAPI for local on-demand scanning, Lambda for serverless auto-triggering on every S3 upload.

Stack: Python, FastAPI, AWS S3/Lambda, Pydantic v2, Docker/LocalStack

Repo

Agentic AI Parenting - Multi-agent system on Google ADK + FatSecret

A modular parenting agent built on Google's Agent Development Kit. Root agent delegates to specialized sub-agents: parenting analyst, nutrition meal planner (FatSecret API), and a basic pediatric medical advisor. Stateful sessions remember user context across turns.

Stack: Python, Google ADK, Gemini, LiteLLM, FatSecret Nutrition API

Repo


Stack

Core ML

Python PyTorch HuggingFace PEFT ONNX scikit-learn

Distributed & Cloud

PySpark AWS Terraform

Agents & APIs

LangChain FastAPI Google ADK Streamlit

Observability & Infra

Docker Kubernetes Prometheus Grafana OpenTelemetry


GitHub Stats

GitHub Streak


Profile Summary


Stats Most Used Languages


Activity Graph


Profile Views

Open to senior ML engineering roles. Best contact: santoshbalu25@gmail.com

Pinned Loading

  1. ColdEmailGenerator ColdEmailGenerator Public

    Jupyter Notebook

  2. Data-Center-Scale-Computing Data-Center-Scale-Computing Public

    Python

  3. Netflix-Data-Analysis Netflix-Data-Analysis Public

    Jupyter Notebook

  4. SantoshAdabala SantoshAdabala Public

    Config files for my GitHub profile.

  5. SpaceAnalytics SpaceAnalytics Public

    Space Analytics- PowerBI

  6. devyanisri/Semeval2024Task8a devyanisri/Semeval2024Task8a Public

    Jupyter Notebook 1