Skip to content
View amacharla15's full-sized avatar

Highlights

  • Pro

Block or report amacharla15

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
amacharla15/README.md

M.S. CS student at CSU Chico and ex-Cognizant software engineer. I build backend systems, LLM inference tooling, and performance-focused C++/Python projects, with recent work in tokenization, observability, benchmarking, and hardware-aware ML performance.

Achievements

  • Built Chaos Reviewer — AI agent (Fetch.ai track @ Cal Hacks) — ~7.2K interactions
  • Certifications:
  • AWS Certified Developer – Associate
  • AWS Machine Learning Foundations
  • Meta Backend Developer (Coursera)

Pinned Loading

  1. Parallel-BPE-Tokenizer Parallel-BPE-Tokenizer Public

    High-performance GPT-2-style BPE tokenizer in C++ with parallel batch encoding, thread-pool execution, thread-local caching, and benchmark-driven comparison against tiktoken and GPT2TokenizerFast

    C++ 2

  2. CPUinference CPUinference Public

    End-to-end LLM inference on CPU: API serving, streaming, benchmarking, memory analysis, and model compression(quantization).

    Python

  3. gpu-profiling-cuda-kernels gpu-profiling-cuda-kernels Public

    GPU profiling suite & CUDA kernels on A100 80GB — ResNet-50 benchmarks, Nsight Systems profiling, tiled matrix multiplication with shared memory

    Python

  4. Hardware-Aware-Training-Time-Throughput-Prediction Hardware-Aware-Training-Time-Throughput-Prediction Public

    Hardware-aware CNN training performance predictor for CIFAR-10 on NVIDIA A100—learns sec/epoch from config features and derives throughput (images/sec) from predicted time.

    Python

  5. miniops miniops Public

    Mixed-language C++/Python tensor ops package with a native CPU backend, pybind11 bindings, modern packaging, CI, and benchmarked speedups over a Python reference implementation.

    Python

  6. Doc-Rag-Agent Doc-Rag-Agent Public

    Doc RAG Agent is a custom lightweight document Q&A system built from scratch that retrieves relevant chunks from PDFs, answers using only that evidence, and enforces verified citations (or abstains…

    Python