Skip to content
View vismaychuriwala's full-sized avatar

Highlights

  • Pro

Block or report vismaychuriwala

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
vismaychuriwala/README.md

Vismay Churiwala

MSE Scientific Computing @ UPenn. BS+MS Physics @ IIT Madras.

I work on GPU programming, physics-informed ML, and distributed systems. Most of what I build is about making something run faster or fit on smaller hardware.

Languages: C++, CUDA, Python, JavaScript, Golang, Java, GLSL Tools: PyTorch, JAX, WebGPU, OpenGL, LangChain, FAISS, Spark


Projects

GPU & Graphics

  • OpenGJK-GPU — CUDA implementation of GJK/EPA collision detection. Half-warp per distance query, full warp per penetration computation, all data sharing via __shfl_sync(). 37x over CPU at 1000 vertices/polytope. Built with Mattia Montanari (PhysicsX). Used by Google DeepMind and Unity.
  • CUDA Path Tracer — GPU path tracer with SAH-built linear BVH for O(log n) intersection. Three BSDFs, stream compaction (99.96% ray termination by bounce 7), thin-lens DoF, stochastic AA.
  • WebGPU Gaussian Splat Viewer — 3DGS in the browser. Compute preprocessing, GPU radix sort per frame, instanced indirect draw. 153 FPS on 272K gaussians.
  • WebGPU Forward+ & Clustered Deferred — Three rendering techniques on Sponza at 5000 dynamic lights. G-buffer compressed to 64 bits/pixel.
  • Mini Minecraft — C++/OpenGL voxel engine. Infinite terrain, 5 biomes, 3D Perlin caves, multithreaded chunk gen, ray-marched physics. (Demo)

ML & Research

  • PDE-aware Optimizer — Custom optimizer for PINNs. Scales updates by per-sample PDE gradient variance for second-order-like preconditioning at first-order cost. Tested on Burgers, Allen-Cahn, KdV. (Paper)
  • Diffusion Transformer for Flow Prediction — DDPM + Transformer for Navier-Stokes and LBM flow fields. Found and fixed a loss formulation bug in the original DiffFluid paper. Under 8% L2 error. (Paper)
  • Fast Image Editing — 100x faster than DDIM inversion. SSD-1B + 4-step LCM + ControlNet Canny, 6 seconds per edit on an RTX 3060. CPU offloading gives a 4.2x speedup by avoiding VRAM fragmentation.
  • KronAdaGrad + Polar Express — Replaced Newton-Schulz in KronAdaGrad with Polar Express pair iteration. Also implemented Muon and Polar Express optimizers in JAX/Optax for PINNs.

Systems

  • PennCloud — Distributed KVS in C++ from raw sockets and pthreads. Synchronous replication, coordinator-based failover, 10MB memory ceiling with LRU eviction and tablet splitting. 5K+ req/s. (Demo)

ThinkCAD (Founder)

LLM agent that turns text prompts into parametric CadQuery models with constraint validation and STEP/STL export. FAISS-indexed RAG over working examples. Wharton GenAI Labs, 1 of 8 selected, $4K seed.


vismay@seas.upenn.edu · LinkedIn · Website

Pinned Loading

  1. OpenGJK-GPU OpenGJK-GPU Public

    A CUDA implementation of openGJK, with GJK and EPA algorithms for massively parallel collision detection between convex polytopes. Up to 37x speedup over CPU.

    C++ 5

  2. Diffusion-Transformer-for-Flow-Prediction Diffusion-Transformer-for-Flow-Prediction Public

    Forked from krispy-kenay/DiffFluid

    DDPM-based U-Net transformer for fluid dynamics prediction, reproducing and extending DiffFluid. Validated on Navier-Stokes vorticity and Lattice Boltzmann (D2Q9) with corrected noise-prediction lo…

    Jupyter Notebook 4

  3. PennCloud PennCloud Public

    Distributed cloud platform in C++17 — webmail, file storage, and user accounts backed by a fault-tolerant key-value store with synchronous replication, LRU eviction, tablet splitting, and heartbeat…

    1

  4. CUDA-Path-Tracer CUDA-Path-Tracer Public

    Forked from CIS5650-Fall-2025/Project3-CUDA-Path-Tracer

    A GPU path tracer in CUDA with physically-based light transport: Lambertian, mirror, and refractive BSDFs. Features stream compaction via Thrust, stochastic anti-aliasing, depth of field, OBJ mesh …

    C++ 1

  5. Fast-Image-Editing-with-Generative-Models Fast-Image-Editing-with-Generative-Models Public

    100x faster text-guided image editing on consumer GPUs (~6s vs ~10min). Combines SSD-1B, LCM-LoRA 4-step inference, and ControlNet-Canny. Benchmarked on PIE-Bench with +23.5% CLIP score over DDIM b…

    Python 2

  6. PDE-aware-optimizer-jax PDE-aware-optimizer-jax Public

    This project introduces a PDE-Aware Optimizer that aligns gradients from PDE-residual, boundary, and initial losses, scales steps by the variance of per-sample PDE gradients, and avoids expensive H…

    Python 1