Skip to content

cboiteux2765/GPUTileMathService

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPU Tile Math Service (incremental build)

This repo is structured so you can test each feature individually before wiring components together.

What’s implemented right now (Feature 1)

API-only FastAPI service with an in-memory job store and a simple “executor” that can:

  • compute a small CPU GEMM result summary (for tiny shapes), or
  • simulate a result for larger shapes (deterministic checksum)

This lets you test:

  • request validation
  • job lifecycle (QUEUED/RUNNING/DONE/FAILED)
  • metrics endpoint

Repo layout

  • api/ FastAPI service (Feature 1)
  • client/ simple CLI client to submit jobs to the API
  • worker_cuda/ placeholder for the standalone CUDA kernel benchmark (Feature 4 later)
  • docs/ notes / architecture

Quickstart (Feature 1: API only)

1) Create a venv and install deps

cd api
python -m venv .venv
# Windows:
.venv\Scripts\activate
# macOS/Linux:
source .venv/bin/activate

pip install -r requirements.txt

2) Run the API

uvicorn app.main:app --reload --port 8000

3) Submit a job

python ../client/submit_job.py --m 64 --n 64 --k 64 --dtype fp32 --repeats 5
python ../client/submit_job.py --m 4096 --n 4096 --k 4096 --dtype fp16 --simulate

4) Check status/results

# replace JOB_ID
curl http://127.0.0.1:8000/v1/jobs/JOB_ID
curl http://127.0.0.1:8000/v1/jobs/JOB_ID/result
curl http://127.0.0.1:8000/metrics

Next features we’ll implement (one-by-one)

  1. Redis queue + metadata store (API still runnable alone)
  2. Worker stub (no CUDA yet): pulls from Redis, produces results
  3. Standalone CUDA tiled GEMM binary (benchmark CLI)
  4. Integrate worker + CUDA + batching/streams

About

A CUDA Tile accelerated math kernel job scheduling cloud service

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages