Skip to content

Speed up json serialization with orjson and custom FastAPI responses#2880

Merged
r4victor merged 13 commits intomasterfrom
pr_orjson
Jul 10, 2025
Merged

Speed up json serialization with orjson and custom FastAPI responses#2880
r4victor merged 13 commits intomasterfrom
pr_orjson

Conversation

@r4victor
Copy link
Copy Markdown
Collaborator

@r4victor r4victor commented Jul 7, 2025

The PR optimizes dstack server response times by reducing json serialization times. There are two main changes:

  1. Make FastAPI endpoints return Response directly instead of returning pydantic models. This skips FastAPI's jsonable_encoder(), which was a bottleneck previously – it's much slower than the serialization itself.
  2. Use orjson everywhere for serializing pydantic models and API responses.

Notes:

  • When returning responses directly, validators are not run on pydantic models. Previously some root validators were used to change response structure and add computed fields. These are now replaced with overridden dict(). This is a hack for pydantic v1 – won't be necessary with pydantic v2 that offers computed fields and flexible serialization hooks.
  • orjson does not handle all the types that built-in json can handle – they need to be handled manually in default function.
  • Not sure if using orjson optimization will stay relevant for pydantic v2 – we might use built-in serialization. I'm open to drop orjson if proven problematic or unnecessary.
  • Returning FastAPI Response directly is independent of pydantic version and has to be done 100% if we care for speed / performance.

Benchmarks

Response-heavy endpoints are up to 2x faster. Other endpoints may also get a small speedup.

Before:

(venvt) ➜  stuff wrk http://localhost:8000/api/runs/list -H 'Authorization: Bearer -' -s bench_list.lua
Running 10s test @ http://localhost:8000/api/runs/list
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   659.99ms  182.94ms   1.31s    74.50%
    Req/Sec     8.35      5.29    29.00     77.78%
  149 requests in 10.09s, 19.43MB read
Requests/sec:     14.77
Transfer/sec:      1.93MB

(dstack) ➜  stuff wrk http://localhost:8000/api/project/main/runs/get -H 'Authorization: Bearer -' -s bench_get.lua
Running 10s test @ http://localhost:8000/api/project/main/runs/get
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   112.94ms   43.18ms 377.68ms   91.02%
    Req/Sec    46.70     12.59    70.00     75.94%
  901 requests in 10.01s, 5.98MB read
Requests/sec:     90.02
Transfer/sec:    611.31KB

After:

(venvt) ➜  stuff wrk http://localhost:8000/api/runs/list -H 'Authorization: Bearer -' -s bench_list.lua
Running 10s test @ http://localhost:8000/api/runs/list
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   353.03ms   69.35ms 589.02ms   73.93%
    Req/Sec    14.51      7.30    40.00     84.57%
  280 requests in 10.09s, 36.51MB read
Requests/sec:     27.75
Transfer/sec:      3.62MB

(dstack) ➜  stuff wrk http://localhost:8000/api/project/main/runs/get -H 'Authorization: Bearer -' -s bench_get.lua
Running 10s test @ http://localhost:8000/api/project/main/runs/get
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    96.25ms   24.68ms 260.78ms   89.44%
    Req/Sec    52.84     14.97    90.00     75.00%
  1050 requests in 10.07s, 6.96MB read
Requests/sec:    104.28
Transfer/sec:    708.19KB

With #2883 merged:

(venvt) ➜  stuff wrk http://localhost:8000/api/runs/list -H 'Authorization: Bearer -' -s bench_list.lua
Running 10s test @ http://localhost:8000/api/runs/list
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   293.78ms   91.36ms 727.30ms   74.63%
    Req/Sec    17.75      8.87    40.00     76.10%
  337 requests in 10.01s, 17.92MB read
Requests/sec:     33.65
Transfer/sec:      1.79MB

Returning large responses is still quite slow but serialization is no longer the bottleneck. TODO:

  • Migrate to pydantic v2, which will improve parsing models from the DB (the main bottleneck now).
  • Optimize response structure (e.g. do not return all the job submissions for runs/list).

@r4victor r4victor marked this pull request as ready for review July 8, 2025 04:32
@r4victor r4victor requested review from jvstme and un-def July 8, 2025 04:32
@r4victor r4victor merged commit 3d3dfd0 into master Jul 10, 2025
26 checks passed
@r4victor r4victor deleted the pr_orjson branch July 10, 2025 07:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants