make inference endpoint sync#324
Merged
Merged
Conversation
Collaborator
|
should this be applied to all the modalities or it's only needed for sentinel-2? |
favyen2
approved these changes
Jun 24, 2026
joshhvulcan
approved these changes
Jun 24, 2026
Contributor
Author
My plan is to get this fully rolled out and get us through the S2 backlog; and then apply this to the other modalities. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
the main change here is making the
get_detectionshttp endpoint be a synchronous endpoint rather thanasync.Because the
get_detectionsmethod was marked asasync, but contained no actual async code, when it ran it blocked the entireasyncioevent loop. Meaning that the server could only handle a single request at a time effectively.In production we run a healthcheck that ensures the api is up and responding to requests, in order to have a worker respond to those healthchecks while we also ran inference, we had to run fastapi in multi-process mode with
WEB_CONCURRENCY=2However, that change caused uvicorn to become a multi-worker supervisor: a parent process that spawns N worker subprocesses and manages their lifecycle. That supervisor path was crash-looping in production:
The logs showed:
I have built this image and am running it in production ;we're seeing about 2x the throughput already with just this change.