⚡ Bolt: Prevent blocking FastAPI event loop with AsyncGroq#91
⚡ Bolt: Prevent blocking FastAPI event loop with AsyncGroq#91Adityasingh-8858 wants to merge 1 commit into
Conversation
- Replaced synchronous `Groq` client with `AsyncGroq` in `backend/main.py`. - Updated `/ai-voice` and `/initiate-transfer` endpoints to `await` the API calls. - Documented this critical learning in `.jules/bolt.md`. Co-authored-by: Deepaksingh7238 <110552872+Deepaksingh7238@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
There was a problem hiding this comment.
Pull request overview
This PR updates the FastAPI backend to use Groq’s async client so LLM network calls no longer block the asyncio event loop, improving concurrency for endpoints that call the LLM.
Changes:
- Switched Groq client import/initialization from
GroqtoAsyncGroq. - Updated
/ai-voiceand/initiate-transferGroq completion calls to be awaited. - Added a short internal Bolt note documenting the “don’t block the event loop” guideline.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| backend/main.py | Replaces synchronous Groq client usage with AsyncGroq and awaits LLM calls in async endpoints. |
| .jules/bolt.md | Documents the async-client guideline to avoid blocking FastAPI’s event loop. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| global groq_client | ||
| if groq_client is None: | ||
| groq_client = Groq(api_key=GROQ_API_KEY) | ||
| chat_completion = groq_client.chat.completions.create( | ||
| groq_client = AsyncGroq(api_key=GROQ_API_KEY) | ||
|
|
||
| # ⚡ OPTIMIZATION: Use await with AsyncGroq to prevent blocking the FastAPI event loop during network I/O | ||
| chat_completion = await groq_client.chat.completions.create( |
| if groq_client is None: | ||
| groq_client = Groq(api_key=GROQ_API_KEY) | ||
| chat_completion = groq_client.chat.completions.create( | ||
| groq_client = AsyncGroq(api_key=GROQ_API_KEY) | ||
|
|
| try: | ||
| chat_completion = groq_client.chat.completions.create( | ||
| # ⚡ OPTIMIZATION: Use await with AsyncGroq to prevent blocking the FastAPI event loop during network I/O | ||
| chat_completion = await groq_client.chat.completions.create( | ||
| messages=[ |
| global groq_client | ||
| if groq_client is None: | ||
| groq_client = Groq(api_key=GROQ_API_KEY) | ||
| groq_client = AsyncGroq(api_key=GROQ_API_KEY) | ||
|
|
💡 What:
Replaced the synchronous
Groqnetwork client withAsyncGroqin the FastAPI backend (backend/main.py), and updated the route endpoints (/ai-voiceand/initiate-transfer) toawaitthe completion generation calls.🎯 Why:
FastAPI executes
async defendpoints directly on the main asyncio event loop. Previously, the application was using a standard, synchronousGroqHTTP client to communicate with the LLM API. Whenever this synchronous client was invoked, it completely blocked the entire event loop until the network response returned, meaning the server could not process any other incoming HTTP requests concurrently during that time. This was a massive performance bottleneck.📊 Impact:
Massively improves concurrent request throughput. By switching to
AsyncGroqandawait, the event loop is now freed up during the network I/O wait state, allowing the application to handle multiple requests simultaneously without bottlenecking on LLM response times.🔬 Measurement:
Load test the
/ai-voiceor/initiate-transferendpoints simultaneously with multiple concurrent clients. Previously, the server would process them strictly sequentially (blocking). Now, they will be processed concurrently, significantly reducing the P99 latency under load. Verified backend tests and frontend lint/build to ensure no regressions.PR created automatically by Jules for task 8230997966922699542 started by @Deepaksingh7238