⚡ Bolt: Use AsyncGroq to prevent event loop blocking#75
⚡ Bolt: Use AsyncGroq to prevent event loop blocking#75Adityasingh-8858 wants to merge 1 commit into
Conversation
Switches to using `AsyncGroq` instead of the synchronous `Groq` client within the FastAPI `async def` endpoints (`/initiate-transfer` and `/ai-voice`). The previous implementation blocked the event loop during HTTP requests to the Groq API. Co-authored-by: Deepaksingh7238 <110552872+Deepaksingh7238@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
There was a problem hiding this comment.
Pull request overview
This PR updates the FastAPI backend to use Groq’s asynchronous Python client so LLM network calls don’t block the asyncio event loop, improving concurrency for endpoints like /ai-voice and /initiate-transfer.
Changes:
- Replace
GroqwithAsyncGroqinbackend/main.py. awaitGroq chat completion calls in async route handlers.- Add an internal note in
.jules/bolt.mddocumenting the async-client guideline.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| backend/main.py | Switch Groq usage to AsyncGroq and await LLM requests in async endpoints. |
| .jules/bolt.md | Document rationale/guideline for using async API clients in FastAPI. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| import time | ||
| import asyncio | ||
| from groq import Groq | ||
| from groq import AsyncGroq |
There was a problem hiding this comment.
Since this PR switches to AsyncGroq, the client likely holds an underlying async HTTP session; it should be explicitly closed during app shutdown. The lifespan cleanup currently closes livekit_api but not groq_client—consider adding shutdown logic to await the Groq client close method (e.g., aclose()/close()) and clear the global to avoid connection/resource leaks across reloads/tests.
| global groq_client | ||
| if groq_client is None: | ||
| groq_client = Groq(api_key=GROQ_API_KEY) | ||
| chat_completion = groq_client.chat.completions.create( | ||
| groq_client = AsyncGroq(api_key=GROQ_API_KEY) | ||
| chat_completion = await groq_client.chat.completions.create( |
There was a problem hiding this comment.
groq_client is lazily initialized without any synchronization. Under concurrent requests, multiple coroutines can observe groq_client is None and create multiple AsyncGroq instances, leaking resources and wasting connections. Consider guarding initialization with an asyncio.Lock or moving initialization into startup/lifespan so exactly one client is created per process.
| global groq_client | ||
| if groq_client is None: | ||
| groq_client = Groq(api_key=GROQ_API_KEY) | ||
| groq_client = AsyncGroq(api_key=GROQ_API_KEY) | ||
|
|
There was a problem hiding this comment.
This endpoint repeats the same unsynchronized global groq_client lazy-init pattern as /ai-voice, which can race under concurrency and create multiple AsyncGroq instances. Consider extracting a single helper (e.g., get_groq_client() that uses an asyncio.Lock) and reusing it here to ensure one shared client instance per process.
💡 What: Replaced the synchronous
Groqclient withAsyncGroqinbackend/main.py. The.chat.completions.createcalls are nowawaited.🎯 Why: Calling synchronous network operations inside FastAPI
async defroute handlers blocks the single-threaded asyncio event loop. This starves the server, causing severe latency degradation under concurrency as no other requests can be processed while waiting for Groq's response.📊 Impact: Expected to vastly improve concurrency and throughput under load. Time spent waiting for LLM responses will no longer pause the rest of the application.
🔬 Measurement: Verify by generating concurrent AI voice and transfer summaries and observing the ability of the backend to simultaneously serve endpoints like
/rooms.PR created automatically by Jules for task 12599511285956236236 started by @Deepaksingh7238