⚡ Bolt: Use AsyncGroq to prevent blocking the FastAPI event loop#77
⚡ Bolt: Use AsyncGroq to prevent blocking the FastAPI event loop#77Adityasingh-8858 wants to merge 1 commit into
Conversation
💡 What: Replaced the synchronous `Groq` client with `AsyncGroq` in `backend/main.py`'s `/ai-voice` and `/initiate-transfer` endpoints. 🎯 Why: Making synchronous network calls inside an `async def` FastAPI endpoint blocks the entire asyncio event loop. By using an asynchronous client and `await`ing the network calls, the event loop can pause and resume, allowing other concurrent requests to be handled efficiently rather than stalling the server. 📊 Impact: Significantly improves backend concurrency and responsiveness under load. Other API requests (e.g., token generation, room listing) will no longer be delayed by the 500ms+ latency of the AI completion calls. 🔬 Measurement: Start the backend server and simulate multiple concurrent requests to the API alongside a request to `/ai-voice`. Before the optimization, other requests would queue and stall until the AI generation completed. With this optimization, concurrent requests are processed immediately. Co-authored-by: Deepaksingh7238 <110552872+Deepaksingh7238@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
There was a problem hiding this comment.
Pull request overview
Switches Groq API usage in FastAPI async endpoints from the synchronous client to the asynchronous AsyncGroq client to avoid blocking the event loop during network calls.
Changes:
- Replaced
GroqwithAsyncGroqand addedawaitfor chat completion calls in/ai-voiceand/initiate-transfer. - Added a short “Bolt” note documenting the anti-pattern and preferred approach for async endpoints.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| backend/main.py | Uses AsyncGroq + await for Groq network calls inside async endpoints. |
| .jules/bolt.md | Adds a learning note documenting why async clients should be used in FastAPI async handlers. |
Comments suppressed due to low confidence (1)
backend/main.py:329
- Current tests exercise the mock Groq path (
FORCE_MOCK_GROQ=1), so the updated async Groq call path isn’t covered. Add a unit/integration test that disables the mock flag and monkeypatchesAsyncGroq(and.chat.completions.create) with an async stub to assert the endpoint awaits the call and returns the expected content.
chat_completion = await groq_client.chat.completions.create(
messages=[
{"role": "system", "content": "You are a helpful realtime voice assistant."},
{"role": "user", "content": req.prompt},
],
model="llama3-8b-8192",
temperature=0.6,
max_tokens=200,
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| global groq_client | ||
| if groq_client is None: | ||
| groq_client = Groq(api_key=GROQ_API_KEY) | ||
| chat_completion = groq_client.chat.completions.create( | ||
| # ⚡ Bolt: Using AsyncGroq prevents blocking the FastAPI event loop during network requests | ||
| groq_client = AsyncGroq(api_key=GROQ_API_KEY) | ||
| chat_completion = await groq_client.chat.completions.create( |
There was a problem hiding this comment.
The lazy global initialization of groq_client is a non-atomic check-then-set inside an async endpoint. Under concurrent load, multiple requests can observe groq_client is None and create multiple AsyncGroq instances (leaking extra connection pools and doing redundant work). Consider guarding initialization with an asyncio.Lock, or initializing a single client during app startup (lifespan) and reusing it.
| # Attempt real Groq call | ||
| global groq_client | ||
| if groq_client is None: | ||
| groq_client = Groq(api_key=GROQ_API_KEY) | ||
| # ⚡ Bolt: Using AsyncGroq prevents blocking the FastAPI event loop during network requests | ||
| groq_client = AsyncGroq(api_key=GROQ_API_KEY) |
There was a problem hiding this comment.
Same race as in /ai-voice: concurrent requests can initialize groq_client more than once due to the non-atomic if groq_client is None check in an async context. Please protect initialization with a lock or move client creation to startup (lifespan).
⚡ Bolt: optimize FastAPI event loop by using AsyncGroq client
💡 What:
Replaced the synchronous
Groqclient withAsyncGroqinbackend/main.py's/ai-voiceand/initiate-transferendpoints.🎯 Why:
Making synchronous network calls inside an
async defFastAPI endpoint blocks the entire asyncio event loop. By using an asynchronous client andawaiting the network calls, the event loop can pause and resume, allowing other concurrent requests to be handled efficiently rather than stalling the server.📊 Impact:
Significantly improves backend concurrency and responsiveness under load. Other API requests (e.g., token generation, room listing) will no longer be delayed by the 500ms+ latency of the AI completion calls.
🔬 Measurement:
Start the backend server and simulate multiple concurrent requests to the API alongside a request to
/ai-voice. Before the optimization, other requests would queue and stall until the AI generation completed. With this optimization, concurrent requests are processed immediately.PR created automatically by Jules for task 16569044003844208012 started by @Deepaksingh7238