⚡ Bolt: Prevent event loop blocking with AsyncGroq#71
Conversation
Replaces the synchronous Groq client with AsyncGroq in the `ai-voice` and `initiate-transfer` endpoints. This ensures that the FastAPI asyncio event loop is not blocked during network calls to Groq API, allowing the server to handle concurrent requests efficiently. Co-authored-by: Deepaksingh7238 <110552872+Deepaksingh7238@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
There was a problem hiding this comment.
Pull request overview
Replaces the synchronous Groq client usage inside async FastAPI endpoints with AsyncGroq and awaits LLM network calls to avoid blocking the asyncio event loop and improve concurrency under load.
Changes:
- Switched Groq client import/initialization from
GroqtoAsyncGroqinbackend/main.py. - Added
awaitto Groq chat completion calls in/ai-voiceand/initiate-transfer. - Added a small async verification script and a short Bolt learning note.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| backend/main.py | Uses AsyncGroq and awaits LLM calls in async endpoints to prevent event loop blocking. |
| backend/tests/verify_groq_async.py | Adds a standalone script to verify AsyncGroq calls work end-to-end. |
| .jules/bolt.md | Documents the learning/action about avoiding sync network calls in async FastAPI handlers. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| global groq_client | ||
| if groq_client is None: | ||
| groq_client = Groq(api_key=GROQ_API_KEY) | ||
| chat_completion = groq_client.chat.completions.create( | ||
| groq_client = AsyncGroq(api_key=GROQ_API_KEY) | ||
| # Use await to avoid blocking the event loop | ||
| chat_completion = await groq_client.chat.completions.create( |
There was a problem hiding this comment.
The global groq_client is now an AsyncGroq client but it’s never closed during app shutdown. If AsyncGroq holds an underlying async HTTP client, this can leak connections/file descriptors and produce “unclosed client” warnings. Consider closing it in the FastAPI lifespan shutdown block (e.g., await groq_client.aclose()/close() guarded with hasattr) and resetting groq_client = None.
| global groq_client | ||
| if groq_client is None: | ||
| groq_client = Groq(api_key=GROQ_API_KEY) | ||
| groq_client = AsyncGroq(api_key=GROQ_API_KEY) | ||
|
|
||
| try: | ||
| chat_completion = groq_client.chat.completions.create( | ||
| # Use await to avoid blocking the event loop | ||
| chat_completion = await groq_client.chat.completions.create( |
There was a problem hiding this comment.
Same as above: AsyncGroq is created lazily and stored globally, but there’s no corresponding shutdown cleanup. Please ensure the client is properly closed in the app lifespan shutdown hook to avoid leaking network resources across reloads/tests.
| from groq import AsyncGroq | ||
| from livekit import api | ||
| from contextlib import asynccontextmanager | ||
| try: |
There was a problem hiding this comment.
asynccontextmanager is imported twice in this module (once near the top and again later). Since this PR touches the import block, consider removing the duplicate import to avoid confusion and keep the import section tidy.
| start = time.time() | ||
| res = await client.chat.completions.create( | ||
| messages=[{"role": "user", "content": "Say hi"}], | ||
| model="llama3-8b-8192", | ||
| temperature=0.1, | ||
| max_tokens=10 | ||
| ) | ||
| print(f"Time: {time.time() - start:.2f}s, Res: {res.choices[0].message.content}") |
There was a problem hiding this comment.
This script creates an AsyncGroq client but never closes it. To avoid unclosed-connection warnings (especially when running repeatedly), close the client in a finally block (or use an async context manager if supported).
| start = time.time() | |
| res = await client.chat.completions.create( | |
| messages=[{"role": "user", "content": "Say hi"}], | |
| model="llama3-8b-8192", | |
| temperature=0.1, | |
| max_tokens=10 | |
| ) | |
| print(f"Time: {time.time() - start:.2f}s, Res: {res.choices[0].message.content}") | |
| try: | |
| start = time.time() | |
| res = await client.chat.completions.create( | |
| messages=[{"role": "user", "content": "Say hi"}], | |
| model="llama3-8b-8192", | |
| temperature=0.1, | |
| max_tokens=10 | |
| ) | |
| print(f"Time: {time.time() - start:.2f}s, Res: {res.choices[0].message.content}") | |
| finally: | |
| await client.close() |
💡 What: Replaced the synchronous
Groqclient withAsyncGroqin FastAPI endpoints (/ai-voiceand/initiate-transfer) and addedawaitto network calls.🎯 Why: Calling a synchronous network API (like Groq) inside an
async defFastAPI endpoint blocks the main asyncio event loop. This prevents the server from processing any other concurrent requests while waiting for the LLM response, causing severe bottlenecks under load.📊 Impact: Significantly improves concurrency. The server can now handle multiple simultaneous requests without being stalled by long-running LLM inferences.
🔬 Measurement: Run a concurrency benchmark (e.g., using
aborwrk) against/initiate-transferor/ai-voicewith multiple parallel connections. The server will remain responsive, unlike before where requests would queue sequentially.PR created automatically by Jules for task 8588919464142089387 started by @Deepaksingh7238