Skip to content

⚡ Bolt: Use AsyncGroq to prevent blocking the FastAPI event loop#77

Open
Adityasingh-8858 wants to merge 1 commit into
mainfrom
bolt/async-groq-optimization-16569044003844208012
Open

⚡ Bolt: Use AsyncGroq to prevent blocking the FastAPI event loop#77
Adityasingh-8858 wants to merge 1 commit into
mainfrom
bolt/async-groq-optimization-16569044003844208012

Conversation

@Adityasingh-8858
Copy link
Copy Markdown
Collaborator

⚡ Bolt: optimize FastAPI event loop by using AsyncGroq client

💡 What:
Replaced the synchronous Groq client with AsyncGroq in backend/main.py's /ai-voice and /initiate-transfer endpoints.

🎯 Why:
Making synchronous network calls inside an async def FastAPI endpoint blocks the entire asyncio event loop. By using an asynchronous client and awaiting the network calls, the event loop can pause and resume, allowing other concurrent requests to be handled efficiently rather than stalling the server.

📊 Impact:
Significantly improves backend concurrency and responsiveness under load. Other API requests (e.g., token generation, room listing) will no longer be delayed by the 500ms+ latency of the AI completion calls.

🔬 Measurement:
Start the backend server and simulate multiple concurrent requests to the API alongside a request to /ai-voice. Before the optimization, other requests would queue and stall until the AI generation completed. With this optimization, concurrent requests are processed immediately.


PR created automatically by Jules for task 16569044003844208012 started by @Deepaksingh7238

💡 What:
Replaced the synchronous `Groq` client with `AsyncGroq` in `backend/main.py`'s `/ai-voice` and `/initiate-transfer` endpoints.

🎯 Why:
Making synchronous network calls inside an `async def` FastAPI endpoint blocks the entire asyncio event loop. By using an asynchronous client and `await`ing the network calls, the event loop can pause and resume, allowing other concurrent requests to be handled efficiently rather than stalling the server.

📊 Impact:
Significantly improves backend concurrency and responsiveness under load. Other API requests (e.g., token generation, room listing) will no longer be delayed by the 500ms+ latency of the AI completion calls.

🔬 Measurement:
Start the backend server and simulate multiple concurrent requests to the API alongside a request to `/ai-voice`. Before the optimization, other requests would queue and stall until the AI generation completed. With this optimization, concurrent requests are processed immediately.

Co-authored-by: Deepaksingh7238 <110552872+Deepaksingh7238@users.noreply.github.com>
@google-labs-jules
Copy link
Copy Markdown
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

Copilot AI review requested due to automatic review settings April 23, 2026 17:56
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Switches Groq API usage in FastAPI async endpoints from the synchronous client to the asynchronous AsyncGroq client to avoid blocking the event loop during network calls.

Changes:

  • Replaced Groq with AsyncGroq and added await for chat completion calls in /ai-voice and /initiate-transfer.
  • Added a short “Bolt” note documenting the anti-pattern and preferred approach for async endpoints.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
backend/main.py Uses AsyncGroq + await for Groq network calls inside async endpoints.
.jules/bolt.md Adds a learning note documenting why async clients should be used in FastAPI async handlers.
Comments suppressed due to low confidence (1)

backend/main.py:329

  • Current tests exercise the mock Groq path (FORCE_MOCK_GROQ=1), so the updated async Groq call path isn’t covered. Add a unit/integration test that disables the mock flag and monkeypatches AsyncGroq (and .chat.completions.create) with an async stub to assert the endpoint awaits the call and returns the expected content.
            chat_completion = await groq_client.chat.completions.create(
                messages=[
                    {"role": "system", "content": "You are a helpful realtime voice assistant."},
                    {"role": "user", "content": req.prompt},
                ],
                model="llama3-8b-8192",
                temperature=0.6,
                max_tokens=200,
            )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread backend/main.py
Comment on lines 317 to +321
global groq_client
if groq_client is None:
groq_client = Groq(api_key=GROQ_API_KEY)
chat_completion = groq_client.chat.completions.create(
# ⚡ Bolt: Using AsyncGroq prevents blocking the FastAPI event loop during network requests
groq_client = AsyncGroq(api_key=GROQ_API_KEY)
chat_completion = await groq_client.chat.completions.create(
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lazy global initialization of groq_client is a non-atomic check-then-set inside an async endpoint. Under concurrent load, multiple requests can observe groq_client is None and create multiple AsyncGroq instances (leaking extra connection pools and doing redundant work). Consider guarding initialization with an asyncio.Lock, or initializing a single client during app startup (lifespan) and reusing it.

Copilot uses AI. Check for mistakes.
Comment thread backend/main.py
Comment on lines 503 to +507
# Attempt real Groq call
global groq_client
if groq_client is None:
groq_client = Groq(api_key=GROQ_API_KEY)
# ⚡ Bolt: Using AsyncGroq prevents blocking the FastAPI event loop during network requests
groq_client = AsyncGroq(api_key=GROQ_API_KEY)
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same race as in /ai-voice: concurrent requests can initialize groq_client more than once due to the non-atomic if groq_client is None check in an async context. Please protect initialization with a lock or move client creation to startup (lifespan).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants