Skip to content

fix: timeout issues#223

Open
NotYuSheng wants to merge 1 commit intodevfrom
fix/timeout-issues
Open

fix: timeout issues#223
NotYuSheng wants to merge 1 commit intodevfrom
fix/timeout-issues

Conversation

@NotYuSheng
Copy link
Copy Markdown
Owner

@NotYuSheng NotYuSheng commented Oct 7, 2025

PR Type

Bug fix, Enhancement


Description

  • Fix timeout issues with HTTP client configuration

  • Remove settings page and update UI elements

  • Improve error handling and logging

  • Optimize image scaling and status responses


Diagram Walkthrough

flowchart LR
  A["HTTP Client"] -- "timeout config" --> B["PDF Processing"]
  B -- "status polling" --> C["Background Tasks"]
  C -- "error handling" --> D["UI Updates"]
  D -- "page removal" --> E["Settings Page"]
Loading

File Walkthrough

Relevant files
Enhancement
main.py
Update UI elements and remove settings                                     

frontend/main.py

  • Changed page icon from 🦸 to 🔍
  • Removed settings page from navigation
  • Removed separator line in sidebar
+1/-9     
main.py
Enable debug logging for docling                                                 

pdf_extraction_service/main.py

  • Added DEBUG logging for docling components
+4/-0     
extractor.py
Optimize image scaling and status responses                           

pdf_extraction_service/routers/extractor.py

  • Changed default image scale from 2.0 to 1.0
  • Added Response parameter to status endpoint
  • Return 202 status code for processing jobs
  • Improved status handling logic
+11/-5   
Bug fix
1_upload_UI.py
Fix timeout issues with HTTP clients                                         

frontend/my_pages/1_upload_UI.py

  • Removed global httpx client with fixed timeout
  • Added separate clients for upload and polling with custom timeouts
  • Updated function signatures to accept client parameter
  • Improved error handling with proper client cleanup
  • Removed max_attempts limit for infinite polling
+29/-21 
caption.py
Add timeout config for caption service                                     

pdf_extraction_service/utils/caption.py

  • Added custom timeout configuration for HTTP client
  • Improved error logging with exception details
+2/-2     

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @NotYuSheng, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses and resolves various timeout-related issues across the application by fine-tuning HTTP client configurations, enhancing job status polling, and optimizing backend processing parameters. The changes aim to improve the reliability and user experience when handling large or complex PDF documents that require extended processing times, ensuring that operations complete successfully without premature interruptions.

Highlights

  • Timeout Handling: Implemented more robust timeout configurations for HTTP requests in both frontend and backend services to prevent premature timeouts during long-running operations like PDF uploads, processing, and image captioning.
  • Frontend Polling Logic: Refactored the frontend's polling mechanism to continuously check job statuses without a fixed attempt limit, relying on the backend's eventual completion or failure.
  • API Status Codes: Modified the PDF extraction service to return a '202 Accepted' HTTP status code for jobs that are still in progress, providing clearer communication to clients about ongoing background tasks.
  • Image Processing Optimization: Reduced the default image scaling factor in the PDF extraction service, potentially speeding up image processing and reducing resource consumption.
  • UI Refinements: Updated the application's page icon and removed a 'Settings' page from the navigation, streamlining the user interface.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@codiumai-pr-agent-free
Copy link
Copy Markdown

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
No custom compliance provided

Follow the guide to enable custom compliance check.

Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@codiumai-pr-agent-free
Copy link
Copy Markdown

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Reconsider using an infinite polling loop

The polling loop in poll_processing_status was changed to an infinite while True
loop, which is risky. It's recommended to revert to a finite loop with a long
timeout to prevent the application from hanging if a backend job stalls.

Examples:

frontend/my_pages/1_upload_UI.py [94]
    while True:

Solution Walkthrough:

Before:

# frontend/my_pages/1_upload_UI.py
async def poll_processing_status(...):
    ...
    while True:
        all_complete = True
        # ... check job statuses ...
        if all_complete:
            # ... update UI and break ...
            break
        
        # This loop never times out if a job gets stuck
        await asyncio.sleep(delay)

After:

# frontend/my_pages/1_upload_UI.py
async def poll_processing_status(..., max_attempts=600, delay=2): # e.g., 20 minute timeout
    ...
    for attempt in range(max_attempts):
        all_complete = True
        # ... check job statuses ...
        if all_complete:
            # ... update UI and return success ...
            return True
        
        await asyncio.sleep(delay)
    
    # Handle timeout after max_attempts
    status_container.warning("⚠️ Processing timed out.")
    return False
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a critical flaw where a finite polling loop was replaced with an infinite while True loop, which could cause the UI to hang and consume resources indefinitely if a backend job stalls.

High
Possible issue
Avoid potential crash in finally block

To prevent a potential UnboundLocalError, initialize poll_client to None before
the try block in process_pdf and check for its existence in the finally block
before attempting to close it.

frontend/my_pages/1_upload_UI.py [263-328]

 async def process_pdf(uploaded_file, file_expander, source_lang="", target_lang=""):
     """
     Uploads PDF to backend and stores document metadata in session state.
     Optionally triggers translation if languages are specified.
     """
+    poll_client = None
     # Process pdf through PDF_processor endpoint
     try:
         # Upload the PDF document with longer timeout for file upload
         logger.info(f"Uploading PDF: {uploaded_file}")
 ...
             # Poll for processing status with spinner
             with st.spinner("Processing document..."):
                 await poll_processing_status(doc_id, status_container, progress_text, poll_client, source_lang, target_lang)
 
     except Exception as e:
         with file_expander:
             st.error(f"❌ Error processing PDF: {uploaded_file.name}")
         logger.error(f"Error processing PDF: {e}")
     finally:
         # Close the client after processing
-        await poll_client.aclose()
+        if poll_client:
+            await poll_client.aclose()

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a potential UnboundLocalError crash in the finally block if an exception occurs before poll_client is assigned, and provides a robust fix to prevent it.

Medium
  • More

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the frontend HTTP client handling to address timeout issues. It removes the global httpx client in favor of creating them within functions with specific timeouts for different operations like file uploads and status polling. The frontend polling loop has also been changed to run indefinitely, relying on backend status or client timeouts instead of a fixed number of attempts. These are good changes that should improve robustness.

My review focuses on a critical resource management issue introduced in the refactoring of process_pdf where HTTP clients might not be closed properly in case of an error, potentially leading to resource leaks and application crashes. I've provided a suggestion to fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant