fix: timeout issues by NotYuSheng · Pull Request #223 · NotYuSheng/OmniPDF

NotYuSheng · 2025-10-07T03:10:43Z

PR Type

Bug fix, Enhancement

Description

Fix timeout issues with HTTP client configuration
Remove settings page and update UI elements
Improve error handling and logging
Optimize image scaling and status responses

Diagram Walkthrough

flowchart LR
  A["HTTP Client"] -- "timeout config" --> B["PDF Processing"]
  B -- "status polling" --> C["Background Tasks"]
  C -- "error handling" --> D["UI Updates"]
  D -- "page removal" --> E["Settings Page"]

File Walkthrough

Relevant files

Enhancement

main.py `Update UI elements and remove settings` frontend/main.py Changed page icon from 🦸 to 🔍 Removed settings page from navigation Removed separator line in sidebar	+1/-9
main.py `Enable debug logging for docling` pdf_extraction_service/main.py Added DEBUG logging for docling components	+4/-0
extractor.py `Optimize image scaling and status responses` pdf_extraction_service/routers/extractor.py Changed default image scale from 2.0 to 1.0 Added Response parameter to status endpoint Return 202 status code for processing jobs Improved status handling logic	+11/-5

Bug fix

1_upload_UI.py `Fix timeout issues with HTTP clients` frontend/my_pages/1_upload_UI.py Removed global httpx client with fixed timeout Added separate clients for upload and polling with custom timeouts Updated function signatures to accept client parameter Improved error handling with proper client cleanup Removed max_attempts limit for infinite polling	+29/-21
caption.py `Add timeout config for caption service` pdf_extraction_service/utils/caption.py Added custom timeout configuration for HTTP client Improved error logging with exception details	+2/-2

gemini-code-assist · 2025-10-07T03:11:00Z

Summary of Changes

Hello @NotYuSheng, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses and resolves various timeout-related issues across the application by fine-tuning HTTP client configurations, enhancing job status polling, and optimizing backend processing parameters. The changes aim to improve the reliability and user experience when handling large or complex PDF documents that require extended processing times, ensuring that operations complete successfully without premature interruptions.

Highlights

Timeout Handling: Implemented more robust timeout configurations for HTTP requests in both frontend and backend services to prevent premature timeouts during long-running operations like PDF uploads, processing, and image captioning.
Frontend Polling Logic: Refactored the frontend's polling mechanism to continuously check job statuses without a fixed attempt limit, relying on the backend's eventual completion or failure.
API Status Codes: Modified the PDF extraction service to return a '202 Accepted' HTTP status code for jobs that are still in progress, providing clearer communication to clients about ongoing background tasks.
Image Processing Optimization: Reduced the default image scaling factor in the PDF extraction service, potentially speeding up image processing and reducing resource consumption.
UI Refinements: Updated the application's page icon and removed a 'Settings' page from the navigation, streamlining the user interface.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

codiumai-pr-agent-free · 2025-10-07T03:11:08Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢	No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
⚪	🎫 No ticket provided Create ticket/issue
Codebase Duplication Compliance
⚪	Codebase context is not defined Follow the guide to enable codebase context checks.
Custom Compliance
⚪	No custom compliance provided Follow the guide to enable custom compliance check.

Compliance status legend

🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

codiumai-pr-agent-free · 2025-10-07T03:12:11Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
High-level	Reconsider using an infinite polling loop The polling loop in `poll_processing_status` was changed to an infinite `while True` loop, which is risky. It's recommended to revert to a finite loop with a long timeout to prevent the application from hanging if a backend job stalls. Examples: frontend/my_pages/1_upload_UI.py [94] while True: Solution Walkthrough: Before: # frontend/my_pages/1_upload_UI.py async def poll_processing_status(...): ... while True: all_complete = True # ... check job statuses ... if all_complete: # ... update UI and break ... break # This loop never times out if a job gets stuck await asyncio.sleep(delay) After: # frontend/my_pages/1_upload_UI.py async def poll_processing_status(..., max_attempts=600, delay=2): # e.g., 20 minute timeout ... for attempt in range(max_attempts): all_complete = True # ... check job statuses ... if all_complete: # ... update UI and return success ... return True await asyncio.sleep(delay) # Handle timeout after max_attempts status_container.warning("⚠️ Processing timed out.") return False Suggestion importance[1-10]: 9 __ Why: The suggestion correctly identifies a critical flaw where a finite polling loop was replaced with an infinite `while True` loop, which could cause the UI to hang and consume resources indefinitely if a backend job stalls.	High
Possible issue	Avoid potential crash in finally block To prevent a potential `UnboundLocalError`, initialize `poll_client` to `None` before the `try` block in `process_pdf` and check for its existence in the `finally` block before attempting to close it. frontend/my_pages/1_upload_UI.py [263-328] async def process_pdf(uploaded_file, file_expander, source_lang="", target_lang=""): """ Uploads PDF to backend and stores document metadata in session state. Optionally triggers translation if languages are specified. """ + poll_client = None # Process pdf through PDF_processor endpoint try: # Upload the PDF document with longer timeout for file upload logger.info(f"Uploading PDF: {uploaded_file}") ... # Poll for processing status with spinner with st.spinner("Processing document..."): await poll_processing_status(doc_id, status_container, progress_text, poll_client, source_lang, target_lang) except Exception as e: with file_expander: st.error(f"❌ Error processing PDF: {uploaded_file.name}") logger.error(f"Error processing PDF: {e}") finally: # Close the client after processing - await poll_client.aclose() + if poll_client: + await poll_client.aclose() `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 8 __ Why: The suggestion correctly identifies a potential `UnboundLocalError` crash in the `finally` block if an exception occurs before `poll_client` is assigned, and provides a robust fix to prevent it.	Medium
More

gemini-code-assist

Code Review

This pull request refactors the frontend HTTP client handling to address timeout issues. It removes the global httpx client in favor of creating them within functions with specific timeouts for different operations like file uploads and status polling. The frontend polling loop has also been changed to run indefinitely, relying on backend status or client timeouts instead of a fixed number of attempts. These are good changes that should improve robustness.

My review focuses on a critical resource management issue introduced in the refactoring of process_pdf where HTTP clients might not be closed properly in case of an error, potentially leading to resource leaks and application crashes. I've provided a suggestion to fix this.

fix: timeout issues

9f95ede

codiumai-pr-agent-free Bot added the Review effort 3/5 label Oct 7, 2025

gemini-code-assist Bot reviewed Oct 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: timeout issues#223

fix: timeout issues#223
NotYuSheng wants to merge 1 commit intodevfrom
fix/timeout-issues

NotYuSheng commented Oct 7, 2025 •

edited by codiumai-pr-agent-free Bot

Loading

Uh oh!

gemini-code-assist Bot commented Oct 7, 2025

Uh oh!

codiumai-pr-agent-free Bot commented Oct 7, 2025

Uh oh!

codiumai-pr-agent-free Bot commented Oct 7, 2025

Examples:

Solution Walkthrough:

Before:

After:

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

NotYuSheng commented Oct 7, 2025 • edited by codiumai-pr-agent-free Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

gemini-code-assist Bot commented Oct 7, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

codiumai-pr-agent-free Bot commented Oct 7, 2025

PR Compliance Guide 🔍

Uh oh!

codiumai-pr-agent-free Bot commented Oct 7, 2025

PR Code Suggestions ✨

Examples:

Solution Walkthrough:

Before:

After:

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

NotYuSheng commented Oct 7, 2025 •

edited by codiumai-pr-agent-free Bot

Loading