Skip to content

fix: Ingestion failure due to invalid Docling serve URL#1270

Open
mpawlow wants to merge 7 commits intomainfrom
mp/fix/GH-1139-ingestion-failure-invalid-docling-serve-url
Open

fix: Ingestion failure due to invalid Docling serve URL#1270
mpawlow wants to merge 7 commits intomainfrom
mp/fix/GH-1139-ingestion-failure-invalid-docling-serve-url

Conversation

@mpawlow
Copy link
Copy Markdown
Collaborator

@mpawlow mpawlow commented Mar 26, 2026

Issue

Summary

  • Fixed ingestion failures caused by invalid Docling serve URL in Langflow
  • ⚠️ Note: These changes are defensive programming measures (underlying trigger for the corruption of the DOCLING_SERVE_URL has not been identified yet)

Settings / Global Variable Updates

  • Updated _update_langflow_global_variables in src/api/settings.py to always set the DOCLING_SERVE_URL global variable in Langflow, independent of provider configuration
  • Used transform_localhost_url when setting the Docling URL to ensure container-to-container reachability

Langflow File Service

  • Added DOCLING_SERVE_URL as an X-Langflow-Global-Var header in both the file ingest and connector (URL ingest) flows in src/services/langflow_file_service.py
  • Applied transform_localhost_url to the Docling URL in both ingest paths so that localhost references are correctly resolved within the container network
  • Minor whitespace cleanup in header construction blocks

@mpawlow mpawlow self-assigned this Mar 26, 2026
@github-actions github-actions bot added the backend 🔷 Issues related to backend services (OpenSearch, Langflow, APIs) label Mar 26, 2026
@github-actions github-actions bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Mar 26, 2026
@mpawlow mpawlow force-pushed the mp/fix/GH-1139-ingestion-failure-invalid-docling-serve-url branch from 81dde1f to 9ee8ab5 Compare March 26, 2026 17:50
@github-actions github-actions bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Mar 26, 2026
@lucaseduoli
Copy link
Copy Markdown
Collaborator

I believe this can be a problem with the docling itself, not with passing the URL, since that should be constant

@mpawlow mpawlow force-pushed the mp/fix/GH-1139-ingestion-failure-invalid-docling-serve-url branch from 9ee8ab5 to c5e15d0 Compare March 27, 2026 15:26
@github-actions github-actions bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Mar 27, 2026
@mpawlow
Copy link
Copy Markdown
Collaborator Author

mpawlow commented Mar 27, 2026

@lucaseduoli

  • Your assertion makes sense and seems to track the behavior I've seen testing 4.0.0.dev8
  • Overnight the Docling service has mysteriously crashed / gone offline without any known active operations
  • Restarting the Docling service is the official work-around used to resolve OpenRAG not being able to ingest further documents
  • This PR is operating on the theory that the DOCLING_SERVE_URL is somehow becoming unset or corrupted as indicated by the only relevant error message in the logs:
Error running graph: Error building Component Docling Serve:
Request URL is missing an 'http://' or 'https://' protocol.
  • This might be either a red herring or just a bad error message that gets returned when Docling is in a bad state or offline
  • However, the protocol error only occurs when DOCLING_SERVE_URL arrives at Langflow's Docling Serve component as an empty string (""), or as a bare hostname/IP with no scheme (e.g. localhost:5001 or just localhost).
    • In this scenario, Docling's availability is irrelevant - the error fires before the network stack is ever invoked.

@github-actions github-actions bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Mar 27, 2026
mpawlow added 7 commits March 27, 2026 10:45
Issue

- #1139

Summary

- Fixed ingestion failures caused by invalid Docling serve URL in Langflow

Settings / Global Variable Updates

- Updated _update_langflow_global_variables in src/api/settings.py to always set the DOCLING_SERVE_URL global variable in Langflow, independent of provider configuration
- Used transform_localhost_url when setting the Docling URL to ensure container-to-container reachability

Langflow File Service

- Added DOCLING_SERVE_URL as an X-Langflow-Global-Var header in both the file ingest and connector (URL ingest) flows in src/services/langflow_file_service.py
- Applied transform_localhost_url to the Docling URL in both ingest paths so that localhost references are correctly resolved within the container network
- Minor whitespace cleanup in header construction blocks
@mpawlow mpawlow force-pushed the mp/fix/GH-1139-ingestion-failure-invalid-docling-serve-url branch from 42fbb03 to 9b3382f Compare March 27, 2026 17:45
@github-actions github-actions bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Mar 27, 2026
@lucaseduoli
Copy link
Copy Markdown
Collaborator

@lucaseduoli

  • Your assertion makes sense and seems to track the behavior I've seen testing 4.0.0.dev8
  • Overnight the Docling service has mysteriously crashed / gone offline without any known active operations
  • Restarting the Docling service is the official work-around used to resolve OpenRAG not being able to ingest further documents
  • This PR is operating on the theory that the DOCLING_SERVE_URL is somehow becoming unset or corrupted as indicated by the only relevant error message in the logs:
Error running graph: Error building Component Docling Serve:
Request URL is missing an 'http://' or 'https://' protocol.
  • This might be either a red herring or just a bad error message that gets returned when Docling is in a bad state or offline

  • However, the protocol error only occurs when DOCLING_SERVE_URL arrives at Langflow's Docling Serve component as an empty string (""), or as a bare hostname/IP with no scheme (e.g. localhost:5001 or just localhost).

    • In this scenario, Docling's availability is irrelevant - the error fires before the network stack is ever invoked.

That was what I was thinking: it would fail from the start if that was the problem. It seems like this whole issue is because of the Docling availability. We should see if we can restart Docling when it does not respond.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend 🔷 Issues related to backend services (OpenSearch, Langflow, APIs) bug 🔴 Something isn't working.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Long-Term Stability Issue: Ingestion Fails After Prolonged Application Use with Invalid Docling Serve URL Error

2 participants