Advisory Details
Title: Arbitrary Infinite Loop Denial of Service (DoS) via Crafted PDF Table of Contents
Description:
Summary
An unbounded while loop vulnerability in the toc_transformer function allows an unauthenticated attacker to cause a perpetual Denial of Service (DoS) and rapidly exhaust LLM API credits. By providing a PDF with an intentionally long Table of Contents, the system triggers length-truncated API responses that permanently trap the application into continuously querying the backend LLM API.
Details
The root cause resides in pageindex/page_index.py at line 303 within the toc_transformer() function. The application uses an LLM to structure a raw Table of Contents string into a hierarchical JSON format.
If the LLM's response hits the maximum output token limit (finish_reason == "length"), the application automatically attempts to instruct the model to "continue". Crucially, the while loop lacks any retry counter or iteration limits (unlike the correctly-patched extract_toc_content function which explicitly caps attempts to 5).
Consequently, if the model repeatedly truncates the JSON or rejects the completeness check, the execution falls into an inescapable infinite loop:
while not (if_complete == "yes" and finish_reason == "finished"):
# ... rebuilds prompt and calls ChatGPT_API_with_finish_reason
new_complete, finish_reason = ChatGPT_API_with_finish_reason(model=model, prompt=prompt)
# ...
if_complete = check_if_toc_transformation_is_complete(toc_content, last_complete, model)
# NO ITERATION LIMIT OR BAILOUT CONDITION
PoC
- Generate an adversarial PDF with thousands of sections in the TOC (sufficiently large to cause the LLM to truncate output), or set up a Mock OpenAI proxy that forcibly returns
finish_reason: "length".
- Run the application via the CLI against the malicious PDF:
python run_pageindex.py --pdf_path evil_toc.pdf --model gpt-3.5-turbo
- Observe the process forever attempting to complete the TOC, utilizing 100% of a CPU thread and rapidly emitting requests. (In a real production environment, this drastically drains OpenAI API credits).
Log of Evidence
[*] Setting up Mock API environment variables on port 18080
[*] Triggering PageIndex parsing on the malicious PDF...
[*] Executing: python3 run_pageindex.py --pdf_path evil_toc.pdf --model gpt-3.5-turbo
[Target] Parsing PDF...
[MockAPI] Returning finish_reason: 'length' (max_output_reached)
[MockAPI] Returning completed: 'no'
[MockAPI] Returning finish_reason: 'length' (max_output_reached)
[MockAPI] Returning completed: 'no'
[MockAPI] Returning finish_reason: 'length' (max_output_reached)
[MockAPI] Returning completed: 'no'
...
[!] The process has been running for over 15 seconds, stuck in the infinite loop.
Impact
This vulnerability allows a complete and unauthenticated Denial of Service (DoS) by causing process hanging and unbounded API usage, resulting in service unavailability and the immediate financial exhaustion of the backend LLM service billing account.
Affected products
- Ecosystem: python
- Package name: PageIndex
- Affected versions: All versions currently in repository (
main branch)
- Patched versions:
Severity
- Severity: High
- Vector string: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H
Weaknesses
- CWE: CWE-835: Loop with Unreachable Exit Condition ('Infinite Loop')
Occurrences
| Permalink |
Description |
| pageindex/page_index.py#L303 |
The vulnerable unbounded while loop within toc_transformer failing to cap API retry attempts. |
Advisory Details
Title: Arbitrary Infinite Loop Denial of Service (DoS) via Crafted PDF Table of Contents
Description:
Summary
An unbounded
whileloop vulnerability in thetoc_transformerfunction allows an unauthenticated attacker to cause a perpetual Denial of Service (DoS) and rapidly exhaust LLM API credits. By providing a PDF with an intentionally long Table of Contents, the system triggers length-truncated API responses that permanently trap the application into continuously querying the backend LLM API.Details
The root cause resides in
pageindex/page_index.pyat line 303 within thetoc_transformer()function. The application uses an LLM to structure a raw Table of Contents string into a hierarchical JSON format.If the LLM's response hits the maximum output token limit (
finish_reason == "length"), the application automatically attempts to instruct the model to "continue". Crucially, thewhileloop lacks any retry counter or iteration limits (unlike the correctly-patchedextract_toc_contentfunction which explicitly caps attempts to 5).Consequently, if the model repeatedly truncates the JSON or rejects the completeness check, the execution falls into an inescapable infinite loop:
PoC
finish_reason: "length".Log of Evidence
Impact
This vulnerability allows a complete and unauthenticated Denial of Service (DoS) by causing process hanging and unbounded API usage, resulting in service unavailability and the immediate financial exhaustion of the backend LLM service billing account.
Affected products
mainbranch)Severity
Weaknesses
Occurrences
whileloop withintoc_transformerfailing to cap API retry attempts.