[Security] Arbitrary Infinite Loop Denial of Service (DoS) via Crafted PDF Table of Contents

### Advisory Details
**Title**: Arbitrary Infinite Loop Denial of Service (DoS) via Crafted PDF Table of Contents

**Description**:
### Summary
An unbounded `while` loop vulnerability in the `toc_transformer` function allows an unauthenticated attacker to cause a perpetual Denial of Service (DoS) and rapidly exhaust LLM API credits. By providing a PDF with an intentionally long Table of Contents, the system triggers length-truncated API responses that permanently trap the application into continuously querying the backend LLM API.

### Details
The root cause resides in `pageindex/page_index.py` at line 303 within the `toc_transformer()` function. The application uses an LLM to structure a raw Table of Contents string into a hierarchical JSON format. 
If the LLM's response hits the maximum output token limit (`finish_reason == "length"`), the application automatically attempts to instruct the model to "continue". Crucially, the `while` loop lacks any retry counter or iteration limits (unlike the correctly-patched `extract_toc_content` function which explicitly caps attempts to 5). 

Consequently, if the model repeatedly truncates the JSON or rejects the completeness check, the execution falls into an inescapable infinite loop:
```python
while not (if_complete == "yes" and finish_reason == "finished"):
    # ... rebuilds prompt and calls ChatGPT_API_with_finish_reason
    new_complete, finish_reason = ChatGPT_API_with_finish_reason(model=model, prompt=prompt)
    # ...
    if_complete = check_if_toc_transformation_is_complete(toc_content, last_complete, model)
    # NO ITERATION LIMIT OR BAILOUT CONDITION
```

### PoC
1. Generate an adversarial PDF with thousands of sections in the TOC (sufficiently large to cause the LLM to truncate output), or set up a Mock OpenAI proxy that forcibly returns `finish_reason: "length"`.
2. Run the application via the CLI against the malicious PDF:
   ```bash
   python run_pageindex.py --pdf_path evil_toc.pdf --model gpt-3.5-turbo
   ```
3. Observe the process forever attempting to complete the TOC, utilizing 100% of a CPU thread and rapidly emitting requests. (In a real production environment, this drastically drains OpenAI API credits).

### Log of Evidence
```text
[*] Setting up Mock API environment variables on port 18080
[*] Triggering PageIndex parsing on the malicious PDF...
[*] Executing: python3 run_pageindex.py --pdf_path evil_toc.pdf --model gpt-3.5-turbo
[Target] Parsing PDF...
[MockAPI] Returning finish_reason: 'length' (max_output_reached)
[MockAPI] Returning completed: 'no'
[MockAPI] Returning finish_reason: 'length' (max_output_reached)
[MockAPI] Returning completed: 'no'
[MockAPI] Returning finish_reason: 'length' (max_output_reached)
[MockAPI] Returning completed: 'no'
...
[!] The process has been running for over 15 seconds, stuck in the infinite loop.
```

### Impact
This vulnerability allows a complete and unauthenticated Denial of Service (DoS) by causing process hanging and unbounded API usage, resulting in service unavailability and the immediate financial exhaustion of the backend LLM service billing account.

### Affected products
- **Ecosystem**: python
- **Package name**: PageIndex
- **Affected versions**: All versions currently in repository (`main` branch)
- **Patched versions**: <None>

### Severity
- **Severity**: High
- **Vector string**: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H

### Weaknesses
- **CWE**: CWE-835: Loop with Unreachable Exit Condition ('Infinite Loop')

### Occurrences
| Permalink | Description |
| :--- | :--- |
| [pageindex/page_index.py#L303](https://github.com/VectifyAI/PageIndex/blob/main/pageindex/page_index.py#L303) | The vulnerable unbounded `while` loop within `toc_transformer` failing to cap API retry attempts. |


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Security] Arbitrary Infinite Loop Denial of Service (DoS) via Crafted PDF Table of Contents #174

Advisory Details

Summary

Details

PoC

Log of Evidence

Impact

Affected products

Severity

Weaknesses

Occurrences

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Security] Arbitrary Infinite Loop Denial of Service (DoS) via Crafted PDF Table of Contents #174

Description

Advisory Details

Summary

Details

PoC

Log of Evidence

Impact

Affected products

Severity

Weaknesses

Occurrences

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions