Problem
When a workspace contains a large number of indexed documents (e.g. 10 000+), using list_documents() as an LLM tool returns the full metadata of every document in a single JSON payload. This causes context window overflows and 500 errors on hosted LLM providers (tested with Azure OpenAI + gpt-oss-120).
Root cause: _load_workspace() loads all document metadata from _meta.json into self.documents at startup, and the only discovery mechanism available to the LLM is iterating the entire dict.
Expected behavior
The library should provide a way to discover relevant documents without loading all metadata into the LLM context.
Options:
- A search_documents(query) method that filters by keyword on doc_name / doc_description
- A global summary / table-of-contents across the workspace that the LLM can use for routing
Environment
- PageIndex: latest (cloned from main)
- Workspace size: ~10 000 documents
- LLM provider: Azure OpenAI
- Model gpt-oss120
Problem
When a workspace contains a large number of indexed documents (e.g. 10 000+), using list_documents() as an LLM tool returns the full metadata of every document in a single JSON payload. This causes context window overflows and 500 errors on hosted LLM providers (tested with Azure OpenAI + gpt-oss-120).
Root cause: _load_workspace() loads all document metadata from _meta.json into self.documents at startup, and the only discovery mechanism available to the LLM is iterating the entire dict.
Expected behavior
The library should provide a way to discover relevant documents without loading all metadata into the LLM context.
Options:
Environment