`list_documents()` returns all documents -> unusable with large workspaces

## Problem

When a workspace contains a large number of indexed documents (e.g. 10 000+), using list_documents() as an LLM tool returns the full metadata of every document in a single JSON payload. This causes context window overflows and 500 errors on hosted LLM providers (tested with Azure OpenAI + gpt-oss-120).

Root cause: _load_workspace() loads all document metadata from _meta.json into self.documents at startup, and the only discovery mechanism available to the   LLM is iterating the entire dict.

## Expected behavior

The library should provide a way to discover relevant documents without loading all metadata into the LLM context. 

Options:

  - A search_documents(query) method that filters by keyword on doc_name / doc_description
  - A global summary / table-of-contents across the workspace that the LLM can use for routing

## Environment

  - PageIndex: latest (cloned from main)
  - Workspace size: ~10 000 documents
  - LLM provider: Azure OpenAI
  - Model gpt-oss120

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`list_documents()` returns all documents -> unusable with large workspaces #300

Problem

Expected behavior

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

list_documents() returns all documents -> unusable with large workspaces #300

Description

Problem

Expected behavior

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`list_documents()` returns all documents -> unusable with large workspaces #300