Skip to content

Latest commit

 

History

History
193 lines (128 loc) · 3.75 KB

File metadata and controls

193 lines (128 loc) · 3.75 KB

API Reference

Table of Contents


Core Classes

DocumentReader

Abstract base class for all document readers.

from mcp_documents_reader import DocumentReader

class MyReader(DocumentReader):
    def read(self, file_path: str) -> str:
        # Implement reading logic
        return "content"

Methods:

Method Description
read(file_path: str) -> str Read and extract text from the document

DocxReader

Reads DOCX (Microsoft Word) documents.

from mcp_documents_reader import DocxReader

reader = DocxReader()
content = reader.read("/path/to/document.docx")

Supported Extensions: .docx

Features:

  • Text extraction
  • Table extraction
  • Paragraph formatting

PdfReader

Reads PDF documents.

Note: Starting from v1.2.0, the PDF reader has been migrated from PyPDF2 to pypdf (more secure and better maintained).

from mcp_documents_reader import PdfReader

reader = PdfReader()
content = reader.read("/path/to/document.pdf")

Supported Extensions: .pdf

Features:

  • Text extraction from PDF pages
  • Multi-page support

ExcelReader

Reads Excel spreadsheets.

from mcp_documents_reader import ExcelReader

reader = ExcelReader()
content = reader.read("/path/to/spreadsheet.xlsx")

Supported Extensions: .xlsx, .xls

Features:

  • Multi-sheet support
  • Cell data extraction
  • Sheet name listing

TxtReader

Reads plain text files with automatic encoding detection.

from mcp_documents_reader import TxtReader

reader = TxtReader()
content = reader.read("/path/to/file.txt")

Supported Extensions: .txt

Features:

  • Automatic encoding detection (UTF-8, GBK, etc.)
  • Latin-1 fallback for binary files

Factory Class

DocumentReaderFactory

Factory class for creating appropriate readers based on file extension.

from mcp_documents_reader import DocumentReaderFactory

# Get reader for a file
reader = DocumentReaderFactory.get_reader("document.pdf")

# Check if format is supported
is_supported = DocumentReaderFactory.is_supported("document.pdf")

# Get list of supported extensions
readers_map = DocumentReaderFactory._readers

Methods:

Method Description
get_reader(file_path: str) -> DocumentReader Get appropriate reader for the file
is_supported(file_path: str) -> bool Check if the file format is supported

Supported Extensions:

Extension Reader Class
.txt TxtReader
.docx DocxReader
.pdf PdfReader
.xlsx ExcelReader
.xls ExcelReader

MCP Tools

read_document

Read any supported document type with a unified interface.

Parameters:

Parameter Type Required Description
filename string Yes Document file path (absolute or relative)

Returns: Extracted text content from the document.

Example:

# Read a DOCX file
content = read_document(filename="report.docx")

# Read a PDF file
content = read_document(filename="paper.pdf")

# Read an Excel file
content = read_document(filename="data.xlsx")

# Read a text file
content = read_document(filename="notes.txt")

Error Handling:

  • Returns error message if file not found
  • Returns error message for unsupported formats
  • Returns error message for corrupted files