Abstract base class for all document readers.
from mcp_documents_reader import DocumentReader
class MyReader(DocumentReader):
def read(self, file_path: str) -> str:
# Implement reading logic
return "content"Methods:
| Method | Description |
|---|---|
read(file_path: str) -> str |
Read and extract text from the document |
Reads DOCX (Microsoft Word) documents.
from mcp_documents_reader import DocxReader
reader = DocxReader()
content = reader.read("/path/to/document.docx")Supported Extensions: .docx
Features:
- Text extraction
- Table extraction
- Paragraph formatting
Reads PDF documents.
Note: Starting from v1.2.0, the PDF reader has been migrated from PyPDF2 to pypdf (more secure and better maintained).
from mcp_documents_reader import PdfReader
reader = PdfReader()
content = reader.read("/path/to/document.pdf")Supported Extensions: .pdf
Features:
- Text extraction from PDF pages
- Multi-page support
Reads Excel spreadsheets.
from mcp_documents_reader import ExcelReader
reader = ExcelReader()
content = reader.read("/path/to/spreadsheet.xlsx")Supported Extensions: .xlsx, .xls
Features:
- Multi-sheet support
- Cell data extraction
- Sheet name listing
Reads plain text files with automatic encoding detection.
from mcp_documents_reader import TxtReader
reader = TxtReader()
content = reader.read("/path/to/file.txt")Supported Extensions: .txt
Features:
- Automatic encoding detection (UTF-8, GBK, etc.)
- Latin-1 fallback for binary files
Factory class for creating appropriate readers based on file extension.
from mcp_documents_reader import DocumentReaderFactory
# Get reader for a file
reader = DocumentReaderFactory.get_reader("document.pdf")
# Check if format is supported
is_supported = DocumentReaderFactory.is_supported("document.pdf")
# Get list of supported extensions
readers_map = DocumentReaderFactory._readersMethods:
| Method | Description |
|---|---|
get_reader(file_path: str) -> DocumentReader |
Get appropriate reader for the file |
is_supported(file_path: str) -> bool |
Check if the file format is supported |
Supported Extensions:
| Extension | Reader Class |
|---|---|
.txt |
TxtReader |
.docx |
DocxReader |
.pdf |
PdfReader |
.xlsx |
ExcelReader |
.xls |
ExcelReader |
Read any supported document type with a unified interface.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
filename |
string | Yes | Document file path (absolute or relative) |
Returns: Extracted text content from the document.
Example:
# Read a DOCX file
content = read_document(filename="report.docx")
# Read a PDF file
content = read_document(filename="paper.pdf")
# Read an Excel file
content = read_document(filename="data.xlsx")
# Read a text file
content = read_document(filename="notes.txt")Error Handling:
- Returns error message if file not found
- Returns error message for unsupported formats
- Returns error message for corrupted files