A Python utility to split large Microsoft Word documents into smaller chunks of a specific page length. It supports exporting to both DOCX (maintaining original formatting) and PDF.
- Automated Pagination: Automatically detects total page counts and calculates splits.
- Dual Format Support:
- DOCX: Extracts raw content and formatting into new Word documents.
- PDF: Exports high-quality, print-optimized PDF chunks.
- Large File Handling: Designed to handle massive documents by automating the Word Background Application.
- Format Preservation: Uses
PasteAndFormatto ensure the original styling, fonts, and layouts remain intact.
- Operating System: Windows (Required for
pywin32and Microsoft Word COM). - Software: Microsoft Word must be installed.
- Python Libraries:
pip install pywin32
Run the script from the command line by providing the input file and the target output directory.
python SplitWordDoc.py <input.docx> <output_folder> [chunk_size] [format]| Parameter | Description | Default |
|---|---|---|
input.docx |
The path to the Word file you want to split. | Required |
output_folder |
Where the split files will be saved. | Required |
chunk_size |
Number of pages per split file. | 500 |
format |
Output type: docx or pdf. |
docx |
Split into 100-page Word documents:
python SplitWordDoc.py manual.docx ./output_chunks 100 docxSplit into 50-page PDF segments:
python SplitWordDoc.py thesis.docx ./pdf_parts 50 pdfThe script utilizes the win32com library to interface directly with the Word.Application engine.
- PDF Export: Uses Word's internal
ExportAsFixedFormatengine for perfect PDF reproduction. - DOCX Export: Navigates the document's
Rangeobjects viaGoTo(Page level) to copy and paste specific sections into new document instances.
- Reflowable Text: DOCX is a "reflowable" format. While the script selects content based on Word's current pagination, small variations in layout might occur in the new DOCX chunks.
- Background Process: The script runs Word in
Visible = Falsemode. If the script crashes, you may occasionally see aWordprocess left open in your Task Manager.