A Python utility that merges multiple Excel workbooks by combining sheets with identical names. The tool is particularly useful for consolidating data from multiple Excel files while preserving the original sheet structure and handling sensitive data like ID numbers.
- Smart Sheet Merging: Automatically combines sheets with the same name from multiple Excel files
- ID Number Preservation: Special handling for Chinese ID numbers to prevent formatting issues
- Batch Processing: Processes all
.xlsxfiles in a specified directory - Data Type Protection: Maintains proper data types, especially for sensitive columns like ID numbers
- Flexible Input: Works with any Excel file structure and sheet names
- Format Preservation: Maintains original data formatting and relationships
git clone https://github.com/CrueChan/excel-merge-tool.git
cd excel-merge-tool
uv syncgit clone https://github.com/CrueChan/excel-merge-tool.git
cd excel-merge-tool
pip install -e .pip install pandas openpyxl- Prepare your files: Place all Excel files (
.xlsxformat) in a directory - Update the script: Modify the paths in
main.py - Run the merger: Execute the script
from main import merge_excel_workbooks
# Basic usage
input_directory = "./collect" # Folder with Excel files
output_file = "./merged_result.xlsx" # Output file path
merge_excel_workbooks(input_directory, output_file)# Default usage (modify paths in main.py first)
python main.py
# The script will process files in ./collect/ and output to ./merged_result.xlsximport os
from main import merge_excel_workbooks
# Example 1: Basic merge
merge_excel_workbooks("./input_files", "./output.xlsx")
# Example 2: Processing multiple directories
directories = ["./dept1_files", "./dept2_files", "./dept3_files"]
for i, directory in enumerate(directories):
if os.path.exists(directory):
output_file = f"./merged_dept_{i+1}.xlsx"
merge_excel_workbooks(directory, output_file)
print(f"✅ Merged {directory} → {output_file}")- File Discovery: Scans the input directory for all
.xlsxfiles - Sheet Analysis: Reads all sheets from each Excel file
- Data Consolidation: Groups sheets by name and merges data from identical sheets
- Special Processing: Applies special formatting for ID number columns
- Output Generation: Creates a single Excel file with all merged sheets
The tool includes special processing for sheets containing ID numbers (specifically "公用账号使用人" sheets):
- Text Format Preservation: Reads ID numbers as text to prevent Excel's automatic number formatting
- Decimal Point Removal: Removes decimal points that might be added during processing
- Cell Format Setting: Sets Excel cell formatting to text format to maintain ID number integrity
- Leading Zero Protection: Prevents loss of leading zeros in ID numbers
- Automatically detects and handles different column types
- Preserves data relationships across merged files
- Maintains original column order and structure
your-project/
├── main.py # Main script
├── collect/ # Input directory (default)
│ ├── file1.xlsx
│ ├── file2.xlsx
│ └── file3.xlsx
├── merged_result.xlsx # Output file (default)
├── pyproject.toml # Project configuration
├── README.md # This file
├── LICENSE # License file
└── .gitignore # Git ignore rules
collect/
├── department_A.xlsx # Contains: Sales, Inventory, 公用账号使用人
├── department_B.xlsx # Contains: Sales, Inventory, 公用账号使用人
└── department_C.xlsx # Contains: Sales, Inventory, 公用账号使用人
merged_result.xlsx # Contains:
├── Sales # Combined data from all 3 files
├── Inventory # Combined data from all 3 files
└── 公用账号使用人 # Combined data with ID numbers properly formatted
Edit the paths in main.py:
if __name__ == "__main__":
# Customize these paths
input_directory = "./your_input_folder" # Change this
output_excel = "./your_output_file.xlsx" # Change this
merge_excel_workbooks(input_directory, output_excel)To handle additional columns with ID numbers:
# In the merge_excel_workbooks function, modify this section:
if sheet_name in ["公用账号使用人", "your_custom_sheet"]:
df = pd.read_excel(
excel_file,
sheet_name=sheet_name,
dtype={
"使用人身份证号": str, # Existing
"your_id_column": str # Add your column
}
)The script handles common issues:
- Missing columns: Gracefully handles files with different column structures
- Different data types: Automatically reconciles data type differences
- Empty sheets: Skips empty sheets without errors
- File permissions: Reports file access issues clearly
- Memory management: Efficiently processes large files
Merges all Excel workbooks in the specified directory.
Parameters:
input_dir(str): Directory containing input Excel filesoutput_file(str): Path for the merged output Excel file
Returns:
- None (creates output file)
Raises:
FileNotFoundError: If input directory doesn't existPermissionError: If unable to write output fileValueError: If no Excel files found in input directory
- Python 3.12+
- pandas >= 2.0.0
- openpyxl >= 3.1.0
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Initial release
- Basic Excel merging functionality
- ID number preservation for Chinese ID cards
- Batch processing support
- Special handling for "公用账号使用人" sheets
Q: The merged file has scientific notation for ID numbers A: This is handled automatically for sheets named "公用账号使用人". For other sheets, modify the dtype parameter in the code.
Q: Some data is missing after merging A: Check that all input files have the same column names in sheets with identical names.
Q: Memory error with large files A: The tool processes files sequentially to minimize memory usage, but very large datasets may require chunked processing.
Q: Permission denied error A: Ensure the output directory is writable and the output file isn't open in Excel.
If you encounter any issues or have questions, please:
- Check the troubleshooting section above
- Search existing GitHub Issues
- Create a new issue with detailed information about your problem