An automated recruitment management system for HKN Beta Chapter that processes ECE student data to identify eligible candidates and generate email contact lists.
- Student Data Processing: Automatically processes Cognos ECE student reports to filter eligible candidates
- Multi-Mode Processing: Supports sequential, multiprocessing, and low-memory modes for different dataset sizes
- Major-Based Filtering: Intelligent filtering by major groups (EE, CMPE, Other) with customizable cutoff percentages
- Email Generation: Automated email lookup using Purdue Directory with intelligent name matching
- Comprehensive Reporting: Detailed reports with statistics, error logging, and major breakdowns
- Automated Setup: PowerShell script for complete environment setup and execution
# Run the all-in-one setup script (Windows PowerShell)
.\run.ps1# Install dependencies
pip install -r requirements.txt
# Run with Excel files in current directory
python hkn_student_parser.py
# Or test with sample data
python hkn_student_parser.py --sample- Python 3.7+ with pip
- Git (for automated setup)
- Cognos ECE student reports in Excel format (.xlsx)
All dependencies are automatically installed via requirements.txt:
pandas- Data processing and Excel file handlingnumpy- Numerical operationstqdm- Progress bars for long-running operationspython-calamine- Fast Excel reading enginerequests- HTTP requests for email lookupbeautifulsoup4- HTML parsing for directory scraping
-
Obtain Student Data: Run Cognos 'record copy - PUID entry' report for all ECE students
- Process in batches (Cognos limit: 1000 entries per report)
- Save all Excel files with
.xlsxextension
-
Place Files: Put all Excel files in the same directory as the script
-
Run Processing: Execute the main script
python hkn_student_parser.py
-
Review Output: Check the generated files in the
reports/directory
python hkn_student_parser.py [options]
Options:
--sample Use sample data instead of Excel files
--no-major Disable major-based filtering
--multiprocessing, --mp Enable parallel file processing
--low-memory, --lm Enable memory-efficient mode
--help, -h Show help message- Sequential (default): Direct Excel processing with optimized single-threaded performance
- Multiprocessing (
--mp): Parallel file processing for multiple large Excel files - Low-Memory (
--lm): Memory-efficient processing for very large datasets
# For multiple large files with parallel processing
python hkn_student_parser.py --mp
# For very large datasets with memory constraints
python hkn_student_parser.py --lm
# Combined for maximum efficiency with large multi-file datasets
python hkn_student_parser.py --mp --lm
# Test without real data
python hkn_student_parser.py --sample
# Disable major-specific filtering (use overall GPA ranking)
python hkn_student_parser.py --no-majorGenerate email addresses for recruited students:
# Process CSV files in reports/ directory to generate emails
python generateEmails.pyFeatures:
- Automatic CSV file discovery in
reports/directory - Intelligent name matching with fuzzy logic
- Progress tracking with visual progress bars
- Purdue Directory integration with rate limiting
- Outputs consolidated
emails.csvfile
seniors.csv- Eligible senior candidatesjuniors.csv- Eligible junior candidatessophomores.csv- Eligible sophomore candidatesreport.log- Comprehensive processing report with statisticserrors.log- Detailed error information and debugging dataemails.csv- Email addresses for recruited students (after email generation)
CSV files include columns based on filtering mode:
- Major-based filtering:
Major Group, Actual Major, Last Name, First Name, PUID - Grade-level filtering:
Last Name, First Name, PUID - Email output:
First Name, Last Name, Email
# Cutoff percentages by grade level
SENIOR_CUTOFF = 0.30 # Top 30% of seniors
JUNIOR_CUTOFF = 0.25 # Top 25% of juniors
SOPHOMORE_CUTOFF = 0.20 # Top 20% of sophomores
# Accepted majors for major-based filtering
ACCEPTED_MAJORS = ["ECEB", "CMPE"] # EE and Computer Engineering
# Additional filters applied automatically:
# - Minimum 10 ECE credits
# - GPA-based ranking within each major group- ECEB: Electrical Engineering students
- CMPE: Computer Engineering students
- Other: All other majors (grouped together for filtering)
- Multiple Excel files: Use
--mpfor parallel processing - Memory constraints: Use
--lmfor reduced memory usage - Very large datasets: Combine
--mp --lmfor optimal performance
- Typical processing speed: ~200 sheets/second
- Memory usage scales with dataset size
- Multiprocessing scales with available CPU cores
- Automatic error logging: All errors saved to
reports/errors.log - Graceful failure handling: Processing continues despite individual student parsing errors
- Detailed error context: Includes student names, sheet names, and error types
- Git repository handling: Automatic safe directory configuration for various environments
- No .xlsx files found: Ensure Excel files are in the same directory as the script
- Memory errors: Use
--lmflag for large datasets - Git ownership errors: Automatically handled by the PowerShell setup script
- Permission errors: Run PowerShell as Administrator if needed
- Run
python hkn_student_parser.py --helpfor command options - Check
reports/errors.logfor detailed error information - Submit issues on GitHub with error messages and context
- Contact HKN Beta chapter representatives for assistance
recruitment-filter/
├── hkn_student_parser.py # Main processing script
├── generateEmails.py # Email generation utility
├── run.ps1 # Automated setup script
├── requirements.txt # Python dependencies
├── reports/ # Output directory
│ ├── *.csv # Student lists
│ ├── report.log # Processing report
│ └── errors.log # Error details
└── README.md # This file
- Optimized Excel Processing: Uses calamine engine for fast Excel reading
- Vectorized Operations: Pandas-based operations for maximum performance
- Memory Management: Configurable memory usage patterns
- Error Recovery: Robust error handling with detailed logging
- Progress Tracking: Visual progress bars for long operations
This project is maintained by HKN Beta Chapter for internal recruitment management.