This project is a fully automated expense processing pipeline designed to simulate a real-world data engineering workflow.
It continuously monitors incoming data, transforms raw financial records into a structured format, and uploads the results to the cloud β all without manual intervention.
The goal of this project was to:
- Build a practical automation system
- Handle messy real-world data
- Implement a CI/CD-style pipeline using Jenkins
- Integrate securely with external APIs (Google Drive)
In real-world scenarios, organizations deal with:
- Unstructured or inconsistent data (CSV files)
- Repetitive manual processing (Excel formatting, categorization)
- Risk of human error in financial tracking
This project solves those problems by:
β Automating the entire workflow β Cleaning and standardizing input data β Ensuring consistency in output format β Reducing manual effort to near zero
It mimics how data pipelines in production systems operate β making it highly relevant for roles in:
- Data Engineering
- Backend Development
- DevOps / Automation
π₯ Inbox Folder (CSV Input)
β
π§ Python Processing Engine
β
π Structured Excel Output
β
βοΈ Google Drive Upload
β
π¦ Archive (Processed Files)
- Jenkins runs the pipeline on a scheduled trigger (cron-based)
- The system scans the
inbox/directory - Identifies the latest folder containing CSV data
-
Reads raw CSV using Pandas
-
Handles inconsistent data formats:
- Fixes corrupted or invalid date formats
- Standardizes categories using a mapping config
-
Extracts:
- Date
- Transaction Type (Income / Expense)
- Category
- Amount
- Notes
-
Preserves:
- Day
- Year
-
Replaces:
- Month β with current system month
-
Ensures valid and consistent date formatting across all records
-
Uses a pre-defined Excel template
-
Writes processed data starting from row 9
-
Automatically fills:
- Cell F5 β Current Month (e.g., March)
- Uploads generated Excel file to Google Drive
- Uses OAuth 2.0 authentication
- Securely manages access via
token.pickle
- Moves processed CSV file to
processed/ - Deletes local Excel file after successful upload
- Ensures no duplicate or redundant data
- Python
- Pandas (data transformation)
- OpenPyXL (Excel handling)
- Jenkins (CI/CD pipeline scheduling)
- Google Drive API
- OAuth 2.0 Authentication
- OS / Shutil (file operations & archiving)
Expense-pipeline/
β
βββ scripts/
β βββ main.py
β βββ processor.py
β βββ uploader.py
β
βββ inbox/ # Incoming CSV folders
βββ outbox/ # Generated Excel files
βββ processed/ # Archived CSV files
β
βββ templates/
β βββ master_template.xlsx
β
βββ configs/
β βββ mapping.json
β
βββ .gitignore
βββ README.md
To protect sensitive data:
β The following are NOT included in this repository:
- Real financial data
credentials.jsontoken.pickle
β Instead:
- Dummy/sample data is used
- A placeholder credentials file is provided
git clone <your-repo-link>
cd Expense-pipeline
pip install -r requirements.txt
- Place your
credentials.jsonin the project root - Run authentication once to generate
token.pickle
python scripts/main.py
- Configure a pipeline job
- Add cron trigger:
H/1 * * * *
- π Fully automated pipeline (no manual intervention)
- π§ Handles messy and inconsistent data gracefully
- π Template-based Excel generation
- βοΈ Seamless Google Drive integration
- π¦ Clean file lifecycle management
- π Secure handling of credentials
- Real-time file detection (using watchdog instead of cron)
- Multi-file batch processing
- Logging system with status tracking
- Cloud deployment (AWS / GCP)
- Dashboard for monitoring pipeline activity
This project goes beyond a simple script β it demonstrates:
- End-to-end pipeline design
- Automation mindset
- Real-world data handling
- Secure API integration
It reflects how modern systems process, transform, and move data at scale.
If you found this interesting or have suggestions, feel free to connect!
LinkedIn: https://www.linkedin.com/in/moses-kenny/
This project is licensed under the MIT License.
You are free to:
Use Modify Distribute
With proper attribution.
β If you like this project
Give it a star β on GitHub β it helps a lot!