Skip to content

Moseskenny/Auto-ExpensePipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ Automated Expense Processing Pipeline

(Python + Jenkins + Google Drive API)

πŸ“Œ Overview

This project is a fully automated expense processing pipeline designed to simulate a real-world data engineering workflow.

It continuously monitors incoming data, transforms raw financial records into a structured format, and uploads the results to the cloud β€” all without manual intervention.

The goal of this project was to:

  • Build a practical automation system
  • Handle messy real-world data
  • Implement a CI/CD-style pipeline using Jenkins
  • Integrate securely with external APIs (Google Drive)

πŸ’‘ Why This Project?

In real-world scenarios, organizations deal with:

  • Unstructured or inconsistent data (CSV files)
  • Repetitive manual processing (Excel formatting, categorization)
  • Risk of human error in financial tracking

This project solves those problems by:

βœ” Automating the entire workflow βœ” Cleaning and standardizing input data βœ” Ensuring consistency in output format βœ” Reducing manual effort to near zero

It mimics how data pipelines in production systems operate β€” making it highly relevant for roles in:

  • Data Engineering
  • Backend Development
  • DevOps / Automation

Demo:

Animation


βš™οΈ How It Works (Pipeline Flow)

πŸ“₯ Inbox Folder (CSV Input)
        ↓
🧠 Python Processing Engine
        ↓
πŸ“Š Structured Excel Output
        ↓
☁️ Google Drive Upload
        ↓
πŸ“¦ Archive (Processed Files)

πŸ”„ Step-by-Step Workflow

1. πŸ“‚ File Detection

  • Jenkins runs the pipeline on a scheduled trigger (cron-based)
  • The system scans the inbox/ directory
  • Identifies the latest folder containing CSV data

2. 🧹 Data Processing

  • Reads raw CSV using Pandas

  • Handles inconsistent data formats:

    • Fixes corrupted or invalid date formats
    • Standardizes categories using a mapping config
  • Extracts:

    • Date
    • Transaction Type (Income / Expense)
    • Category
    • Amount
    • Notes

3. πŸ“… Smart Date Handling

  • Preserves:

    • Day
    • Year
  • Replaces:

    • Month β†’ with current system month
  • Ensures valid and consistent date formatting across all records


4. πŸ“Š Excel Generation

  • Uses a pre-defined Excel template

  • Writes processed data starting from row 9

  • Automatically fills:

    • Cell F5 β†’ Current Month (e.g., March)

5. ☁️ Cloud Upload

  • Uploads generated Excel file to Google Drive
  • Uses OAuth 2.0 authentication
  • Securely manages access via token.pickle

6. 🧼 Cleanup & Archival

  • Moves processed CSV file to processed/
  • Deletes local Excel file after successful upload
  • Ensures no duplicate or redundant data

🧰 Tech Stack

🐍 Backend & Processing

  • Python
  • Pandas (data transformation)
  • OpenPyXL (Excel handling)

βš™οΈ Automation

  • Jenkins (CI/CD pipeline scheduling)

☁️ Cloud Integration

  • Google Drive API
  • OAuth 2.0 Authentication

πŸ—‚οΈ File Handling

  • OS / Shutil (file operations & archiving)

πŸ“ Project Structure

Expense-pipeline/
β”‚
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ main.py
β”‚   β”œβ”€β”€ processor.py
β”‚   β”œβ”€β”€ uploader.py
β”‚
β”œβ”€β”€ inbox/          # Incoming CSV folders
β”œβ”€β”€ outbox/         # Generated Excel files
β”œβ”€β”€ processed/      # Archived CSV files
β”‚
β”œβ”€β”€ templates/
β”‚   └── master_template.xlsx
β”‚
β”œβ”€β”€ configs/
β”‚   └── mapping.json
β”‚
β”œβ”€β”€ .gitignore
└── README.md

πŸ” Security Considerations

To protect sensitive data:

❌ The following are NOT included in this repository:

  • Real financial data
  • credentials.json
  • token.pickle

βœ” Instead:

  • Dummy/sample data is used
  • A placeholder credentials file is provided

▢️ How to Run

1. Clone the repository

git clone <your-repo-link>
cd Expense-pipeline

2. Install dependencies

pip install -r requirements.txt

3. Add your Google Drive credentials

  • Place your credentials.json in the project root
  • Run authentication once to generate token.pickle

4. Run manually

python scripts/main.py

5. Run with Jenkins

  • Configure a pipeline job
  • Add cron trigger:
H/1 * * * *

✨ Key Features

  • πŸ”„ Fully automated pipeline (no manual intervention)
  • 🧠 Handles messy and inconsistent data gracefully
  • πŸ“Š Template-based Excel generation
  • ☁️ Seamless Google Drive integration
  • πŸ“¦ Clean file lifecycle management
  • πŸ” Secure handling of credentials

πŸš€ Future Improvements

  • Real-time file detection (using watchdog instead of cron)
  • Multi-file batch processing
  • Logging system with status tracking
  • Cloud deployment (AWS / GCP)
  • Dashboard for monitoring pipeline activity

πŸ“£ Final Thoughts

This project goes beyond a simple script β€” it demonstrates:

  • End-to-end pipeline design
  • Automation mindset
  • Real-world data handling
  • Secure API integration

It reflects how modern systems process, transform, and move data at scale.


πŸ”— Connect

If you found this interesting or have suggestions, feel free to connect!

LinkedIn: https://www.linkedin.com/in/moses-kenny/


πŸ“œ License

This project is licensed under the MIT License.

You are free to:

Use Modify Distribute

With proper attribution.


⭐ If you like this project

Give it a star ⭐ on GitHub β€” it helps a lot!

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages