Skip to content

Retail Discount Leadership and Follow on Pricing Pattern Analysis#287

Open
radia-zayeen wants to merge 4 commits into
DataBytes-Organisation:mainfrom
radia-zayeen:main
Open

Retail Discount Leadership and Follow on Pricing Pattern Analysis#287
radia-zayeen wants to merge 4 commits into
DataBytes-Organisation:mainfrom
radia-zayeen:main

Conversation

@radia-zayeen
Copy link
Copy Markdown

Overview

This PR builds out the DiscountMate data analysis pipeline from raw supermarket CSVs through to reusable cross-retailer analysis outputs.

It introduces:

  • A standardized master price dataset across Coles, Woolworths, ALDI, and IGA\
  • Discount leader/follower analysis for like-for-like products
  • Private/house-brand comparison across selected categories

Changes

  • Built a master dataset pipeline that cleans retailer files separately, standardizes schema, and combines them into one reusable dataframe.
  • Updated ingestion to read from the nested raw data folder structure and handle schema drift, especially for IGA files.
  • Fixed size parsing so decimal pack sizes like 1.25L are normalized correctly.
  • Added discount event analysis using curated exact-match products, including leader retailer identification.
  • Extended the event analysis to calculate follower retailers and lag time in days.
  • Added a separate house-brand comparison module with curated matching rules and normalized unit-price comparisons.

Analysis Added

  • Discount Leader / Follower
  • Identifies price-drop events over time.
  • Determines which retailer leads each discount event.
  • Calculates follower retailers and lag days.

House Brands

  • Compares curated private-label products across retailers.
  • Covers milk, cola, and corn chips.
  • Normalizes prices to $/L or $/100g for fairer comparison.
  • Explicitly shows coverage gaps where a retailer does not have a clean comparable private-label SKU.

Notes

  • ALDI discount-history signals remain weaker where historical “was price” data is limited.
  • IGA had one empty file and multiple schema versions; the pipeline now handles these safely.
  • Coles private-label cola did not have a clean comparable match in the current dataset, so that gap is left explicit rather than forced.

MergeScript.py
retailer_analysis.py
house_brand_analysis.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant