Skip to content

r-lode/IA_LiquorAnalytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Iowa Liquor Analytics Project

Project Tasks Status:

Data Preparation

  • Separate Incorrect/Missing Yearly Data from Correct Yearly Data
  • Data cleaning of incorrect data
  • Merge newly cleaned data with original cleaned data
  • Separate remaining data into csv for later cleaning
  • Prepare Yearly Iowa County Population Data for per capita consumption anaylsis
  • Improve markdown clarity
  • retesting measures

SQL Database

  • Establish connection between Linux environment and local MySQL Desktop
  • Import Data into MySQL Database using batch processing
  • Extract data subsets as .txt files to for dashboard building in Tableau

Dashboard Building

  • County Per Capita Consumption Visualization Completed
  • Other visualizations to be determined
  • Publish Dashboards

Table of Contents

  1. Project Description
  2. Example Visualizations
  3. Other Data Links

Project Description

The goal of this project was to clean and analyze a large publicly available dataset using techniques in Python, SQL, and Tableau. For this endeavor, I chose the Sales Data from Iowa Class “E” liquor licensees dataset from data.iowa.gov, which details all the liquor sales in Iowa from January 2012 through the end of September 2023, since I downloaded this data in early November of 2023.

This dataset was very large at 27489743 values. Upon downloading the data, over 20% of the rows had missing values. Compounding this issue was a large number of rows with mismatched and incorrect data types.Thus, cleaning the data was a massive undertaking. It soon became apparent that cleaning 100% of this data was not going to be possible given time constraints and my sanity as many errors were very granular in nature. After an intensive effort, I was able to reduce the number of missing values down to only 1.03% of the original data. However, some simplifying allowances were used to fill in some of the missing geodata. It is possible that a very tiny fraction of the data will have geodata that is slightly off from from the listed address. In such cases, however, the row address will share the same city, zip code, and county as the geo-point used to fill in said data. Again, I emphasize that achieving perfection here was not possible due to time constraints. However, given this allowance, I advise slight caution when using the data generated by this process to conduct analysis at the neighborhood level. My process is outlined in detail in the noteboks in the data folder.

Visualizations

The first visualization I built was a simple line chart showing sales by month. From this, we can see a general seasonal trend of higher in sales in the summer, but lower sales in the winter.

Monthly Sales

I also built a map showing average per capita consumption (in liters) across Iowa counties. Interestingly, Dickinson county had by far the highest rate of any county. Having spent a lot of time at the lakes growing up, I'm fairly certain the huge boating/party scene on Okoboji/Spirit Lake has a big impact on this value.

Per Capita

You can also check out this additional map showing the sales volume by locations in Iowa. Each dot represents a different store location with bigger circles indicating a higher number of sales.

Map Sales

Other Data Links

  1. Link to Dashboard
  2. Iowa County Population Data
  3. Iowa Populated Places
  4. Active and Inactive Iowa Liquor Stores

About

Exploring liquor sales data in Iowa from 2012-2023

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors