Skip to content

JayCata/Working-with-Data-VSP-2023

Repository files navigation

Working with Big Data Syllabus

Instructor: Josh Catalano (jcatal@student.ubc.ca)
Office Hours: TBD
Class Hours: 9:00 AM - 12:00 PM
Class Room: ANGU 241

Learning Materials

Please bring the following items to every class

  • Laptop
  • Laptop charger
  • Paper notebook (or tablet)
  • Writing utensil (or stylus for your tablet)

Course Overview

With the rapid proliferation of larger and more complicated datasets, wrangling, analyzying, characterizing, interpreting, and generally deriving value from data are becoming increasingly important skills. In this course, students will learn the basics of computational programming and data analysis by applying that knowledge to computer-based coding exercises, real data sets, and their own project. When relevant, we will examine the material we are learning in a big data context.

This course roughly follows (and at points, borrows examples and code from) the material laid out in QuantEcon's lecture notes "Introduction to Economic Modeling and Data Science" with some deviations in topics, course materials, and mathetmatical sophistication. In particular, we do not cover the economic-specfic material nor do we assume that students have the same mathematical pre-requisites. The necessary math will be taught as part of this course. While this course is self-contained and requires no economic knowledge, QuantEcon is a valuable alternative reference for the topics covered in this class and more advanced topics.

Learning Objectives

Students will begin by learning the fundamentals of coding in Python (a popular open-source programming language) using Jupyter notebooks. Simultaneously, they will learn to use GitHub for code collaboration and basic version control. Afterwards, we provide a hands-on introduction to various topics including

  • Data visualization and mapping
  • Basic mathematics for data analysis
  • Data manipulation using the Pandas Python library
  • Scientific computing using the NumPy Python library

With a working knowledge of these tools, students will be ready to learn about and implement various regression and classification techniques such as

  • Linear Regression
  • Lasso Regression
  • Regression Trees
  • Logistic Regression
  • Classification Trees

Finally, the class will conclude with a lecture on Neural Networks.

Course Structure

This course consists of 13 3-hour classes. Roughly the first half of each class will be dedicated to lecture. After lecture, there will be a short break if time allows. The remainder of the class period will be spent working on a lab assignment or the final project. Students will generally finish the lab assignments in class. If lab assignments are not finished in class, students must submit them within 48 hours of the end of class.

Students will also work on a group project both in and out of class. On the final exam day, students will present their project instead of taking a final exam. More details on that final project and presentation will be shared in class.

Course materials will be provided on GitHub. Canvas will only be used for assignment submission and keeping track of grades.

Grading

  • 50% In-class lab assignments
  • 35% Project
  • 10% Project Presentation
  • 5% Project Proposal

Detailed Timeline

Note that the following is an ambitious timeline for the class and is subject to change.

Day 1 (July 18th)

Lecture: Big Data, course description, introduction to GitHub, Jupyter, & Python
Lab: Sign up for GitHub, access Jupyter Open, create & clone repositories

Day 2 (July 19th)

Lecture: Python fundamentals & collections
Lab Assignment 1: Python fundamentals & collections

Day 3 (July 21st)

Lecture: Control flow & functions
Lab Assignment 2: Control flow & functions

Day 4 (July 24th)

Lecture: Introduction to data & Pandas
Lab: Introduction to project, getting into groups, start project

Day 5 (July 25th)

Lecture: More data & Pandas
Lab Assignment 3: Data & Pandas

Day 6 (July 26th)

Lecture: Introduction to arrays, matrix algebra, & NumPy
Lab Assignment 4: Arrays, matrix algebra, & NumPy

Day 7 (July 27th)

Lecture: Basic plotting & mapping
Lab: Working on project in groups. You should be collecting data, figuring out topic(s)/story, thinking of visualizations, and preparing your proposal.

Day 8 (July 28th)

Lecture: Introduction to regression
Lab: Working on project in groups. You should be coding up preliminary descriptive statistics and visualizations that are relevant to your selected topic(s).

Day 9 (July 31st)

Lecture: More regression
Lab Assignment 5: Regression

Day 10 (August 1st)

Lecture: Introduction to Classification
Lab: Working on Projects in groups. Should be refining descriptive statistics and figures and begin writing up some text in Jupyter Notebook.

Day 11 (August 2nd)

Lecture: Introduction to Neural Networks & PyTorch
Lab: Working on project in groups. You should be integrating figures, descriptive statistics, and writing into Jupyter Notebook. You should also start thinking of which classification and/or regression models you will run.

Day 12 (August 3rd)

Lecture: Neural networks & PyTorch continued
Lab: Working on project in groups. You should be running your analyses and thinking about how to frame them.

Day 13 (August 4th)

Lecture: Final notes and end of progam evaluation survey
Lab: Working on project in groups. You should be finalizing your analyses, polishing the project, running details by me, preparing for the presentation, and doing anything else that needs to be done. At this point, you want to be almost finished.

Day 14 (August 8th)

When you get to class, you will received an aditional hour of time to work on your project or presentation. At the end of that time, your project must be submitted to me on Canvas. Then, each group will present their project for the remaining two hours.

About

Thi is the GitHub for VSP 2023 Working with Data Course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors