Skip to content

danielhx2001/College_Scorecard_Data_Cleaning_And_Analysis_Using_PostgreSQL

Repository files navigation

🎓 College Scorecard Institutional Analysis (2018–2019)

📌 Project Overview

This project analyzes institution-level data from the U.S. Department of Education’s College Scorecard to examine how institutional characteristics—such as ownership type, selectivity, and cost of attendance—are associated with student outcomes, including graduation rates, retention rates, and early employment outcomes.

The analysis focuses specifically on four-year, bachelor’s degree–granting institutions, with the goal of producing analysis-ready datasets and interpretable insights relevant to institutional research, higher education policy, and data analytics roles.

🎯 Project Goals

The main objectives of the project include:

  • Clean and prepare a large unstructured public data set for analysis using Postgre SQL
  • Explore how insitutional ownership (Public, Private Non-profit, Private For-profit) relates to students outcomes(retention and graduation)
  • Find out does school selectivity (admission rate) have relationship with students' graduation rates and job placement rates
  • Analyze the cost of attendance and students outcomes such as graduation rates and job placement rates

📊 Dataset

Source: U.S. Department of Education – College Scorecard
Academic Year: 2018–2019
Key Variables:

  • Institutional characteristics (ownership/control, degree offerings, enrollment size)
  • Admissions selectivity (admission rate)
  • Cost of attendance (public/private cost measures)
  • Student outcomes:
    • Retention Rate
    • Graduation Rate
    • Retention Rate
    • Employment outcomes one year after graduation

🛠️ Data Tools Used & Concepts

PostgreSQL

  • Cleaned and prepared large-scale public datasets, including type conversion, standardization of missing values, and construction of analysis-ready variables using COALESCE
  • Structured multi-step transformations with Common Table Expressions (CTEs)
  • Applied window functions (ROW_NUMBER, PERCENTILE_CONT, NTILE) to compute percentiles, rankings, and group-level benchmarks
  • Categorized institutions by ownership, selectivity tiers (admission rate quartiles), and cost tiers (cost of attendance quartiles) using CASE WHEN syntax
  • Compared graduation, retention, and employment outcomes across institutional groups using multiple aggregate functions

💡Conclusions from the analysis

Institution type: Public and private nonprofit schools have higher graduation and retention rates than private for-profit schools; private nonprofits are slightly higher than publics in graduation.

Selectivity: More selective institutions show higher graduation rates, but employment rates are similar across selectivity tiers (job quality/earnings not available).

Cost: Higher-cost institutions have higher graduation rates, while employment rates are relatively consistent across cost tiers.

About

This project analyzes the U.S. DOE College Scorecard Data to explore how institutional characteristics relate to student outcomes

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors