Skip to content

PopHIVE/PopHIVE_DataHub

Repository files navigation

Static Badge

Overview

Enclosed is the code used to pull data updates, process them, and serve lightweight versions for use on PopHIVE.org. This repository includes the legacy code for regularly interacting with APIs and scripts used to prepare data for visualization on the website.

The code was originally copied from a repository in the YSPH Data Science and Data Equity (YSPH-DSDE) GitHub Organization and was copied into the PopHIVE GitHub Organization May 6th, 2026. At the time of transfer, the YSPH-DSDE GitHub Organization codebase was no longer being actively contributed to and had been archived.

The replacement codebase can be found here.

Dates active: Feb 28th, 2025 – Apr 21st, 2026

Using these data

The data shown on PopHIVE.org are found in ./Data/Webslim/. These files are mostly stored in parquet format. If using R, these can be downloaded using the arrow package in R. For example:

library(arrow)

url1 <- 'https://github.com/ysph-dsde/PopHIVE_DataHub/raw/refs/heads/main/Data/Webslim/respiratory_diseases/rsv/ed_visits_by_county.parquet'

ds1 <- read_parquet(url1)

In general, the data closest to the source data are found in the 'value' column. Some datasets also include a 3 week moving average (value_smooth), and a smoothed value, scaled to between 0-100 (value_smooth_scale). The data in 'value' are generally drawn directly from the source data. Exceptions include:

  1. In some datasets where national level data were not provided by the source, a national average is calculated using a population-weighted average.

  2. For Epic Cosmos, if the data are based on fewer than 10 counts, the cell is suppressed. For visualization purposes, this is filled in with a value halfway between 0 and the minimum value reported for that state. These values are indicated with suppressed_flag=1.

Time-stamped archives of the data are available in the Pulled Data folder.

FAQ

Can I re-use the data from PopHIVE?

Yes! Much of the data are drawn from publicly available Federal datasets obtained from CDC or data.gov. Other data, including the results of research performed using Epic Cosmos or the data available through Google Health Trends, can be used with appropriate attribution. A suggested citation relating to this data is 'Results of research performed with Epic Cosmos were obtained from the PopHIVE platform [url for Github corresponding to the specific data source].’

Who is it for? PopHIVE is designed for a broad audience: - Members of the public who want to understand what’s happening in their communities. - Clinicians who need to anticipate trends and adjust care. - Public health departments and local governments who need up-to-date data to allocate resources. - Researchers, journalists, and advocates working to tell stories and drive policy change. - Policy makers and decision-makers who need to understand the basics of who, what, and where about health issues occurring in the areas they serve.

Can you show ZIP code-level data? Because the data is de-identified, we can’t always go down to ZIP code level, especially for sensitive conditions like STIs or mental health outcomes. For some topics, like asthma or heat-related illness, we can show more granular data. Our data team is constantly working to expand local detail while protecting individual privacy.

Will you show additional conditions in the future? Yes. PopHIVE is evolving based on user needs and feedback. As high-quality, de-identified data becomes available, we plan to expand condition-specific dashboards, such as those for diabetes, maternal health, and behavioral health. Please provide us feedback on what you’d like to see here.

How do I know the data is accurate or reliable? PopHIVE’s data team continually evaluates the quality and representativeness of the data. In some cases (like diabetes Hemoglobin A1C data), completeness varies, and we are committed to transparency about what the data can and can’t tell us. This is an evolving platform, and we're building new functionality and insights over time.

How are you using electronic health record data from Epic? Isn’t that a violation of HIPAA? PopHIVE doesn’t change any rules or regulations around health data sharing. We only use de-identified, aggregate data, following all existing privacy laws. We’re not sharing individual patient records—we’re simply making existing public health trends more timely and accessible for the public good.

Are you accepting additional data sets? Yes! We welcome partnerships and are actively working to expand PopHIVE’s data offerings. If you have a reliable, de-identified dataset that could help improve public understanding of health, we’d love to hear from you. Please submit here.

How can I give feedback on this tool? We’d love to hear from you. PopHIVE is shaped by the people who use it. Whether you have a technical suggestion, want to request a feature, or share how it helped your community, please submit here.

Currently available data

Category Source Description File(s) on PopHIVE Restrictions
Respiratory Diseases Google Health Trends This represents the volume of Google searches for ‘RSV’, statistically adjusted to remove searches related to RSV immunizations. Unadjusted search volumes can be accessed here. Weekly time series of RSV for multiple indicators Non-commercial purposes
Respiratory Diseases Epic Cosmos Percentage of ED visits due to RSV, influenza, or COVID-19, based on ICD-10 coding Weekly time series of RSV, influenza, and COVID-19 for multiple indicators by state Can be used with attribution (see FAQ)
Respiratory Diseases CDC National Respiratory and Enteric Virus Surveillance System Number of positive tests for RSV, by health and human services region. RSV positive tests by region -
Respiratory Diseases CDC National Wastewater surveillance program Viral concentration for RSV, influenza, or SARS-CoV-2 in wastewater. Weekly time series of RSV, influenza, and SARS-COV-2 for multiple indicators by state -
Respiratory Diseases CDC National Syndromic Surveillance Program Percentage of ED visits due to RSV, influenza, or COVID-19. Weekly time series of RSV, influenza, and COVID-19 for multiple indicators by state -
Respiratory Diseases [CDC RESP-NET](https://data.cdc.gov/Public-Health-Surveillance/Rates-of-Laboratory-Confirmed-RSV-COVID-19-and-Flu/kvib-3txy/about_data "The CDC's Respiratory Virus Hospitalization Surveillance Network (RESP-NET) monitors laboratory-confirmed hospitalizations associated with influenza, COVID-19, and respiratory syncytial virus (RSV) among children and adults. The data are collected from hospitals in selected counties and county equivalents. This dataset has several important advantages: the area around the hospitals is well described, so rates of disease adjusted for population size can be accurately reported. The selected counties include ~10% of the US population and are demographically representative of the country. Detailed patient demographic information is available, and officials actively search for cases to ensure they capture all cases in the data. A limitation is that the network relies on the clinicians to perform viral ... Number of laboratory-confirmed hospitalizations due to the virus per 100,000 people. Weekly time series of RSV, influenza, and COVID-19 for multiple indicators by state -
Respiratory Diseases CDC Active Bacterial Core Surveillance (ABCs) The number of cases of invasive pneumococcal disease by age group, year, and serotype, 1998-2023. For 2018, state-specific breakdowns are provided Serotype-specific IPD by year, Number of IPD cases by state for 2018 -
Respiratory Diseases Surveillance for serotype-specific pneumococcal pneumonia Comparison of invasive pneumococcal disease and pneumonia Comparison of IPD and pneumonia -
Childhood Immunizations CDC National Immunization Survey Estimates of immunization coverage by vaccine, age, and state. Immunization rates -
Childhood Immunizations CDC National Immunization Survey Estimates of immunization coverage by vaccine, age, and state, and by urbanicity of the county/city of residence. Immunization rates -
Childhood Immunizations CDC National Immunization Survey Estimates of immunization coverage by vaccine, age, and state, and by insurance status. Immunization rates -
Chronic diseases Epic Cosmos Percentage of 'active users' in Epic Cosmos who have a history of measurements indicating diabetes (Hemoglobin A1C ≥7%)

Legal Disclaimer

These data and PopHIVE statistical outputs are provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose, and noninfringement. In no event shall the authors, contributors, or copyright holders be liable for any claim, damages, or other liability, whether in an action of contract, tort, or otherwise, arising from, out of, or in connection with the data or the use or other dealings in the data.

The PopHIVE statistical outputs are research tools intended for use in the fields of public health and medicine. They are not intended for clinical decision making, are not intended to be used in the diagnosis or treatment of patients and may not be useful or appropriate for any clinical purpose. Users of the PopHIVE statistical outputs should be aware of their responsibilities to ensure the ethical and appropriate use of this technology, including adherence to any applicable legal and regulatory requirements.

The content and data provided with the statistical outputs do not replace the expertise of healthcare professionals. Healthcare professionals should use their professional judgment in evaluating the outputs of the PopHIVE statistical outputs.

About

First prototype for automated data retrieval, cleaning, and formatting for display on PopHIVE.org.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors