Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions .github/workflows/pull_request.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,7 @@
name: Pull Request

on:
pull_request:
branches: [ main, staging ]
pull_request

jobs:

Expand Down
57 changes: 31 additions & 26 deletions 01-data_types.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ ottrpal::set_knitr_image_path()

# Clinical Data Types

Clinical data is health-related information collected from patients throughout their healthcare journey. It may come in many forms and its sensitive nature requires careful management by researchers.

## Learning Objectives

Expand All @@ -20,52 +21,57 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1ivDTcLjb2078O0Ge

## Clinical data is unique

Clinical data comes from a wide variety of sources and as such, requires careful consideration when designing, collecting, and analyzing this data. Unlike domains such as Finance or other areas in the sciences which predominantly use structured data, clinical data is often heterogeneous, integrating quantitative measurements, categorical data, subjective narratives from patient notes, and objective observations from doctor notes and even possible image analysis. Free text adds a layer of complexity with unstructured information, reflecting subjective patient experiences or qualitative insights from healthcare professionals. Furthermore, the contrast between patient and doctor notes reflects the dual perspectives of symptoms and formal diagnoses. In essence, clinical data's unique blend of structured and unstructured components, along with its multidisciplinary nature, necessitates specialized methodologies for comprehensive analysis and interpretation in the realm of healthcare.
Clinical data refers to information __collected from patients__ during healthcare delivery, clinical trials, and medical research. Clinical data comes from a wide variety of sources and as such, requires careful consideration when designing, collecting, and analyzing this data. Unlike domains such as Finance or other areas in the sciences which predominantly use structured data with predictable and consistent formats, clinical data is often heterogeneous, integrating many forms of both structured and unstructured data: quantitative measurements, categorical data, genotyping, images, subjective narratives from patient notes, and objective observations or conclusions. The unstructured nature of free text from notes reflecting subjective patient experiences or qualitative insights from healthcare professionals especially adds a further layer of complexity. Furthermore, the contrast between patient and doctor notes reflects the dual perspectives of symptoms and formal diagnoses. In essence, clinical data's unique blend of structured and unstructured components, along with its multidisciplinary nature, necessitates specialized methodologies for comprehensive analysis and interpretation in the realm of healthcare. Further, because clinical data contains sensitive, personal information about patients, there are additional security and ethics concerns in the handling and management of clinical data.

## Major clinical data types

Clinical data can come in many different forms, including

Clinical data can come in many different forms, including demographics, diagnoses, lab results, vital signs, medication records, procedures, genetic reports, images, scanned documents, and notes written by physicians, nurses, and other clinicians. Although any of these types of information might theoretically be available for use in clinical research, some sources are more accessible than others. As they are often stored directly in electronic medical record systems, notes, demographics, observations (labs, medications, procedures, and vitals), and images are the easiest data to work with, and are the focus of most Electronic Health Record (EHR) data research efforts.
* patient demographics
* medical history or records such as diagnoses, lab results, vital signs, medication records, or procedure history
* genetic reports
* health monitor data
* images
* scanned documents, and notes written by physicians, nurses, and other clinicians
* survey/ case report form (CRF responses)
* and more ...

### Structured Data

Observational and demographic data are often collectively referred to as “structured data”, as they are stored in electronic health record databases and often provided to researchers in tabular form. Although details may vary based on the type of EHR being used, the customizations to the EHR for the specific environment in which data was collected, and any pre-processing that might be done by institutional research offices prior to providing data to researchers, details are likely to be familiar across contexts. Structured data types frequently used in EHR research include demographics, diagnoses, lab values, procedures, vitals, and medication records. Often provided as tables indexed off of a patient or visit ID, these tables
often include timestamps and other supporting descriptors. For example, medication orders might specify the drug name, class, dose, unit, quantity, route, frequency, and other instructions.

Structured data tables often describe entries in terms of codes from standardized vocabularies. Diagnoses might be described with codes from the International Classification of Diseases (ICD) vocabulary, lab tests with Logical Observation Identifiers Names and Codes (LOINC), medications with National Drug Code (NDC), and procedures with Current Procedural Terminology (CPT) codes. These terms, or "billing codes", provide a common foundation that can be invaluable for identifying patients with a specific disease or who have received specified medications, particularly when integrating data from multiple sources.

```{r, fig.align='center', echo = FALSE, fig.alt= "Example of structured data, a table that includes patient id numbers, billing codes, dates, blood pressure measurements, weight measurements, height measurements, and prescribed medications coded using the National Drug Code (NDC)", out.width="100%"}
ottrpal::include_slide("https://docs.google.com/presentation/d/1ivDTcLjb2078O0GemkSeCgC1jmxk4fMsiFQaPaer9mQ/edit#slide=id.g3385bea4ad0_0_0" )
```{r fig.align='center', echo = FALSE, fig.alt= "Clinical data refers to information collected from patients during healthcare delivery, clinical trials, and medical research and could include demographics, medical history, lab results, imaging, treatment outcomes, and more", out.width="100%"}
ottrpal::include_slide("https://docs.google.com/presentation/d/1ivDTcLjb2078O0GemkSeCgC1jmxk4fMsiFQaPaer9mQ/edit?slide=id.g35bf5e18bfe_0_0#slide=id.g35bf5e18bfe_0_0")
```

Some sources of clinical data are more prevalent and readily obtainable than others. For instance, notes, demographics, images, and histories or observations/records (lab results, vitals, medications, procedures) are often stored directly in electronic medical record systems making them more easily accessible and so are the focus of most Electronic Health Record (EHR) data research efforts.

### Unstructured Data / Clinical Notes

Clinical notes are, perhaps unsurprisingly, generally shared as seemingly straightforward text files. However, the simple format should not be taken as a suggestion that the data are easy to interpret. Some EHR systems contain literally dozens of types of notes, covering specialties such as pathology or surgery; specific moments in care such as admission or discharge; particular procedures such as colonoscopies; patient-provider interactions such as telehealth or phone encounters, and many others. In addition to differing in content, these sources may have different layouts and formats, ranging from free-form reports to structured SOAP (subjective, objective, assessment, and plan) formats or even templated procedure reports. Understanding the types of notes available in a given context and where relevant data might be found is a key step in effectively using clinical notes.
### Structured data

Observational records such as test results and demographic data are often collectively referred to as “structured data”, as they are stored in electronic health record databases and often provided to researchers in tabular form. Structured data types frequently used in EHR research consist of comprehensive longitudinal records of a patient's interactions with a healthcare system and may include demographics, diagnoses, lab values, procedures, vitals, and medication records.

```{r, fig.align='center', echo = FALSE, fig.alt= "Unstructured Data - Data without specific format: includes images, pathology reports, radiology reports, clinical notes, discharge summaries and more. Includes an image of an x-ray and some clinical notes that states: Patient reports a sharp pain in the right lower abdomen for the past 24 hours.Temperature 98.6°F, BP 120/80, tenderness in the right lower quadrant. Suspected appendicitis. Refer to surgery for evaluation, initiate IV fluids, and pain management.", out.width="100%"}
ottrpal::include_slide("https://docs.google.com/presentation/d/1ivDTcLjb2078O0GemkSeCgC1jmxk4fMsiFQaPaer9mQ/edit#slide=id.g3385bea4ad0_0_14" )
```{r, fig.align='center', echo = FALSE, fig.alt= "Electronic health records include many different kinds of records on individuals over time, including, clinical notes, family history information, lab result, images, and medication information. The image shows data on the same individual over a period of time. ", out.width="100%"}
ottrpal::include_slide("https://docs.google.com/presentation/d/1ivDTcLjb2078O0GemkSeCgC1jmxk4fMsiFQaPaer9mQ/edit#slide=id.g338f75828af_0_7" )
```

The structured nature of this data allows for it to be stored in tables which may be indexed by a patient or visit ID and often include timestamps and other supporting descriptors. For example, medication orders might specify the drug name, class, dose, unit, quantity, route, frequency, and other instructions.

When used in EHR research, both structured data and clinical notes are generally de-identified to protect patient privacy. Patient ID numbers might be replaced with new identifiers, with linkages maintained by institutional “honest brokers” [@Dhir2008] charged with providing clinical data for research purposes. In some cases, dates may be changed as well. Clinical notes are generally “de-identified” through specialized software designed to remove names, dates, locations, and other sensitive details. Researchers working with institutions to access clinical data should be sure to understand local data de-identification practices.

Structured data tables often describe entries in terms of codes from standardized vocabularies. Diagnoses might be described with codes from the International Classification of Diseases (ICD) vocabulary, lab tests with Logical Observation Identifiers Names and Codes (LOINC), medications with National Drug Code (NDC), and procedures with Current Procedural Terminology (CPT) codes. These terms, or "billing codes", provide a common foundation that can be invaluable for identifying patients with a specific disease or who have received specified medications, particularly when integrating data from multiple sources.

```{r, fig.align='center', echo = FALSE, fig.alt= "Example of structured data, a table that includes patient id numbers, billing codes, dates, blood pressure measurements, weight measurements, height measurements, and prescribed medications coded using the National Drug Code (NDC)", out.width="100%"}
ottrpal::include_slide("https://docs.google.com/presentation/d/1ivDTcLjb2078O0GemkSeCgC1jmxk4fMsiFQaPaer9mQ/edit#slide=id.g3385bea4ad0_0_0" )
```

## Specific types of clinical data
As we've described it, structured clinical data is expected to have similarities, although specific details may vary based on the type of EHR being used and any customizations to the EHR for the specific environment or institution in which data was collected (e.g., any specialized pre-processing by institutional research offices prior to providing data to researchers). So it is important for researchers to consider differences present in their data if they've obtained it from multiple contexts or institutions.

### Unstructured data / clinical notes

Clinical notes are, perhaps unsurprisingly, generally shared as seemingly straightforward text files. However, the simple format should not be taken as a suggestion that the data are easy to interpret. Some EHR systems contain literally dozens of types of notes, covering specialties such as pathology or surgery; specific moments in care such as admission or discharge; particular procedures such as colonoscopies; patient-provider interactions such as telehealth or phone encounters, and many others. In addition to differing in content, these sources may have different layouts and formats, ranging from free-form reports to structured SOAP (subjective, objective, assessment, and plan) formats or even templated procedure reports. Understanding the types of notes available in a given context and where relevant data might be found is a key step in effectively using clinical notes.


### Physiological
### Monitoring data
```{r, fig.align='center', echo = FALSE, fig.alt= "Unstructured Data - Data without specific format: includes images, pathology reports, radiology reports, clinical notes, discharge summaries and more. Includes an image of an x-ray and some clinical notes that states: Patient reports a sharp pain in the right lower abdomen for the past 24 hours.Temperature 98.6°F, BP 120/80, tenderness in the right lower quadrant. Suspected appendicitis. Refer to surgery for evaluation, initiate IV fluids, and pain management.", out.width="100%"}
ottrpal::include_slide("https://docs.google.com/presentation/d/1ivDTcLjb2078O0GemkSeCgC1jmxk4fMsiFQaPaer9mQ/edit#slide=id.g3385bea4ad0_0_14" )
```

### Radiology

### Pathology
When used in EHR research, both structured data and clinical notes are generally de-identified to protect patient privacy. Patient ID numbers might be replaced with new identifiers, with linkages maintained by institutional “honest brokers” [@Dhir2008] charged with providing clinical data for research purposes. In some cases, dates may be changed as well. Clinical notes are generally “de-identified” through specialized software designed to remove names, dates, locations, and other sensitive details. Researchers working with institutions to access clinical data should be sure to understand local data de-identification practices.

### Synthetic Data

## How to acquire clinical data

Expand All @@ -79,5 +85,4 @@ When used in EHR research, both structured data and clinical notes are generally
### Metadata



## Conclusion
## Summary
77 changes: 77 additions & 0 deletions 01b-specific_data_types.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Specific Clinical Data Types

## Learning Objectives

```{r, fig.align='center', echo = FALSE, fig.alt= "Learning Objectives: 1. Explain why clinical data is unique compared to other types of biomedical research data, 2. Describe the difference between Structured and Unstructured data, 3. List major sources and types of clinical data", out.width="100%"}
ottrpal::include_slide("https://docs.google.com/presentation/d/1ivDTcLjb2078O0GemkSeCgC1jmxk4fMsiFQaPaer9mQ/edit#slide=id.g3385bea4ad0_0_30" )
```



## Physiological

## Monitoring data

:::bluebox
This section was written by: Jennifer Kelleher, Ph.D.^1^; Abigail S. Robbertz, Ph.D.^1^; and Meghan E. McGrady, Ph.D.^1,2^

**NOTE:** Jennifer Kelleher, Ph.D.^1^ and Abigail S. Robbertz, Ph.D.^1^ contributed equally.

^1^ Center for Adherence and Self-Management, Division of Behavioral Medicine and Clinical Psychology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA

^2^ Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA

The work discussed in this section was also supported by the National Cancer Institute at the National Institutes of Health (R21CA263704, K07CA200668) to MEM. JK and ASR are supported by the Eunice Kennedy Shriver National Institute of Child Health & Human Development at the National Institutes of Health (T32HD068223). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

:::

Electronic monitoring devices are digital tools that can be used to track health behaviors over time such as:

* Sleep
* Physical activity
* Medication adherence
* Calorie intake

Electronic monitoring devices can also be used to assess physical health indicators including:
* Blood glucose levels
* Blood pressure
* Heart rate and heart rate variability
* Oxygen saturation

Electronic monitoring devices enable researchers to track day-to-day health behaviors in the patient’s **"real-world" setting**. This allows researchers to explore patterns or changes in a patient’s health behavior and provides a richer understanding of **daily behavior over time**.


### Benefits of Monitoring Data

1) Electronic monitoring devices often include data transmission abilities that enable healthcare providers or researchers to access these data in near **real-time** potentially informing intervention and/or medical decision-making.

1) Electronic monitoring devices also have the potential to produce **more accurate** estimates of health behaviors than alternative strategies (e.g., self-report) as they are not subject to recall bias and can detect efforts to inflate adherence due to social desirability.

### Considerations

:::warning
This section is not exhaustive. Research teams are strongly encouraged to consult with experts with experience and training in collecting and analyzing data from specific devices.

To ensure the outcome variables are aligned with the research question of interest and [ethical and age/developmental considerations](https://pmc.ncbi.nlm.nih.gov/articles/PMC10798216/) (@psihogios_ethical_2024; @modi_pediatric_2012) have been appropriately accounted for, readers are encouraged to consult with researchers in their field who have integrated these measurement strategies into their work.
:::

#### Medication Adherence

There are three major components of medical adherence (the tracking of taking medication):

- Initiation: Starting a prescribed regimen
- Implementation: The amount of which a patient's medication-taking behavior corresponds with the treatment regimen or protocol
- Discontinuation: Stopping a perscribed regimen

For more information see:

- [A new taxonomy for describing and defining adherence to medications (Vrijens et al., 2012)](https://pmc.ncbi.nlm.nih.gov/articles/PMC3403197/)
- [Pediatric self-management: A framework for research, practice, and policy. (Modi et al., 2012)](https://pmc.ncbi.nlm.nih.gov/articles/PMC9923567/)

## Radiology

## Pathology

## Synthetic Data

## Summary
Loading
Loading