group-activity/neoadj.md at master · biof309/group-activity

#Leveraging Current Surveillance Epidemiology and End Results (SEER) Data Elements to Characterize Receipt of Neoadjuvant Treatment

BIOF 309 Spring 2019 Final Project

Melissa Bruno

#Background

Neoadjuvant therapy, also referred to as induction therapy, is generally defined as systemic therapy given before localized cancer treatment.
Routine and accurate collection of this treatment sequence is essential to better understand therapeutic effectiveness and guide strategies in treatment plan for cancer care.
Problem: a standardized definition for neoadjuvant data collection does not exist in the literature.

#Objective We aim to leverage existing Surveillance, Epidemiology and End Results (SEER) data elements to investigate the development of an algorithm using data items collected and transmitted through SEER to calculate a score to characterize the likelihood a patient received neoadjuvant treatment.

#Methods

Dataset: SEER 2010-2016 colon cancer cases
The algorithm will use a set of 35 elements selected as the most neoadjuvant-informative variables the score calculation
These chosen indicator variables will then be translated into the following categories to help calculate neoadjuvant treatment scores: no neoadjuvant treatment, unlikely, possible, definite neoadjuvant treatment, unindicative, and unknown
The algorithm will then be validated using the SEER*Medicare linked dataset

#First steps with Python

Import data as a CSV file
View various aspects of the dataset to ensure import was done correctly
Data wrangling
Exploratory Data Analysis (EDA)
Creation of new neoadjuvant variables

#Steps 1-3 (import and wrangling) import pandas as pd pd.read_csv('/Users/mbruno2/Documents/neo_colon.csv')

neocolon.head() neocolon.tail() neocolon.describe()

Creating list of variable names: for col in neocolon.columns: print(col)

Renaming 'age' variable and viewing observations <18yo neocolon.rename(columns={'Age recode with <1 year olds':'Age'}, inplace=True) print(neocolon[neocolon['Age']<18])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

neoadj.md

Latest commit

History

neoadj.md

File metadata and controls