-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathLiegel_Capstone_Github.htm.html
More file actions
283 lines (278 loc) · 135 KB
/
Liegel_Capstone_Github.htm.html
File metadata and controls
283 lines (278 loc) · 135 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
<script src="https://rawcdn.githack.com/oscarmorrison/md-page/master/md-page.js"></script>
<!--This script was sourced from https://github.com/oscarmorrison/md-page and is used to format latex/other markdown items on GitHub.-->
<noscript
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Liegel_Capstone.htm</title>
<link rel="stylesheet" href="https://stackedit.io/style.css" />
</head>
<body class="stackedit">
<div class="stackedit__html"><h1 id="uncovering-potential-environmental-and-social-links-to-chronic-disease-frequency-with-penalized-regression">Uncovering Potential Environmental And Social Links to Chronic Disease Frequency With Penalized Regression</h1>
<p>By Taylor Liegel<br>
The page is hosted on GitHub <a href="https://ts2002.github.io/Capstone/Liegel_Capstone_Github.htm.html#">here</a>.
</p>
<h2 id="abstract">Abstract</h2>
<p>In recent years, the effects of the environment on health have become more widely known and discussed. This research leverages penalized regression models to perform variable selection and model creation to predict disease frequency on a census-tract level. Through the incorporation of both environmental and social variables, models were created to reliability predict Cancer and Asthma frequencies and provide insights in regards to the strongest predictors of both disease types.</p>
<h2 id="introduction">Introduction</h2>
<p>Cumulative impacts are defined by the CDC as the “total harm to human health” resulting from exposure to “environmental burden[s], pre-existing health conditions, and social factors” [1, pp. 5]. One well-known incident of widespread harm originating from environmental hazards is the community of Love Canal, New York. The primarily working-class community was developed on a former chemical waste site, resulting in devastating health effects on its residents years after its foundation [2]. These effects included diseases such as asthma or epilepsy, as well as birth defects [2]. While chemical hazards are only one part of a larger picture of potential detriments to human health, this case illustrates how environmental and socioeconomic factors play a role in what demographics are affected by cumulative impacts and how. The purpose of this research is to identify on a census-tract level utilizing the CDC’s Environmental Justice Index (2022) what environmental and sociological factors are the strongest predictors of specific chronic disease types, and if they can be used to reliably predict disease frequency [1,pp. 6].</p>
<p>This analysis will be performed by tuning penalized regularization models, utilizing soft thresholding to obtain all variables weighted over a specific value, and building models that predict census-tract level disease frequency (Asthma, Cancer) with the resulting subset of features. The specific penalized regularized model types that will be used are smoothly clipped absolute deviation (SCAD) [3], ElasticNet [4], and SqrtLasso [5].</p>
<h2 id="description-of-data">Description of Data</h2>
<p>The Environment Justice Index (2022) is the latest version of the EJI produced by the CDC. The EJI is an agglomeration of several different government datasets sourced from the Census Bureau, the EPA, the Mine Safety and Health Administration and, the CDC [1, pp.10]. The purpose of the dataset is to quantify the cumulative impacts experienced by communities at a census tract level and chronic health disease frequency [1, pp.10]. Three major modules are covered in data collection, the Environmental Burden Module, the Social Vulnerability Module and the Health Vulnerability Module with a total of 36 unique factors included and ten sub-domains [1 ,pp. 10].</p>
<p>The purpose of the Environment Burden Module is to quantify “the sum of activities that cause environmental pollution or negatively affect environmental and human health” [1,pp. 25]. This includes types of pollution, such as air, water or toxic waste pollution, but also proximity to congested infrastructure or consequences of specific community layouts. The Social Vulnerability Module intends to quantify factors that may “influence [a community’s ability] to respond to environmental hazards or influence environmental decision-making”[1,pp.14]. These factors include ethnic demographic information, socioeconomic indicators, household characteristics and housing types. Finally, the Health Vulnerability Module is intended to demonstrate “the prevalence of certain pre-existing health conditions”, specifically Asthma, Cancer, High Blood Pressure and Diabetes, which may “[demonstrate] a measurable form of biological susceptibility”[1,pp. 14].</p>
<figure>
<img src="https://www.atsdr.cdc.gov/placeandhealth/eji/img/EJI-Indicators.jpg?_=66144" width="600" height="600">
<div align="center">
<figcaption>Fig 1.1) An overview of EJI's Domains, Sub-modules and Variables [1] </figcaption>
</div>
</figure>
From the pool of features within the dataset, it will be necessary to not include specific variables due to potential collinearity or being a result of another predictive model. This includes features that are census percentile rank, or combinations of other variables like the specific domain values. It also includes “EP_TOTCR”, which is a percentage of an individual’s risk of developing cancer given the presence of different air toxins developed from a predictive model trained on the EPA’s 2014 National Air Toxins Assessment [1, pp.28].
<p>This left thirty viable features to perform analysis with. These features included:</p>
<ul>
<li>E_OZONE : The annual mean days above O3 regulatory standard : 3 year average (2014-2016)</li>
<li>E_PM : The annual mean days above PM2.5 regulatory standard : 3 year average (2014-2016)</li>
<li>E_DSLPM : Ambient concentration of diesel PM/m3 (2014)</li>
<li>E_NPL: Proportion of tract’s area within 1-mi buffer of EPA National Priority List Site (Superfund site)</li>
<li>E_TRI: Proportion of tract’s area within 1-mi buffer of EPA Toxic Release Inventory Site</li>
<li>E_TSD : Proportion of tract’s area within 1-mi buffer of EPA Treatment, Storage and Disposal site</li>
<li>E_RMP: Proportion of tract’s area within 1-mi buffer of EPA risk management plan site</li>
<li>E_COAL: Proportion of tract’s area within 1-mi buffer of coal mines</li>
<li>E_LEAD : Proportion of tract’s area within 1-mi buffer of lead mines</li>
<li>E_PARK: Proportion of tract’s area within 1-mi buffer of green space (2020)</li>
<li>E_HOUAGE: Percentage of houses built pre-1980 (lead-exposure)</li>
<li>E_RAIL: Proportion of tract’s area within 1-mi buffer of railroad (2020)</li>
<li>E_ROAD : Proportion of tract’s area within 1-mi buffer of high-volume road or highway (2020)</li>
<li>E_AIRPRT: Proportion of tract’s area within 1-mi buffer of airport (2020)</li>
<li>E_IMPWTR: Percent of tract that intersects an impaired/impacted watershed at the HUC12 level.
<ul>
<li>Note: A watershed may be defined as “impaired” as a result of “elevated levels of waterborne pathogens or significant contamination by toxic substances” (1, pp.39)</li>
</ul>
</li>
<li>EP_MINRTY: Percentage of minority persons (2015-2019)</li>
<li>EP_POV200: Percentage below 200% poverty</li>
<li>EP_NOHSDP: Percentage of persons with no high school diploma (age 25+) estimate (2015-2019)</li>
<li>EP_UNEMP: Percentage of persons who are unemployed (2015-2019)</li>
<li>EP_RENTER: Percentage of persons who rent (2015-2019)</li>
<li>EP_HOUBDN : Percentage of households that make less than 75,000 (2015-2019)</li>
<li>EP_UNINSUR : Percentage of persons who are uninsured</li>
<li>EP_NOINT: Percentage of persons without internet</li>
<li>EP_AGE65 : Persons aged 65 and older estimate (2014-2018)</li>
<li>EP_AGE17 : Persons aged 17 and younger estimate (2014-2018)</li>
<li>EP_DISABL : Percentage of civilian noninstitutionalized population with a disability estimate (2014-2018)</li>
<li>EP_MOBILE : Percentage of mobile homes estimate (2015-2019)</li>
<li>EP_GROUPQ : Percentage of persons in group quarters estimate (2014-2018)</li>
</ul>
<p>For my target values, I used EP_ASTHMA and EP_CANCER, which respectively estimate the percentage of indviduals with asthma and cancer within a census tract.</p>
<p>More information about specific variables and their collection process can be found at the CDC’s EJI (2022) Data Dictionary [16].</p>
<h2 id="methods">Methods</h2>
<h3 id="pre-processing-methods">Pre-processing methods</h3>
<p>As the dataset was a processed collection of different datasets, relatively little preprocessing was needed. However, some census tracts had no values for certain features, so following a subset of all relevant features and targets, rows with NaN values were dropped from the subset dataframe.</p>
<p>Additionally, as part of the model tuning, I included options for of four different scalar options: using a Standard scalar, a Min Max scalar, a Quantile Transformer scalar and, using no scalar before passing the feature data to a model.</p>
<h2 id="the-analyticalmachine-learning-methods">The analytical/machine learning methods</h2>
<h3 id="penalized-regression--model-introductions">Penalized Regression & Model Introductions</h3>
<p>Penalized regression is often utilized in the variable selection process as through a defined objective function, larger coefficient values are penalized, pushing coefficients closer to zero [6]. The ultimate goal of utilizing a penalized regression method is to approximate a sparsity pattern within the variables, which are the resulting weights in the coefficient vector obtained during training that are non-zero [7]. Variables revealed by the sparsity pattern can be used to develop lower-dimension models that may yield better performance in comparison to more complex models [6]. Within this research penalized regressors will be used on the EJI (2022) dataset [1] in regards to determining a subset of features relating to disease frequency.</p>
<p>The three types of penalty types used within the penalized regression models are the smoothly clipped absolute deviation penalty [3] (SCAD), ElasticNet [4], and Square-root Lasso [5]. Each utilizes a different optimization function which consists of minimizing a traditional loss function plus a new penalty function whose value depends on the weights of the coefficients of the features.</p>
<p>The loss function added to the penalty functions to compromise the overall objective function to minimize in the following examples is MSE, which is defined by [7]:</p>
<p><span class="katex--display"><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mfrac><mn>1</mn><mi>n</mi></mfrac><msubsup><mi mathvariant="normal">Σ</mi><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></msubsup><mo stretchy="false">(</mo><msub><mi>y</mi><mi>i</mi></msub><mo>−</mo><mover accent="true"><msub><mi>y</mi><mi>i</mi></msub><mo>^</mo></mover><msup><mo stretchy="false">)</mo><mn>2</mn></msup></mrow><annotation encoding="application/x-tex">\frac{1}{n}\Sigma_{i=1}^n(y_{i}-\hat{y_{i}})^2</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height: 2.00744em; vertical-align: -0.686em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 1.32144em;"><span class="" style="top: -2.314em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord mathnormal">n</span></span></span><span class="" style="top: -3.23em;"><span class="pstrut" style="height: 3em;"></span><span class="frac-line" style="border-bottom-width: 0.04em;"></span></span><span class="" style="top: -3.677em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.686em;"><span class=""></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mord"><span class="mord">Σ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.714392em;"><span class="" style="top: -2.453em; margin-left: 0em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span><span class="mrel mtight">=</span><span class="mord mtight">1</span></span></span></span><span class="" style="top: -3.113em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">n</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.247em;"><span class=""></span></span></span></span></span></span><span class="mopen">(</span><span class="mord"><span style="margin-right: 0.03588em;" class="mord mathnormal">y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.311664em;"><span class="" style="top: -2.55em; margin-left: -0.03588em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.15em;"><span class=""></span></span></span></span></span></span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 1.11411em; vertical-align: -0.25em;"></span><span class="mord accent"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.69444em;"><span class="" style="top: -3em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span style="margin-right: 0.03588em;" class="mord mathnormal">y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.311664em;"><span class="" style="top: -2.55em; margin-left: -0.03588em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.15em;"><span class=""></span></span></span></span></span></span></span><span class="" style="top: -3em;"><span class="pstrut" style="height: 3em;"></span><span class="accent-body" style="left: -0.25em;"><span class="mord">^</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.19444em;"><span class=""></span></span></span></span></span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height: 0.864108em;"><span class="" style="top: -3.113em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span></span></span></span><br>
SCAD was introduced in 2001 by Fan and Li [3] as a new method in penalized regression which aims to improve on previous methods, such as Lasso or Ridge regression.<br>
The penalty function is defined by:</p>
<p><span class="katex--display"><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mover accent="true"><mi>θ</mi><mo>^</mo></mover><mo>=</mo><mrow><mo fence="true">{</mo><mtable rowspacing="0.1600em" columnalign="right center left" columnspacing="1em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow><mi>s</mi><mi>g</mi><mi>n</mi><mo stretchy="false">(</mo><msub><mi>β</mi><mi>j</mi></msub><mo stretchy="false">)</mo><mo stretchy="false">(</mo><mi mathvariant="normal">∣</mi><msub><mi>β</mi><mi>j</mi></msub><mi mathvariant="normal">∣</mi><mo>−</mo><mi>λ</mi><msub><mo stretchy="false">)</mo><mo lspace="0em" rspace="0em">+</mo></msub><mo separator="true">,</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>when</mtext></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow><mi mathvariant="normal">∣</mi><msub><mi>β</mi><mi>j</mi></msub><mi mathvariant="normal">∣</mi><mo><</mo><mn>2</mn><mi>λ</mi></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow><mo stretchy="false">{</mo><mo stretchy="false">(</mo><mi>α</mi><mo>−</mo><mn>1</mn><mo stretchy="false">)</mo><msub><mi>β</mi><mi>j</mi></msub><mo>−</mo><mi>s</mi><mi>g</mi><mi>n</mi><mo stretchy="false">(</mo><msub><mi>β</mi><mi>j</mi></msub><mo stretchy="false">)</mo><mi>α</mi><mi>λ</mi><mo stretchy="false">}</mo><mi mathvariant="normal">/</mi><mo stretchy="false">(</mo><mi>α</mi><mo>−</mo><mn>2</mn><mo stretchy="false">)</mo><mo separator="true">,</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>when</mtext></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow><mn>2</mn><mi>λ</mi><mo>≤</mo><mi mathvariant="normal">∣</mi><msub><mi>β</mi><mi>j</mi></msub><mi mathvariant="normal">∣</mi><mo>≤</mo><mi>α</mi><mi>λ</mi></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow><msub><mi>β</mi><mi>j</mi></msub><mo separator="true">,</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>when</mtext></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow><mi mathvariant="normal">∣</mi><msub><mi>β</mi><mi>j</mi></msub><mi mathvariant="normal">∣</mi><mo>></mo><mi>α</mi><mi>λ</mi></mrow></mstyle></mtd></mtr></mtable></mrow></mrow><annotation encoding="application/x-tex">\hat{\theta} = \left\{
\begin{array}{rcl}
sgn(\beta_{j})(|\beta_{j}| - \lambda)_{+},& \text{when} & |\beta_{j}| < 2\lambda \\
\{(\alpha - 1)\beta_{j} - sgn(\beta_{j})\alpha\lambda\}/(\alpha-2), & \text{when} & 2\lambda \leq |\beta_{j}| \leq \alpha\lambda \\
\beta_{j}, & \text{when} & |\beta_{j}| > \alpha\lambda
\end{array}
\right.
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height: 0.95788em; vertical-align: 0em;"></span><span class="mord accent"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height: 0.95788em;"><span class="" style="top: -3em;"><span class="pstrut" style="height: 3em;"></span><span style="margin-right: 0.02778em;" class="mord mathnormal">θ</span></span><span class="" style="top: -3.26344em;"><span class="pstrut" style="height: 3em;"></span><span class="accent-body" style="left: -0.16666em;"><span class="mord">^</span></span></span></span></span></span></span><span class="mspace" style="margin-right: 0.277778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right: 0.277778em;"></span></span><span class="base"><span class="strut" style="height: 3.60004em; vertical-align: -1.55002em;"></span><span class="minner"><span class="mopen"><span class="delimsizing mult"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 2.05002em;"><span class="" style="top: -2.49999em;"><span class="pstrut" style="height: 3.15em;"></span><span class="delimsizinginner delim-size4"><span class="">⎩</span></span></span><span class="" style="top: -2.49199em;"><span class="pstrut" style="height: 3.15em;"></span><span class="" style="height: 0.016em; width: 0.889em;"><svg width="0.889em" height="0.016em" style="width:0.889em" viewBox="0 0 889 16" preserveAspectRatio="xMinYMin"><path d="M384 0 H504 V16 H384z M384 0 H504 V16 H384z"></path></svg></span></span><span class="" style="top: -3.15001em;"><span class="pstrut" style="height: 3.15em;"></span><span class="delimsizinginner delim-size4"><span class="">⎨</span></span></span><span class="" style="top: -4.29201em;"><span class="pstrut" style="height: 3.15em;"></span><span class="" style="height: 0.016em; width: 0.889em;"><svg width="0.889em" height="0.016em" style="width:0.889em" viewBox="0 0 889 16" preserveAspectRatio="xMinYMin"><path d="M384 0 H504 V16 H384z M384 0 H504 V16 H384z"></path></svg></span></span><span class="" style="top: -4.30002em;"><span class="pstrut" style="height: 3.15em;"></span><span class="delimsizinginner delim-size4"><span class="">⎧</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 1.55002em;"><span class=""></span></span></span></span></span></span><span class="mord"><span class="mtable"><span class="arraycolsep" style="width: 0.5em;"></span><span class="col-align-r"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 2.05em;"><span class="" style="top: -4.21em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord mathnormal">s</span><span style="margin-right: 0.03588em;" class="mord mathnormal">g</span><span class="mord mathnormal">n</span><span class="mopen">(</span><span class="mord"><span style="margin-right: 0.05278em;" class="mord mathnormal">β</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.311664em;"><span class="" style="top: -2.55em; margin-left: -0.05278em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span style="margin-right: 0.05724em;" class="mord mathnormal mtight">j</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.286108em;"><span class=""></span></span></span></span></span></span><span class="mclose">)</span><span class="mopen">(</span><span class="mord">∣</span><span class="mord"><span style="margin-right: 0.05278em;" class="mord mathnormal">β</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.311664em;"><span class="" style="top: -2.55em; margin-left: -0.05278em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span style="margin-right: 0.05724em;" class="mord mathnormal mtight">j</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.286108em;"><span class=""></span></span></span></span></span></span><span class="mord">∣</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mord mathnormal">λ</span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.258331em;"><span class="" style="top: -2.55em; margin-left: 0em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">+</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.208331em;"><span class=""></span></span></span></span></span></span><span class="mpunct">,</span></span></span><span class="" style="top: -3.01em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mopen">{(</span><span style="margin-right: 0.0037em;" class="mord mathnormal">α</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mord">1</span><span class="mclose">)</span><span class="mord"><span style="margin-right: 0.05278em;" class="mord mathnormal">β</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.311664em;"><span class="" style="top: -2.55em; margin-left: -0.05278em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span style="margin-right: 0.05724em;" class="mord mathnormal mtight">j</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.286108em;"><span class=""></span></span></span></span></span></span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mord mathnormal">s</span><span style="margin-right: 0.03588em;" class="mord mathnormal">g</span><span class="mord mathnormal">n</span><span class="mopen">(</span><span class="mord"><span style="margin-right: 0.05278em;" class="mord mathnormal">β</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.311664em;"><span class="" style="top: -2.55em; margin-left: -0.05278em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span style="margin-right: 0.05724em;" class="mord mathnormal mtight">j</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.286108em;"><span class=""></span></span></span></span></span></span><span class="mclose">)</span><span style="margin-right: 0.0037em;" class="mord mathnormal">α</span><span class="mord mathnormal">λ</span><span class="mclose">}</span><span class="mord">/</span><span class="mopen">(</span><span style="margin-right: 0.0037em;" class="mord mathnormal">α</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mord">2</span><span class="mclose">)</span><span class="mpunct">,</span></span></span><span class="" style="top: -1.81em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord"><span style="margin-right: 0.05278em;" class="mord mathnormal">β</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.311664em;"><span class="" style="top: -2.55em; margin-left: -0.05278em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span style="margin-right: 0.05724em;" class="mord mathnormal mtight">j</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.286108em;"><span class=""></span></span></span></span></span></span><span class="mpunct">,</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 1.55em;"><span class=""></span></span></span></span></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 2.05em;"><span class="" style="top: -4.21em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">when</span></span></span></span><span class="" style="top: -3.01em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">when</span></span></span></span><span class="" style="top: -1.81em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">when</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 1.55em;"><span class=""></span></span></span></span></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="col-align-l"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 2.05em;"><span class="" style="top: -4.21em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">∣</span><span class="mord"><span style="margin-right: 0.05278em;" class="mord mathnormal">β</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.311664em;"><span class="" style="top: -2.55em; margin-left: -0.05278em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span style="margin-right: 0.05724em;" class="mord mathnormal mtight">j</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.286108em;"><span class=""></span></span></span></span></span></span><span class="mord">∣</span><span class="mspace" style="margin-right: 0.277778em;"></span><span class="mrel"><</span><span class="mspace" style="margin-right: 0.277778em;"></span><span class="mord">2</span><span class="mord mathnormal">λ</span></span></span><span class="" style="top: -3.01em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">2</span><span class="mord mathnormal">λ</span><span class="mspace" style="margin-right: 0.277778em;"></span><span class="mrel">≤</span><span class="mspace" style="margin-right: 0.277778em;"></span><span class="mord">∣</span><span class="mord"><span style="margin-right: 0.05278em;" class="mord mathnormal">β</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.311664em;"><span class="" style="top: -2.55em; margin-left: -0.05278em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span style="margin-right: 0.05724em;" class="mord mathnormal mtight">j</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.286108em;"><span class=""></span></span></span></span></span></span><span class="mord">∣</span><span class="mspace" style="margin-right: 0.277778em;"></span><span class="mrel">≤</span><span class="mspace" style="margin-right: 0.277778em;"></span><span style="margin-right: 0.0037em;" class="mord mathnormal">α</span><span class="mord mathnormal">λ</span></span></span><span class="" style="top: -1.81em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">∣</span><span class="mord"><span style="margin-right: 0.05278em;" class="mord mathnormal">β</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.311664em;"><span class="" style="top: -2.55em; margin-left: -0.05278em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span style="margin-right: 0.05724em;" class="mord mathnormal mtight">j</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.286108em;"><span class=""></span></span></span></span></span></span><span class="mord">∣</span><span class="mspace" style="margin-right: 0.277778em;"></span><span class="mrel">></span><span class="mspace" style="margin-right: 0.277778em;"></span><span style="margin-right: 0.0037em;" class="mord mathnormal">α</span><span class="mord mathnormal">λ</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 1.55em;"><span class=""></span></span></span></span></span><span class="arraycolsep" style="width: 0.5em;"></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></span></span>where lambda and alpha are model hyper-parameters.</p>
<p>The model was intended to overcome some of the drawbacks of other models, such as LASSO, by avoiding excessive biases during the variable selection process.</p>
<p>ElasticNet, developed by Zou and Hastie [4] in 2005, was proposed as a model to overcome the problems faced by Lasso and Ridge regression by utilizing a penalty function that is a convex combination of both approaches’ penalty functions, respectively L1 and L2 [4, 7].</p>
<p>The L1 penalty, used by Lasso [4,7], is defined by:<br>
<span class="katex--display"><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><msubsup><mi mathvariant="normal">Σ</mi><mrow><mi>j</mi><mo>=</mo><mn>1</mn></mrow><mi>p</mi></msubsup><mi mathvariant="normal">∣</mi><msub><mi>β</mi><mi>j</mi></msub><mi mathvariant="normal">∣</mi><mi mathvariant="normal">∣</mi></mrow><annotation encoding="application/x-tex">\Sigma_{j=1}^p|\beta_{j}||</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height: 1.19527em; vertical-align: -0.412972em;"></span><span class="mord"><span class="mord">Σ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.7823em;"><span class="" style="top: -2.42314em; margin-left: 0em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span style="margin-right: 0.05724em;" class="mord mathnormal mtight">j</span><span class="mrel mtight">=</span><span class="mord mtight">1</span></span></span></span><span class="" style="top: -3.18091em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">p</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.412972em;"><span class=""></span></span></span></span></span></span><span class="mord">∣</span><span class="mord"><span style="margin-right: 0.05278em;" class="mord mathnormal">β</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.311664em;"><span class="" style="top: -2.55em; margin-left: -0.05278em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span style="margin-right: 0.05724em;" class="mord mathnormal mtight">j</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.286108em;"><span class=""></span></span></span></span></span></span><span class="mord">∣∣</span></span></span></span></span></span></p>
<p>and the L2 penalty, used by Ridge [4,7] is defined by:<br>
<span class="katex--display"><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><msqrt><mrow><msubsup><mi mathvariant="normal">Σ</mi><mrow><mi>j</mi><mo>=</mo><mn>1</mn></mrow><mi>p</mi></msubsup><msubsup><mi>β</mi><mi>j</mi><mn>2</mn></msubsup></mrow></msqrt></mrow><annotation encoding="application/x-tex">\sqrt{\Sigma_{j=1}^p\beta^2_{j}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height: 1.84em; vertical-align: -0.614657em;"></span><span class="mord sqrt"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 1.22534em;"><span class="svg-align" style="top: -3.8em;"><span class="pstrut" style="height: 3.8em;"></span><span class="mord" style="padding-left: 1em;"><span class="mord"><span class="mord">Σ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.7823em;"><span class="" style="top: -2.42314em; margin-left: 0em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span style="margin-right: 0.05724em;" class="mord mathnormal mtight">j</span><span class="mrel mtight">=</span><span class="mord mtight">1</span></span></span></span><span class="" style="top: -3.18091em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">p</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.412972em;"><span class=""></span></span></span></span></span></span><span class="mord"><span style="margin-right: 0.05278em;" class="mord mathnormal">β</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.795908em;"><span class="" style="top: -2.42314em; margin-left: -0.05278em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span style="margin-right: 0.05724em;" class="mord mathnormal mtight">j</span></span></span></span><span class="" style="top: -3.0448em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.412972em;"><span class=""></span></span></span></span></span></span></span></span><span class="" style="top: -3.18534em;"><span class="pstrut" style="height: 3.8em;"></span><span class="hide-tail" style="min-width: 1.02em; height: 1.88em;"><svg width="400em" height="1.8800000000000001em" viewBox="0 0 400000 1944" preserveAspectRatio="xMinYMin slice"><path d="M983 90
l0 -0
c4,-6.7,10,-10,18,-10 H400000v40
H1013.1s-83.4,268,-264.1,840c-180.7,572,-277,876.3,-289,913c-4.7,4.7,-12.7,7,-24,7
s-12,0,-12,0c-1.3,-3.3,-3.7,-11.7,-7,-25c-35.3,-125.3,-106.7,-373.3,-214,-744
c-10,12,-21,25,-33,39s-32,39,-32,39c-6,-5.3,-15,-14,-27,-26s25,-30,25,-30
c26.7,-32.7,52,-63,76,-91s52,-60,52,-60s208,722,208,722
c56,-175.3,126.3,-397.3,211,-666c84.7,-268.7,153.8,-488.2,207.5,-658.5
c53.7,-170.3,84.5,-266.8,92.5,-289.5z
M1001 80h400000v40h-400000z"></path></svg></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.614657em;"><span class=""></span></span></span></span></span></span></span></span></span></span></p>
<p>Thus, as a convex combination of both penalty types, the ElasticNet model has the penalty function of:</p>
<p><span class="katex--display"><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>α</mi><mo stretchy="false">(</mo><mi>λ</mi><mo>∗</mo><msubsup><mi mathvariant="normal">Σ</mi><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></msubsup><mi mathvariant="normal">∣</mi><msub><mi>β</mi><mi>j</mi></msub><mi mathvariant="normal">∣</mi><mo>+</mo><mo stretchy="false">(</mo><mn>1</mn><mo>−</mo><mi>λ</mi><mo stretchy="false">)</mo><mo>∗</mo><msubsup><mi mathvariant="normal">Σ</mi><mrow><mi>j</mi><mo>=</mo><mn>1</mn></mrow><mi>p</mi></msubsup><msubsup><mi>β</mi><mi>j</mi><mn>2</mn></msubsup><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\alpha(\lambda *\Sigma_{i=1}^n |\beta_{j}| + (1-\lambda) * \Sigma_{j=1}^p\beta^2_{j})</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height: 1em; vertical-align: -0.25em;"></span><span style="margin-right: 0.0037em;" class="mord mathnormal">α</span><span class="mopen">(</span><span class="mord mathnormal">λ</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">∗</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 1.03611em; vertical-align: -0.286108em;"></span><span class="mord"><span class="mord">Σ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.714392em;"><span class="" style="top: -2.453em; margin-left: 0em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span><span class="mrel mtight">=</span><span class="mord mtight">1</span></span></span></span><span class="" style="top: -3.113em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">n</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.247em;"><span class=""></span></span></span></span></span></span><span class="mord">∣</span><span class="mord"><span style="margin-right: 0.05278em;" class="mord mathnormal">β</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.311664em;"><span class="" style="top: -2.55em; margin-left: -0.05278em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span style="margin-right: 0.05724em;" class="mord mathnormal mtight">j</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.286108em;"><span class=""></span></span></span></span></span></span><span class="mord">∣</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 1em; vertical-align: -0.25em;"></span><span class="mopen">(</span><span class="mord">1</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 1em; vertical-align: -0.25em;"></span><span class="mord mathnormal">λ</span><span class="mclose">)</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">∗</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 1.27708em; vertical-align: -0.412972em;"></span><span class="mord"><span class="mord">Σ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.7823em;"><span class="" style="top: -2.42314em; margin-left: 0em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span style="margin-right: 0.05724em;" class="mord mathnormal mtight">j</span><span class="mrel mtight">=</span><span class="mord mtight">1</span></span></span></span><span class="" style="top: -3.18091em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">p</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.412972em;"><span class=""></span></span></span></span></span></span><span class="mord"><span style="margin-right: 0.05278em;" class="mord mathnormal">β</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.864108em;"><span class="" style="top: -2.453em; margin-left: -0.05278em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span style="margin-right: 0.05724em;" class="mord mathnormal mtight">j</span></span></span></span><span class="" style="top: -3.113em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.383108em;"><span class=""></span></span></span></span></span></span><span class="mclose">)</span></span></span></span></span></span></p>
<p>Where lambda is a hyper-parameter that takes on a ratio value between 0 and 1 which determines the ratio of the weights of both penalties in the final result and demonstrates the convex property of the combination.</p>
<p>Square-root Lasso, introduced by Belloni, Chernozhukov, and Wang in 2011 is a modification on the existing Lasso method of penalized regularization [5]. The method aims to eliminate the need for the penalty to estimate the standard deviation of the noise [5].</p>
<p>Unlike the previous two method, the penalty function itself is not adjusted, but instead, an alteration is applied to the MSE loss function where the value used is the square root of the MSE [5]. The L1 penalty is used as with Lasso regression.</p>
<p>Thus, the overall objective function to be minimized is [5]:</p>
<p><span class="katex--display"><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><msqrt><mrow><mi>M</mi><mi>S</mi><mi>E</mi></mrow></msqrt><mo>+</mo><msub><mi>L</mi><mn>1</mn></msub></mrow><annotation encoding="application/x-tex">\sqrt{MSE} + L_{1}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height: 1.05887em; vertical-align: -0.08333em;"></span><span class="mord sqrt"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.97554em;"><span class="svg-align" style="top: -3em;"><span class="pstrut" style="height: 3em;"></span><span class="mord" style="padding-left: 0.833em;"><span style="margin-right: 0.05764em;" class="mord mathnormal">MSE</span></span></span><span class="" style="top: -2.93554em;"><span class="pstrut" style="height: 3em;"></span><span class="hide-tail" style="min-width: 0.853em; height: 1.08em;"><svg width="400em" height="1.08em" viewBox="0 0 400000 1080" preserveAspectRatio="xMinYMin slice"><path d="M95,702
c-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14
c0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54
c44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10
s173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429
c69,-144,104.5,-217.7,106.5,-221
l0 -0
c5.3,-9.3,12,-14,20,-14
H400000v40H845.2724
s-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7
c-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z
M834 80h400000v40h-400000z"></path></svg></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.06446em;"><span class=""></span></span></span></span></span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 0.83333em; vertical-align: -0.15em;"></span><span class="mord"><span class="mord mathnormal">L</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.301108em;"><span class="" style="top: -2.55em; margin-left: 0em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.15em;"><span class=""></span></span></span></span></span></span></span></span></span></span></span></p>
<p>or<br>
<span class="katex--display"><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><msqrt><mrow><mfrac><mn>1</mn><mi>n</mi></mfrac><msubsup><mi mathvariant="normal">Σ</mi><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></msubsup><mo stretchy="false">(</mo><msub><mi>y</mi><mi>i</mi></msub><mo>−</mo><mover accent="true"><msub><mi>y</mi><mi>i</mi></msub><mo>^</mo></mover><msup><mo stretchy="false">)</mo><mn>2</mn></msup></mrow></msqrt><mo>+</mo><msubsup><mi mathvariant="normal">Σ</mi><mrow><mi>j</mi><mo>=</mo><mn>1</mn></mrow><mi>p</mi></msubsup><mi mathvariant="normal">∣</mi><msub><mi>β</mi><mi>j</mi></msub><mi mathvariant="normal">∣</mi><mi mathvariant="normal">∣</mi></mrow><annotation encoding="application/x-tex">\sqrt{\frac{1}{n}\Sigma_{i=1}^n(y_{i}-\hat{y_{i}})^2} + \Sigma_{j=1}^p|\beta_{j}||</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height: 2.44em; vertical-align: -0.788405em;"></span><span class="mord sqrt"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 1.6516em;"><span class="svg-align" style="top: -4.4em;"><span class="pstrut" style="height: 4.4em;"></span><span class="mord" style="padding-left: 1em;"><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 1.32144em;"><span class="" style="top: -2.314em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord mathnormal">n</span></span></span><span class="" style="top: -3.23em;"><span class="pstrut" style="height: 3em;"></span><span class="frac-line" style="border-bottom-width: 0.04em;"></span></span><span class="" style="top: -3.677em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.686em;"><span class=""></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mord"><span class="mord">Σ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.646192em;"><span class="" style="top: -2.42314em; margin-left: 0em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span><span class="mrel mtight">=</span><span class="mord mtight">1</span></span></span></span><span class="" style="top: -3.0448em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">n</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.276864em;"><span class=""></span></span></span></span></span></span><span class="mopen">(</span><span class="mord"><span style="margin-right: 0.03588em;" class="mord mathnormal">y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.311664em;"><span class="" style="top: -2.55em; margin-left: -0.03588em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.15em;"><span class=""></span></span></span></span></span></span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mord accent"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.69444em;"><span class="" style="top: -3em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span style="margin-right: 0.03588em;" class="mord mathnormal">y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.311664em;"><span class="" style="top: -2.55em; margin-left: -0.03588em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.15em;"><span class=""></span></span></span></span></span></span></span><span class="" style="top: -3em;"><span class="pstrut" style="height: 3em;"></span><span class="accent-body" style="left: -0.25em;"><span class="mord">^</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.19444em;"><span class=""></span></span></span></span></span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height: 0.740108em;"><span class="" style="top: -2.989em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span><span class="" style="top: -3.6116em;"><span class="pstrut" style="height: 4.4em;"></span><span class="hide-tail" style="min-width: 1.02em; height: 2.48em;"><svg width="400em" height="2.48em" viewBox="0 0 400000 2592" preserveAspectRatio="xMinYMin slice"><path d="M424,2478
c-1.3,-0.7,-38.5,-172,-111.5,-514c-73,-342,-109.8,-513.3,-110.5,-514
c0,-2,-10.7,14.3,-32,49c-4.7,7.3,-9.8,15.7,-15.5,25c-5.7,9.3,-9.8,16,-12.5,20
s-5,7,-5,7c-4,-3.3,-8.3,-7.7,-13,-13s-13,-13,-13,-13s76,-122,76,-122s77,-121,77,-121
s209,968,209,968c0,-2,84.7,-361.7,254,-1079c169.3,-717.3,254.7,-1077.7,256,-1081
l0 -0c4,-6.7,10,-10,18,-10 H400000
v40H1014.6
s-87.3,378.7,-272.6,1166c-185.3,787.3,-279.3,1182.3,-282,1185
c-2,6,-10,9,-24,9
c-8,0,-12,-0.7,-12,-2z M1001 80
h400000v40h-400000z"></path></svg></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.788405em;"><span class=""></span></span></span></span></span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 1.19527em; vertical-align: -0.412972em;"></span><span class="mord"><span class="mord">Σ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.7823em;"><span class="" style="top: -2.42314em; margin-left: 0em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span style="margin-right: 0.05724em;" class="mord mathnormal mtight">j</span><span class="mrel mtight">=</span><span class="mord mtight">1</span></span></span></span><span class="" style="top: -3.18091em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">p</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.412972em;"><span class=""></span></span></span></span></span></span><span class="mord">∣</span><span class="mord"><span style="margin-right: 0.05278em;" class="mord mathnormal">β</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.311664em;"><span class="" style="top: -2.55em; margin-left: -0.05278em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span style="margin-right: 0.05724em;" class="mord mathnormal mtight">j</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.286108em;"><span class=""></span></span></span></span></span></span><span class="mord">∣∣</span></span></span></span></span></span></p>
<p>The purpose of using a variety of models is to obtain different approximations of the feature sparsity pattern and to leverage the different strengths of each model to evaluate optimal solutions for different disease types.</p>
<h3 id="scalar-types">Scalar Types</h3>
<p>The three scalars used as options in the data pre-processing step were the Standard scalar, the Min-Max scalar and, the Quantile Transformer scalar.</p>
<p>The Standard Scalar is a transformation on the dataset which scales all points to the z score obtained based on the mean and standard deviation of the feature’s values. The z score is defined as [8]:</p>
<p><span class="katex--display"><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>z</mi><mo>=</mo><mfrac><mrow><mi>x</mi><mo>−</mo><mi>μ</mi></mrow><mi>σ</mi></mfrac></mrow><annotation encoding="application/x-tex">z = \frac{x-\mu}{\sigma}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height: 0.43056em; vertical-align: 0em;"></span><span style="margin-right: 0.04398em;" class="mord mathnormal">z</span><span class="mspace" style="margin-right: 0.277778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right: 0.277778em;"></span></span><span class="base"><span class="strut" style="height: 1.94633em; vertical-align: -0.686em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 1.26033em;"><span class="" style="top: -2.314em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span style="margin-right: 0.03588em;" class="mord mathnormal">σ</span></span></span><span class="" style="top: -3.23em;"><span class="pstrut" style="height: 3em;"></span><span class="frac-line" style="border-bottom-width: 0.04em;"></span></span><span class="" style="top: -3.677em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord mathnormal">x</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mord mathnormal">μ</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.686em;"><span class=""></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></span></span></p>
<p>The Min-Max scalar transforms the features by scaling them in the context of the minimum and maximum values found [8]. The resulting value is between a predefined bound, typically (0,1), and is calculated as [9]:</p>
<p><span class="katex--display"><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mfrac><mrow><mi>X</mi><mo>−</mo><msub><mi>X</mi><mrow><mi>m</mi><mi>i</mi><mi>n</mi></mrow></msub></mrow><mrow><msub><mi>X</mi><mrow><mi>m</mi><mi>a</mi><mi>x</mi></mrow></msub><mo>−</mo><msub><mi>X</mi><mrow><mi>m</mi><mi>i</mi><mi>n</mi></mrow></msub></mrow></mfrac><mo>∗</mo><mo stretchy="false">(</mo><mi>b</mi><mi>o</mi><mi>u</mi><mi>n</mi><mi>d</mi><mi mathvariant="normal">_</mi><mi>m</mi><mi>a</mi><mi>x</mi><mo>−</mo><mi>b</mi><mi>o</mi><mi>u</mi><mi>n</mi><mi>d</mi><mi mathvariant="normal">_</mi><mi>m</mi><mi>i</mi><mi>n</mi><mo stretchy="false">)</mo><mo>+</mo><mi>b</mi><mi>o</mi><mi>u</mi><mi>n</mi><mi>d</mi><mi mathvariant="normal">_</mi><mi>m</mi><mi>i</mi><mi>n</mi></mrow><annotation encoding="application/x-tex">\frac{X - X_{min}}{X_{max} - X_{min}} * (bound\_max - bound\_min) + bound\_min</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height: 2.19633em; vertical-align: -0.836em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 1.36033em;"><span class="" style="top: -2.314em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord"><span style="margin-right: 0.07847em;" class="mord mathnormal">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.151392em;"><span class="" style="top: -2.55em; margin-left: -0.07847em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">ma</span><span class="mord mathnormal mtight">x</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.15em;"><span class=""></span></span></span></span></span></span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mord"><span style="margin-right: 0.07847em;" class="mord mathnormal">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.311664em;"><span class="" style="top: -2.55em; margin-left: -0.07847em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">min</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.15em;"><span class=""></span></span></span></span></span></span></span></span><span class="" style="top: -3.23em;"><span class="pstrut" style="height: 3em;"></span><span class="frac-line" style="border-bottom-width: 0.04em;"></span></span><span class="" style="top: -3.677em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span style="margin-right: 0.07847em;" class="mord mathnormal">X</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mord"><span style="margin-right: 0.07847em;" class="mord mathnormal">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.311664em;"><span class="" style="top: -2.55em; margin-left: -0.07847em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">min</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.15em;"><span class=""></span></span></span></span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.836em;"><span class=""></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">∗</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 1.06em; vertical-align: -0.31em;"></span><span class="mopen">(</span><span class="mord mathnormal">b</span><span class="mord mathnormal">o</span><span class="mord mathnormal">u</span><span class="mord mathnormal">n</span><span class="mord mathnormal">d</span><span style="margin-right: 0.02778em;" class="mord">_</span><span class="mord mathnormal">ma</span><span class="mord mathnormal">x</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 1.06em; vertical-align: -0.31em;"></span><span class="mord mathnormal">b</span><span class="mord mathnormal">o</span><span class="mord mathnormal">u</span><span class="mord mathnormal">n</span><span class="mord mathnormal">d</span><span style="margin-right: 0.02778em;" class="mord">_</span><span class="mord mathnormal">min</span><span class="mclose">)</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 1.00444em; vertical-align: -0.31em;"></span><span class="mord mathnormal">b</span><span class="mord mathnormal">o</span><span class="mord mathnormal">u</span><span class="mord mathnormal">n</span><span class="mord mathnormal">d</span><span style="margin-right: 0.02778em;" class="mord">_</span><span class="mord mathnormal">min</span></span></span></span></span></span></p>
<p>The Quantile Transformer is a non-linear scalar [8] that transforms the features such that they follow a uniform and normal distribution [9] (uniform used for analysis purposes). When fitting the model a hyper-parameter for the number of quantiles used to create the cumulative distribution function is needed.</p>
<h3 id="model-tuning-with-pso">Model Tuning with PSO</h3>
<p>To optimize the hyper-parameters of the different penalized regressors used, an implementation of particle swarm optimization was used. Particle swarm optimization, as discussed in Kennedy and Eberhart [12], was based on the concept of bird flocking behavior, specifically the idea of simultaneous group and local behavior. Each particle within the swarm is defined with different values for each variable requested, and after each iteration, the value is adjusted by the velocity [12].</p>
<p>The velocity used in the analysis, where w, c1, and c2 are hyper-parameters was [13]:</p>
<p><span class="katex--display"><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><msub><mi>v</mi><mi>i</mi></msub><mo stretchy="false">(</mo><mi>t</mi><mo>+</mo><mn>1</mn><mo stretchy="false">)</mo><mo>=</mo><mi>w</mi><mo>∗</mo><msub><mi>v</mi><mi>i</mi></msub><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo>+</mo><msub><mi>c</mi><mi>i</mi></msub><mo>∗</mo><mi>U</mi><mi>n</mi><mi>i</mi><mi>f</mi><mo stretchy="false">(</mo><mn>0</mn><mo separator="true">,</mo><mn>1</mn><mo stretchy="false">)</mo><mo>∗</mo><mo stretchy="false">(</mo><msubsup><mi>p</mi><mi>i</mi><mrow><mi>p</mi><mi mathvariant="normal">_</mi><mi>b</mi><mi>e</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo>−</mo><msub><mi>p</mi><mi>i</mi></msub><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo>+</mo><msub><mi>c</mi><mn>2</mn></msub><mo>∗</mo><mi>U</mi><mi>n</mi><mi>i</mi><mi>f</mi><mo stretchy="false">(</mo><mn>0</mn><mo separator="true">,</mo><mn>1</mn><mo stretchy="false">)</mo><mo>∗</mo><mo stretchy="false">(</mo><msubsup><mi>p</mi><mi>i</mi><mrow><mi>g</mi><mi mathvariant="normal">_</mi><mi>b</mi><mi>e</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo>−</mo><msub><mi>p</mi><mi>i</mi></msub><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo>+</mo><msub><mi>c</mi><mn>2</mn></msub></mrow><annotation encoding="application/x-tex">v_{i}(t+1) = w * v_{i}(t) + c_{i} * Unif(0,1) * (p_{i}^{p\_best}(t) - p_{i}(t)) + c_{2} * Unif(0,1) * (p_{i}^{g\_best}(t) - p_{i}(t)) + c_{2}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height: 1em; vertical-align: -0.25em;"></span><span class="mord"><span style="margin-right: 0.03588em;" class="mord mathnormal">v</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.311664em;"><span class="" style="top: -2.55em; margin-left: -0.03588em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.15em;"><span class=""></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathnormal">t</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 1em; vertical-align: -0.25em;"></span><span class="mord">1</span><span class="mclose">)</span><span class="mspace" style="margin-right: 0.277778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right: 0.277778em;"></span></span><span class="base"><span class="strut" style="height: 0.46528em; vertical-align: 0em;"></span><span style="margin-right: 0.02691em;" class="mord mathnormal">w</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">∗</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 1em; vertical-align: -0.25em;"></span><span class="mord"><span style="margin-right: 0.03588em;" class="mord mathnormal">v</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.311664em;"><span class="" style="top: -2.55em; margin-left: -0.03588em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.15em;"><span class=""></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathnormal">t</span><span class="mclose">)</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 0.61528em; vertical-align: -0.15em;"></span><span class="mord"><span class="mord mathnormal">c</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.311664em;"><span class="" style="top: -2.55em; margin-left: 0em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.15em;"><span class=""></span></span></span></span></span></span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">∗</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 1em; vertical-align: -0.25em;"></span><span style="margin-right: 0.10903em;" class="mord mathnormal">U</span><span class="mord mathnormal">ni</span><span style="margin-right: 0.10764em;" class="mord mathnormal">f</span><span class="mopen">(</span><span class="mord">0</span><span class="mpunct">,</span><span class="mspace" style="margin-right: 0.166667em;"></span><span class="mord">1</span><span class="mclose">)</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">∗</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 1.32477em; vertical-align: -0.276864em;"></span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">p</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 1.04791em;"><span class="" style="top: -2.42314em; margin-left: 0em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="" style="top: -3.2618em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">p</span><span style="margin-right: 0.02778em;" class="mord mtight">_</span><span class="mord mathnormal mtight">b</span><span class="mord mathnormal mtight">es</span><span class="mord mathnormal mtight">t</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.276864em;"><span class=""></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathnormal">t</span><span class="mclose">)</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 1em; vertical-align: -0.25em;"></span><span class="mord"><span class="mord mathnormal">p</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.311664em;"><span class="" style="top: -2.55em; margin-left: 0em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.15em;"><span class=""></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathnormal">t</span><span class="mclose">))</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 0.61528em; vertical-align: -0.15em;"></span><span class="mord"><span class="mord mathnormal">c</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.301108em;"><span class="" style="top: -2.55em; margin-left: 0em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">2</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.15em;"><span class=""></span></span></span></span></span></span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">∗</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 1em; vertical-align: -0.25em;"></span><span style="margin-right: 0.10903em;" class="mord mathnormal">U</span><span class="mord mathnormal">ni</span><span style="margin-right: 0.10764em;" class="mord mathnormal">f</span><span class="mopen">(</span><span class="mord">0</span><span class="mpunct">,</span><span class="mspace" style="margin-right: 0.166667em;"></span><span class="mord">1</span><span class="mclose">)</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">∗</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 1.32477em; vertical-align: -0.276864em;"></span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">p</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 1.04791em;"><span class="" style="top: -2.42314em; margin-left: 0em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="" style="top: -3.2618em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span style="margin-right: 0.03588em;" class="mord mathnormal mtight">g</span><span style="margin-right: 0.02778em;" class="mord mtight">_</span><span class="mord mathnormal mtight">b</span><span class="mord mathnormal mtight">es</span><span class="mord mathnormal mtight">t</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.276864em;"><span class=""></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathnormal">t</span><span class="mclose">)</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 1em; vertical-align: -0.25em;"></span><span class="mord"><span class="mord mathnormal">p</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.311664em;"><span class="" style="top: -2.55em; margin-left: 0em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.15em;"><span class=""></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathnormal">t</span><span class="mclose">))</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 0.58056em; vertical-align: -0.15em;"></span><span class="mord"><span class="mord mathnormal">c</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.301108em;"><span class="" style="top: -2.55em; margin-left: 0em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">2</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.15em;"><span class=""></span></span></span></span></span></span></span></span></span></span></span></p>
<p>This includes incorporating the current velocity of the particle, the impact of the difference between the particle’s local best and the current local position, and the difference between the global best and the local best position.</p>
<p>This velocity is then added to the local position of the particle to update the value for that hyper-parameter being tuned.</p>
<p>PSO utilizes this searching in the context of an objective function, which for this analysis, was training a model on with the positional hyper-parameters and using the test set MSE from its predictions as a metric.</p>
<h3 id="variable-selection-with-soft-thresholding">Variable-Selection with Soft-Thresholding</h3>
<p>To select variables from the resulting penalized regression models, soft thresholding as described in “Statistical Learning with Sparsity The Lasso and Generalizations” [11] was used.</p>
<p>The soft-thresholding operator is defined by [11, pp.15]:<br>
<span class="katex--display"><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><msub><mi>S</mi><mi>λ</mi></msub><mo>=</mo><mi>s</mi><mi>i</mi><mi>g</mi><mi>n</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo stretchy="false">(</mo><mi mathvariant="normal">∣</mi><mi>x</mi><mi mathvariant="normal">∣</mi><mo>−</mo><mi>λ</mi><msub><mo stretchy="false">)</mo><mo lspace="0em" rspace="0em">+</mo></msub></mrow><annotation encoding="application/x-tex">S_{\lambda} = sign(x)(|x| - \lambda)_{+}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height: 0.83333em; vertical-align: -0.15em;"></span><span class="mord"><span style="margin-right: 0.05764em;" class="mord mathnormal">S</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.336108em;"><span class="" style="top: -2.55em; margin-left: -0.05764em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">λ</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.15em;"><span class=""></span></span></span></span></span></span><span class="mspace" style="margin-right: 0.277778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right: 0.277778em;"></span></span><span class="base"><span class="strut" style="height: 1em; vertical-align: -0.25em;"></span><span class="mord mathnormal">s</span><span class="mord mathnormal">i</span><span style="margin-right: 0.03588em;" class="mord mathnormal">g</span><span class="mord mathnormal">n</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span><span class="mopen">(</span><span class="mord">∣</span><span class="mord mathnormal">x</span><span class="mord">∣</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 1em; vertical-align: -0.25em;"></span><span class="mord mathnormal">λ</span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.258331em;"><span class="" style="top: -2.55em; margin-left: 0em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">+</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.208331em;"><span class=""></span></span></span></span></span></span></span></span></span></span></span></p>
<p>where values that exceed the threshold are translated through addition or subtraction by a value of the parameter lambda towards zero and values that do not are set to zero. By subsetting the resulting non-zero values, variable selection of the approximated sparsity pattern can be performed. A soft-thresholding class was defined implementing this concept.</p>
<p>This class took a threshold value as a parameter and applied the transformation to a set of values passed in. All non-zero values as well as their corresponding values were placed in a dictionary and sorted from highest to lowest absolute weight value.</p>
<h2 id="application-of-the-methods-and-the-validation-procedure">Application of the methods and the validation procedure</h2>
<h3 id="model-tuning">Model tuning</h3>
<figure>
<div align="center">
<img src="https://i.imgur.com/hAlaj4x.png" width="400" height="400">
<figcaption>Fig 2.1) An overview of the model creation process</figcaption>
</div>
</figure>
For each disease type (Asthma, Cancer), a model of each penalized regression type (ElasticNet, Square-root Lasso, and SCAD) was trained on the data and tuned with an adjusted particle swarm optimization. The specific parameters being tuned were the number of epochs, scalar type, the learning rate, alpha, and other model-specific parameters such as the L1 ratio for ElasticNet or lambda for SCAD. Data was split into a test and train set (test ratio = 25%) and models were trained on the train set and the MSE of the predictions on the test set were used as the objective function for PSO.
<p>PSO was run with ten particles and five iterations.</p>
<h3 id="model-tuning-results">Model tuning results</h3>
<p>For cancer soft-thresholding, a value of <span class="katex--display"><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mn>2.5</mn><mo>∗</mo><mn>1</mn><msup><mn>0</mn><mrow><mo>−</mo><mn>2</mn></mrow></msup></mrow><annotation encoding="application/x-tex">2.5*10^{-2}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height: 0.64444em; vertical-align: 0em;"></span><span class="mord">2.5</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">∗</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 0.864108em; vertical-align: 0em;"></span><span class="mord">1</span><span class="mord"><span class="mord">0</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height: 0.864108em;"><span class="" style="top: -3.113em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">−</span><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span></span></span></span></span><br>
was used.</p>
<p>For asthma soft-thresholding, a value of <span class="katex--display"><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mn>2.6</mn><mo>∗</mo><mn>1</mn><msup><mn>0</mn><mrow><mo>−</mo><mn>2</mn></mrow></msup></mrow><annotation encoding="application/x-tex">2.6*10^{-2}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height: 0.64444em; vertical-align: 0em;"></span><span class="mord">2.6</span><span class="mspace" style="margin-right: 0.222222em;"></span><span class="mbin">∗</span><span class="mspace" style="margin-right: 0.222222em;"></span></span><span class="base"><span class="strut" style="height: 0.864108em; vertical-align: 0em;"></span><span class="mord">1</span><span class="mord"><span class="mord">0</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height: 0.864108em;"><span class="" style="top: -3.113em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">−</span><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span></span></span></span></span><br>
was used.</p>
<p>For the SCAD model:</p>
<p><span class="katex--display"><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mtable rowspacing="0.1600em" columnalign="center center center center center center center center" columnlines="dashed none none none none none none" columnspacing="1em" rowlines="solid none"><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>Disease</mtext></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>Epochs</mtext></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>Learning Rate</mtext></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>Alpha</mtext></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>Lambda</mtext></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>Scalar</mtext></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>Asthma</mtext></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>7238</mn></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>0.01</mn></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>1.930</mn></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>2.608</mn></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>None</mtext></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>Cancer</mtext></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>6255</mn></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>0.01</mn></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow><mi mathvariant="normal">.</mi><mn>748</mn></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>0.038</mn></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>None</mtext></mstyle></mtd></mtr></mtable><annotation encoding="application/x-tex">
\begin{array}{c:ccccccc}
\text{Disease} & \text{Epochs} & \text{Learning Rate} & \text{Alpha} & \text{Lambda} &\text{Scalar} \\ \hline
\text{Asthma} & 7238 & 0.01 & 1.930 & 2.608 & \text{None}\\
\text{Cancer} & 6255 & 0.01 & .748 & 0.038 & \text{None}\\
\end {array}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height: 3.6em; vertical-align: -1.55em;"></span><span class="mord"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 2.05em;"><span class="" style="top: -4.05em;"><span class="pstrut" style="height: 4.05em;"></span><span class="mtable"><span class="arraycolsep" style="width: 0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 2.05em;"><span class="" style="top: -4.21em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">Disease</span></span></span></span><span class="" style="top: -3.01em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">Asthma</span></span></span></span><span class="" style="top: -1.81em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">Cancer</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 1.55em;"><span class=""></span></span></span></span></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="vertical-separator" style="height: 3.6em; border-right-width: 0.04em; border-right-style: dashed; margin: 0px -0.02em; vertical-align: -1.55em;"></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 2.05em;"><span class="" style="top: -4.21em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">Epochs</span></span></span></span><span class="" style="top: -3.01em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">7238</span></span></span><span class="" style="top: -1.81em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">6255</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 1.55em;"><span class=""></span></span></span></span></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 2.05em;"><span class="" style="top: -4.21em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">Learning Rate</span></span></span></span><span class="" style="top: -3.01em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">0.01</span></span></span><span class="" style="top: -1.81em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">0.01</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 1.55em;"><span class=""></span></span></span></span></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 2.05em;"><span class="" style="top: -4.21em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">Alpha</span></span></span></span><span class="" style="top: -3.01em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">1.930</span></span></span><span class="" style="top: -1.81em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">.748</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 1.55em;"><span class=""></span></span></span></span></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 2.05em;"><span class="" style="top: -4.21em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">Lambda</span></span></span></span><span class="" style="top: -3.01em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">2.608</span></span></span><span class="" style="top: -1.81em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">0.038</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 1.55em;"><span class=""></span></span></span></span></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 2.05em;"><span class="" style="top: -4.21em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">Scalar</span></span></span></span><span class="" style="top: -3.01em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">None</span></span></span></span><span class="" style="top: -1.81em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">None</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 1.55em;"><span class=""></span></span></span></span></span><span class="arraycolsep" style="width: 0.5em;"></span></span></span><span class="" style="top: -4.9em;"><span class="pstrut" style="height: 4.05em;"></span><span class="hline" style="border-bottom-width: 0.04em;"></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 1.55em;"><span class=""></span></span></span></span></span></span></span></span></span></span></p>
<p>Selected variables: EP_AGE65, E_PM, EP_AGE17, EP_DISABL<br>
For the ElasticNet model:</p>
<p><span class="katex--display"><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mtable rowspacing="0.1600em" columnalign="center center center center center center center center" columnlines="dashed none none none none none none" columnspacing="1em" rowlines="solid none"><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>Disease</mtext></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>Epochs</mtext></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>Learning Rate</mtext></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>Alpha</mtext></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>L1 Ratio</mtext></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>Scalar</mtext></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>Asthma</mtext></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>3894</mn></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>0.01</mn></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>0.0</mn></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>0.344</mn></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>QuantileTransformer</mtext></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>Cancer</mtext></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>4749</mn></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>0.01</mn></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow><mi mathvariant="normal">.</mi><mn>373</mn></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow><mi mathvariant="normal">.</mi><mn>784</mn></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>QuantileTransformer</mtext></mstyle></mtd></mtr></mtable><annotation encoding="application/x-tex">
\begin{array}{c:ccccccc}
\text{Disease} & \text{Epochs} & \text{Learning Rate} & \text{Alpha} & \text{L1 Ratio} &\text{Scalar} \\ \hline
\text{Asthma} & 3894 & 0.01 & 0.0 & 0.344 & \text{QuantileTransformer}\\
\text{Cancer} & 4749 & 0.01 & .373 & .784 & \text{QuantileTransformer}\\
\end {array}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height: 3.6em; vertical-align: -1.55em;"></span><span class="mord"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 2.05em;"><span class="" style="top: -4.05em;"><span class="pstrut" style="height: 4.05em;"></span><span class="mtable"><span class="arraycolsep" style="width: 0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 2.05em;"><span class="" style="top: -4.21em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">Disease</span></span></span></span><span class="" style="top: -3.01em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">Asthma</span></span></span></span><span class="" style="top: -1.81em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">Cancer</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 1.55em;"><span class=""></span></span></span></span></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="vertical-separator" style="height: 3.6em; border-right-width: 0.04em; border-right-style: dashed; margin: 0px -0.02em; vertical-align: -1.55em;"></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 2.05em;"><span class="" style="top: -4.21em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">Epochs</span></span></span></span><span class="" style="top: -3.01em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">3894</span></span></span><span class="" style="top: -1.81em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">4749</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 1.55em;"><span class=""></span></span></span></span></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 2.05em;"><span class="" style="top: -4.21em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">Learning Rate</span></span></span></span><span class="" style="top: -3.01em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">0.01</span></span></span><span class="" style="top: -1.81em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">0.01</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 1.55em;"><span class=""></span></span></span></span></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 2.05em;"><span class="" style="top: -4.21em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">Alpha</span></span></span></span><span class="" style="top: -3.01em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">0.0</span></span></span><span class="" style="top: -1.81em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">.373</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 1.55em;"><span class=""></span></span></span></span></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 2.05em;"><span class="" style="top: -4.21em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">L1 Ratio</span></span></span></span><span class="" style="top: -3.01em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">0.344</span></span></span><span class="" style="top: -1.81em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">.784</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 1.55em;"><span class=""></span></span></span></span></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 2.05em;"><span class="" style="top: -4.21em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">Scalar</span></span></span></span><span class="" style="top: -3.01em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">QuantileTransformer</span></span></span></span><span class="" style="top: -1.81em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">QuantileTransformer</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 1.55em;"><span class=""></span></span></span></span></span><span class="arraycolsep" style="width: 0.5em;"></span></span></span><span class="" style="top: -4.9em;"><span class="pstrut" style="height: 4.05em;"></span><span class="hline" style="border-bottom-width: 0.04em;"></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 1.55em;"><span class=""></span></span></span></span></span></span></span></span></span></span><br>
Selected variables: EP_AGE65, EP_DISABL, E_PARK, EP_AGE17, E_TOTPOP, E_IMPWTR, EP_POV200, EP_LIMENG, E_TRI, E_RMP, EP_NOINT, EP_RENTER, EP_HOUBDN, E_OZONE, E_PM, EP_UNINSUR, EP_UNEMP, E_ROAD, M_TOTPOP, E_NPL, E_HOUAGE, E_DSLPM, E_RAIL, E_COAL<br>
For the Square-root Lasso model:</p>
<p><span class="katex--display"><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mtable rowspacing="0.1600em" columnalign="center center center center center center center center" columnlines="dashed none none none none none none" columnspacing="1em" rowlines="solid none"><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>Disease</mtext></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>Epochs</mtext></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>Learning Rate</mtext></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>Alpha</mtext></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>Scalar</mtext></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>Asthma</mtext></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>1878</mn></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>0.01</mn></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>0.0</mn></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>QuantileTransformer</mtext></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>Cancer</mtext></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>8235</mn></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>0.01</mn></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow><mi mathvariant="normal">.</mi><mn>050</mn></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>QuantileTransformer</mtext></mstyle></mtd></mtr></mtable><annotation encoding="application/x-tex">
\begin{array}{c:ccccccc}
\text{Disease} & \text{Epochs} & \text{Learning Rate} & \text{Alpha} &\text{Scalar} \\ \hline
\text{Asthma} & 1878 & 0.01 & 0.0 & \text{QuantileTransformer}\\
\text{Cancer} & 8235 & 0.01 & .050 & \text{QuantileTransformer}\\
\end {array}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height: 3.6em; vertical-align: -1.55em;"></span><span class="mord"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 2.05em;"><span class="" style="top: -4.05em;"><span class="pstrut" style="height: 4.05em;"></span><span class="mtable"><span class="arraycolsep" style="width: 0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 2.05em;"><span class="" style="top: -4.21em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">Disease</span></span></span></span><span class="" style="top: -3.01em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">Asthma</span></span></span></span><span class="" style="top: -1.81em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">Cancer</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 1.55em;"><span class=""></span></span></span></span></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="vertical-separator" style="height: 3.6em; border-right-width: 0.04em; border-right-style: dashed; margin: 0px -0.02em; vertical-align: -1.55em;"></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 2.05em;"><span class="" style="top: -4.21em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">Epochs</span></span></span></span><span class="" style="top: -3.01em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">1878</span></span></span><span class="" style="top: -1.81em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">8235</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 1.55em;"><span class=""></span></span></span></span></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 2.05em;"><span class="" style="top: -4.21em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">Learning Rate</span></span></span></span><span class="" style="top: -3.01em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">0.01</span></span></span><span class="" style="top: -1.81em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">0.01</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 1.55em;"><span class=""></span></span></span></span></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 2.05em;"><span class="" style="top: -4.21em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">Alpha</span></span></span></span><span class="" style="top: -3.01em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">0.0</span></span></span><span class="" style="top: -1.81em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">.050</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 1.55em;"><span class=""></span></span></span></span></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 2.05em;"><span class="" style="top: -4.21em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">Scalar</span></span></span></span><span class="" style="top: -3.01em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">QuantileTransformer</span></span></span></span><span class="" style="top: -1.81em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">QuantileTransformer</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 1.55em;"><span class=""></span></span></span></span></span><span class="arraycolsep" style="width: 0.5em;"></span></span></span><span class="" style="top: -4.9em;"><span class="pstrut" style="height: 4.05em;"></span><span class="hline" style="border-bottom-width: 0.04em;"></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 1.55em;"><span class=""></span></span></span></span></span></span></span></span></span></span></p>
<p>Selected variables: EP_AGE65, EP_AGE17, EP_HOUAGE, EP_DISABL, EP_UNINSUR, EP_NOINT, E_ROAD, EP_HOUBDN, EP_GROUPQ, EP_LIMENG, EP_RENTER, EP_NOHSDP, EP_POV200, EP_MINRTY</p>
<h3 id="model-validation--selection">Model Validation & Selection</h3>
<figure>
</figure> <div align="center">
<img src="https://i.imgur.com/774d0eg.png" width="400" height="400">
<figcaption>Fig 2.2) An overview of the model validation and selection process</figcaption>
</div>
<p>Following the creation of three different model types for both diseases, to evaluate the performance of the models, a KFold Cross Validation where k=3 was used to evaluate them. The model with the lowest average MSE was chosen as the final model for the target value it predicted.</p>
<p>Model performance for Cancer:</p>
<p><span class="katex--display"><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mtable rowspacing="0.1600em" columnalign="center center center center center center center center" columnlines="dashed none none none none none none" columnspacing="1em" rowlines="solid none none"><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>Model</mtext></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>Mean MSE</mtext></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>SCAD</mtext></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow><mi mathvariant="normal">.</mi><mn>807</mn></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>ElasticNet</mtext></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>2.413</mn></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>SqrtLasso</mtext></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>1.630</mn></mstyle></mtd></mtr></mtable><annotation encoding="application/x-tex">
\begin{array}{c:ccccccc}
\text{Model} & \text{Mean MSE} \\ \hline
\text{SCAD} & .807 \\
\text{ElasticNet} & 2.413 \\
\text{SqrtLasso} & 1.630 \\
\end {array}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height: 4.8em; vertical-align: -2.15em;"></span><span class="mord"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 2.65em;"><span class="" style="top: -4.65em;"><span class="pstrut" style="height: 4.65em;"></span><span class="mtable"><span class="arraycolsep" style="width: 0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 2.65em;"><span class="" style="top: -4.81em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">Model</span></span></span></span><span class="" style="top: -3.61em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">SCAD</span></span></span></span><span class="" style="top: -2.41em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">ElasticNet</span></span></span></span><span class="" style="top: -1.21em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">SqrtLasso</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 2.15em;"><span class=""></span></span></span></span></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="vertical-separator" style="height: 4.8em; border-right-width: 0.04em; border-right-style: dashed; margin: 0px -0.02em; vertical-align: -2.15em;"></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 2.65em;"><span class="" style="top: -4.81em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">Mean MSE</span></span></span></span><span class="" style="top: -3.61em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">.807</span></span></span><span class="" style="top: -2.41em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">2.413</span></span></span><span class="" style="top: -1.21em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">1.630</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 2.15em;"><span class=""></span></span></span></span></span><span class="arraycolsep" style="width: 0.5em;"></span></span></span><span class="" style="top: -6.1em;"><span class="pstrut" style="height: 4.65em;"></span><span class="hline" style="border-bottom-width: 0.04em;"></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 2.15em;"><span class=""></span></span></span></span></span></span></span></span></span></span></p>
<p>Model performance for Asthma:</p>
<p><span class="katex--display"><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mtable rowspacing="0.1600em" columnalign="center center center center center center center center" columnlines="dashed none none none none none none" columnspacing="1em" rowlines="solid none none"><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>Model</mtext></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>Mean MSE</mtext></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>SCAD</mtext></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>5.489</mn></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>ElasticNet</mtext></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>3.367</mn></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>SqrtLasso</mtext></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>3.350</mn></mstyle></mtd></mtr></mtable><annotation encoding="application/x-tex">
\begin{array}{c:ccccccc}
\text{Model} & \text{Mean MSE} \\ \hline
\text{SCAD} & 5.489 \\
\text{ElasticNet} & 3.367 \\
\text{SqrtLasso} & 3.350 \\
\end {array}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height: 4.8em; vertical-align: -2.15em;"></span><span class="mord"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 2.65em;"><span class="" style="top: -4.65em;"><span class="pstrut" style="height: 4.65em;"></span><span class="mtable"><span class="arraycolsep" style="width: 0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 2.65em;"><span class="" style="top: -4.81em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">Model</span></span></span></span><span class="" style="top: -3.61em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">SCAD</span></span></span></span><span class="" style="top: -2.41em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">ElasticNet</span></span></span></span><span class="" style="top: -1.21em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">SqrtLasso</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 2.15em;"><span class=""></span></span></span></span></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="vertical-separator" style="height: 4.8em; border-right-width: 0.04em; border-right-style: dashed; margin: 0px -0.02em; vertical-align: -2.15em;"></span><span class="arraycolsep" style="width: 0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 2.65em;"><span class="" style="top: -4.81em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord text"><span class="mord">Mean MSE</span></span></span></span><span class="" style="top: -3.61em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">5.489</span></span></span><span class="" style="top: -2.41em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">3.367</span></span></span><span class="" style="top: -1.21em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">3.350</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 2.15em;"><span class=""></span></span></span></span></span><span class="arraycolsep" style="width: 0.5em;"></span></span></span><span class="" style="top: -6.1em;"><span class="pstrut" style="height: 4.65em;"></span><span class="hline" style="border-bottom-width: 0.04em;"></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 2.15em;"><span class=""></span></span></span></span></span></span></span></span></span></span></p>
<p>Ultimately, the best models for predicting Cancer and Asthma were SCAD and ElasticNet.</p>
<h3 id="model-evaluation">Model Evaluation</h3>
<p>To evaluate the model, I performed a permutation test inspired by [14] and [15] where I randomly shuffled both the features and targets to simulate a null hypothesis:</p>
<p><span class="katex--display"><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><msub><mi>H</mi><mn>0</mn></msub><mo>:</mo><mspace linebreak="newline"></mspace><mtext>The features and targets are independent and the model</mtext><mspace linebreak="newline"></mspace><mtext>has not found a significant class structure between the features and targets.</mtext></mrow><annotation encoding="application/x-tex"> H_{0} :
\newline
\text{The features and targets are
independent and the model} \newline \text{has not found a significant class structure between the features and targets.} </annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height: 0.83333em; vertical-align: -0.15em;"></span><span class="mord"><span style="margin-right: 0.08125em;" class="mord mathnormal">H</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.301108em;"><span class="" style="top: -2.55em; margin-left: -0.08125em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">0</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.15em;"><span class=""></span></span></span></span></span></span><span class="mspace" style="margin-right: 0.277778em;"></span><span class="mrel">:</span></span><span class="mspace newline"></span><span class="base"><span class="strut" style="height: 0.88888em; vertical-align: -0.19444em;"></span><span class="mord text"><span class="mord">The features and targets are independent and the model</span></span></span><span class="mspace newline"></span><span class="base"><span class="strut" style="height: 0.88888em; vertical-align: -0.19444em;"></span><span class="mord text"><span class="mord">has not found a significant class structure between the features and targets.</span></span></span></span></span></span></span></p>
<p>This is in contrast to the alternative hypothesis:<br>
<span class="katex--display"><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><msub><mi>H</mi><mi>A</mi></msub><mo>:</mo><mspace linebreak="newline"></mspace><mtext>The features and targets are not independent and the model</mtext><mspace linebreak="newline"></mspace><mtext>has found a significant class structure between the features and targets.</mtext></mrow><annotation encoding="application/x-tex"> H_{A} :
\newline
\text{The features and targets are not
independent and the model} \newline \text{has found a significant class structure between the features and targets.} </annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height: 0.83333em; vertical-align: -0.15em;"></span><span class="mord"><span style="margin-right: 0.08125em;" class="mord mathnormal">H</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.328331em;"><span class="" style="top: -2.55em; margin-left: -0.08125em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">A</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height: 0.15em;"><span class=""></span></span></span></span></span></span><span class="mspace" style="margin-right: 0.277778em;"></span><span class="mrel">:</span></span><span class="mspace newline"></span><span class="base"><span class="strut" style="height: 0.88888em; vertical-align: -0.19444em;"></span><span class="mord text"><span class="mord">The features and targets are not independent and the model</span></span></span><span class="mspace newline"></span><span class="base"><span class="strut" style="height: 0.88888em; vertical-align: -0.19444em;"></span><span class="mord text"><span class="mord">has found a significant class structure between the features and targets.</span></span></span></span></span></span></span></p>
<p>In the test, I used a cross-validation KFold where k=3 and 120 permutations. In each iteration, the data was shuffled, split into folds, and the mean MSE for the folds resulting from the permutation.</p>
<p>For both the SCAD Cancer model and the ElasticNet Asthma model, none of the MSEs from the model permutations were able to outperform the means of the original models. While a larger permutation size is likely needed to confirm the results, this suggests that the null hypothesis is false and that the model has found a significant class structure between the features and targets.</p>
<h2 id="discussion-and-inferences">Discussion and Inferences</h2>
<figure>
<img src="https://i.imgur.com/uweYv1a.png">
<div align="center">
<figcaption>Fig 3.1) A geographic visualization of the distribution of Cancer in Virginian Census Tracts</figcaption>
</div>
</figure>
<figure>
<img src="https://i.imgur.com/FAdRCvY.png">
<div align="center">
<figcaption>Fig 3.2) A geographic visualization of the distribution of Watershed Impairment in Virginian Census Tracts</figcaption>
</div>
</figure>
<figure>
<img src="https://i.imgur.com/ehl2dQp.png">
<div align="center">
<figcaption>Fig 3.3) An geographic visualization of the Pre-1980 Homes in Virginian Census Tracts</figcaption>
</div>
</figure>
<p>Overall, the utilization of penalized regression models to predict disease frequency on a census-tract level yielded results with a low MSE and insights into the most closely associated parameters with specific disease types.</p>
<p>For both Cancer and Asthma, age statistics were found to be important associated values in determining disease frequency in the final models. Additionally, both disease types demonstrated a connection with variables related to air quality, specifically Cancer in regards to the annual mean days above the PM2.5 regulatory standard and Asthma about proximity to busy roads. This suggests a potential commonality between predispositions to developing these chronic diseases in regard to air pollution.</p>
<p>The above figures 3.1, 3.2, and 3.3 show the frequency of cancer in census tracts about two variables that were selected for at least one model, the percentage of a tract that intersects an impaired watershed and the percentage of homes within the tract that were built before 1980. Both variable graphs share similarities with the frequencies of Cancer, specifically along the northeastern corner and the center-west area. While for home distribution, this may be related to overall population density, but for impaired watershed distribution this may hint at a deeper environmental relation between this factor and chronic disease disposition.</p>
<p>Some of the variables obtained in variable selection fit pre-held notions about potential factors of developing chronic disease, such as proximity to busy roads being related to developing Asthma. However, some unexpected variables were also included, such as the percentage without internet in relation to Asthma or the percentage living in group quarters in relation to Cancer. While a clear connection between these variables and their corresponding targets are not immediately clear, more research may be warranted to uncover potential links. One concern is the chance of undetected lurking variables not accounted for in the data playing a role in these results.</p>
<h2 id="references">References</h2>
<p>[1] CDC, “Environmental Justice Index (EJI),” Centers for Disease Control and Prevention, 2022. <a href="https://www.atsdr.cdc.gov/placeandhealth/eji/index.html">https://www.atsdr.cdc.gov/placeandhealth/eji/index.html</a><br>
[2] J. Kleiman, “Love Canal: A Brief History | SUNY Geneseo,” <a href="http://Geneseo.edu">Geneseo.edu</a>, 2000. <a href="https://www.geneseo.edu/history/love_canal_history">https://www.geneseo.edu/history/love_canal_history</a><br>
[3] J. Fan and R. Li, “Variable selection via nonconcave penalized likelihood and its oracle properties”, Journal of the American Statistical Association, vol. 96, no. 456, p. 1348-1360, 2001. <a href="https://doi.org/10.1198/016214501753382273">https://doi.org/10.1198/016214501753382273</a><br>
[4] H. Zou and T. Hastie, “Regularization and variable selection via the elastic net”, Journal of the Royal Statistical Society Series B: Statistical Methodology, vol. 67, no. 2, p. 301-320, 2005. <a href="https://doi.org/10.1111/j.1467-9868.2005.00503.x">https://doi.org/10.1111/j.1467-9868.2005.00503.x</a><br>
[5] A. Belloni, V. Chernozhukov, & L. Wang, “Square-root lasso: pivotal recovery of sparse signals via conic programming”, Biometrika, vol. 98, no. 4, p. 791-806, 2011. <a href="https://doi.org/10.1093/biomet/asr043">https://doi.org/10.1093/biomet/asr043</a><br>
[6] G. Yu, L. Yin, S. Lu, & Y. Liu, “Confidence intervals for sparse penalized regression with random designs”, Journal of the American Statistical Association, vol. 115, no. 530, p. 794-809, 2019. <a href="https://doi.org/10.1080/01621459.2019.1585251">https://doi.org/10.1080/01621459.2019.1585251</a><br>
[7] D. Vasilu. (2024) “Variable Selection and Regularization Introduction”. [Online]. Available: <a href="https://github.com/dvasiliu/AAML/blob/main/Module%203%20-%20Varaible%20Selection%20via%20Regularization/Variable_Selection_and_Regularization_Introduction.ipynb">https://github.com/dvasiliu/AAML/blob/main/Module 3 - Varaible Selection via Regularization/Variable_Selection_and_Regularization_Introduction.ipynb</a>.<br>
[8] Scikit-Learn. “6.3. Preprocessing data” . [Online] Available: <a href="https://scikit-learn.org/stable/modules/preprocessing.html#preprocessing-scaler">https://scikit-learn.org/stable/modules/preprocessing.html#preprocessing-scaler</a><br>
[9] Scikit-Learn. “sklearn.preprocessing.MinMaxScaler — scikit-learn 0.24.1 documentation". [Online] Available: <a href="https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html#sklearn.preprocessing.MinMaxScaler">https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html#sklearn.preprocessing.MinMaxScaler</a><br>
[10] Scikit-Learn. “sklearn.preprocessing.QuantileTransformer — scikit-learn 0.24.1 documentation". <a href="https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.QuantileTransformer.html">https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.QuantileTransformer.html</a><br>
[11] T. Hastie, R. Tibshirani, & M. Wainwright. “Statistical Learning with Sparsity The Lasso and Generalizations”. [Online] Available: <a href="https://hastie.su.domains/StatLearnSparsity/">https://hastie.su.domains/StatLearnSparsity/</a><br>
[12] J. Kennedy and R. Eberhart, “Particle swarm optimization,” <em>Proceedings of ICNN’95 - International Conference on Neural Networks</em>, vol. 4, pp. 1942–1948, 1995, doi: <a href="https://doi.org/10.1109/icnn.1995.488968">https://doi.org/10.1109/icnn.1995.488968</a>.<br>
[13] D. Vasilu. (2024) “Grid Search Algorithms”. [Online]. Available: <a href="https://github.com/dvasiliu/AAML/blob/main/Module%206%20-%20Grid%20Search%20Algorithms%20for%20Optimization/Grid_Search_Algorithms.ipynb">https://github.com/dvasiliu/AAML/blob/main/Module 6 - Grid Search Algorithms for Optimization/Grid_Search_Algorithms.ipynb</a>.<br>
[14] M. Ojala and G. C. Garriga, “Permutation Tests for Studying Classifier Performance,” <em>Journal of Machine Learning Research</em>, vol. 11, no. 62, pp. 1833–1863, Mar. 2010, doi: <a href="https://doi.org/10.5555/1756006.1859913">https://doi.org/10.5555/1756006.1859913</a>.<br>
[15] M. Anderson and J. Robinson, “Permutation tests for linear models”, Australian & New Zealand Journal of Statistics, vol. 43, no. 1, p. 75-88, 2001. <a href="https://doi.org/10.1111/1467-842x.00156">https://doi.org/10.1111/1467-842x.00156</a><br>
[16] CDC. “Data Dictionary for the Environmental Justice Index 2022”. [Online]. Available at: <a href="https://eji.cdc.gov/Documents/Data/2022/EJI_2022_Data_Dictionary_508.pdf">https://eji.cdc.gov/Documents/Data/2022/EJI_2022_Data_Dictionary_508.pdf</a></p>
</div>
</body>
</html>