-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathcensor.html
More file actions
200 lines (200 loc) · 9.32 KB
/
censor.html
File metadata and controls
200 lines (200 loc) · 9.32 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<!--AstroStatistics Home page-->
<title>Censoring: Planet host stars dataset</title>
<!-- InstanceEndEditable -->
<meta http-equiv="Content-Type" content="text/html;">
<meta name="keywords"
content="Center for Astrostatistics, CASt,
Statcodes, SCMA, VoStat">
<link href="assets/cast.css" rel="stylesheet" type="text/css">
</head>
<body style="background-color: rgb(255, 248, 229);" leftmargin="0"
topmargin="0">
<!--body style="background-color: #FFF8E5"-->
<p><img src="assets/HmPg_Ban.gif" alt="Astrostatistics Image" usemap="#Map"
href="http://www.stat.psu.edu" title=""
style="border: 0px solid ; width: 700px; height: 107px;" hspace="0"
vspace="0"> <map name="Map">
<area shape="rect" coords="3,12,126,98" href="http://www.psu.edu"
target="_blank" alt="Penn State University">
<area shape="rect" coords="360,0,508,17"
href="http://www.science.psu.edu" alt="Eberly College of Science">
<area shape="rect" coords="508,0,700,98"
href="http://astrostatistics.psu.edu" alt="Center for Astrostatistics">
<area shape="rect" coords="127,18,508,98"
href="http://astrostatistics.psu.edu" alt="Center for Astrostatistics">
</map>
</p>
<!-- InstanceBeginEditable name="head" -->
<p> <!--center--><a href="index.html">Home</a></p>
<p><img src="http://astrostatistics.psu.edu/rainbow.gif" border="0"
height="4" vspace="2" width="700"><!-- InstanceBeginEditable name="PageTitle" -->
<!-- InstanceEndEditable --><!-- InstanceBeginEditable name="body" --></p>
<table border="0" cellspacing="0" width="700">
<tbody>
<tr>
<td style="vertical-align: top;"> </td>
<td>
<h2 style="text-align: center;">Censoring: Planet host stars<br>
</h2>
<h3>The CASt dataset</h3>
<a
href="censor.dat"
target="_blank">censor.dat</a><br>
<br>
<p><span style="font-weight: bold;">Astronomical background
</span></p>
<p>The following is a common situation in observational
astronomy. A previously identified sample of objects (stars,
galaxies, quasars, X-ray sources, etc.) are observed at some new
wavelength or for some new property. Some of the target objects
are detected and the value of the new property is measured (with a
known measurement error), while others are not detected. These
are assigned an upper limit to the value of the property based on the
uncertainty of the unsuccessful measurement. The result is
new column in a multivariate database where the rows represent the
objects and the columns represent values of various properties.
The new column has measured values with errors, and upper limits.
Statisticians call these "left-censored" data points. Many
astronomical studies encounter such problems, particularly in
extragalactic astronomy.
</p>
<p>A large suite of statistical methods have been developed to
treat right-censored datasets because these frequently arise in
"survival" studies; that is, examination of how long a population
"lives" under various situations when the experiment is stopped before
all members of the population have "died". This situation arises
in actuarial (where the objects are ordinary people), industrial
reliability (where the objects are often manufactured products), and
biomedical studies (where the objects are usually ill people or test
animal samples). During the 1980-90s, survival analysis methods
were adapted for use in astronomical surveys with nondetections.
The <a href="http://astrostatistics.psu.edu/statcodes/sc_censor.html">ASURV</a>
code, used in several hundred astronomical studies to date, implements
a number of survival methods: the Kaplan-Meier univariate maximum
likelihood estimator; Gehan and other two-sample tests; generalized
rank correlation coefficients for bivariate problems; and bivariate
linear regressions.
</p>
<p>Astronomical censoring problems often differ from those
encountered in ordinary survival applications is various ways:
censoring is not restricted to a single dependent variable but can
occur anywhere in the multivariate dataset; a point can be
simultaneously censored in several properties; distance-dependent
censoring produce non-random censoring patterns; detected points have
heteroscedastic measurement errors; the censored values are imprecise
because they are based on the measurement errors. Despite these
problems, survival methods are often used because they overcome much of
the bias due to nondetections.
</p>
<p><span style="font-weight: bold;">Dataset</span></p>
<p>Here we present a censored dataset from stellar astronomy
where the authors seek differences in the properties of stars that do
and do not host extrasolar planetary systems. It had already been
established that the probability of finding a planet is a steeply
rising function of the star's metal content, but it was unclear whether
this arises from the metallicity at birth or from later accretion of
planetary bodies. This study focuses on the abundances of the
light elements beryllium (Be) and lithium (Li) that are thought to be
depleted by internal stellar burning, so that excess Be and Li should
be present only in the planet accretion scenario of metal
enrichment. </p>
<p>The dataset and figures below are obtained
from the following paper:
<ul> Are beryllium abundances anomalous in stars with
giant planets? N. C. Santos, G. Israelian, R. J. García
López, M. Mayor, R. Rebolo, S. Randich, A. Ecuvillon, and C. Domínguez
Cerdeña; Astronomy & Astrophysics, 437, 1086-1096 (2004)</ul>
</p>
<p> The columns of the dataset are:
</p>
<ol>
<li>Star name</li>
<li>Sample. Type=1 indicates planet-hosting stars.
Type=2 is the control sample</li>
<li>T<sub>eff</sub> (in degrees Kelvin) stellar surface
temperature</li>
<li>log N(Be), log of the abundance of beryllium scaled to the
Sun's abundance (i.e. the Sun has log N(Be)=0.0).</li>
<li>Measurement error to log N(Be) based on model-fitting of
the observed stellar spectrum</li>
<li>log N(Li), log of the abundance of lithium scaled to the
Sun's abundance<br>
</li>
</ol>
<p>The dataset consists of 39 stars known to host planets
(plotted as
filled circles) and 29 stars in a control sample (open circles).
Due to internal stellar processes, Be abundances are correlated with
stellar mass which is traced by stellar surface temperature ("effective
temperature" or T<sub>eff</sub>). Regression lines of the
detections only
(top panel) show a slight elevation in Be abundance for planet hosting
stars, but this difference evaporates when a Buckley-James regression
line is considered that includes the effects of censoring (bottom
panel). </p>
<p style="text-align: center;"><img alt="Berrylium vs. Teff"
src="Santos_Fig6.gif" style="width: 561px; height: 547px;">
</p>
<p>The scatter plot below shows that Be and Li abundances are
interdependent in a complicated fashion, but little difference is seen
between the planet-hosting and control samples.
</p>
<p style="text-align: center;"><img alt="Beryllium vs. lithium"
src="Santos_Fig9.gif" style="width: 540px; height: 526px;"><br>
</p>
<p><span style="font-weight: bold;">Statistical exercises
</span></p>
<ul>
<li>Use standard univariate survival analysis methods to
onstruct Kaplan-Meier distributions of Be and Li abundances for the
planet-hosting and control samples. Find means and medians, and
apply two-sample tests for differences.
</li>
<li>Perform the same but with heteroscedastic weighting for the
Be abundances.
</li>
<li>Apply bivariate correlation tests and linear regressions to
the Be-vs-Li plots shown above. Note that ASURV implements the
Brown, Hollander & Korwar generalized Kendall's tau that permits
censoring in both variables.
</li>
<li>Extend bivariate survival methods to the multivariate
case. See the Akritas & Siebert trivariate <a
href="http://astrostatistics.psu.edu/statcodes/sc_censor.html">partial
correlation coefficient</a> for multiply censored data based on
Kendall's tau.
</li>
</ul>
</div>
</td>
</tr>
</tbody>
</table>
<!-- InstanceEndEditable --><!-- InstanceEnd --><br>
<img src="http://astrostatistics.psu.edu/rainbow.gif" border="0"
height="4" vspace="2" width="700"><br>
<table style="width: 700px; text-align: left;" border="0"
cellpadding="2" cellspacing="2">
<tbody>
<tr>
<td style="text-align: center; vertical-align: top;"><a
href="http://www.nsf.gov/"><img src="assets/nsflogo.gif" alt="NSF"
style="border: 0px solid ; width: 65px; height: 65px;" align="middle"></a><a
href="http://www.stat.psu.edu/"><img src="assets/statistics.jpg"
alt="Department of Statistics"
style="border: 0px solid ; width: 68px; height: 58px;" align="middle"></a><a
href="http://www.science.psu.edu/"><img alt="Eberly College of Science"
src="assets/eberlycol.gif"
style="border: 0px solid ; height: 58px; width: 363px;" align="middle"></a><a
href="http://www.astro.psu.edu/"><img src="assets/astronomy.jpg"
alt="Department of Astronomy and Astrophysics"
style="border: 0px solid ; width: 68px; height: 58px;" align="middle"></a></td>
</tr>
</tbody>
</table>
<div style="text-align: left;"> </div>
</body>
</html>