drafting-doc/model_description.Rnw at master · a4a/drafting-doc · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
\documentclass[a4paper,english]{article}
\usepackage{a4a}

\title{Description of Catch at Age model}
\author{Colin P. Millar, Ernesto Jardim}
\date{}


\begin{document}

\section*{Coverpage}

% set up R environment
<<knitr_opts2, echo=FALSE, message=FALSE, warning=FALSE>>=
library(knitr)
library(formatR)
library(RColorBrewer)
library(lattice)
library(xtable)
thm = knit_theme$get("bclear") #moe, bclear
knit_theme$set(thm)
opts_chunk$set(dev='png', cache=TRUE, fig.align='center',
               warning=FALSE, message=FALSE, dev.args=list(type="cairo"),
               dpi=96, highlight=TRUE, background='#F2F2F2',
               fig.lp="fig:", fig.pos="H", width=70, tidy=TRUE,
               out.width='.9\\linewidth')
@

<<echo=FALSE, cache=FALSE>>=
cat("Document build date:", date(), "\n")
cat("Working directory :\n", "    ", getwd(), "\n")
cat("Current contents of .GlobalEnv:\n\n")
cat("    ", if (length(ls(all = TRUE)) == 0) "<empty>" else ls(all = TRUE))
cat("\nSession information:\n\n")
sessionInfo()
@


\maketitle

\begin{abstract}
\noindent This document presents a decription of the statistical catch-at-age stock assessment
model developed in the JRC Assessment For All (\aFa) initiative. The stock assessment
model framework is a non-linear catch-at-age model implemented in R \url{http://www.r-project.org/},
FLR \url{http://www.flr-project.org/} and ADMB \url{http://www.admb-project.org/} that
can be applied rapidly to a wide range of situations with low parametrization requirements.
The model structure is defined by submodels, which are the different parts that require
structural assumptions. There are 5 submodels in operation: a model for F-at-age, a model
for the initial age structure, a model for recruitment, a (list) of model(s) for abundance
indices catchability-at-age, and a list of models for the observation variance of catch-at-age
and abundance indices. The submodels are formulated through linear models. This opens the
possibility of using the linear modelling tools available in R: see for example the mgcv
\url{http://cran.r-project.org/web/packages/mgcv/index.html} gam formulas and useful overview of R
formulas here \url{http://science.nature.nps.gov/im/datamgmt/statistics/r/formulas/}.
\end{abstract}

\tableofcontents


% ---------------------------------------------------------------------------------------------------
%
%  Section
%
% ---------------------------------------------------------------------------------------------------

\section{Background}

The \aFa stock assessment model is a non-linear catch-at-age model implemented in
R / FLR / ADMB that can be applied rapidly to a wide range of situations with low parametrization
requirements.

In the \aFa assessment model, the model structure is defined by submodels, which are
the different parts of a statistical catch at age model that require structural assumptions. There
are 5 submodels in operation:
\begin{itemize}
	\item a model for F-at-age,
	\item a (list) of model(s) for abundance index catchability-at-age,
	\item a model for recruitment,
	\item a list of models for the observation variance of catch-at-age and abundance indices,
	\item a model for the initial age structure,
\end{itemize}

In practice, we fix the variance models and the initial age structure models, but
in theory these can be changed.

The submodels are formulated through the use of linear models. This opens the possibility of using
the linear modelling tools available in R: see for example the
\href{http://cran.r-project.org/web/packages/mgcv/index.html}{mgcv} gam formulas, or
factorial design formulas using \code{lm()}, a usefull overview of model formulas in R can be
found here \url{http://science.nature.nps.gov/im/datamgmt/statistics/r/formulas/}. In R's linear
modelling language, a constant model is coded as $\sim 1$, while a slope over age would simply
be $\sim age$. For example, we can write a traditional year/age separable F model like
$\sim factor(age) + factor(year)$.

The 'language' of linear models has been developing within the statistical community for
many years, and constitutes an elegant way of defining models without going through the
complexity of mathematical representations. This approach helps to improve
communication among scientists
  \begin{itemize}
  \item \href{http://rspa.royalsocietypublishing.org/content/283/1393/147.short}{1965 J. A. Nelder}, notation for randomized block design
  \item \href{http://www.jstor.org/stable/info/2346786}{1973 Wilkinson and Rodgers}, symbolic description for factorial designs
  \item \href{http://books.google.com/books?isbn=0412343908}{1990 Hastie and Tibshirani}, introduced notation for smoothers
  \item \href{http://books.google.com/books?isbn=041283040X}{1991 Chambers and Hastie}, further developed for use in S
  \end{itemize}


% ---------------------------------------------------------------------------------------------------
%
%  Section
%
% ---------------------------------------------------------------------------------------------------

\section{The basic approach}

\newcommand{\obs}{\text{(obs)}}

The data required to fit the \aFa stock assessment model are observations of catch $C^\obs_{ay}$ for
age $a = a_0, a_0 + 1, \ldots$ and year $y = y_0, y_0 + 1, \ldots$, and observations of abundance
indices $I^\obs_{ays}$ for age $a$ and year $y$ from the $s$th survey or CPUE series,
$s = 1, 2, \ldots$

The model is an age structure model where the number of fish at age $a$ at the start of year $y$
is $N_{ay}$ are assumed to die through the year at a constant rate given by $e^{-Z_{ay}}$, where
$Z_{ay}$ is always positive, and that this rate is solely due to natural causes $M_{ay}$ and
fishing $F_{ay}$. At the start of the following year $y+1$ the number of fish $N_{a+1,y+1}$ is
the number of fish, 1 year older, that survived the perils of year $y$. This results in an
expression that describes the simplest type of population dynamics in the \aFa model

\begin{align}
  \label{eq:pop_decay}
  N_{a+1,y+1} = N_{ay} e^{-Z_{ay}}
\end{align}

And to initialise this population, it requires a vector of numbers of recruiting fish
$R_y = N_{a_0y}$, $y = y_0, y_0 + 1, \ldots$ and a vector of the age structure $N_{ay_0}$,
$a = a_0 + 1, a_0 + 2, \ldots$ in year $y_0$.  These two vectors, along with an estimate of mortality
$Z_{ay}$ are what is required to generate a model of the fish population, and are refered to as
recruitment and initial-age-structure.

Unfortuntely, recruitment and the initial population structure are not directly observed, but the
modelled numbers at age can be used to generate predictions that are directly observed in the form
of survey indices and commercial catches. Abundance indices are (in most cases) observations of the
relative abundance because they do not detect every fish but rather a fixed proportion $Q_{ays}$
that can depend on age and year, and be survey specific.  This model of how the survey index
relates to the numbers of fish in the population, allows a prediction of the survey indices
$I_{ays}$ to be made, which can be compared to the observed survey indices $I^\obs_{ays}$.

\begin{align}
  \label{eq:survey_index}
  I_{ays} &= Q_{ays} N_{ay}
\end{align}

As mentioned, it is also nessisary to estimate the mortality rate in the population. Usually, only
mortality due to fishing is observed, and so natural mortality $M_{ay}$ is assumed to be known, and
is set to a sensible value, guided by expert judgement. Fishing mortality $F_{ay}$, on the other hand
is observed via the numbers removed through fishing. Because $F_{ay}$ and $M_{ay}$ are constant through
the year, catches arise as a fraction of those fish that died $N_{ay} - N_{a+1,y+1}$, and is
written here as the familiar Baranov catch equation. Note the second line arises by substituting
the population equation (\ref{eq:pop_decay}) for $N_{a+1,y+1}$

\begin{align}
  \label{eq:catches}
  C_{ay} &= \frac{F_{ay}}{Z_{ay}}\left(N_{ay} - N_{a+1,y+1} \right) \nonumber \\
         &= \frac{F_{ay}}{Z_{ay}}\left(1 - e^{- Z_{ay}}\right) N_{ay}
\end{align}

This equation, like (\ref{eq:survey_index}) gives predictions based on the population model
and provides a value of catches which can be compared to the observed catches $C^\obs_{ay}$.  Taken
together, the comparison of the observations of catches and survey index to their predictions is the
basic approach for estimating the numbers of fish in the population and the fishing mortality
rate in \aFa.


% ---------------------------------------------------------------------------------------------------
%
%  Section
%
% ---------------------------------------------------------------------------------------------------

\section{\aFa Model details}

Modelled catches $C$ are defined in terms of the three quantities, natural mortality
$M$, fishing mortality $F$ and recruitment $R$, using a modified form of the well
known Baranov catch equation:

\begin{equation}
  C_{ay} = \frac{\bm{F}_{ay}}{\bm{F}_{ay}+M_{ay}}\left(1 - e^{-(\bm{F}_{ay}+M_{ay})}\right) \bm{R}_{y}e^{-\sum (\bm{F}_{ay} + M_{ay})}
\end{equation}

where $a$ and $y$ denote age and year. Modelled survey indices $I$ are defined in
terms of the same three quantities with the addition of survey catchability $Q$:

\begin{equation}
  I_{ays} = \bm{Q}_{ays} \bm{R}_{y}e^{-\sum (\bm{F}_{ay} + M_{ay})}
\end{equation}

where $s$ denotes survey or abundance index and allows for multiple surveys to be
considered. Observed catches $C^{(obs)}$ and the observed survey indices $I^{(obs)}$
are assumed to be log-normally distributed, or equivalently, normally distributed on
the log-scale, with age, year and survey specific observation variance:

\begin{equation}
  \log C^{(obs)}_{ay} \sim \text{Normal} \Big( \log C_{ay}, \bm{\sigma}^2_{ay}\Big) \qquad
  \log I^{(obs)}_{ays} \sim \text{Normal} \Big( \log I_{ays}, \bm{\tau}^2_{ays} \Big)
\end{equation}

The full log-likelihood for the \aFa statistical catch at age model can now be defined
as the sum of the log-likelihood of the observed catches ($\ell_N$ is the log-likelihood
of a normal distribution)

\begin{equation}
  \ell_C = \sum_{ay} w^{(c)}_{ay}\ \ell_N \Big( \log C_{ay}, \bm{\sigma}^2_{ay} ;\ \log C^{(obs)}_{ay} \Big)
\end{equation}

and the log-likelihood of the observed survey indices

\begin{equation}
  \ell_I = \sum_s \sum_{ay} w^{(s)}_{ays}\ \ell_N \Big( \log I_{ays}, \bm{\tau}_{ays}^2 ;\ \log I^{(obs)}_{ays} \Big)
\end{equation}

giving the total log-likelihood

\begin{equation}
  \label{eq:ll_full1}
  \ell = \ell_C + \ell_I
\end{equation}

which is defined in terms of the strictly positive quantites, $M_{ay}$, $F_{ay}$,
$Q_{ays}$ and $R_{y}$, and the observation variances $\sigma_{ay}$ and $\tau_{ays}$.
As such, the log-likelihood is over-parameterised as there are many more parameters
than observations.  In order to reduce the number of parameters, $M_{ay}$ is assumed
known (as is common), and the remaining parameters are written in terms of a linear
combination of covariates $x_{ayk}$, e.g.

\begin{equation}
  \log F_{ay} = \sum_k \beta_k x_{ayk}
\end{equation}

where $k$ is the number of parameters to be estimated and is sufficiently small. Using
this tecnique the quantities $\log F$, $\log Q$, $\log \sigma$ and $\log \tau$
%$\log \text{initial\,age\,structure}$ % this is not present in the above
(in bold in the equations above) can be described by a reduced number of parameters.
The following section has more discussion on the use of linear models in \aFa.


%  subsubsection  -----------------------------------

\subsubsection*{Stock recruitment relationships}

The \aFa statistical catch at age model can addiionally allow for a functional relationship
to be imposed that links predicted recruitment $\tilde{R}$ based on spawning stock biomass
and modelled recruitment $R$, included as a fixed variance random effect.  Options for the
relationship are the hard coded models Ricker, Beverton Holt, smooth hockeystick or
geometric mean. This is implemented by including a third component in the log-likelihood

\begin{equation}
  \ell_{SR} = \sum_y \ell_N \Big( \log \tilde{R}_y(a, b), \phi_y^2 ;\ \log R_y \Big)
\end{equation}

giving the total log-likelihood

\begin{equation}
  \label{eq:ll_full2}
  \ell = \ell_C + \ell_I + \ell_{SR}
\end{equation}

Using the (time varying) Ricker model as an example, predicted recruitment is

\begin{equation}
  \tilde{R}_y(a_y,b_y) = a_y S_{y-1} e^{-b_y S_{y-1}}
\end{equation}

where $S$ is spawning stock biomass derived from the model parameters $F$ and $R$, and
the fixed quantites $M$ and mean weights by year and age. It is assumed that $R$ is
log-normally distributed, or equivalently, normally distributed on the log-scale about
the (log) recruitment predicted by the SR model $\tilde{R}$, with known variance $\phi^2$,
i.e.

\begin{equation}
  \log R_y \sim \text{Normal} \Big( \log \tilde{R}_y, \phi_y^2 \Big)
\end{equation}

which leads to the definition of $\ell_{SR}$ given above. In all cases $a$ and $b$ are
strictly positive, and with the quantities $F$, $R$, etc. linear models are used to
parameterise $\log a$ and/or $\log b$, where relevant.

By default, recruitment $R$ as apposed to the reruitment predicted from a stock recruitment
model $\tilde{R}$, is specified as a linear model with a parameter for each year, i.e.

\begin{equation}
  \log R_y = \gamma_y
\end{equation}

This is to allow modelled recruitment $R_y$ to be shrunk towards the stock recruitment model.
However, if it is considered appropriate that recruitment can be determined exactly by a
relationship with covariates, it is possible, to instead define $\log R$ in terms of a linear
model in the same way as $\log F$, $\log Q$, $\log \sigma$ and $\log \tau$.
%But this is pretty much the same as taking a geometric mean, with a model on log a, and making the variance very small.

%  subsubsection
% ---------------------------------------------------------------------------------------------------
\subsubsection*{Model fitting}

Model fitting is done by optimising the combined likelihood (\ref{eq:ll_full1}) or
(\ref{eq:ll_full2}) in ADMB.


% ---------------------------------------------------------------------------------------------------
%
%  Section
%
% ---------------------------------------------------------------------------------------------------

%\section{Summary}

% TODO


% ---------------------------------------------------------------------------------------------------
%
%  References
%
% ---------------------------------------------------------------------------------------------------

%\bibliographystyle{chicago}
%\bibliography{a4a_refs}

\end{document}