-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy path07-functions.Rmd
More file actions
397 lines (268 loc) · 19 KB
/
07-functions.Rmd
File metadata and controls
397 lines (268 loc) · 19 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
# Writing functions in R {#functions}
Although we have already written various functions in R, in this chapter the writing of R functions will be approached systematically.
## General
A good way to learn about functions or to write a new function is to look at existing ones. As an example consider that we would like to write a function to implement a novel plotting procedure. So we start by taking a look at the existing `plot` function.
```{r, plotContent}
plot
```
This is not very helpful so we give the instruction:
```{r, plotMethods}
methods(plot)
```
If we decide to take a look at `plot.default` we can do so by
```{r, plotDefault}
plot.default
```
Since our new plotting method is aimed at categorical data we decide rather to take a look at `plot.factor`. But this is an asterisked function and hence is not visible:
```{r, plotFactor, error = 0}
plot.factor
```
Asterisked functions can be inspected using the following method:
```{r, plotGetAnywhere}
getAnywhere(plot.factor)
```
(a) How are default values assigned to arguments of functions?
(b) What is the default behaviour of `plot.factor()`?
(c) What tasks can be achieved with `pmatch()` and what is understood by partial matching? What will happen if `plot.factor()` is called with (i) `legend.text = 'AA=Agecat'`; (ii) `leg = 'AA=Agecat'`? Explain.
(d) Discuss the usage of `missing()`.
(e) Give an example of the usage of the function `stop(message= " ")`.
(f) Give an example of the usage of the function `warning(message= " ")`.
(g) What is the usage of the function `warnings()`?
(h) Why can functions be called without specifying any arguments e.g. `q()`?
(i) If the body of a function consists only of a single instruction it is not necessary to enclose it with braces.
(j) The convention is to use the last evaluated statement as a function’s return value. If several objects are to be returned gather them in a list.
(k) The function `return()` with a single object or a list of objects is useful to interrupt a function at some intermediate stage and return an object or a list of objects at that particular stage. This is usually done when a function is under development.
(l) Sometimes there is no meaningful value to return e.g. when a function is written primarily to produce some plot. In cases like this the function `invisible()` can be used as the last statement of the function. As an example of the usage of `invisible()` give the following instructions:
```{r, invisibleExamples}
boxplot(rnorm(100), plot = TRUE)
boxplot(rnorm(100), plot = FALSE)
```
Now look at the end of function `boxplot.default()` to see how `invisible()` has been implemented.
(m) Libraries (packages) of R functions. Attaching and detaching libraries to the search path. (Revise Chapter \@ref(intro))
(n) Creating a new function using scripts or `fix()`. (Revise Chapter \@ref(intro))
(o) Editing an existing function using scripts or `fix()`. (Revise Chapter \@ref(intro))
(p) Note that when writing a function a line can be interrupted at any place and be continued on a next line. *<span style="color:#FF9966">Warning: Be careful not to put the break point where it marks the completion of an executable statement.</span>* Explain.
## Writing a new function
Determining the indices of elements in a vector or matrix that meet a certain condition: the function `where()`
(a) Write the following function:
```{r, functionWhere}
where <- function(x, cond)
{ # Argument cond must evaluate to a logical value
if(!is.matrix(x))
seq(along = x)[cond]
else matrix(c(row(x)[cond], col(x)[cond]), ncol = 2)
}
```
(b) Inspect the *airquality* data set using the command `str(airquality)`.
(c) Use the `where()` function to find the indices of (i) the `NA`s, (ii) the maximum value and (iii) the minimum value in the airquality data set.
(d) Repeat (b) using the built-in function `which()`.
## Checking for object name clashes
(a) What happens if an R object is given the same name as an existing object?
(b) Discuss the usages of the functions `apropos()`, `conflicts()`, `find()` and `match()` for the naming of objects.
(c) Remember that when a function is called the R evaluator first looks in the *<span style="color:#3399FF">global environment</span>* for a function with this name and subsequently in each of the attached packages or date bases in the order shown by `search()`. The evaluator generally stops searching when the name is found for the first time. If two attached packages have functions with the same name one of them will *<span style="color:#FF9966">mask</span>* the object in the other. For example, the function `gam()` exists in two packages: `gam` and `mgcv`. If both were attached the command
```{r, findGam}
library (mgcv)
library (gam)
find("gam")
```
<div style="margin-left: 25px; margin-right: 20px;">
will return both version.
</div>
(d) The operator `::` can be used to access the intended version of `gam()` by using the call `mgcv::gam()` or `gam::gam()`.
(e) When writing R packages the *<span style="color:#3399FF">namespace</span>* of the package provides another mechanism for ensuring that the correct version of a function is used. Note in this regard that the operator `:::` can be used to access objects that are not exported.
## Returning multiple values
### Exercise
::: {style="color: #80CC99;"}
Write an R function that returns the mean, median, variance, minimum, maximum and coefficient of variation of a numeric vector of sample data. The different components must be accessible by name. Test your function with the value of `rnorm(1000)`. *Hint*: Use the construct `list (mean = ..., median = ..., ...)`.
:::
## Local variables and evaluation environments
(a) Where is an object stored that is created by a script or `fix()`?
(b) Where are local objects (objects that are created during the execution of a function) stored?
(c) Explain how the *<span style="color:#3399FF">evaluation environment</span>* works.
(d) What is understood by the *<span style="color:#3399FF">global environment</span>*?
(e) Study the R help-file w.r.t. the operator `<<-`. When is it useful to use this operator? What are the dangers inherent to this operator?
(f) What is understood by the scope of an expression or function?
The symbols which occur in the body of a function can be divided into three classes: *<span style="color:#FF9966">formal parameters</span>*, *<span style="color:#FF9966">local variables</span>* and *<span style="color:#FF9966">free variables</span>*. The formal parameters of a function are those appearing within the parentheses denoting the argument list of the function. Their values are determined by the process of *<span style="color:#FF9966">binding</span>* the actual function arguments to the formal parameters. Local variables are created by the evaluation of expressions in the body of the functions. Variables which are neither formal parameters nor local variables are called free variables. Free variables become local variables when they are assigned to. Consider the following function definition.
```{r, arguments}
fun <- function(datvec) {
mean <- mean(datvec)
print(mean)
plot(datvec)
plot(Traffic)
}
```
In this function, `datvec` is a formal parameter, the object `mean` on the left-hand of the assignment symbol is a local variable (not to be confused with the function `mean()` on the right-hand side of the assignment symbol) while `Traffic` is a free variable. In R the free variable bindings are resolved by first looking in the *<span style="color:#3399FF">environment</span>* in which the function was created. This is called *<span style="color:#FF9966">lexical scope</span>*.
If the following function call is made from the prompt in the working directory `fun(1:25)` the formal parameter `datvec` within the body of the function is assigned the value `1:25` (the actual argument) and its mean is assigned to the local object `mean`. If the free parameter `Traffic` is found in the *<span style="color:#3399FF">global environment</span>* or in a data base on the search path the required graph will be created else an error message will be sent to the console. Perform the above call.
## Cleaning up
(a) Study how the function `on.exit()` is used. This function can be used to reset options that are changed during an R-session back to their original values when the session is ended or a function terminates with an error message. It is also convenient for removal of temporary files.
(b) Study the uses of the functions `.First()` and `.Last()`.
(c) Write a function that automatically opens a graph window with a square plot region when an R-session is started.
## Variable number of arguments: argument `...`
(a) Consider the following situation: You want to write a function for a complex task. At a particular stage a graph of some intermediate results is to be constructed. This requires the calling function to contain a call to the `hist()` function. Here is an example of a chunk of code for executing this task:
```{r, eval = FALSE}
complexfun <- function(datmat,colgraph)
{ datmat <- scale(datmat)
# Several lines of complex code here
hist(datmat, col = colgraph) }
```
A call like `complexfun(rnorm(1000), 'yellow')` can now be executed for the desired result. The problem is that the hist function has several arguments that you would like to be able to access by passing suitable actual values to them through the calling function `complexfun`. Instead of having to resort to provide a complete set of arguments in the argument list of `complexfun` R provides a neat way of addressing this situation: The argument `...` which acts like any other formal argument except that it can represent a variable number of arguments. To see how the argument `...` works change the above function to:
```{r, eval = FALSE}
complexfun2 <- function(datmat, ... )
{ datmat <- scale(datmat)
# Several lines of complex code here
hist(datmat, ... ) }
```
Arguments represented by argument `...` in the argument list of hist are passed to hist through the argument `...` appearing in the arguments list of function `complexfun2`:
```{r, eval = FALSE}
complexfun2(datmat = rnorm(1000), col = 'yellow',
probability = TRUE, xlim = c(-5,5))
```
(b) Write a function that will retrieve the maximum length of any of an unspecified number of arguments of a specified mode. This is another example of the use of the `...` argument:
```{r, maxlen}
maxlen <- function (mode.use="numeric", ...)
{ my.list <- list(...)
out <- 0
for(x in my.list)
print (mode(x)) #if(mode(x) == mode.use) out <- max(out,length(x))
out
}
```
Note that the named argument must be specified as such in the function call:
```{r, maxlenExamples}
maxlen(1:10, 1:15, 1:3, letters)
maxlen(mode.use="numeric", 1:10, 1:15, 1:3, letters)
maxlen(1:10, 1:15, 1:3, letters, mode.use="character")
maxlen(mode.use="character", 1:10, 1:15, 1:3, letters)
```
## Retrieving names of arguments: functions `deparse()` and `substitute()`
There are many practical situations requiring the conversion of mathematical expressions into character strings (text) or, conversely, requiring the conversion of text into mathematical expressions. The tools (functions) provided in R for achieving such conversions are summarized in Figure \@ref(fig:expression).
```{r, expression, echo=FALSE, fig.cap="Converting text into mathematical expression or mathematical expressions into text.", out.width="80%"}
library(knitr)
knitr::include_graphics("pics/expressions.jpg")
```
::: {style="color: #80CC99;"}
* Task: write an R function that will plot two vectors using as axis labels the names of the objects passed as arguments to the function.
:::
It follows from Figure \@ref(fig:expression) that the function `substitute()` takes an expression as argument and returns it unevaluated. In order to evaluate the return value of `substitute()` the function `eval()` must be used. The function `deparse()` takes as argument an unevaluated expression and converts it into a character string. Now we are ready to write the following function:
```{r, eval = FALSE}
labplot <- function (x,y)
{ xname <- deparse(substitute(x))
yname <- deparse(substitute(y))
plot(x,y, xlab=xname, ylab=yname, main = paste("Plot of",
yname,"versus", xname))
}
```
(a) Study and illustrate the usage of function `labplot()`.
(b) From Figure \@ref(fig:expression) it also follows that the function `parse()` does the opposite of `deparse()` by converting a character string into an unevaluated expression. The latter unevaluated expression can be evaluated when needed using `eval()`.
## Operators
Execute the following instruction
```{r, objects31}
objects('package:base')[1:31]
```
in order to obtain some examples of operators available in R.
(a) *<span style="color:#FF9966">Operators are special R functions.</span>* Discuss this statement. In what respects do operators differ from ordinary R functions?
::: {style="color: #80CC99;"}
(b) Write an operator `%E%` to determine the Euclidean distance between two vectors and give an example of its usage. *Hint*: when creating operators with `fix()` or using scripts the name must be given as a character string e.g. `fix("%E%")`.
:::
## Replacement functions
Execute the following instruction
```{r, objectsBase}
objects('package:base')[300:400]
```
and notice that some object names appear in pairs with the name of one member of the pair ending in `<-`. Examples are `dim<-`, `levels<-`, `diag<-`, `names<-`, `rownames<-`, `colnames<-` and `dimnames<-`. Functions having names ending in `<-` are called *<span style="color:#FF9966">replacement</span>* functions. A replacement function appears on the left-hand side of the assignment symbol using the name without the `<-` to replace contents of the objects appearing in its argument list by the contents of the object appearing at the right-hand side of the assignment symbol e.g.:
```{r, rownamesExample}
X <- matrix (1:12, ncol = 3, dimnames =
list (paste0 ("Row", 1:4), paste0 ("X", 1:3)))
a <- rownames(X) # Function rownames in action.
rownames(X) <- 1:nrow(X) # Replacement function 'rownames<-' in action.
```
How can the object `diag<-` be inspected and is it different from the object `diag`? Compare the result of the following function calls:
```{r, getAnywhereDiag}
getAnywhere('diag')
getAnywhere('diag<-')
```
In what respects do replacement functions differ from other functions?
In order to write a replacement function the following rules must be met:
(i) the function name must end in `<-`
(ii) the function must return the complete object with suitable changes made
(iii) the final argument of the function corresponding to the replacement data on the right-hand side of the assignment, must be named `value`
(iv) usually a companion function exists having the same name without the `<-`.
As an example, write a replacement function `undefined()` that will replace missing values in a data object with the values on its right-hand side:
```{r, undefined}
"undefined<-" <- function (x, codes = numeric(), value)
{ if (length(codes) > 0) x[x %in% codes] <- NA
x[is.na(x)] <- value
x
}
```
The above function can be created or edited using `fix("undefined<-")`. Illustrate the usage of `undefined()`.
## Default values and lazy evaluation
(a) The function `match.arg()` is useful for selecting a default value from one of a set of possible values. Consider the following example:
```{r, choice, error = 0}
choice <- function(method=c("PCA","CVA","CA","NONLIN"))
{ match.arg(method) }
choice()
choice("CVA")
choice("xx")
```
(b) Functions in the R language are governed by a principle known as *<span style="color:#FF9966">lazy evaluation</span>* which means that a default value is not evaluated until it is actually needed within the function body. As a result of lazy evaluation it might happen in a function call that some default values are never evaluated.
## The dynamic loading of external routines
Compiled code can run in some instances much faster than corresponding code in R. The functions `.C()` and `.Fortran()` allow users to make use of programs written in *<span style="color:#BE99ff">C</span>* or *<span style="color:#BE99ff">Fortran</span>* in their R functions. How this is done is illustrated below. Study this example carefully and consult the help files for more details when needed.
First an R function is created to compute the matrix product of two matrices:
```{r, eval = FALSE}
matmult <- function (A,B)
{ if(ncol(A) != nrow(B)) stop("A and B not conformable with
respect to matrix multiplication \n")
n <- nrow(A)
q <- ncol(B)
Cmat <- matrix(NA, nrow=n, ncol=q)
for(i in 1:n)
{ for(j in 1:q) Cmat[i,j] <- sum(A[i,] * B[,j])
}
Cmat
}
```
Next a *<span style="color:#BE99ff">Fortran</span>* subroutine is written for performing matrix multiplication. The *<span style="color:#BE99ff">Fortran</span>* code for this subroutine is given below:
````{verbatim}
SUBROUTINE MATM (A1, A2B1, B2, A, B, OUT)
C This subroutine performs matrix multiplication.
C This should be improved with optimized code (such as
C from Linpack, etc.)
IMPLICIT NONE
INTEGER A1, A2B1, B2
DOUBLE PRECISION A(A1,A2B1), B(A2B1,B2), OUT(A1,B2)
C DUMMIES
INTEGER I, J, K
DO 300,J=1,B2
DO 200,I=1,A1
OUT(I,J)=0
DO 100,K=1,A2B1
OUT(I,J)=OUT(I,J)+A(I,K)*B(K,J)
100 CONTINUE
200 CONTINUE
300 CONTINUE
END
````
Next a dynamic link library (*<span style="color:#BE99ff">.dll</span>*) is made from the *<span style="color:#BE99ff">Fortran</span>* subroutine. The easiest way to do this is to use the command `R CMD SHLIB matm.f` from the *<span style="color:#BE99ff">Command Prompt</span>*. The *<span style="color:#BE99ff">.dll</span>* is available as *<span style="color:#BE99ff">C:\\matm64.dll</span>*.
Now an R function is to be written where the *<span style="color:#BE99ff">Fortran</span>* code is called:
```{r, eval = FALSE}
matmult.Fortran <-function (A,B)
{ if(ncol(A) != nrow(B)) stop("A and B not conformable with
respect to matrix multiplication \n")
n <- nrow(A)
q <- ncol(B)
p <- ncol(A)
Cmat <- matrix(0, nrow=n, ncol=q)
storage.mode(A) <- "double"
storage.mode(B) <- "double"
storage.mode(Cmat) <- "double"
value <- .Fortran("matm", as.integer(n), as.integer(p),
as.integer(q), A, B, matprod=Cmat)
value$matprod }
```
In order to use `matmult.Fortran()` the correct *<span style="color:#BE99ff">.dll</span>* must be loaded into the current folder using the function `dyn.load()`:
```{r, eval = FALSE}
dyn.load("full path\\matm64.dll")
```
Compare the answers and execution time of `matmult()` and `matmult.Fortran()` for different sized matrices.
The `Rcpp` package has made the inclusion of *<span style="color:#BE99ff">C++</span>* code into R considerably easier and more robust. For a detailed description of the package see [Rcpp vignette intro](https://cran.r-project.org/web/packages/Rcpp/vignettes/Rcpp-introduction.pdf).