forked from Tazinho/Advanced-R-Solutions
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy path2-13-S3.Rmd
More file actions
executable file
·551 lines (391 loc) · 22.5 KB
/
2-13-S3.Rmd
File metadata and controls
executable file
·551 lines (391 loc) · 22.5 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
```{r, include=FALSE}
source("common.R")
```
# S3
```{r}
library(sloop)
```
## Basics
1. __<span style="color:red">Q</span>__: Describe the difference between `t.test()` and `t.data.frame()`? When is each function called?
__<span style="color:green">A</span>__: Because of S3's `generic.class()` naming scheme, both functions may initially look similar, while they are in fact unrelated.
- `t.test()` is a *generic* function that performs a t-test.
- `t.data.frame()` is a *method* that gets called by the generic `t()` to transpose data frame input.
Due to R's S3 dispatch rules, `t.test()` would also get called when `t()` is a applied to an object of class "test".
2. __<span style="color:red">Q</span>__: Make a list of commonly used base R functions that contain `.` in their name but are not S3 methods.
__<span style="color:green">A</span>__: In the recent years "snake_case"-style has become increasingly common when naming functions (and variables) in R. But many functions in base R will continue to be "point.separated", which is why some inconsistency in your R code most likely cannot be avoided.
```{r, eval=FALSE}
# Some base R functions with point.separated names
install.packages()
read.csv()
list.files()
download.file()
data.frame()
as.character()
Sys.Date()
all.equal()
do.call()
on.exit()
```
For some of these functions "tidyverse"-replacements may exist such as `readr::read_csv()` or `rlang::as_character()`, which you could use at the cost of an extra dependency.
<!-- possibly mention https://journal.r-project.org/archive/2012/RJ-2012-018/RJ-2012-018.pdf (The State of Naming Conventions in R) -->
3. __<span style="color:red">Q</span>__: What does the `as.data.frame.data.frame()` method do? Why is it confusing? How could you avoid this confusion in your own code?
__<span style="color:green">A</span>__: The function `as.data.frame.data.frame()` implements the data frame *method* for the `as.data.frame()` *generic*, which coerces objects to data frames.
The name is confusing, because it does not clearly communicate the type of the function, which could be a regular function, a generic or a method. Even if we assume a method, the amount of `.`'s makes it difficult to separate the generic- and the class-part of the name.
We could avoid this confusion by applying a different naming convention (e.g. "snake_case") for our class and function names.
4. __<span style="color:red">Q</span>__: Describe the difference in behaviour in these two calls.
```{r}
set.seed(1014)
some_days <- as.Date("2017-01-31") + sample(10, 5)
mean(some_days)
mean(unclass(some_days))
```
__<span style="color:green">A</span>__: `mean()` is a generic function, which will select the appropriate method based on the class of the input. `some_days` has the class "Date" and `mean.Date(some_days)` will be used.
After `unclass()` has removed the class attribute the default method is chosen by the method dispatch. (`mean.default(unclass(some_days))`) calculates the mean of the underlying double.
<!-- When you look into the source code of `mean.Date()` (one line), you will see that the difference in the resulting objects is only the class attribute. -->
<!-- I agree, that inspecting `mean.Date` is interesting, though I am not entirely sure, how the dots work here and how the origin of the date is passed to `.Date`. It looks to me, as if the result of `mean.default(unclass(x))` is then backtransformend into a Date... If we can describe this concisely, we can use it otherwise I think we can safely skip it. (HB, 2019-03-12) -->
5. __<span style="color:red">Q</span>__: What class of object does the following code return? What base type is it built on? What attributes does it use?
```{r}
x <- ecdf(rpois(100, 10))
x
```
__<span style="color:green">A</span>__: This code returns an object of the class "ecdf" and contains an empirical cumulative distribution function of its input. The object is built on the base type "closure" and the expression, which was used to create it (`rpois(100, 10)`) is stored in in the `call` attribute.
6. __<span style="color:red">Q</span>__: What class of object does the following code return? What base type is it built on? What attributes does it use?
```{r}
x <- table(rpois(100, 5))
x
```
__<span style="color:green">A</span>__: This code returns a "table" object, which is build upon the base type "integer". The attribute "dimnames" are used to name the elements of the integer vector.
## Classes
1. __<span style="color:red">Q</span>__: Write a constructor for `data.frame` objects. What base type is a data frame built on? What attributes does it use? What are the restrictions placed on the individual elements? What about the names?
__<span style="color:green">A</span>__: Data frames are built on (named) lists with the additional requirement that all elements must have the same length. Their only attribute is "row.names". These must be unique, have the same length as each list element and also must be of integer or character type.
There are no additional restrictions to column names apart to those of lists, so one could use special characters and surround the names with backticks (which is not recommended).
A very good constructor implementing these criteria used to be part of the sloop package. It is no longer part of the package, but the source can still be found online (https://github.com/r-lib/sloop/blob/be7ce8a6be660536df4bdd3a31fa54f0d627f2d6/R/data.frame.R#L11).
```{r, error=TRUE}
# Copied from older version of the sloop package
new_data.frame <- function(x, row.names = NULL) {
stopifnot(is.list(x))
n <- if (length(x) == 0) 0 else length(x[[1]])
lengths <- vapply(x, length, integer(1))
stopifnot(all(lengths == n))
if (is.null(row.names)) {
row.names <- .set_row_names(n)
} else {
stopifnot(
is.character(row.names) ||
is.numeric(row.names)
)
stopifnot(
length(row.names) == n ||
length(row.names) == 2
)
}
structure(
x,
class = "data.frame",
row.names = row.names
)
}
# Test
x <- list(a = 1, b = 2)
new_data.frame(x, row.names = "l1")
new_data.frame(x, row.names = 1)
```
2. __<span style="color:red">Q</span>__: Enhance my `factor()` helper to have better behaviour when one or more `values` is not found in `levels`. What does `base::factor()` do in this situation?
__<span style="color:green">A</span>__: `base::factor()` converts these values (silently) into `NA`'s. To improve our `factor()` helper we choose to return an informative error message instead.
```{r, eval = FALSE}
factor <- function(x, levels = unique(x)) {
new_levels <- match(x, levels)
# Return error if unseen levels are passed
if(any(is.na(new_levels))){
stop("The following values do not occur ",
"in the levels of x: ",
paste(setdiff(x, levels), collapse = ", ")
".",
call. = FALSE)
}
validate_factor(new_factor(new_levels, levels))
}
```
3. __<span style="color:red">Q</span>__: Carefully read the source code of `factor()`. What does it do that our constructor does not?
__<span style="color:green">A</span>__: The original implementation allows a more flexible specification of input for `x`. The input is coerced to character or replaced by `character(0)` (in case of `NULL`). It also ensures that the factor levels are unique. This is achieved by setting the levels via `base::levels<-`, which fails when duplicate values are supplied.
4. __<span style="color:red">Q</span>__: Factors have an optional “contrasts” attribute. Read the help for `C()`, and briefly describe the purpose of the attribute. What type should it have? Rewrite the `new_factor()` constructor to include this attribute.
__<span style="color:green">A</span>__: When factor variables (representing nominal or ordinal information) are used in statistical models, they are typically encoded as dummy variables and by default each level is compared with the first factor level. However, many different encodings ("contrasts") are possible: https://en.wikipedia.org/wiki/Contrast_(statistics)
Within R's formula interface you can wrap a factor in `C` and specify the contrast of your choice. Alternatively you can set the "contrast" attribute of you factor variable, which accepts matrix input. (see `?contr.helmert` or similar for details)
```{r}
# Updated factor constructor
new_factor <- function(
x = integer(),
levels = character(),
contrast = NULL
) {
stopifnot(is.integer(x))
stopifnot(is.character(levels))
stopifnot(is.matrix(contrast) | is.null(contrast))
structure(
x,
levels = levels,
class = "factor",
contrast = contrast
)
}
```
5. __<span style="color:red">Q</span>__: Read the documentation for `utils::as.roman()`. How would you write a constructor for this class? Does it need a validator? What would a helper look like?
__<span style="color:green">A</span>__: This function transforms numeric input into Roman numbers (how cool is this!). This class is built on the "integer" type, which results in the following constructor.
```{r}
new_roman <- function(x = integer()){
stopifnot(is.integer(x))
structure(x, class = "roman")
}
```
The documentation tells us, that only values between 1 and 3899 are uniquely represented, which we then include in our validation function.
```{r}
validate_roman <- function(x) {
values <- unclass(x)
if(any(values < 1 | values > 3899)) {
stop(
"Roman numbers are only defined between ",
"1 and 3899.",
call. = FALSE
)
}
x
}
```
For convenience, we allow the user to also pass real values to a helper function.
```{r, error=TRUE}
roman <- function(x = integer()) {
x <- as.integer(x)
validate_roman(new_roman(x))
}
# Test
roman(c(1, 753, 2019))
roman(0)
```
## Generics and methods
1. __<span style="color:red">Q</span>__: Read the source code for `t()` and `t.test()` and confirm that `t.test()` is an S3 generic and not an S3 method. What happens if you create an object with class `test` and call `t()` with it? Why?
```{r, eval=FALSE}
x <- structure(1:10, class = "test")
t(x)
```
__<span style="color:green">A</span>__: We can see that `t.test()` is a generic, because it calls `UseMethod()`
```{r}
t.test
# or simply call
sloop::ftype(t.test)
```
`sloop::ftype()` confirms via a call to `sloop:::is_s3_generic` (which then uses `codetools::findGlobals()`) that `t.test()` contains a call to `UseMethod()`.
Interestingly R also provides helpers, which list functions that look like methods, but in fact are not:
```{r}
tools::nonS3methods("stats")
```
When we create an object with class `test`, `t()`, will dispatch to `t.test()`. This happens, because `UseMethod()` simply searches for functions named `paste0("generic", ".", c(class(x), "default"))`.
Consequently `t.test()` is erroneously treated as a method of `t()`. Because `t.test()` is a generic itself and doesn't find a method called `t.test.test()`, it dispatches to `t.test.default()`.
By defining `t.test.test()`, we demonstrate, that this is really what is happening internally.
```{r, error=TRUE}
x <- structure(1:10, class = "test")
t(x)
t.test.test <- function(x) t.default(x)
t(x)
```
2. __<span style="color:red">Q</span>__: What generics does the `table` class have methods for?
__<span style="color:green">A</span>__: We find methods specific for the `table` class, by searching for functions that end on ".table".
```{r}
library(methods)
objs <- mget(ls("package:base"), inherits = TRUE)
funs <- Filter(is.function, objs)
Filter(function(x) grepl(".table$", x), names(funs))
```
3. __<span style="color:red">Q</span>__: What generics does the `ecdf` class have methods for?
__<span style="color:green">A</span>__: We use the same approach as above. When this is not successful, we repeat using the superclass. Apparently the classes `ecdf` and `stepfun` exist, even though no specific methods are currently implemented (in base R).
```{r}
class(ecdf(1:2))
Filter(function(x) grepl(".ecdf$", x), names(funs))
Filter(function(x) grepl(".stepfun$", x), names(funs))
Filter(function(x) grepl(".function$", x), names(funs))
```
<!-- I think it would be nice to reflect on, why `ecdf` has a special class, but no specific methods tailored to this class. But I'm not sure about this for now. (HB, 2018-03-13) -->
4. __<span style="color:red">Q</span>__: Which base generic has the greatest number of defined methods?
__<span style="color:green">A</span>__: The generic `print()` clearly has the most defined methods.
```{r}
generics <- Filter(
function(x) "generic" %in% sloop::ftype(x),
funs
)
methods_per_generic <- sapply(
names(generics), function(x) methods(x),
USE.NAMES = TRUE
)
tail(sort(lengths(methods_per_generic)), 3)
```
5. __<span style="color:red">Q</span>__: Carefully read the documentation for `UseMethod()` and explain why the following code returns the results that it does. What two usual rules of function evaluation does `UseMethod()` violate?
```{r}
g <- function(x) {
x <- 10
y <- 10
UseMethod("g")
}
g.default <- function(x) c(x = x, y = y)
x <- 1
y <- 1
g(x)
```
__<span style="color:green">A</span>__: R looks for the `x` argument in `g()`'s calling environment (the global environment), in which `x` is bound to 1.
`g()` then dispatches to `g.default()`. The `x` argument is passed to `g.default()`. As `y` is not defined with `g.default`'s function environment, `y`'s value will be taken from the environment where `UseMethod()` created the call. There `y` is defined as 10.
When invoking `g.default()` explicitly, instead of using `UseMethod()`, the default argument, `x`, is evaluated in `g.default`'s calling environment, where it is `10` and further global variables like `y` are looked up via lexical scoping in the enclosing (global) environment, where `y` is `1`.
```{r}
g <- function(x) {
x <- 10
y <- 10
g.default(x)
}
g(x)
```
6. __<span style="color:red">Q</span>__: What are the arguments to `[`? Why is this a hard question to answer?
__<span style="color:orange">A</span>__: The subsetting operator `[` is a primitive and generic function as can be inspected via `ftype()`.
```{r}
ftype(`[`)
```
Therefore, `formals(`[`)` returns `NULL` and one possible way to figure out `[`'s arguments would be to inspect the underlying C source code, which can be found online via `pryr::show_c_source(.Primitive("["))`. However, regarding the differing arguments of `[`'s methods, it seems most probable, that `[`'s arguemts are `x` and `...`.
```{r}
names(formals(`[.Date`))
names(formals(`[.table`))
names(formals(`[.AsIs`))
```
## Object styles
1. __<span style="color:red">Q</span>__: Categorise the objects returned by `lm()`, `factor()`, `table()`, `as.Date()`, `ecdf()`, `ordered()`, `I()` into the styles described above.
__<span style="color:orange">A</span>__: The returned objects correspond to the following object styles:
Vector: `factor()`, `table()`, `as.Date()`, `ordered()`
Record:
Scalar: `lm()`, `ecdf()`
Other: `I()`
2. __<span style="color:red">Q</span>__: What would a constructor function for `lm` objects, `new_lm()`, look like? Use `?lm` and experimentation to figure out the required fields and their types.
__<span style="color:green">A</span>__: The constructor needs to populate the attributes of an `lm` object and check their type for correctness.
```{r}
# Learn about lm-attributes
?lm
attributes(lm(cyl ~ ., data = mtcars))
# Define constructor
new_lm <- function(
coefficiets, residuals, effects, rank, fitted.values, assign,
qr, df.residual, xlevels, call, terms, model
) {
stopifnot(
is.double(coefficients), is.double(residuals),
is.double(effects), is.integer(rank), is.double(fitted.values),
is.integer(assign), is.list(qr), is.integer(df.residual),
is.list(xlevels), is.language(call), is.language(terms),
is.list(model)
)
structure(
list(
coefficients = coefficients,
residuals = residuals,
effects = effects,
rank = rank,
fitted.values = fitted.values,
assign = assign,
qr = qr,
df.residual = df.residual,
xlevels = xlevels,
call = call,
terms = terms,
model = model
),
class = "lm"
)
}
```
## Inheritance
1. __<span style="color:red">Q</span>__: How does `[.Date` support subclasses? How does it fail to support subclasses?
__<span style="color:orange">A</span>__:
```{r, eval=FALSE}
# inspect function
`[.Date`
# see how it's used
x <- Sys.Date()
s3_dispatch(x[1])
# attempt to find out, what oldclass does
oldClass()
```
Maybe one would have to create a subclass to `Date` and see what `s3_dispatch` returns, when it is called on this subclass. I suspect, the delegation to the internal `[` to be related to the issue here.
2. __<span style="color:red">Q</span>__: R has two classes for representing date time data, `POSIXct` and `POSIXlt`, which both inherit from `POSIXt`. Which generics have different behaviours for the two classes? Which generics share the same behaviour?
__<span style="color:green">A</span>__: To answer this question, we have to get the respective generics
```{r}
# define helper
get_generics <- function(x) {
attr(methods(class = x), "info")[["generic"]]
}
# get generics
generics_t <- get_generics("POSIXt")
generics_ct <- get_generics("POSIXct")
generics_lt <- get_generics("POSIXlt")
```
The generics in `generics_t` with a method for the superclass POSIXt potentially share the same behaviour for both subclasses. However, if a generic has a specific method for one of the subclasses, it has to be subtracted:
```{r}
# These generics provide subclass-specific methods
union(generics_ct, generics_lt)
# These generics share (inherited) methods for both subclasses
setdiff(generics_t, union(generics_ct, generics_lt))
```
3. __<span style="color:red">Q</span>__: What do you expect this code to return? What does it actually return? Why?
```{r, eval = FALSE}
generic2 <- function(x) UseMethod("generic2")
generic2.a1 <- function(x) "a1"
generic2.a2 <- function(x) "a2"
generic2.b <- function(x) {
class(x) <- "a1"
NextMethod()
}
generic2(structure(list(), class = c("b", "a2")))
```
__<span style="color:green">A</span>__: When we execute the code above, this is what is happening:
* we pass an object of classes `b` and `a2` to `generic2()`, which prompts R to look for a method`generic2.b()`
* the method `generic2.b()` then changes the class to `a1` and calls `NextMethod()`
* One would think that this will lead R to call `generic2.a1()`, but in fact, as mentioned in the textbook, `NextMethod()`
> doesn’t actually work with the class attribute of the object, but instead uses a special global variable (.Class) to keep track of which method to call next.
This is why `generic2.a2()` is called instead.
## Dispatch details
1. __<span style="color:red">Q</span>__: Explain the differences in dispatch below:
```{r}
x1 <- 1:5
class(x1)
s3_dispatch(x1[1])
x2 <- structure(x1, class = "integer")
class(x2)
s3_dispatch(x2[1])
```
__<span style="color:orange">A</span>__: `class()` returns `"integer"` for `x1` and `x2`. However, they are not identical. While `x2` has the attribute "class" with the value `"integer"`, `x1` doesn't have a class attribute. Instead, `x1` has the implicit class `"numeric"`, see `?class`.
2. __<span style="color:red">Q</span>__: What classes have a method for the `Math` group generic in base R? Read the source code. How do the methods work?
__<span style="color:green">A</span>__: The following functions belong to this group (see ?`Math`):
* `abs`, `sign`, `sqrt`, `floor`, `ceiling`, `trunc`, `round`, `signif`
* `exp`, `log`, `expm1`, `log1p`, `cos`, `sin`, `tan`, `cospi`, `sinpi`, `tanpi`, `acos`, `asin`, `atan`, `cosh`, `sinh`, `tanh`, `acosh`, `asinh`, `atanh`
* `lgamma`, `gamma`, `digamma`, `trigamma`
* `cumsum`, `cumprod`, `cummax`, `cummin`
The following classes have a method for this group generic:
```{r}
methods("Math")
```
To read the source code of the S3 classes, we can just enter the name of the method into the console. To get the source code of the S4 classes, we can use `getMethod()`, i. e. `getMethod("Math", "nonStructure")`.
To explain the basic idea, we just overwrite the data frame method:
```{r}
Math.data.frame <- function(x){"hello"}
```
Now all functions from the math generic group, will return `"hello"`
```{r}
abs(iris)
exp(iris)
lgamma(iris)
```
Of course different functions should perform different calculations. Here `.Generic` comes into play, which provides us with the calling generic as a string
```{r}
Math.data.frame <- function(x, ...){
.Generic
}
abs(iris)
exp(iris)
lgamma(iris)
rm(Math.data.frame)
```
The original source code of `Math.data.frame()` is a good example on how to invoke the string returned by `.Generic` into a specific method. `Math.factor()` is a good example of a method, which is simply defined for better error messages.
3. __<span style="color:red">Q</span>__: `Math.difftime()` is more complicated than I described. Why?
__<span style="color:green">A</span>__: `Math.difftime()` also excludes cases apart from `abs`, `sign`, `floor`, `ceiling`, `trunc`, `round` and `signif` and needs to return a fitting error message.