nestedLogit/README.Rmd at master · friendly/nestedLogit · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file.
     When version changes, install the package before re-kniting so the deve version is up-to-date
-->

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  warning = FALSE,
  comment = "#>",
  fig.path = "man/figures/README-",
  fig.height = 5,
  fig.width = 5
#  out.width = "100%"
)

library(nestedLogit)
# get package versions
cran_version <- available.packages(repos = "https://cloud.r-project.org")["nestedLogit", "Version"]
dev_version <- getNamespaceVersion("nestedLogit")

```

<!-- badges: start -->
[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
[![Last Commit](https://img.shields.io/github/last-commit/friendly/nestedLogit)](https://github.com/friendly/nestedLogit)
[![CRAN status](https://www.r-pkg.org/badges/version/nestedLogit)](https://cran.r-project.org/package=nestedLogit)
[![R-Universe](https://friendly.r-universe.dev/badges/nestedLogit)](https://friendly.r-universe.dev)
[![Downloads](https://cranlogs.r-pkg.org/badges/nestedLogit?color=brightgreen)](https://www.r-pkg.org:443/pkg/nestedLogit)
[![Docs](https://img.shields.io/badge/pkgdown%20site-blue)](https://friendly.github.io/nestedLogit)

<!-- badges: end -->

# nestedLogit <img src="man/figures/logo.png" style="float:right; height:200px;" />
<!-- **Version 0.3.4** -->
**Version `r dev_version`**; documentation built for `pkgdown` `r Sys.Date()`

The `nestedLogit` package provides functions for fitting _nested dichotomy_ logistic regression models
for a **polytomous** response (with $m > 2$ categories), such as:

* support for political party in Canada (PC, Liberal, NDP, Green, BQ),
* preferred mode of transport (foot, bus, bike, train, plane),
* womens' working status (not working, part-time, full-time).

The figure below shows two different ways that a $m=4$-category polytomous response $Y = \{1, 2, 3, 4\}$ can be decomposed as
three ($m-1$) nested dichotomies among the levels.

* In the case shown at the left of the figure, the response categories
are divided first as $\{1, 2\}$ vs. $\{3, 4\}$. Then these compound categories are subdivided
 as the dichotomies $\{1\}$ vs. $\{2\}$ and as $\{3\}$ vs. $\{4\}$.
* Alternatively, as shown at the right of the figure, the response categories
are divided progressively:
first as $\{1\}$ vs. $\{2, 3, 4\}$;
next as $\{2\}$ vs. $\{3, 4\}$; and
and finally $\{3\}$ vs. $\{4\}$.

```{r nested}
#| echo=FALSE,
#| out.width="80%",
#| fig.cap = "**Nested dichotomies**: The boxes show two different ways a four-category response can be represented as three nested dichotomies."
knitr::include_graphics("man/figures/nested.jpg")
```

## ⚙️ Related models for a polytomous response

The basic model for this situation ($m > 2$ response categories) is the standard [**multinomial logistic model**](https://en.wikipedia.org/wiki/Multinomial_logistic_regression) (fit by: e.g., `nnet::multinom()`)
which compares response categories to a _reference level_.

When you can think of the differences among the response categories as a set nested comparisons
among subsets of the categories, the approach of nested dichotomies is simpler, because:

* Nested dichotomies are **statistically independent**, and hence:
* the likelihood chi-square statistics for the sub-models are **additive**;
* they provide an additive decomposition of tests for the **overall** polytomous response.
* You can think of this as breaking up the overall question of "How do the response categories differ?" into $m-1$
  non-overlapping sub-questions that answer the global one.

When the dichotomies make
sense substantively, this method can be a simpler alternative to the standard **multinomial logistic model**
which compares response categories to a reference level.
This choice is similar to using **orthogonal contrasts** among factor categories in an ANOVA,
as opposed to using the default reference-level coding.

The benefit is that, with nested logit models, you get to ask substantively more interesting questions **directly**
than you can with the multinomial logit model. The results (for overall tests) are nearly equivalent.
But, you get to **think better** about your problem with the nested logit model.

### Ordered categories

Note that when the response categories are **ordered**, as in
education attained: "HS" < "College" < "BA" < "MA" < "PhD", another attractive model is the
**proportional odds** model (e.g., fit by `MASS::polr()`).
This is a simpler model, but achieves that simplicity by
making the additional assumption that the coefficients for the
predictors are the same for all categories.


## 🚀 Installation

You can install the current published version (`r cran_version`) from [CRAN](https://cran.r-project.org/package=nestedLogit),
or the development version (`r dev_version`) from
either [R-universe](https://friendly.r-universe.dev/nestedLogit) or [Github](https://github.com/friendly/nestedLogit)

+-------------------+------------------------------------------------------------------------------+
| CRAN version      | `install.packages("nestedLogit")`                                            |
+-------------------+------------------------------------------------------------------------------+
| R-universe        | `install.packages('nestedLogit', repos = 'https://friendly.r-universe.dev')` |
+-------------------+------------------------------------------------------------------------------+
|                   |                                                                              |
| Github            | `remotes::install_github("friendly/nestedLogit")`                            |
|                   |                                                                              |
+-------------------+------------------------------------------------------------------------------+


## ✨ Package overview

The package provides one main function, `nestedLogit()` for fitting the set of $(m-1)$
binary logistic regression models for a polytomous response with $m$ levels.
It has the full suited of methods associated with `glm()` and other model fitting methods.

What is novel in R software design are aspects of a grammar for specifying dichotomies
in model formulas, and the idea of fitting an overall model, which is composed of
submodels that we want to also consider, test, plot, ...

### Specifying dichotomies

The essential idea here is to have a simple notation expressing a dichotomy among response levels,
like `{{A, B}, {C, D}}`, and then a way to specify a collection of $m-1$ of these that can be
used productively in data analysis and visualization.

In the `nestedLogit` package these can be specified using helper functions:

* `dichotomy()`: constructs a _single_ dichotomy among the levels of a response factor;
* `logits()`: creates the set of dichotomies, typically using `dichotomy()` for each.
* `continuationLogits()`: provides a convenient way to generate all dichotomies for an ordered response.

For instance, a 4-category response, with levels `r LETTERS[1:4]`, and successive binary splits
for the dichotomies of interest
could be specified as:


```{r}
(ABCD <-
  logits(AB.CD = dichotomy(c("A", "B"), c("C", "D")),
           A.B = dichotomy("A", "B"),
           C.D = dichotomy("C", "D")
         )
)
```

These dichotomies are effectively a tree structure of lists, which can be displayed simply using
`lobstr::tree()`.

```{r tree}
lobstr::tree(ABCD)
```


Alternatively, the nested dichotomies can be specified more compactly as a nested (i.e., recursive) list
with optionally named elements. For example, where people might choose a method of transportation
among the categories `plane`, `train`, `bus`, `car`, a sensible set of three dichotomies could
be specified as:

```{r transport}
transport <- list(
  air = "plane",
  ground = list(
    public = list("train", "bus"),
    private = "car"
  ))

lobstr::tree(transport)
```

An example with this structure is included in `vignettes("other-examples", package = "nestedLogit")`.

There are also methods including `as.matrix.dichotomies()`, `as.character.dichotomies()`
to facilitate working with `dichotomies` objects in other representations. The `ABCD` example
above corresponds to the matrix below, whose rows represent the dichotomies and columns
are the response levels:

```{r}
as.matrix(ABCD)

as.character(ABCD)
```


The result of `nestedLogit()` is an object of class `"nestedLogit"`. It contains
the set of $(m-1)$ `glm()` models fit to the dichotomies.

### Methods

```{r child="man/partials/methods.Rmd"}
```


## 🎨 Examples

This example uses data on women's labor force participation to fit a nested logit model for
the response, `partic`, representing categories
`not.work`, `parttime` and `fulltime` for 263 women from a 1977
survey in Canada. This dataset is explored in more detail in the
package vignette, `vignette("nestedLogits", package = "nestedLogit")`.

A model for the complete polytomy can be specified as two nested
dichotomies, using helper functions `dichotomy()` and `logits()`, as shown in the example that follows:

* `work`: {not.work} vs. {parttime, fulltime}
* `full`: {parttime} vs. {fulltime}, but only for those working

`nestedLogit()` effectively fits each of these dichotomies
as logistic regression models via `glm(..., family = binomial)`

```{r wlf-model}
data(Womenlf, package = "carData")

# Use `logits()` and `dichotomy()` to specify the comparisons of interest
comparisons <- logits(work=dichotomy("not.work",
                                     working=c("parttime", "fulltime")),
                      full=dichotomy("parttime", "fulltime"))

m <- nestedLogit(partic ~ hincome + children,
                 dichotomies = comparisons,
                 data=Womenlf)
coef(m)
```
The `"nestedLogit"` object contains the components of the fitted model. The structure can be shown nicely
using `lobstr::tree()`:

```{r}
m |> lobstr::tree(max_depth=1)
```

The separate models for the `work` and `full` dichotomies can be extracted via `models()`. These
are the binomial `glm()` models.
```{r}
models(m) |> lobstr::tree(max_depth = 1)
```

`Anova()` produces analysis of variance deviance tests for the terms in this model for each of the submodels, as well as for the combined responses of the polytomy. The `LR Chisq` and `df` for terms in the combined model are the sums of those for
the submodels.


```{r wlf-anova}
car::Anova(m)
```

### Plots
A basic plot of predicted probabilities can be produced using
the `plot()` method for `"nestedLogit"` objects.
It can be called several times to give multi-panel plots.
By default, a 95% pointwise confidence envelope is added to the plot.
Here, they are plotted with `conf.level = 0.68` to give $\pm 1$ std. error bounds.

```{r wlf-plot}
#| out.width = "100%",
#| fig.asp = 0.55,
#| echo = 1:3
op <- par(mfcol=c(1, 2), mar=c(4, 4, 3, 1) + 0.1)
plot(m, "hincome", list(children="absent"),
     conf.level = 0.68,
     xlab="Husband's Income", legend=FALSE)
plot(m, "hincome", list(children="present"),
     conf.level = 0.68,
     xlab="Husband's Income")
par(op)
```

## 📜 Vignettes

* A more general discussion of nested dichotomies logistic regression and detailed examples can be found
in `vignette("nestedLogit", package="nestedLogit")` and in the [pkgdown documentation](https://friendly.github.io/nestedLogit/articles/nestedLogit.html).


* A variety of other plots can be produced using `ggplot()`, as described in the vignette,
`vignette("plotting-ggplot")`  and in the [pkgdown documentation](https://friendly.github.io/nestedLogit/articles/plotting-ggplot.html).

* Figuring out how to calculate uncertainty estimates for nested logit models was solved
John Fox.
The vignette, `vignette("standard-errors")`, describes the mathematics behind the calculation of
standard errors using the delta method.

* A collection of other examples of datasets for which nested logit models are useful, described in `vignette("other-examples")` and
[pkgdown documentation](https://friendly.github.io/nestedLogit/articles/other-examples.html).

## Authors
* John Fox
* Michael Friendly

## References

S. Fienberg (1980) _The Analysis of Cross-Classified Categorical Data_, 2nd Edition, MIT Press, Section 6.6.

J. Fox (2016) _Applied Regression Analysis and Generalized Linear Models_, 3rd Edition, Sage, Section 14.2.2.

M. Friendly and D. Meyers (2016) _Discrete Data Analysis with R_, CRC Press, Section 8.2.