Skip to content

Commit 988946e

Browse files
committed
final edits before cran submission
1 parent 63c94f9 commit 988946e

7 files changed

Lines changed: 179 additions & 107 deletions

File tree

.Rbuildignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,5 @@
77
^README\.Rmd$
88
^doc$
99
^Meta$
10+
^CRAN-SUBMISSION$
11+
^cran-comments\.md$

CRAN-SUBMISSION

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
Version: 0.0.1
2+
Date: 2025-06-03 13:29:42 UTC
3+
SHA: 63c94f996558be87bb5adcd50b0f1400ed62ac43

DESCRIPTION

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ Title: Tools for outbreaker2
33
Version: 0.0.1
44
Authors@R:
55
person("Cyril", "Geismar", email = "c.geismar21@imperial.ac.uk", role = c("aut", "cre"))
6-
Description: Helper functions for the outbreaker2 package.
6+
Description: Streamlines the post-processing, summarisation, and visualisation of outbreaker2 output via a suite of helper functions. Facilitates tidy manipulation of posterior samples, integration with case metadata, and generation of diagnostic plots and summary statistics.
77
License: MIT + file LICENSE
88
Encoding: UTF-8
99
Roxygen: list(markdown = TRUE)
@@ -18,6 +18,7 @@ Suggests:
1818
linktree,
1919
epicontacts,
2020
dplyr,
21+
tidyr,
2122
furrr,
2223
ggplot2,
2324
igraph,

NEWS.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# o2ools 0.0.1
2+
3+
* Initial CRAN submission.

cran-comments.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
## R CMD check results
2+
3+
0 errors | 0 warnings | 1 note
4+
5+
* This is a new release.

vignettes/extended_analyses.Rmd

Lines changed: 154 additions & 95 deletions
Original file line numberDiff line numberDiff line change
@@ -14,125 +14,86 @@ knitr::opts_chunk$set(
1414
message = FALSE,
1515
warning = FALSE,
1616
out.width = "100%",
17+
fig.height = 4,
1718
dpi = 300
1819
)
1920
```
2021

2122
```{r setup}
2223
library(dplyr)
24+
library(tidyr)
2325
library(ggplot2)
26+
library(outbreaker2)
2427
library(o2ools)
2528
```
2629

27-
# Extension with other packages
28-
The *o2ools* package provides helper functions to process and summarise the results from *outbreaker2* and to extend the analyses with other packages. In this vignette, we show how to use *o2ools* with *linktree* and *mixtree*.
29-
30-
*linktree* is a framework for estimating *group transmission assortativity*—the tendency for individuals to transmit within their own group rather than to others. The method is described in our [paper](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0313037), and briefly discussed on [youtube](https://www.youtube.com/watch?v=Ib5dBunDPN0).
31-
32-
*mixtree* is a package for comparing *epidemic forests*—collections of transmission trees. It can help assess the convergence of *outbreaker2* results, and to compare the trees from different models.
33-
34-
## `linktree`
35-
36-
A detailed [vignette](https://cygei.github.io/linktree/articles/linktree_vignette.html) is available, so we do not cover it fully here. Instead, we show how to estimate assortativity from *outbreaker2*'s posterior trees. This requires knowledge of group sizes or their relative proportions. In our example, say the staff-to-patient ratio in the hospital is 1:4. It is also advised to analyse the transmission chains before *saturation* (i.e. the epidemic peak).
30+
# Reproduction number
3731

38-
### Estimate the peak date
39-
We'll estimate the peak date with the *incidence2* package.
32+
We can estimate the case reproductive number ($R_i$), *i.e.* the number of secondary infections generated by each case, with the `get_Ri` function:
4033

4134
```{r}
42-
library(incidence2)
43-
linelist$date_onset <- as.Date(linelist$onset, origin = "2020-01-01")
44-
45-
incid <- incidence2::incidence(linelist,
46-
date_index = "date_onset",
47-
groups = "group")
48-
plot(incid, fill = "group") +
49-
scale_fill_manual(values = c("hcw" = "orange", "patient" = "purple"))+
50-
theme(legend.position = "top")
35+
out_id <- identify(out, linelist$name)
5136
37+
# A matrix of R values
38+
# Columns refer to the case ID
39+
# Rows refer to the MCMC iteration
40+
Ri_mat <- get_Ri(out_id)
5241
```
5342

43+
You can plot the offspring distribution:
5444

5545
```{r}
56-
peak <- estimate_peak(regroup(incid), progress = FALSE)$observed_peak
57-
peak
46+
Ri_mat %>%
47+
mutate(step = row_number()) %>%
48+
pivot_longer(-step, names_to = "name", values_to = "Ri") %>%
49+
group_by(step, Ri) %>%
50+
summarise(n = n(), .groups = "drop_last") %>%
51+
mutate(percent = n / sum(n) * 100) %>%
52+
ungroup() %>%
53+
group_by(Ri) %>%
54+
summarise(
55+
mean = mean(percent),
56+
lower = quantile(percent, 0.025),
57+
upper = quantile(percent, 0.975),
58+
.groups = "drop"
59+
) %>%
60+
ggplot(aes(x = Ri, y = mean)) +
61+
geom_segment(aes(x = Ri, xend = Ri, y = 0, yend = mean), colour = "grey60", size = 0.5) +
62+
geom_point(size = 2) +
63+
geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.2) +
64+
labs(x = "Number of secondary infections (Ri)",
65+
y = "Mean % of cases",
66+
title = "Posterior offspring distribution with 95% credible intervals")+
67+
theme_bw()
5868
```
5969

70+
# Serial interval
6071

61-
### Mask non-direct transmissions
62-
Group transmission assortativity is estimated from direct transmissions only. We can use the `filter_chain` function to mask non-direct transmissions (i.e. where `kappa`>1).
63-
64-
```{r}
65-
out_direct <- filter_chain(
66-
out,
67-
param = "kappa",
68-
thresh = 1,
69-
comparator = "==",
70-
target = "alpha" # keep alpha values where kappa == 1
71-
)
72-
73-
trees <- get_trees(out_direct,
74-
group = linelist$group,
75-
onset = linelist$date_onset)
76-
77-
# keeps only rows where both onset dates are on or before peak
78-
cut_trees <- lapply(trees, function(tree) {
79-
tree |> filter(from_onset <= peak, to_onset <= peak)
80-
})
72+
The serial interval (SI) is the time between the onset of symptoms in an infector-infectee pair. You can compute the SI's empirical cumulative density functionusing with `get_si()` :
8173

82-
```
83-
### Estimate group transmission assortativity
84-
For a given tree, we can estimate the groups' transmission assortativity coefficients with:
8574
```{r}
86-
library(linktree)
87-
# staff-to-patient ratio
88-
ratio <- c("hcw" = 1, "patient" = 3)
89-
90-
max_post <- trees[[which.max(out$post)]]
91-
delta <- linktree::get_delta(from = max_post$from_group, to = max_post$to_group, f = ratio)
92-
plot(delta)
93-
```
94-
95-
To account for uncertainty in who infected whom, we can compute the posterior distribution of assortativity coefficients. This is done by estimating `delta` for each tree in the posterior sample.
96-
97-
```{r}
98-
deltas <- lapply(trees, function(x) {
99-
linktree::get_delta(from = x$from_group, to = x$to_group, f = ratio)
100-
}) |>
101-
bind_rows(.id = "tree")
102-
103-
ggplot(deltas) +
104-
geom_density(aes(x = est, fill = group), bw = 0.03, color = "white") +
105-
scale_x_continuous(limits = c(-1, 1), breaks = seq(-1, 1, 0.2)) +
106-
scale_fill_manual(values = c("hcw" = "orange", "patient" = "purple")) +
75+
trees <- get_trees(out_id, onset = linelist$onset)
76+
si <- get_si(trees, date_suffix = "onset")
77+
78+
si |>
79+
ggplot(aes(x = x, y = mean))+
80+
geom_point() +
81+
geom_ribbon(aes(ymin = lwr, ymax = upr),
82+
fill = "grey",
83+
alpha = 0.5) +
84+
labs(x = "serial interval", y = "Percent")+
10785
theme_bw()
108-
10986
```
11087

111-
Convert `delta` to `gamma` for interpretatibilty:
88+
# Model convergence
11289

113-
```{r}
114-
gammas <- deltas |>
115-
group_by(group) |>
116-
summarise(
117-
mean = mean(est),
118-
q025 = quantile(est, 0.025),
119-
q975 = quantile(est, 0.975)
120-
) |>
121-
mutate(across(.cols = -group, ~ linktree::delta2gamma(.x)))
122-
123-
head(gammas)
124-
```
90+
*outbreaker2* uses Bayesian inference to account for uncertainty in who infected whom by sampling multiple plausible transmission trees from the posterior distribution. Standard MCMC diagnostics evaluate parameter chains (see `?plot.outbreaker_chains` and `?coda::gelman.diag`) rather than transmission events themselves.
12591

126-
Healthcare workers are nearly 5 times more likely to infect other HCWs whereas patients are 2.6 (`1/0.387`) times more likely to infect other HCWs.
92+
[*mixtree*](https://cygei.github.io/mixtree/) provides a statistical framework for comparing collections of transmission trees (*epidemic forests*) to assess whether they originate from the same generative process. To evaluate MCMC convergence, *mixtree* can be used to compare epidemic forests sampled from multiple parallel MCMC chains. If the chains have converged, the resulting epidemic forests should be statistically indistinguishable.
12793

94+
## Run multiple chains of *outbreaker2*
12895

129-
## `mixtree`
130-
131-
A detailed [vignette](https://cygei.github.io/mixtree/articles/mixtree.html) is available, so we do not cover it fully here. Instead, we show how to use *mixtree* to assess the convergence of *outbreaker2* results.
132-
133-
### Run multiple chains of *outbreaker2*
13496
```{r}
135-
library(outbreaker2)
13697
data <- outbreaker_data(dna = fake_outbreak$dna, dates = linelist$onset, ctd = fake_outbreak$ctd, w_dens = fake_outbreak$w)
13798
13899
# Run multiple chains of outbreaker2 in parallel with furrr
@@ -149,18 +110,19 @@ chains <- future_map(1:n_chains, function(i) {
149110
150111
```
151112

113+
## Extract trees
152114

153-
### Extract trees
154115
```{r}
155116
trees <- lapply(chains, function(x) {
156117
get_trees(x)
157118
})
158119
159120
```
160121

161-
We now have 5 collections of transmission trees, one for each chain. We can use *mixtree* to test that the compared epidemic forests stem from the same generative process.
122+
We now have 5 epidemic forests, one for each chain.
162123

163124
#### Chi-squared test
125+
164126
```{r}
165127
library(mixtree)
166128
@@ -170,11 +132,13 @@ do.call(what = mixtree::tree_test, args = c(trees, list(
170132
test_args = list(simulate.p.value = TRUE, B = 999)
171133
)))
172134
```
173-
We used the chi-squared test to test the null hypothesis that the frequency of infector-infectee pairs is the similar between chains. A p-value of 1 indicates similarity between the chains, suggesting convergence. The chi-squared test allows for multiple introductions, unlike PERMANOVA.
174135

175-
#### PERMANOVA
176-
PERMANOVA only accepts one introduction event, so we need to re-run *outbreaker2* allowing for a single introduction event only. We can do this by setting the `find_import` argument to `FALSE`.
177-
136+
We used the chi-squared test to test the null hypothesis that the frequency of infector-infectee pairs is the similar between forests. A p-value of 1 indicates similarity between the forests, suggesting convergence. The chi-squared test allows for multiple introductions, unlike PERMANOVA.
137+
138+
#### PERMANOVA
139+
140+
PERMANOVA only accepts one introduction event, so we need to re-run *outbreaker2* allowing for a single introduction event only. We can do this by setting the `find_import` argument to `FALSE`.
141+
178142
```{r}
179143
set.seed(123)
180144
chains <- future_map(1:n_chains, function(i) {
@@ -201,4 +165,99 @@ do.call(what = mixtree::tree_test, args = c(trees, list(
201165
)))
202166
```
203167

204-
PERMANOVA tests whether the variance in tree topology is similar between chains. A p-value >0.05 suggests that the topologies of the trees do not differ significantly across the 5 chains, suggesting convergence.
168+
PERMANOVA tests whether the variance in tree topology is similar between *forests*. A p-value \> 0.05 suggests that the topologies of the trees do not differ significantly across the 5 *forests*, suggesting convergence.
169+
170+
# Group transmission patterns
171+
172+
Transmission chains can inform on the patterns of transmission between groups. [*linktree*](https://cygei.github.io/linktree/) provides a framework for estimating *group transmission assortativity* which quantifies the extent to which individuals transmit within their own group compared to others.
173+
174+
This requires knowledge of group sizes or their relative proportions. In our example, say the staff-to-patient ratio in the hospital is 1:3. It is also advised to analyse the transmission chains before *saturation* (*i.e.* the epidemic peak).
175+
176+
## Estimate the peak date
177+
178+
We'll estimate the peak date with the *incidence2* package.
179+
180+
```{r}
181+
library(incidence2)
182+
linelist$date_onset <- as.Date(linelist$onset, origin = "2020-01-01")
183+
184+
incid <- incidence2::incidence(linelist,
185+
date_index = "date_onset",
186+
groups = "group")
187+
peak <- estimate_peak(regroup(incid), progress = FALSE)$observed_peak
188+
peak
189+
190+
```
191+
192+
## Mask non-direct transmissions
193+
194+
Group transmission assortativity is estimated from direct transmissions only. We can use the `filter_chain` function to mask non-direct transmissions (*i.e.* where `kappa`\>1).
195+
196+
```{r}
197+
out_direct <- filter_chain(
198+
out,
199+
param = "kappa",
200+
thresh = 1,
201+
comparator = "==",
202+
target = "alpha" # keep alpha values where kappa == 1
203+
)
204+
205+
trees <- get_trees(out_direct,
206+
group = linelist$group,
207+
onset = linelist$date_onset)
208+
209+
# keeps only rows where both onset dates are on or before peak
210+
cut_trees <- lapply(trees, function(tree) {
211+
tree |> filter(from_onset <= peak, to_onset <= peak)
212+
})
213+
214+
```
215+
216+
## Estimate group transmission assortativity
217+
218+
**To run the below you need to install *linktree* from [GitHub](https://github.com/CyGei/linktree/).**
219+
220+
For a given tree, we can estimate the groups' transmission assortativity coefficients with:
221+
222+
```{r, eval = requireNamespace("linktree", quietly = TRUE)}
223+
library(linktree)
224+
# staff-to-patient ratio
225+
ratio <- c("hcw" = 1, "patient" = 3)
226+
227+
max_post <- trees[[which.max(out$post)]]
228+
delta <- linktree::get_delta(from = max_post$from_group, to = max_post$to_group, f = ratio)
229+
plot(delta)
230+
```
231+
232+
To account for uncertainty in who infected whom, we can compute the posterior distribution of assortativity coefficients. This is done by estimating `delta` for each tree in the posterior sample.
233+
234+
```{r, eval = requireNamespace("linktree", quietly = TRUE)}
235+
deltas <- lapply(trees, function(x) {
236+
linktree::get_delta(from = x$from_group, to = x$to_group, f = ratio)
237+
}) |>
238+
bind_rows(.id = "tree")
239+
240+
ggplot(deltas) +
241+
geom_density(aes(x = est, fill = group), bw = 0.03, color = "white") +
242+
scale_x_continuous(limits = c(-1, 1), breaks = seq(-1, 1, 0.2)) +
243+
scale_fill_manual(values = c("hcw" = "orange", "patient" = "purple")) +
244+
theme_classic() +
245+
theme(legend.position = c(0.1, 0.85))
246+
```
247+
248+
Convert `delta` to `gamma` for interpretatibilty:
249+
250+
```{r, eval = requireNamespace("linktree", quietly = TRUE)}
251+
gammas <- deltas |>
252+
group_by(group) |>
253+
summarise(
254+
mean = mean(est),
255+
q025 = quantile(est, 0.025),
256+
q975 = quantile(est, 0.975)
257+
) |>
258+
mutate(across(.cols = -group, ~ linktree::delta2gamma(.x)))
259+
260+
head(gammas)
261+
```
262+
263+
Healthcare workers are nearly 5 times more likely to infect other HCWs whereas patients are 2.6 (`1/0.387`) times more likely to infect other HCWs.

0 commit comments

Comments
 (0)