ggplot2-exts.github.io/ggraph.Rmd at master · ggplot2-exts/ggplot2-exts.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
---
title: "ggplot2 extensions: ggraph"
---

### ggraph
<https://github.com/thomasp85/ggraph>

ggraph extends the grammar of graphics provided by ggplot2 to cover graph and
network data. This type of data consists of nodes and edges and are not
optimally stored in a single data.frame, as expected by ggplot2. Furthermore the
spatial position of nodes (end thereby edges) are more often defined by the
graph structure through a layout function, rather than mapped to specific
parameters. ggraph handles all of these issues in an extensible way that lets
the user gradually build up their graph visualization with different layers as
expected from ggplot2, without needing to worry too much about how positions and
other metrics are derived.

While node-edge diagrams are the visualization people often relate to graph
visualizations it is by far the only one, and ggraph have, or will have support
for all types of common and uncommon visualization types.

#### A simple example

```{r, eval=TRUE, echo=TRUE, warning=FALSE, message=FALSE}
library(ggraph)
library(igraph)

# To show a simple example of the API we consider a hierarchy
## Create a graph of the flare class hierarchy
flareGraph <- graph_from_data_frame(flare$edges, vertices = flare$vertices)
ggraph(flareGraph, 'igraph', algorithm = 'tree') +
  geom_edge_link() +
  ggforce::theme_no_axes()
```

Here we use the reingold-tilford algorithm to position the nodes in a hierarchy
and draw straight edges between them using the `geom_edge_link` function. There
are ample opportunity for modifications though:

```{r, eval=TRUE, echo=TRUE}
ggraph(flareGraph, 'igraph', algorithm = 'tree', circular = TRUE) +
  geom_edge_diagonal(aes(alpha = ..index..)) +
  coord_fixed() +
  scale_edge_alpha('Direction', guide = 'edge_direction') +
  geom_node_point(aes(filter = degree(flareGraph, mode = 'out') == 0),
                 color = 'steelblue', size = 1) +
  ggforce::theme_no_axes()
```

Here we transform the layout into a circular representation and choose to draw
the edges as diagonals instead of straight lines using `geom_edge_diagonal`. We
indicate the direction of the edge by mapping the alpha value to it, and shows
this using the edge_direction guide. Lastly we plot the leaf nodes using a
filter function.

#### Other layouts
But what if my data is not hierarchical, you ask. Well ggraph have access to all
the layouts implemented in igraph, and then some (more on that later). To show
this we take a look at some colonial Americans:

```{r, echo=TRUE, eval=TRUE}
whigsGraph <- graph_from_adjacency_matrix(whigs %*% t(whigs), mode = 'upper',
                                          weighted = TRUE, diag = FALSE)
V(whigsGraph)$degree <- degree(whigsGraph)
ggraph(whigsGraph, 'igraph', algorithm = 'kk') +
  geom_edge_link0(aes(width = weight), edge_alpha = 0.1) +
  geom_node_point(aes(size = degree), colour = 'forestgreen') +
  geom_node_text(aes(label = name, filter = degree > 150), color = 'white',
                 size = 3) +
  ggforce::theme_no_axes()
```

So it seems Paul Reveres was really in the center of it all (more on this [here](http://kieranhealy.org/blog/archives/2013/06/09/using-metadata-to-find-paul-revere/)).

Here we use the kamada kawai layout to lay out the nodes and draw the edges
using the `geom_edge_link0` (notice the trailing 0 - this is a faster version
than the standard but it doesn't support gradients along the edge). We draw
the nodes scaling the size to the degree and lastly we add node labels to the
nodes with a degree larger than 150.

*What if I don't like nodes and edges?*

Well you could try to make a treemap. Let's turn to our flare hierarchy again:

```{r, eval=TRUE, echo=TRUE}
# Well add some additional information using the treeApply helpers
flareGraph <- treeApply(flareGraph, function(node, parent, depth, tree) {
  tree <- set_vertex_attr(tree, 'depth', node, depth)
  if (depth == 1) {
    tree <- set_vertex_attr(tree, 'class', node, V(tree)$shortName[node])
  } else if (depth > 1) {
    tree <- set_vertex_attr(tree, 'class', node, V(tree)$class[parent])
  }
  tree
})
V(flareGraph)$leaf <- degree(flareGraph, mode = 'out') == 0

ggraph(flareGraph, 'treemap', weight = 'size') +
  geom_treemap(aes(fill = class, filter = leaf, alpha = depth), colour = NA) +
  geom_treemap(aes(size = depth), fill = NA, colour = 'white') +
  geom_node_text(aes(filter = depth == 1, label = shortName), size = 3) +
  scale_fill_brewer(type = 'qual', palette = 3, guide = 'none') +
  scale_size(range = c(3, 0.2), guide = 'none') +
  scale_alpha(range = c(1, 0.6), guide = 'none') +
  ggforce::theme_no_axes()
```

*What if I don't use igraph?*

There is also support for dendrogram objects and almost anything else can be
converted to either igraph or dendrogram. Let's look at a dendrogram object:

```{r, echo=TRUE, eval=TRUE}
irisCluster <- hclust(dist(iris[, -5]), method = 'average')
irisCluster <- as.dendrogram(irisCluster)

# treeApply also works for dendrogram
irisCluster <- treeApply(irisCluster, function(node, children, ...) {
  if (is.leaf(node)) {
    index <- as.integer(attr(node, 'label'))
    attr(node, 'nodePar') <- list(species = as.character(iris$Species[index]))
  } else {
    childSpecies <- sapply(children, function(child) {
      attr(child, 'nodePar')$species
    })
    if (length(unique(childSpecies)) == 1 && !anyNA(childSpecies)) {
      attr(node, 'nodePar') <- list(species = childSpecies[1])
    } else {
      attr(node, 'nodePar') <- list(species = NA)
    }
  }
  node
}, direction = 'up')

ggraph(irisCluster, 'dendrogram', repel = TRUE, ratio = 10) +
  geom_edge_elbow2(aes(colour = node.species),
                   data = gEdges('long', nodePar = 'species')) +
  scale_edge_colour_brewer('Species', type = 'qual', na.value = 'black') +
  ggforce::theme_no_axes()
```

Wait, what is that data argument? Most edge geoms comes in three flavors, a
standard, a fast (the 0 one) and a slow (the 2 one). So why use the slow? It
makes it possible to set interpolate edge aesthetics between the start and end
point. Here we tell that colour should be mapped to the species of the nodes,
and we tell that the edge data should contain the species parameter from the
start and end nodes within the `gEdges()` call. Now we get a nice gradient from
clusters with a common species towards clusters with mixed species.

These are just a couple of examples of what is possible. Due to the layer
approach to plotting edges and nodes you have full control over the look of your
plot in a very expressive way. As it is a direct extension of ggplot2, it also
comes with all the niceties from there, e.g. facetting:

```{r, echo=TRUE, eval=TRUE}
highschoolGraph <- graph_from_data_frame(highschool)

ggraph(highschoolGraph, 'igraph', algorithm = 'kk') +
  geom_edge_fan(aes(alpha = ..index..)) +
  facet_wrap(~year) +
  ggforce::theme_no_axes()
```