桑基图(sankey plot)是一种特定类型的流图,用于描述一组值到另一组值的流向;理论上来表示前一组值与后一组值存在一定的逻辑关系。从泛化角度来看,也可用于呈现多组间的构成比例关系。从这个角度来看,冲积图(alluvial plot)就是一种特殊的桑基图。

  • ggsankey包的官方教程:https://github.com/davidsjoberg/ggsankey
1
2
# install.packages("devtools")
devtools::install_github("davidsjoberg/")

1、构图要素

(1)图形组成

  • 一整列对应的每一组值表示一个stage,每个stage由若干个node组成;
  • 相邻两个stage的node之间可存在流向关系,称为flow。
image-20230116192850299

(2)图形参数

  • fill设置填充颜色,可进一步分为node.fillflow.fill;
  • color设置边框颜色,可进一步分为node.colorflow.color;
  • width设置node的宽度,flow.alpha设置flow的不透明度,space设置同组内node的间距。
image-20230116193300534

2、准备数据

1
2
3
library(ggsankey)
library(dplyr)
library(ggplot2)
  • 汽车型号数据mtcars
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
df <- mtcars %>%
  dplyr::select(cyl, vs, am, gear) %>% 
  dplyr::mutate(cyl=paste0("C",cyl),
                vs=paste0("V",vs),
                am=paste0("A",am),
                gear=paste0("G",gear))
head(df)
#                   cyl vs am gear
# Mazda RX4          C6 V0 A1   G4
# Mazda RX4 Wag      C6 V0 A1   G4
# Datsun 710         C4 V1 A1   G4
# Hornet 4 Drive     C6 V1 A0   G3
# Hornet Sportabout  C8 V0 A0   G3
# Valiant            C6 V1 A0   G3

## 格式转换:宽变长
df = make_long(df, cyl, vs, am, gear)
head(df)
# # A tibble: 6 × 4
#   x     node  next_x next_node
#   <fct> <chr> <fct>  <chr>    
# 1 cyl   C6    vs     V0       
# 2 vs    V0    am     A1       
# 3 am    A1    gear   G4       
# 4 gear  G4    NA     NA       
# 5 cyl   C6    vs     V0       
# 6 vs    V0    am     A1 

## x列:当前node所处的stage
## node列:当前node
## next_x列:当前node流向的下一stage
## next_node:当前node流向的下一个node

3、绘制桑基图

主要函数

  • geom_sankey()绘制桑基图,主要参数见1.2
  • geom_sankey_text()/geom_sankey_label()添加node标签,相关参数同geom_text/label
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
ggplot(df, aes(x = x, next_x = next_x, 
               node = node, next_node = next_node, 
               fill = factor(x),
               label = node,)) +
  geom_sankey(color = "grey", # node.color=, flow.color=
              #fill = "brown", # node.fill=, flow.fill=
              flow.alpha = 0.6,
              # space = 1,
              width = 0.1) +
  geom_sankey_text(size = 5, vjust = 0.5, hjust = -1,
                   color = "black", fontface = "bold") +
  ggsci::scale_fill_aaas() +
  theme_sankey(base_size = 18) +
  labs(x = NULL) +
  theme(legend.position = "none",
        plot.title = element_text(hjust = .5)) +
  ggtitle("Car features")

image-20230116200644763

4、绘制冲击图

Alluvial plots are very similiar to sankey plots but have no spaces between nodes and start at y = 0 instead being centered around the x-axis.

主要函数:

  • geom_alluvial()绘制冲积图,主要参数同geom_sankey()
  • geom_alluvial_text()/geom_alluvial_label()添加node标签,相关参数同geom_text/label
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
ggplot(df, aes(x = x, next_x = next_x, 
               node = node, next_node = next_node, 
               fill = factor(node), 
               label = node)) +
  geom_alluvial(flow.alpha = 0.6,
                color="grey") +
  geom_alluvial_label(size = 3, color = "white",
                      fill = "grey") +
  ggsci::scale_fill_d3() +
  theme_alluvial(base_size = 18) +
  theme(legend.position = "none",
        axis.ticks.y =  element_blank(), 
        axis.text.y = element_blank(),
        axis.title = element_blank(),
        plot.title = element_text(hjust = .5)) +
  ggtitle("Car features")

image-20230116200523526