R可视化 | Li's Bioinfo-Blog

R-可视化-ggplot2绘图基础

ggplot2包一方面可以实现多种形式的数据可视化、比如箱图、柱状图等；另一方面也可以从多个角度进行美化、修饰。对于前者，之前对ggplot2的柱状图、箱图用法进行了详细的学习。关于其它类型的图，例如密度图、折线图、直方图等，可参考他人的总结，例如下面的sthda网站。 ...

R-可视化-ggpubr包快速绘制点图、线图与柱状图

参考教程：http://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/。 ...

R-可视化-拼图patchwork

一、ggplot2组图 0、安装包及示例图 1 2 3 4 5 6 7 8 9 10 # #install.packages("devtools") # devtools::install_github("thomasp85/patchwork") library(ggplot2) library(patchwork) p1 <- ggplot(mtcars) + geom_point(aes(mpg, disp)) p2 <- ggplot(mtcars) + geom_boxplot(aes(gear, disp, group = gear)) p3 <- ggplot(mtcars) + geom_bar(aes(gear)) + facet_wrap(~cyl) p4 <- ggplot(mtcars) + geom_bar(aes(carb)) p5 <- ggplot(mtcars) + geom_violin(aes(cyl, mpg, group = cyl)) 1、简单使用（1）符号连接：+或者| 均表示左右拼接，/表示上下拼接，()可以用于调整优先级（2）函数调用：wrap_plots()，可通过具体参数设置排列方式 ...

R-可视化-韦恩图venn

示例数据 3个基因集列表，为list格式 1 2 3 4 5 genes <- paste0("gene",1:1000) set.seed(20210302) gene_list <- list(A = sample(genes,100), B = sample(genes,200), C = sample(genes,300)) 一、VennDiagram VennDiagram包是绘制韦恩图的一个经典包。 ...

R-可视化-ggplot+ggpubr绘制箱图boxplot

一、ggplot绘制基础箱图 0、示例数据 1 2 3 4 5 6 7 8 9 10 11 12 13 library(ggplot2) library(patchwork) #组别名最好是字符型；如果是数值类型，最好转为因子化 ToothGrowth$dose = factor(ToothGrowth$dose) summary(ToothGrowth) # len supp dose # Min. : 4.20 OJ:30 0.5:20 # 1st Qu.:13.07 VC:30 1 :20 # Median :19.25 2 :20 # Mean :18.81 # 3rd Qu.:25.27 # Max. :33.90 1、基础绘图 1 2 3 4 5 p1 = ggplot(ToothGrowth, aes(x=dose, y=len, fill=dose)) + geom_boxplot() p2 = ggplot(ToothGrowth, aes(x=dose, y=len, fill=supp)) + geom_boxplot() p1 | p2 2、离群点相关 1 2 3 4 5 6 p1 = ggplot(ToothGrowth, aes(x=dose, y=len)) + geom_boxplot(outlier.color = "red", outlier.size = 0.5) p2 = ggplot(ToothGrowth, aes(x=dose, y=len)) + geom_boxplot(outlier.alpha = 0) #透明度为0，相当于不绘制离群点 p1 + p2 ...

R-可视化-ggplot绘制常规柱状图barplot

1、示例数据 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 library(ggplot2) library(patchwork) library(carData) #示例数据 head(Salaries) #教职工资情况 # rank discipline yrs.since.phd yrs.service sex salary # 1 Prof B 19 18 Male 139750 # 2 Prof B 20 16 Male 173200 # 3 AsstProf B 4 3 Male 79750 # 4 Prof B 45 39 Male 115000 # 5 Prof B 40 41 Male 141500 # 6 AssocProf B 6 6 Male 97000 table(Salaries$rank, Salaries$sex) # Female Male # AsstProf 11 56 # AssocProf 10 54 # Prof 18 248 2、基础用法 1 2 3 4 5 6 7 8 9 p1 = ggplot(Salaries, aes(x=rank)) + geom_bar() # 贴近x轴 p2 = ggplot(Salaries, aes(x=rank)) + geom_bar() + scale_y_continuous(expand=c(0,0)) # 映射填充颜色 p3 =ggplot(Salaries, aes(x=rank, fill=rank)) + geom_bar() p1 + p2 + p3 3、position=参数调整分组形式 1 2 3 4 5 6 7 8 # 默认 p1 <- ggplot(Salaries, aes(x=rank, fill=sex)) + geom_bar(position="stack") + labs(title='position="stack"') p2 <- ggplot(Salaries, aes(x=rank, fill=sex)) + geom_bar(position="dodge") + labs(title='position="dodge"') p3 <- ggplot(Salaries, aes(x=rank, fill=sex)) + geom_bar(position="fill") + labs(title='position="fill"') p1 + p2 + p3 + plot_layout(guides = 'collect') 4、stat=参数设置频数统计方式 stat="count"(default) 表示从给定的数据里，统计每个类别出现的次数；此时aes()只需要给定x参数即可； stat="identity"表示直接指定每种类别的频数；此时aes()除了需要给定x参数交代类别，还需要指定y参数表示频数值。 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 library(tidyverse) dat = Salaries %>% group_by(rank) %>% dplyr::summarise(n=n()) %>% as.data.frame() dat # rank n # 1 AsstProf 67 # 2 AssocProf 64 # 3 Prof 266 p1 = ggplot(dat, aes(x=rank, y=n, fill=rank)) + geom_bar(stat = "identity") dat = Salaries %>% group_by(rank,sex) %>% dplyr::summarise(n=n()) %>% as.data.frame() dat # rank sex n # 1 AsstProf Female 11 # 2 AsstProf Male 56 # 3 AssocProf Female 10 # 4 AssocProf Male 54 # 5 Prof Female 18 # 6 Prof Male 248 p2 = ggplot(dat, aes(x=rank, y=n, fill=sex)) + geom_bar(stat = "identity") p1 + p2 5、geom_text()添加频数注释 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 dat = Salaries %>% group_by(rank) %>% dplyr::summarise(n=n()) p1=ggplot(dat, aes(x=rank, y=n)) + geom_bar(stat="identity") + geom_text(aes(label=n), vjust = -0.2) # vjust<0,上移；vjust>0,下移 dat = Salaries %>% group_by(rank,sex) %>% dplyr::summarise(n=n()) p2=ggplot(dat, aes(x=rank, y=n, fill=sex)) + geom_bar(stat="identity", position = "dodge") + geom_text(aes(label=n), vjust = -0.2, position=position_dodge(width=0.9)) p1 + p2 6、调整柱子的顺序如果只有一种分组方式，通过设置类别的因子水平即可。或者使用scale_x_discrete(c(.....))自定义顺序也可以实现很方便的修改 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ggplot(Salaries, aes(x=rank)) + geom_bar() + scale_x_discrete(limits=c("AsstProf", "AssocProf", "Prof")) p1=ggplot(iris, aes(x=Species, y=Sepal.Width)) + geom_boxplot() + ggtitle("default factor levels") p2=ggplot(iris, aes(x=fct_reorder(Species, Sepal.Width), y=Sepal.Width)) + geom_boxplot() + ggtitle("fct_reorder default levels") p3=ggplot(iris, aes(x=fct_reorder(Species, Sepal.Width, .desc=T), y=Sepal.Width)) + geom_boxplot() + ggtitle("fct_reorder descent levels") library(patchwork) p1 | p2 | p3 但如果更复杂的情况–组内排序。例：5个学生的三门课程成绩，按照每门学科分组，将5个学生按照成绩从低到高排序(或者从高到低排序)。 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 grade = data.frame( subject=rep(c("Chineses","Math","English"), each=5), name=rep(c("A","B","C","D","E"),3), score=c(79,65,70,94,82,76,87,80,81,89,88,79,82,95,90)) # 先按学科均分从高到低 # 然后每个学科内,成绩从低到高学生排序 grade$subject=fct_reorder(grade$subject, grade$score, .desc=T) library(tidytext) p1 = ggplot(grade, aes(x=reorder_within(name,score,subject), y=score, fill=name)) + geom_bar(stat = "identity") + scale_x_reordered() + facet_wrap(subject~. ,scales = "free_x") # 先按学科均分从低到高 # 然后每个学科内,成绩从高到低学生排序 grade$subject=fct_reorder(grade$subject, grade$score, .desc=F) library(tidytext) p2 = ggplot(grade, aes(x=reorder_within(name,-score,subject), y=score, fill=name)) + geom_bar(stat = "identity") + scale_x_reordered() + facet_wrap(subject~. ,scales = "free_x") p1 + p2 + plot_layout(guides = 'collect') 注意reorder_within(个体,值,分组)，还需要设置scale_x_reordered() , facet_wrap(variable~. ,scales = "free_x") ...

R-可视化-绘图颜色与画板的选择

1、R中颜色的表示方式 1.1 颜色的名字 R内置了657种颜色的名字可供调用 http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf 1 2 3 4 5 6 7 8 9 10 11 str(colors()) # chr [1:657] "white" "aliceblue" "antiquewhite" ... head(colors()[1:10]) # [1] "white" "aliceblue" "antiquewhite" # [4] "antiquewhite1" "antiquewhite2" "antiquewhite3" set.seed(111) cols = sample(colors(), 5) numbers = 1:5; names(numbers) = cols barplot(numbers, col=cols) 关于透明色，虽然不在这657个颜色当中，可以通过transparent指定。 ...

可视化-R--热图heatmap

简单整理两个绘制热图R包的用法，分别是基础的pheatmap包与复杂的ComplexHeatmap包。 pheatmap 1 2 3 4 5 # install.packages("pheatmap") library(pheatmap) packageVersion("pheatmap") # [1] ‘1.0.12’ 0、示例数据 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 exp = matrix(rnorm(300), nrow = 30, ncol = 10) set.seed(123) exp[1:15, 1:5] = exp[1:15, 1:5] + matrix(rnorm(75,mean = 4), nrow = 15, ncol = 5) set.seed(123) exp[16:30, 6:10] = exp[16:30, 6:10] + matrix(rnorm(75,mean = 3), nrow = 15, ncol = 5) exp = round(exp, 2) colnames(exp) = paste("Sample", 1:10, sep = "") rownames(exp) = paste("Gene", 1:30, sep = "") dim(exp) # [1] 30 10 head(exp) # Sample1 Sample2 Sample3 Sample4 Sample5 Sample6 Sample7 Sample8 Sample9 Sample10 # Gene1 4.47 5.74 5.56 3.18 6.38 0.06 -0.09 0.13 0.70 -1.75 # Gene2 3.49 3.71 2.24 4.23 4.10 -0.70 1.08 0.22 -0.11 0.10 # Gene3 4.34 0.37 5.64 3.05 2.42 -0.72 0.63 1.64 -1.26 -0.57 # Gene4 4.25 4.32 6.79 5.30 2.37 0.88 -0.11 -0.22 1.68 -0.97 # Gene5 3.99 4.45 3.38 4.29 1.74 -1.02 -1.53 0.17 0.91 -0.18 # Gene6 5.72 2.36 5.39 4.04 6.50 1.96 -0.52 1.17 0.24 1.01 ##基础绘图 pheatmap(exp) 1、聚类相关参数下述均是针对行row的操作，改为col即为针对列的操作。 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 ## (1) 不聚类 pheatmap(exp, cluster_row = FALSE) ## (2) 聚类但显示聚类树 pheatmap(exp, treeheight_row = 0) ## (3) 距离计算公式 clustering_distance_rows = "euclidean" # "correlation" ## (4) 聚类间距离计算方法 clustering_method = "average" #"ward.D2", "single", "complete"等 ## (5) 获取聚类后热图的表达矩阵(改变了行列顺序) ph = pheatmap(exp) ph$tree_row$order ph$tree_col$order ph_exp = exp[ph$tree_row$order, ph$tree_col$order] ph_exp[1:4,1:4] # Sample3 Sample5 Sample2 Sample1 # Gene26 1.05 -0.57 1.44 -0.71 # Gene19 -1.01 -0.52 -0.26 -0.63 # Gene24 0.98 -1.24 -0.96 -0.24 # Gene18 0.33 1.23 -0.49 0.24 2、颜色相关 1 2 3 4 5 6 7 8 9 10 11 #Default colours = colorRampPalette(rev(RColorBrewer::brewer.pal(n = 7, name = "RdYlBu")))(100) str(colours) # chr [1:100] "#4575B4" "#4979B6" "#4E7DB8" "#5282BB" "#5786BD" "#5C8BBF" "#608FC2" ... # 个性化修改 colours = colorRampPalette(c("navy", "white", "firebrick3"))(10) #colours = colorRampPalette(c("#3288bd", "white", "#d53e4f"))(10) str(colours) # chr [1:10] "#3288BD" "#5FA2CB" "#8DBCDA" "#BAD7E9" "#E8F1F7" "#FAE9EB" "#F1BEC4" ... pheatmap(exp, color = colours) ...

R-可视化--弦图chordDiagram

弦图常用于表示两组或多组成员之间的连接关系。如下简单学习circlize包ChordDiagram()绘制弦图的基础用法参考教程：https://jokergoo.github.io/circlize_book/book/the-chorddiagram-function.html 0、数据输入对于matrix： ...

R-可视化--桑基(sankey)与冲击(alluvial)图ggsankey

桑基图(sankey plot)是一种特定类型的流图，用于描述一组值到另一组值的流向；理论上来表示前一组值与后一组值存在一定的逻辑关系。从泛化角度来看，也可用于呈现多组间的构成比例关系。从这个角度来看，冲积图(alluvial plot)就是一种特殊的桑基图。 ...