ConsensusClusterPlus包是肿瘤分型研究的常用工具,其于2010年发表于Bioinformatics。
Paper:https://academic.oup.com/bioinformatics/article/26/12/1572/281699 Tutorial:https://bioconductor.org/packages/release/bioc/vignettes/ConsensusClusterPlus/inst/doc/ConsensusClusterPlus.pdf 1 2 # BiocManager::install("ConsensusClusterPlus") library(ConsensusClusterPlus) 1、示例表达矩阵 行名为基因,列名为样本的基因表达矩阵;根据需要,进行标准化处理 基因集的选择是关键的一步,可根据统计学(高变基因)或者生物学(特定功能相关基因)进行选择 如下是参考教程的数据 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 library(ALL) data(ALL) d=exprs(ALL) dim(d) # [1] 12625 128 mads=apply(d,1,mad) d=d[rev(order(mads))[1:5000],] # 筛选高方差基因 dim(d) # [1] 5000 128 ## 标准化处理:每行(基因)减去每行的中位数 d = sweep(d,1, apply(d,1,median,na.rm=T)) d[1:4,1:4] # 01005 01010 03002 04006 # 36638_at 1.556121 0.9521271 -0.05018082 4.780378 # 39318_at 1.191353 2.5013225 -2.38793537 -1.199521 # 38514_at 1.020716 3.2785671 1.55949145 -3.345919 # 266_s_at 1.829260 0.3624327 1.54913247 -1.286294 2、样本亚型鉴定 实际就是一个函数即可:ConsensusClusterPlus() 其中涉及到较多参数的选择,具体如下 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 ## 默认参数 ConsensusClusterPlus(d, maxK=3, reps=10, pItem=0.8, pFeature=1, title="untitled_consensus_cluster", clusterAlg="hc", distance="pearson", plot=NULL, writeTable=FALSE, seed=42) ## 参数含义 # maxK:考虑的最大聚类数,建议取10~20 # reps:抽样次数,建议1000 # pItem与pFeature:分别表示对样本与基因的抽样比例 # title:图片或文件的保存路径 # clusterAlg:聚类方式 c("hc","pam","km") # distance: 距离计算方式 c("pearson","spearman","euclidean","binay","maximum","canberra","minkowski") # plot : 是否绘图,以及图形类型 c(NULL, "pdf", "png", "pngBMP") # writeTable: 是否保存文件 # seed:随机种子 results = ConsensusClusterPlus(d, maxK=10, reps=1000, pItem=0.8, pFeature=1, title="./tmp/", clusterAlg="hc", distance="pearson", plot="pdf", writeTable=FALSE, seed=42) R对象结果:list格式 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## 查看聚类数为2的结果信息 names(results[[3]]) # [1] "consensusMatrix" "consensusTree" "consensusClass" "ml" "clrs" table(results[[3]]$consensusClass) # 1 2 3 # 69 28 31 dim(results[[3]]$consensusMatrix) # [1] 128 128 results[[3]]$consensusMatrix[1:4,1:4] # [,1] [,2] [,3] [,4] # [1,] 1.0000000 0.3408360 0.7996870 0.3515249 # [2,] 0.3408360 1.0000000 0.1224806 1.0000000 # [3,] 0.7996870 0.1224806 1.0000000 0.1190108 # [4,] 0.3515249 1.0000000 0.1190108 1.0000000 ## consensus values 0 (never clustered together) ## consensus values 1 (always clustered together) 图形输出结果,主要两类图 (1)特定聚类结果的热图
...