转录组 | Li's Bioinfo-Blog

xCell与CIBERSORT等免疫浸润分析

xCell xCell包 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 devtools::install_github('dviraran/xCell') library(xCell) data("xCell.data") ##查看支持的64种细胞类型，同下图 colnames(xCell.data$spill$K) ##预测函数的关键参数解释 ?xCellAnalysis() # expr = 交代表达矩阵； ##如果是array，不需要额外标准化；如果是RNAseq，需要TPM/FPKM/TPM。 ##对于基因ID格式需要是symbol格式。 # rnaseq = TRUE 数据是否为RNAseq数据，如果是芯片数据设置为FALSE # cell.types.use = NULL 提供一个字符串，说明想要预测64种细胞中的哪些细胞类型 # parallel.sz = 4 调用的线程数，默认为4 NOTE： ...

ClusterGVis包绘制基因表达矩阵热图

ClusterGVis包是中国药科大学Jun Zhang博士开发的系列可视化工具包之一，可以基因表达矩阵进行高级的热图可视化分析。如下根据其github以及微信教程简单整理一下自己感兴趣的用法。 ...

MuSiC包根据scRNAseq预测Bulk细胞组成

MuSiC(MUlti-Subject SIngle Cell deconvolution)是来自宾夕法尼亚大学Biostatistics, Epidemiology and Informatics系的Mingyao Li课题组于2019年发表于Nature Communication的一个工具R包，可根据单细胞转录组信息推测Bulk RNA-seq细胞组成。而后，该团队又于2022年在Briefing in bioinformatics发表了扩展版本MuSiC2，可以考虑更复杂的场景。 ...

从RNAseq的fastq.gz提取表达矩阵

RNA-seq数据比对流程主要分为三步（1）整理数据；（2）质控；（3）比对。其中每一步都涉及到若干软件的用法，如下简单整理基本的分析流程。示例数据：GSE158623中6个样本的RNA-seq测序结果(human)，对应SRR12720999~SRR12721004 ...

肿瘤亚型分群工具ConsensusClusterPlus

ConsensusClusterPlus包是肿瘤分型研究的常用工具，其于2010年发表于Bioinformatics。 Paper：https://academic.oup.com/bioinformatics/article/26/12/1572/281699 Tutorial：https://bioconductor.org/packages/release/bioc/vignettes/ConsensusClusterPlus/inst/doc/ConsensusClusterPlus.pdf 1 2 # BiocManager::install("ConsensusClusterPlus") library(ConsensusClusterPlus) 1、示例表达矩阵行名为基因，列名为样本的基因表达矩阵；根据需要，进行标准化处理基因集的选择是关键的一步，可根据统计学（高变基因）或者生物学（特定功能相关基因）进行选择如下是参考教程的数据 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 library(ALL) data(ALL) d=exprs(ALL) dim(d) # [1] 12625 128 mads=apply(d,1,mad) d=d[rev(order(mads))[1:5000],] # 筛选高方差基因 dim(d) # [1] 5000 128 ## 标准化处理：每行（基因）减去每行的中位数 d = sweep(d,1, apply(d,1,median,na.rm=T)) d[1:4,1:4] # 01005 01010 03002 04006 # 36638_at 1.556121 0.9521271 -0.05018082 4.780378 # 39318_at 1.191353 2.5013225 -2.38793537 -1.199521 # 38514_at 1.020716 3.2785671 1.55949145 -3.345919 # 266_s_at 1.829260 0.3624327 1.54913247 -1.286294 2、样本亚型鉴定实际就是一个函数即可：ConsensusClusterPlus() 其中涉及到较多参数的选择，具体如下 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 ## 默认参数 ConsensusClusterPlus(d, maxK=3, reps=10, pItem=0.8, pFeature=1, title="untitled_consensus_cluster", clusterAlg="hc", distance="pearson", plot=NULL, writeTable=FALSE, seed=42) ## 参数含义 # maxK：考虑的最大聚类数，建议取10~20 # reps：抽样次数，建议1000 # pItem与pFeature：分别表示对样本与基因的抽样比例 # title：图片或文件的保存路径 # clusterAlg：聚类方式 c("hc","pam","km") # distance: 距离计算方式 c("pearson","spearman","euclidean","binay","maximum","canberra","minkowski") # plot : 是否绘图，以及图形类型 c(NULL, "pdf", "png", "pngBMP") # writeTable: 是否保存文件 # seed：随机种子 results = ConsensusClusterPlus(d, maxK=10, reps=1000, pItem=0.8, pFeature=1, title="./tmp/", clusterAlg="hc", distance="pearson", plot="pdf", writeTable=FALSE, seed=42) R对象结果：list格式 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## 查看聚类数为2的结果信息 names(results[[3]]) # [1] "consensusMatrix" "consensusTree" "consensusClass" "ml" "clrs" table(results[[3]]$consensusClass) # 1 2 3 # 69 28 31 dim(results[[3]]$consensusMatrix) # [1] 128 128 results[[3]]$consensusMatrix[1:4,1:4] # [,1] [,2] [,3] [,4] # [1,] 1.0000000 0.3408360 0.7996870 0.3515249 # [2,] 0.3408360 1.0000000 0.1224806 1.0000000 # [3,] 0.7996870 0.1224806 1.0000000 0.1190108 # [4,] 0.3515249 1.0000000 0.1190108 1.0000000 ## consensus values 0 (never clustered together) ## consensus values 1 (always clustered together) 图形输出结果，主要两类图（1）特定聚类结果的热图 ...

使用IsoformSwitchAnalyzeR包进行可变剪切分析

https://bioconductor.org/packages/release/bioc/html/IsoformSwitchAnalyzeR.html 1、背景知识 The usage of Alternative Transcription Start sites (aTSS可选择转录起始位点), Alternative Splicing (AS可选择剪切位点) and alternative Transcription Termination Sites (aTTS可选择终止位点) are collectively collectively results in the production of different isoforms. Alternative isoforms are widely used as recently demonstrated by The ENCODE Consortium, which found that on average, 6.3 different transcripts are generated per gene; a number which may vary considerably per gene. ...

诺模图或列线图绘制分析

肿瘤预后类文章的常规步骤之一是绘制诺模图，并进行相关分析。以下总结了相关基础绘制工具。示例数据集 1 2 3 4 5 6 library(survival) head(lung) # inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss # 1 3 306 2 74 1 1 90 100 1175 NA # 2 3 455 2 68 1 0 90 90 1225 15 # 3 3 1010 1 56 1 0 90 90 NA 15 1、rms包参考用法：https://atm.amegroups.com/article/view/14736/15089 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 library(rms) # var.labels = c(age="Age in Years", # lac="lactate", # sex="Sex of the participant", # shock="shock", # y="outcome", # Y="ordinal") # label(data) = lapply(names(var.labels), # function(x) label(data[,x]) = var.labels[x]) mod.cox <- cph(Surv(time,status) ~ ph.ecog+sex+age,lung, surv=TRUE) ddist <- datadist(lung) options(datadist='ddist') surv.cox <- Survival(mod.cox) # 定义函数 med <- Quantile(mod.cox) surv <- Survival(mod.cox) nom.cox <- nomogram(mod.cox, # 根据total point进行特定函数计算 fun=list(function(x) surv.cox(365, x), # 一年生存率 function(x) med(lp=x, q=0.5)), # 中位生存时间 funlabel=c("200-Day Sur. Prob.", "Median Survival Time"), lp=F, # 不展示Linear Predictor conf.int=c(0.1,0.5) # 两个置信区间 ) plot(nom.cox, col.conf=c('red','green'), # 置信区间的颜色 col.grid = c("grey30","grey") # 网格的颜色 ) # f = cph(Surv(time, status) ~ age + sex + ph.karno, data = lung, # x = T, y = T, sur = T) # pred_score = apply(lung, 1, function(x){ # pred=Predict(f, age=x["age"], sex=x["sex"], ph.karno=x["ph.karno"]) # return(pred$yhat) # }) %>% unlist() # summary(pred_score) 校准曲线 ...

EnhancedVolcano包绘制火山图

EnhancedVolcano包可根据差异分析结果，基于ggplot2绘图结构，方便地绘制美观的火山图，下面根据自己的理解小结下基本用法。官方全面的教程：https://github.com/kevinblighe/EnhancedVolcano 示例差异基因数据 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 library(airway) library(magrittr) data('airway') airway$dex %<>% relevel('untrt') ens <- rownames(airway) library(org.Hs.eg.db) symbols <- mapIds(org.Hs.eg.db, keys = ens, column = c('SYMBOL'), keytype = 'ENSEMBL') symbols <- symbols[!is.na(symbols)] symbols <- symbols[match(rownames(airway), names(symbols))] rownames(airway) <- symbols keep <- !is.na(rownames(airway)) airway <- airway[keep,] library('DESeq2') dds <- DESeqDataSet(airway, design = ~ cell + dex) dds <- DESeq(dds, betaPrior=FALSE) res <- results(dds, contrast = c('dex','trt','untrt')) res <- lfcShrink(dds, contrast = c('dex','trt','untrt'), res=res, type = 'normal') res <- as.data.frame(res) head(res) # baseMean log2FoldChange lfcSE stat pvalue padj # TSPAN6 710.0931707 -0.37807189 0.09851236 -3.8404448 0.0001228116 0.0009522932 # TNMD 0.0000000 NA NA NA NA NA # DPM1 521.2572396 0.19826365 0.10931684 1.8155169 0.0694445184 0.1910397405 # SCYL3 237.6068046 0.03234467 0.13821470 0.2371917 0.8125081096 0.9118161375 # C1orf112 58.0358739 -0.08835419 0.25056704 -0.3194810 0.7493618190 0.8773885438 # FGR 0.3194343 -0.08459224 0.15186225 -0.3948862 0.6929268648 NA 如上，只要包含包含基因名、差异倍数、P值三部分信息的差异结果就可以用于绘制火山图。 ...

IOBR包肿瘤signature打分与免疫浸润分析

IOBR包集signature打分与免疫浸润分析为一体的肿瘤数据分析工具，由南方医科大学南方医院廖旺军教授，曾东强博士等人于2021年7月发表于Frontiers in Immunology，引用数以超过100余次。现根据其github教程学习其部分功能用法。 ...

BayesPrism包根据scRNAseq预测Bulk细胞组成

BayesPrism是由美国康奈尔大学Tin Yi Chu等人开发的R包工具，于2022年4月发表在Nature Cancer。简单来说，该方法使用单细胞RNA-seq作为先验信息，通过估计批量样本中细胞类型比例和细胞类型特异性基因表达的联合后验分布。P(θ,Z|X,ϕ)，参数概念如下 ...