📖 生信数据分析--分析流程，工具包等

使用IsoformSwitchAnalyzeR包进行可变剪切分析

https://bioconductor.org/packages/release/bioc/html/IsoformSwitchAnalyzeR.html 1、背景知识 The usage of Alternative Transcription Start sites (aTSS可选择转录起始位点), Alternative Splicing (AS可选择剪切位点) and alternative Transcription Termination Sites (aTTS可选择终止位点) are collectively collectively results in the production of different isoforms. Alternative isoforms are widely used as recently demonstrated by The ENCODE Consortium, which found that on average, 6.3 different transcripts are generated per gene; a number which may vary considerably per gene. ...

诺模图或列线图绘制分析

肿瘤预后类文章的常规步骤之一是绘制诺模图，并进行相关分析。以下总结了相关基础绘制工具。示例数据集 1 2 3 4 5 6 library(survival) head(lung) # inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss # 1 3 306 2 74 1 1 90 100 1175 NA # 2 3 455 2 68 1 0 90 90 1225 15 # 3 3 1010 1 56 1 0 90 90 NA 15 1、rms包参考用法：https://atm.amegroups.com/article/view/14736/15089 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 library(rms) # var.labels = c(age="Age in Years", # lac="lactate", # sex="Sex of the participant", # shock="shock", # y="outcome", # Y="ordinal") # label(data) = lapply(names(var.labels), # function(x) label(data[,x]) = var.labels[x]) mod.cox <- cph(Surv(time,status) ~ ph.ecog+sex+age,lung, surv=TRUE) ddist <- datadist(lung) options(datadist='ddist') surv.cox <- Survival(mod.cox) # 定义函数 med <- Quantile(mod.cox) surv <- Survival(mod.cox) nom.cox <- nomogram(mod.cox, # 根据total point进行特定函数计算 fun=list(function(x) surv.cox(365, x), # 一年生存率 function(x) med(lp=x, q=0.5)), # 中位生存时间 funlabel=c("200-Day Sur. Prob.", "Median Survival Time"), lp=F, # 不展示Linear Predictor conf.int=c(0.1,0.5) # 两个置信区间 ) plot(nom.cox, col.conf=c('red','green'), # 置信区间的颜色 col.grid = c("grey30","grey") # 网格的颜色 ) # f = cph(Surv(time, status) ~ age + sex + ph.karno, data = lung, # x = T, y = T, sur = T) # pred_score = apply(lung, 1, function(x){ # pred=Predict(f, age=x["age"], sex=x["sex"], ph.karno=x["ph.karno"]) # return(pred$yhat) # }) %>% unlist() # summary(pred_score) 校准曲线 ...

UCSCXenaShiny包肿瘤数据分析可视化

UCSCXenaShiny是基于集成了多种肿瘤数据库的UCSCXena平台，进行数据下载、分析、可视化的Shiny工具（以及同名R包），由上海科技大学王诗翔博士等共同开发；于2021年6月发表于Bioinformatics。下面主要学习其R包的相关函数，了解其核心功能。 ...

EnhancedVolcano包绘制火山图

EnhancedVolcano包可根据差异分析结果，基于ggplot2绘图结构，方便地绘制美观的火山图，下面根据自己的理解小结下基本用法。官方全面的教程：https://github.com/kevinblighe/EnhancedVolcano 示例差异基因数据 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 library(airway) library(magrittr) data('airway') airway$dex %<>% relevel('untrt') ens <- rownames(airway) library(org.Hs.eg.db) symbols <- mapIds(org.Hs.eg.db, keys = ens, column = c('SYMBOL'), keytype = 'ENSEMBL') symbols <- symbols[!is.na(symbols)] symbols <- symbols[match(rownames(airway), names(symbols))] rownames(airway) <- symbols keep <- !is.na(rownames(airway)) airway <- airway[keep,] library('DESeq2') dds <- DESeqDataSet(airway, design = ~ cell + dex) dds <- DESeq(dds, betaPrior=FALSE) res <- results(dds, contrast = c('dex','trt','untrt')) res <- lfcShrink(dds, contrast = c('dex','trt','untrt'), res=res, type = 'normal') res <- as.data.frame(res) head(res) # baseMean log2FoldChange lfcSE stat pvalue padj # TSPAN6 710.0931707 -0.37807189 0.09851236 -3.8404448 0.0001228116 0.0009522932 # TNMD 0.0000000 NA NA NA NA NA # DPM1 521.2572396 0.19826365 0.10931684 1.8155169 0.0694445184 0.1910397405 # SCYL3 237.6068046 0.03234467 0.13821470 0.2371917 0.8125081096 0.9118161375 # C1orf112 58.0358739 -0.08835419 0.25056704 -0.3194810 0.7493618190 0.8773885438 # FGR 0.3194343 -0.08459224 0.15186225 -0.3948862 0.6929268648 NA 如上，只要包含包含基因名、差异倍数、P值三部分信息的差异结果就可以用于绘制火山图。 ...

IOBR包肿瘤signature打分与免疫浸润分析

IOBR包集signature打分与免疫浸润分析为一体的肿瘤数据分析工具，由南方医科大学南方医院廖旺军教授，曾东强博士等人于2021年7月发表于Frontiers in Immunology，引用数以超过100余次。现根据其github教程学习其部分功能用法。 ...

BayesPrism包根据scRNAseq预测Bulk细胞组成

BayesPrism是由美国康奈尔大学Tin Yi Chu等人开发的R包工具，于2022年4月发表在Nature Cancer。简单来说，该方法使用单细胞RNA-seq作为先验信息，通过估计批量样本中细胞类型比例和细胞类型特异性基因表达的联合后验分布。P(θ,Z|X,ϕ)，参数概念如下 ...

R包NMF非负矩阵分解

（1）NMF是非负矩阵分解（Non-negative Matrix Factorization）的缩写。它是将一个非负数据矩阵分解为两个非负矩阵的乘积，其中一个矩阵表示特征的基矩阵，另一个矩阵表示每个样本在这些特征上的系数矩阵。这样的分解可以将原始数据表示为一组非负基向量的加权组合，从而实现数据的降维和特征提取。 ...

UCSCXenaShiny V2简要教程

Github仓库：https://github.com/openbiox/UCSCXenaShiny Online App：https://shiny.zhoulab.ac.cn/UCSCXenaShiny/ ...

机器学习基于R(0)--mlr3基本流程 V2

https://mlr3book.mlr-org.com/ 1 2 3 4 5 6 7 8 9 10 library(mlr3verse) library(tidyverse) tsks() #预置数据任务 lrns() #机器学习算法 msrs() #性能评价指标 as.data.table() 1. Task 任务 https://mlr3book.mlr-org.com/chapters/chapter2/data_and_basic_modeling.html 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 tsk() #预置数据任务 as.data.table(tsk()) tsk("mtcars") #自定义任务 tsk_mtcars = as_task_regr(mtcars, target = "mpg", id = "cars") #target参数指定标签列，id参数（可选）设置任务名 as_task_classif() #支持对任务对象进行数据查看、修改等操作，不一一列举，详见上述链接 #有两点需要重点说明 tsk_mtcars$row_ids #不等于一般的行序号。一旦定义任务，row_ids就确定不变了，可以理解为row name。方便后续数据分割。 tsk_mtcars_another = tsk_mtcars$clone() #想要独立的复制任务时，需要使用clone() 对于分类任务基本类似。值得注意的是在二分类问题时，需要进一步指定阳性标签 ...

机器学习基于R(0)--mlr3基本流程

1 2 library(mlr3verse) library(tidyverse) 1、Task训练数据与目的 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ## 分类任务 task_classif = as_task_classif(data, target = "col_target") #根据预测结果又可分为：twoclass二分类, multiclass多分类 ## 回归任务 task_regr = as_task_regr(data, target = "col_target") task$ncol task$nrow task$feature_names task$feature_types task$target_names task$task_type task$data() task$col_roles 2、Learner 机器学习算法 mlr3learners包提供了基本的机器学习算法（如下图） https://github.com/mlr-org/mlr3learners ...