Li's Bioinfo-Blog
  • |
  • 主页
  • 分类
  • 标签
  • 归档
  • 关于
  • 搜索
Home » 分类

📖 生信数据分析--分析流程,工具包等

数据库--药物与药物靶点TTD

1、TTD数据库简介 首先关于靶点的生物学定义是: 生物学靶点(英語:Biological target)是指位于生物体内,能够被其他物质(配体、药物等)识别或结合的结构。常见的药物靶点包括蛋白质、核酸和离子通道等。—维基百科 ...

Create:&nbsp;<span title='2022-05-03 00:00:00 +0000 UTC'>2022-05-03</span>&nbsp;|&nbsp;Update:&nbsp;2022-05-03&nbsp;|&nbsp;Words:&nbsp;1517&nbsp;|&nbsp;4 min&nbsp;|&nbsp;Lishensuo

CMap数据库整理与使用方法

Cmap LINCS计划采用L1000技术进行大规模的细胞系干扰实验测序,得到差异基因。具体可分为Phase-1,Phase-2两个阶段。数据已整理、上传至阿里云盘。本片笔记整理下数据的操作、使用方法。 ...

Create:&nbsp;<span title='2022-04-21 00:00:00 +0000 UTC'>2022-04-21</span>&nbsp;|&nbsp;Update:&nbsp;2022-04-21&nbsp;|&nbsp;Words:&nbsp;2399&nbsp;|&nbsp;5 min&nbsp;|&nbsp;Lishensuo

MsigDB基因集数据库

官方介绍:https://www.gsea-msigdb.org/gsea/msigdb/ 下载界面:http://www.gsea-msigdb.org/gsea/downloads.jsp ...

Create:&nbsp;<span title='2022-04-21 00:00:00 +0000 UTC'>2022-04-21</span>&nbsp;|&nbsp;Update:&nbsp;2022-05-06&nbsp;|&nbsp;Words:&nbsp;2279&nbsp;|&nbsp;5 min&nbsp;|&nbsp;Lishensuo

使用igraph包进行网络结构分析与可视化

1、创建与查看igraph对象 1.1 示例数据 igraph包提供了很多创建igraph对象的函数与思路。这里采用常用的基于data.frame的格式创建。 示例数据来自STRINGdb的PPI蛋白互作数据以及对应基因的上下调信息 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 library(STRINGdb) library(tidyverse) string_db <- STRINGdb$new(version="11", species=9606, score_threshold=200, input_directory="") data(diff_exp_example1) genes = rbind(head(diff_exp_example1,30), tail(diff_exp_example1,30)) head(genes) genes_mapped <- string_db$map(genes, "gene" ) head(genes_mapped) ppi = string_db$get_interactions(genes_mapped$STRING_id) %>% distinct() edges = ppi %>% dplyr::left_join(genes_mapped[,c(1,4)], by=c('from'='STRING_id')) %>% dplyr::rename(Gene1=gene) %>% dplyr::left_join(genes_mapped[,c(1,4)], by=c('to'='STRING_id')) %>% dplyr::rename(Gene2=gene) %>% dplyr::select(Gene1, Gene2, combined_score) nodes = genes_mapped %>% dplyr::filter(gene %in% c(edges$Gene1, edges$Gene2)) %>% dplyr::mutate(log10P = -log10(pvalue), direction = ifelse(logFC>0,"Up","Down")) %>% dplyr::select(gene, log10P, logFC, direction) ###边信息 head(edges) # Gene1 Gene2 combined_score # 1 UPK3B PTS 244 # 2 GSTM5 ACOT12 204 # 3 GRHL3 IGDCC4 238 # 4 TNNC1 ATP13A1 222 # 5 NNAT VSTM2L 281 # 6 EZH2 RBBP7 996 ###节点信息 head(nodes) # gene log10P logFC direction # 1 VSTM2L 3.992252 3.333461 Up # 2 TNNC1 3.534468 2.932060 Up # 3 MGAM 3.515558 2.369738 Up # 4 IGDCC4 3.290137 2.409806 Up # 5 UPK3B 3.248490 2.073072 Up # 6 SLC52A1 3.227019 3.214998 Up 1.2 创建对象 使用graph_from_data_frame()函数创建 ...

Create:&nbsp;<span title='2022-04-16 00:00:00 +0000 UTC'>2022-04-16</span>&nbsp;|&nbsp;Update:&nbsp;2022-05-17&nbsp;|&nbsp;Words:&nbsp;2209&nbsp;|&nbsp;5 min&nbsp;|&nbsp;Lishensuo

重启随机游走算法与RandomWalkRestartMH包

1、关于RWR 1.1 算法简介 Random Walk with Restart,RWR重启随机游走算法 在给定的一个由节点和边组成的网络结构中(下面均已PPI蛋白相互作用网络为例),选择其中一个或者一组基因。我们想知道其余的哪些基因与我们先前所选择的一个或者一组基因最相关。此时可以用到RWR,简单原理如下: ...

Create:&nbsp;<span title='2022-04-19 00:00:00 +0000 UTC'>2022-04-19</span>&nbsp;|&nbsp;Update:&nbsp;2022-04-19&nbsp;|&nbsp;Words:&nbsp;1887&nbsp;|&nbsp;4 min&nbsp;|&nbsp;Lishensuo

Gene2vec算法根据基因对计算基因表示

学习Genecompass时,了解参考到可以基因调控网络信息(Gene pair),计算Gene的嵌入表示(Embedding) https://github.com/jingcheng-du/Gene2vec 关键是要在python=3.7环境下,安装genesim=3.4.0 关于Genesim是NLP领域受欢迎的工具:https://github.com/jingcheng-du/Gene2vec ...

Create:&nbsp;<span title='2025-01-23 00:00:00 +0000 UTC'>2025-01-23</span>&nbsp;|&nbsp;Update:&nbsp;2025-01-23&nbsp;|&nbsp;Words:&nbsp;959&nbsp;|&nbsp;2 min&nbsp;|&nbsp;Lishensuo

obabel化学小分子格式转换

conda 安装 1 2 3 4 conda install -c conda-forge openbabel obabel # Open Babel 3.1.0 -- Nov 2 2021 -- 08:43:45 查看支持的格式 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 obabel -L # charges # descriptors # fingerprints # forcefields # formats # loaders # ops obabel -L formats | head # acesin -- ACES input format [Write-only] # acesout -- ACES output format [Read-only] # acr -- ACR format [Read-only] # adf -- ADF cartesian input format [Write-only] # adfband -- ADF Band output format [Read-only] # adfdftb -- ADF DFTB output format [Read-only] # adfout -- ADF output format [Read-only] # alc -- Alchemy format # aoforce -- Turbomole AOFORCE output format [Read-only] 格式转换 ...

Create:&nbsp;<span title='2022-04-16 00:00:00 +0000 UTC'>2022-04-16</span>&nbsp;|&nbsp;Update:&nbsp;2022-04-16&nbsp;|&nbsp;Words:&nbsp;1296&nbsp;|&nbsp;3 min&nbsp;|&nbsp;Lishensuo

化合物指纹与描述符生成系列工具

1、rdkit 1 2 3 4 5 6 # conda install -c conda-forge rdkit from rdkit import Chem from rdkit.Chem import MACCSkeys from rdkit import DataStructs from rdkit.Chem import Draw 1.1 指纹编码式 (1)Topological Fingerprints 1 2 3 4 5 6 7 8 m = Chem.MolFromSmiles('CCOC') # Chem.MolToSmiles(mol) fp = Chem.RDKFingerprint(m, fpSize=1024) # fpSize 自定义数目,默认为2048 fp.GetNumBits() # 1024 fp.ToBitString() ...

Create:&nbsp;<span title='2023-03-20 00:00:00 +0000 UTC'>2023-03-20</span>&nbsp;|&nbsp;Update:&nbsp;2023-03-20&nbsp;|&nbsp;Words:&nbsp;1070&nbsp;|&nbsp;3 min&nbsp;|&nbsp;Lishensuo

化合物敏感度数据库GDSC_CTRL

一、GDSC GDSC : https://www.cancerrxgene.org/,已上传至阿里云盘 1、原始数据整理 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 ## 预处理 # library(tidyverse) # #RAW 文件夹 # gdsc_drug = read.csv("GDSC_drug.csv") # colnames(gdsc_drug) = gsub("[.]", "_", colnames(gdsc_drug)) # # gdsc_cl = read.csv("GDSC_cellline.csv") # colnames(gdsc_cl) = gsub("[.]", "_", colnames(gdsc_cl)) # gdsc_cl = gdsc_cl %>% # reshape2::dcast(Cell_line_Name+Model_ID+COSMIC_ID+TCGA_Classfication+Tissue+Tissue_sub_type~Datasets, # value.var = "number_of_drugs") # # GDSC1 = readxl::read_excel("GDSC1_fitted_dose_response_25Feb20.xlsx") # GDSC1 = GDSC1[,c(-4, -6)] # GDSC1 = GDSC1[,c(-6, -8, -9)] # GDSC1 = GDSC1 %>% # dplyr::select(DATASET, DRUG_NAME, CELL_LINE_NAME, TCGA_DESC, LN_IC50, AUC, RMSE, Z_SCORE, everything()) # GDSC1 = GDSC1 %>% as.data.frame() # head(GDSC1) # # GDSC2 = readxl::read_excel("GDSC2_fitted_dose_response_25Feb20.xlsx") # GDSC2 = GDSC2[,c(-4, -6)] # GDSC2 = GDSC2[,c(-6, -8, -9)] # GDSC2 = GDSC2 %>% # dplyr::select(DATASET, DRUG_NAME, CELL_LINE_NAME, TCGA_DESC, LN_IC50, AUC, RMSE, Z_SCORE, everything()) # GDSC2 = GDSC2 %>% as.data.frame() # head(GDSC2) # # GDSC_merge = rbind(GDSC1, GDSC2) # head(GDSC_merge) # # head(gdsc_cl) 2、敏感度实验结果 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 GDSC_res = read.csv("GDSC/GDSC_result.csv") # DATASET DRUG_NAME CELL_LINE_NAME TCGA_DESC LN_IC50 AUC RMSE Z_SCORE # 1 GDSC1 Erlotinib MC-CAR MM 2.395685 0.982114 0.022521 -0.189576 # 2 GDSC1 Erlotinib ES3 UNCLASSIFIED 3.140923 0.984816 0.031840 0.508635 # 3 GDSC1 Erlotinib ES5 UNCLASSIFIED 3.968757 0.985693 0.026052 1.284229 # 4 GDSC1 Erlotinib ES7 UNCLASSIFIED 2.692768 0.972699 0.110056 0.088760 # 5 GDSC1 Erlotinib EW-11 UNCLASSIFIED 2.478678 0.944462 0.087011 -0.111820 # 6 GDSC1 Erlotinib SK-ES-1 UNCLASSIFIED 2.034050 0.950763 0.016288 -0.528390 ## 总共药物数 GDSC_res %>% dplyr::distinct(DRUG_NAME) %>% dim() # [1] 449 1 ## 每期药物数 GDSC_res %>% dplyr::distinct(DATASET, DRUG_NAME) %>% dplyr::count(DATASET, name = "Drugs") # DATASET Drugs # 1 GDSC1 345 # 2 GDSC2 192 ## 每个细胞系的实验数 GDSC_res %>% dplyr::count(DATASET, CELL_LINE_NAME, name = "assays") %>% reshape2::dcast(CELL_LINE_NAME ~ DATASET, value.var = "assays") %>% dplyr::arrange(desc(GDSC1)) %>% head() # CELL_LINE_NAME GDSC1 GDSC2 # 1 A253 367 179 # 2 AMO-1 367 178 # 3 KCL-22 367 178 # 4 KNS-42 367 NA summary(GDSC_res$LN_IC50) # Min. 1st Qu. Median Mean 3rd Qu. Max. # -10.5793 0.8435 2.6228 2.2052 4.1216 12.3591 summary(GDSC_res$AUC) # Min. 1st Qu. Median Mean 3rd Qu. Max. # 0.00479 0.78839 0.92309 0.84467 0.97306 0.99984 cor(GDSC_res$LN_IC50, GDSC_res$AUC) # [1] 0.7534196 关于IC50与AUC:https://blog.csdn.net/linkequa/article/details/88221975 ...

Create:&nbsp;<span title='2022-10-09 00:00:00 +0000 UTC'>2022-10-09</span>&nbsp;|&nbsp;Update:&nbsp;2022-10-09&nbsp;|&nbsp;Words:&nbsp;1391&nbsp;|&nbsp;3 min&nbsp;|&nbsp;Lishensuo

ChemmineR处理化合物信息的基础工具R包

ChemmineR是使用R语言实现化合物基础操作的工具包,现根据其官方文档学习其主要用法如下: https://www.bioconductor.org/packages/release/bioc/vignettes/ChemmineR/inst/doc/ChemmineR.html 1 2 3 4 5 6 if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("ChemmineR") library("ChemmineR") # library("ChemmineOB") 1. SDFset格式 ChemmineR基础操作是围绕SDFset对象展开的,其表示多个SDF格式的化合物集合 1 2 3 4 5 6 7 8 9 data(sdfsample) sdfset = sdfsample # valid <- validSDF(sdfset) # sdfset <- sdfset[valid] class(sdfset) # SDFset length(sdfset) # 100 c(sdfset[1:4], sdfset[5:8]) # 合并 sdfset[1:4] # 子集 每个SDFset集合是由单个SDF对象组成的,主要由4部分构成 <<header» : 化合物id等基本信息 <<atomblock» : 原子信息,<<bondblock»: 键信息 <<datablock» : 化合物的属性/其它注释信息 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 sdfset[[1]] as(sdfset[[1]], "list") ## ID cid(sdfset[1:2]) # slot ID sdfid(sdfset[1:2]) # header ID cid(sdfset) = sdfid(sdfset) ## Component header(sdfset[[1]]) # character atomblock(sdfset[[1]]) # matrix bondblock(sdfset[[1]]) # matrix datablock(sdfset[[1]]) # character blockmatrix = datablock2ma(datablock(sdfset[1:2])) 补充:ChemmineR提供一些函数可计算化合物的基本属性信息,例如分子量等。此外ChemmineOB也可以实现类似功能。 ...

Create:&nbsp;<span title='2023-07-20 00:00:00 +0000 UTC'>2023-07-20</span>&nbsp;|&nbsp;Update:&nbsp;2022-07-20&nbsp;|&nbsp;Words:&nbsp;1052&nbsp;|&nbsp;3 min&nbsp;|&nbsp;Lishensuo
« Prev Page Next Page »
© 2025 Li's Bioinfo-Blog Powered by Hugo & PaperMod
您是本站第 位访问者,总浏览量为 次