Li's Bioinfo-Blog
  • |
  • 主页
  • 分类
  • 标签
  • 归档
  • 关于
  • 搜索
Home » Tags

化合物

基因-蛋白-化合物ID转换

1、不同基因ID转换 1.1 org.Hs.eg.db包 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 library(dplyr) library(org.Hs.eg.db) keytypes(org.Hs.eg.db) # [1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS" "ENTREZID" # [7] "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME" "GENETYPE" "GO" # [13] "GOALL" "IPI" "MAP" "OMIM" "ONTOLOGY" "ONTOLOGYALL" # [19] "PATH" "PFAM" "PMID" "PROSITE" "REFSEQ" "SYMBOL" # [25] "UCSCKG" "UNIPROT" gene_symbol=c("RHO","CALM1","MEG3","GNGT1","SAG","RPGRIP1","TRPM1","PCP2","PCP4","AP1B1") gene_ids<-AnnotationDbi::select(org.Hs.eg.db, keys=as.character(gene_symbol), columns=c("ENSEMBL","ENTREZID"), #目标格式 keytype="SYMBOL") #目前的格式 gene_ids ##去重 gene_ids %>% dplyr::distinct(ENTREZID, .keep_all = T) # SYMBOL ENSEMBL ENTREZID # 1 RHO ENSG00000163914 6010 # 2 CALM1 ENSG00000198668 801 # 3 MEG3 ENSG00000214548 55384 # 4 GNGT1 ENSG00000127928 2792 # 5 SAG ENSG00000130561 6295 # 6 RPGRIP1 ENSG00000092200 57096 # 7 TRPM1 ENSG00000134160 4308 # 8 PCP2 ENSG00000174788 126006 # 9 PCP4 ENSG00000183036 5121 # 10 AP1B1 ENSG00000100280 162 1.2 biomaRt包 1 2 3 4 5 6 7 8 9 10 11 12 library("biomaRt") ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl") attributes = listAttributes(ensembl) attributes[1:5,] # library(httr) # httr::set_config(config(ssl_verifypeer = 0L)) gene_symbol=c("RHO","CALM1","MEG3","GNGT1","SAG","RPGRIP1","TRPM1","PCP2","PCP4","AP1B1") gene_ids2 <- getBM(filters= "hgnc_symbol", attributes= c("hgnc_symbol","ensembl_gene_id","entrezgene_id"), values = gene_symbol, mart= ensembl) gene_ids2 2、鼠源基因转为人类基因ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 musGenes <- c("Hmmr", "Tlx3", "Cpeb4") ## 方式1:直接大小写转换 toupper(musGenes) # [1] "HMMR" "TLX3" "CPEB4" ## 方式2:通过biomaRt包(不稳定) require("biomaRt") # library(httr) # httr::set_config(config(ssl_verifypeer = 0L)) human = useMart("ensembl", dataset = "hsapiens_gene_ensembl",host = "dec2021.archive.ensembl.org") mouse = useMart("ensembl", dataset = "mmusculus_gene_ensembl",host = "dec2021.archive.ensembl.org") genes = getLDS(attributes = c("mgi_symbol"), filters = "mgi_symbol", values = musGenes, mart = mouse, attributesL = c("hgnc_symbol"), martL = human, uniqueRows=T) ## 方式3:MGI 数据库 # https://support.bioconductor.org/p/129636/ library(dplyr) mouse_human_genes = read.csv("http://www.informatics.jax.org/downloads/reports/HOM_MouseHumanSequence.rpt",sep="\t") convert_mouse_to_human <- function(gene_list){ output = c() for(gene in gene_list){ class_key = (mouse_human_genes %>% filter(Symbol == gene & Common.Organism.Name=="mouse, laboratory"))[['DB.Class.Key']] if(!identical(class_key, integer(0)) ){ human_genes = (mouse_human_genes %>% filter(DB.Class.Key == class_key & Common.Organism.Name=="human"))[,"Symbol"] for(human_gene in human_genes){ output = append(output,human_gene) } } } return (output) } convert_mouse_to_human(musGenes) # 1] "HMMR" "TLX3" "CPEB4" 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 # # https://github.com/lishensuo/utils # # library("biomaRt") # # library(httr) # # httr::set_config(config(ssl_verifypeer = 0L)) # human = useMart("ensembl", dataset = "hsapiens_gene_ensembl",host = "dec2021.archive.ensembl.org") # mouse = useMart("ensembl", dataset = "mmusculus_gene_ensembl",host = "dec2021.archive.ensembl.org") # # # https://www.gencodegenes.org/mouse/ # dat = data.table::fread("gencode.vM33.basic.annotation.gtf.gz") # dat = subset(dat, V3 == "gene") # dat_sub = dat[,"V9"] %>% # separate(V9, into = c("gene_id","gene_type","gene_name","mgi_id","havana_gene"), sep = "; ") # dat_sub$gene_name2 = gsub('gencode.vM33.basic.annotation.gtf.gz "','',dat_sub$gene_name) # dat_sub$gene_name2 = gsub('"','',dat_sub$gene_name2) # # genes = getLDS(attributes = c("mgi_symbol"), filters = "mgi_symbol", # values = dat_sub$gene_name2, # mart = mouse, # attributesL = c("hgnc_symbol"), # martL = human, uniqueRows=T) # write.csv(genes, file = "mgi2hgnc_biomart.csv", row.names = F, quote = F) # head(genes) 3、蛋白质与基因ID转换 https://www.uniprot.org/uploadlists/ ...

Create: 2022-05-28 | Update: 2025-06-25 | Words: 2488 | 5 min | Lishensuo

数据库--药物与药物靶点TTD

1、TTD数据库简介 首先关于靶点的生物学定义是: 生物学靶点(英語:Biological target)是指位于生物体内,能够被其他物质(配体、药物等)识别或结合的结构。常见的药物靶点包括蛋白质、核酸和离子通道等。—维基百科 ...

Create: 2022-05-03 | Update: 2022-05-03 | Words: 1517 | 4 min | Lishensuo

CMap数据库整理与使用方法

Cmap LINCS计划采用L1000技术进行大规模的细胞系干扰实验测序,得到差异基因。具体可分为Phase-1,Phase-2两个阶段。数据已整理、上传至阿里云盘。本片笔记整理下数据的操作、使用方法。 ...

Create: 2022-04-21 | Update: 2022-04-21 | Words: 2399 | 5 min | Lishensuo

obabel化学小分子格式转换

conda 安装 1 2 3 4 conda install -c conda-forge openbabel obabel # Open Babel 3.1.0 -- Nov 2 2021 -- 08:43:45 查看支持的格式 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 obabel -L # charges # descriptors # fingerprints # forcefields # formats # loaders # ops obabel -L formats | head # acesin -- ACES input format [Write-only] # acesout -- ACES output format [Read-only] # acr -- ACR format [Read-only] # adf -- ADF cartesian input format [Write-only] # adfband -- ADF Band output format [Read-only] # adfdftb -- ADF DFTB output format [Read-only] # adfout -- ADF output format [Read-only] # alc -- Alchemy format # aoforce -- Turbomole AOFORCE output format [Read-only] 格式转换 ...

Create: 2022-04-16 | Update: 2022-04-16 | Words: 1296 | 3 min | Lishensuo

化合物指纹与描述符生成系列工具

1、rdkit 1 2 3 4 5 6 # conda install -c conda-forge rdkit from rdkit import Chem from rdkit.Chem import MACCSkeys from rdkit import DataStructs from rdkit.Chem import Draw 1.1 指纹编码式 (1)Topological Fingerprints 1 2 3 4 5 6 7 8 m = Chem.MolFromSmiles('CCOC') # Chem.MolToSmiles(mol) fp = Chem.RDKFingerprint(m, fpSize=1024) # fpSize 自定义数目,默认为2048 fp.GetNumBits() # 1024 fp.ToBitString() ...

Create: 2023-03-20 | Update: 2023-03-20 | Words: 1070 | 3 min | Lishensuo

化合物敏感度数据库GDSC_CTRL

一、GDSC GDSC : https://www.cancerrxgene.org/,已上传至阿里云盘 ...

Create: 2022-10-09 | Update: 2022-10-09 | Words: 1391 | 3 min | Lishensuo

ChemmineR处理化合物信息的基础工具R包

ChemmineR是使用R语言实现化合物基础操作的工具包,现根据其官方文档学习其主要用法如下: https://www.bioconductor.org/packages/release/bioc/vignettes/ChemmineR/inst/doc/ChemmineR.html 1 2 3 4 5 6 if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("ChemmineR") library("ChemmineR") # library("ChemmineOB") 1. SDFset格式 ChemmineR基础操作是围绕SDFset对象展开的,其表示多个SDF格式的化合物集合 1 2 3 4 5 6 7 8 9 data(sdfsample) sdfset = sdfsample # valid <- validSDF(sdfset) # sdfset <- sdfset[valid] class(sdfset) # SDFset length(sdfset) # 100 c(sdfset[1:4], sdfset[5:8]) # 合并 sdfset[1:4] # 子集 每个SDFset集合是由单个SDF对象组成的,主要由4部分构成 <<header» : 化合物id等基本信息 <<atomblock» : 原子信息,<<bondblock»: 键信息 <<datablock» : 化合物的属性/其它注释信息 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 sdfset[[1]] as(sdfset[[1]], "list") ## ID cid(sdfset[1:2]) # slot ID sdfid(sdfset[1:2]) # header ID cid(sdfset) = sdfid(sdfset) ## Component header(sdfset[[1]]) # character atomblock(sdfset[[1]]) # matrix bondblock(sdfset[[1]]) # matrix datablock(sdfset[[1]]) # character blockmatrix = datablock2ma(datablock(sdfset[1:2])) 补充:ChemmineR提供一些函数可计算化合物的基本属性信息,例如分子量等。此外ChemmineOB也可以实现类似功能。 ...

Create: 2023-07-20 | Update: 2022-07-20 | Words: 1052 | 3 min | Lishensuo
© 2026 Li's Bioinfo-Blog Powered by Hugo & PaperMod
您是本站第 位访问者,总浏览量为 次