官方介绍:https://www.gsea-msigdb.org/gsea/msigdb/

下载界面:http://www.gsea-msigdb.org/gsea/downloads.jsp

通路词条示例:https://www.gsea-msigdb.org/gsea/msigdb/cards/CORONEL_RFX7_DIRECT_TARGETS_UP.html,记录通路的描述信息以及出处。

1、hallmark gene sets

  • 50条代表性通路,涵盖了涉及发育、免疫、信号通路等明确定义的基因集,具体见文末。

  • 基于C1-C6的4000多条通路整理而成,整合思路可参考文章

  • 可以作为探索富集通路的分析起点

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
gset = clusterProfiler::read.gmt("msigdb_v7.5.1_GMTs/h.all.v7.5.1.symbols.gmt")
length(unique(gset$term))
# head(gset)
#                               term    gene
# 1 HALLMARK_TNFA_SIGNALING_VIA_NFKB    JUNB
# 2 HALLMARK_TNFA_SIGNALING_VIA_NFKB   CXCL2
# 3 HALLMARK_TNFA_SIGNALING_VIA_NFKB    ATF3
# 4 HALLMARK_TNFA_SIGNALING_VIA_NFKB  NFKBIA
# 5 HALLMARK_TNFA_SIGNALING_VIA_NFKB TNFAIP3
# 6 HALLMARK_TNFA_SIGNALING_VIA_NFKB   PTGS2

2、C1 : positional gene sets

  • 记录每个染色体区段的基因;
  • 可用于识别与染色体缺失/扩增,表观遗传沉默等相关的效应
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
gset = clusterProfiler::read.gmt("msigdb_v7.5.1_GMTs/c1.all.v7.5.1.symbols.gmt")
length(unique(gset$gene))
# 40721
head(gset)
#     term      gene
# 1 chr1p11   RPL22P6
# 2 chr1p11     NBPF8
# 3 chr1p11 LINC02798
# 4 chr1p11      H3P4
# 5 chr1p11   MTIF2P1
# 6 chr1p11   SRGAP2C

3、C2:curated gene sets

3.1 CGP: Chemical and genetic perturbations

  • 化合物/基因/信号通路干扰的差异表达基因集,一般分为UP/DN两类。收集的signature数据主要来自于文献。
1
2
3
gset = clusterProfiler::read.gmt("msigdb_v7.5.1_GMTs/c2.cgp.v7.5.1.symbols.gmt")
head(gset)
head(unique(gset$term))

3.2 CP: Canonical pathways

  • 各种通路数据库中收集而来,比如KEGG、Reactome、WikiPathways等
1
2
3
4
5
6
7
8
gset = clusterProfiler::read.gmt("msigdb_v7.5.1_GMTs/c2.cp.kegg.v7.5.1.symbols.gmt")
length(unique(gset$term)) #186

gset = clusterProfiler::read.gmt("msigdb_v7.5.1_GMTs/c2.cp.reactome.v7.5.1.symbols.gmt")
length(unique(gset$term)) #1615

gset = clusterProfiler::read.gmt("msigdb_v7.5.1_GMTs/c2.cp.wikipathways.v7.5.1.symbols.gmt")
length(unique(gset$term)) #664

4、C3:regulatory target gene sets

4.1 MIR: microRNA targets

  • microRNA所调控的靶点基因,主要收集自miRDB v6.0 (mirdb.org, Chen and Wang, 2020)
1
2
3
4
5
6
7
8
9
gset = clusterProfiler::read.gmt("msigdb_v7.5.1_GMTs/c3.mir.v7.5.1.entrez.gmt")
head(gset)
#        term  gene
# 1 MIR153_5P  4232
# 2 MIR153_5P 11163
# 3 MIR153_5P  3267
# 4 MIR153_5P  3843
# 5 MIR153_5P 25926
# 6 MIR153_5P 63976

4.2 TFT: Transcription factor targets

  • 转录因子所调控的靶点,主要收集自the Gene Transcription Regulation Database (GTRD, gtrd.biouml.org)
1
2
3
4
5
6
7
8
9
gset = clusterProfiler::read.gmt("msigdb_v7.5.1_GMTs/c3.tft.v7.5.1.symbols.gmt")
head(gset)
#                                                                        term     gene
# 1 METHYLCYTOSINE_DIOXYGENASE_TET_UNIPROT_A0A023HHK9_UNREVIEWED_TARGET_GENES PPP1R12A
# 2 METHYLCYTOSINE_DIOXYGENASE_TET_UNIPROT_A0A023HHK9_UNREVIEWED_TARGET_GENES    SPHK2
# 3 METHYLCYTOSINE_DIOXYGENASE_TET_UNIPROT_A0A023HHK9_UNREVIEWED_TARGET_GENES     HAGH
# 4 METHYLCYTOSINE_DIOXYGENASE_TET_UNIPROT_A0A023HHK9_UNREVIEWED_TARGET_GENES     RBL1
# 5 METHYLCYTOSINE_DIOXYGENASE_TET_UNIPROT_A0A023HHK9_UNREVIEWED_TARGET_GENES   GEMIN5
# 6 METHYLCYTOSINE_DIOXYGENASE_TET_UNIPROT_A0A023HHK9_UNREVIEWED_TARGET_GENES    TMED2

5、C4:computational gene sets

5.1 CGN: Cancer gene neighborhoods

  • 癌症相关的表型或者特征相关的基因集,收集自4个数据库GNF2: Human tissue compendium (Novartis)、CAR: Novartis carcinoma compendium (Novartis)、GCM: Global Cancer Map (Broad Institute)、MORF: An unpublished compendium of gene expression data sets。
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
gset = clusterProfiler::read.gmt("msigdb_v7.5.1_GMTs/c4.cgn.v7.5.1.symbols.gmt")
head(gset)
unique(gset$term)
#        term   gene
# 1 MORF_ATRX  ADCY3
# 2 MORF_ATRX SEC31A
# 3 MORF_ATRX    BTD
# 4 MORF_ATRX  LTBP4
# 5 MORF_ATRX   UTRN
# 6 MORF_ATRX   FIG4

5.2 CM: Cancer modules

  • 456 such modules as significantly changed in a variety of cancer conditions
1
2
3
4
5
6
7
8
9
gset = clusterProfiler::read.gmt("msigdb_v7.5.1_GMTs/c4.cm.v7.5.1.symbols.gmt")
head(gset)
#       term    gene
# 1 MODULE_9 SLC16A4
# 2 MODULE_9   ACTN1
# 3 MODULE_9    PER1
# 4 MODULE_9     VWF
# 5 MODULE_9   DHRS3
# 6 MODULE_9    ID2B

6、C5:ontology gene sets

6.1 GO: Gene Ontology

  • biological process (BP), cellular component (CC), or molecular function (MF) respectively.
1
2
3
4
5
6
7
8
9
gset = clusterProfiler::read.gmt("msigdb_v7.5.1_GMTs/c5.go.v7.5.1.symbols.gmt")
head(gset)
#                                    term     gene
# 1 GOBP_MITOCHONDRIAL_GENOME_MAINTENANCE     AKT3
# 2 GOBP_MITOCHONDRIAL_GENOME_MAINTENANCE PPARGC1A
# 3 GOBP_MITOCHONDRIAL_GENOME_MAINTENANCE    POLG2
# 4 GOBP_MITOCHONDRIAL_GENOME_MAINTENANCE    PARP1
# 5 GOBP_MITOCHONDRIAL_GENOME_MAINTENANCE     DNA2
# 6 GOBP_MITOCHONDRIAL_GENOME_MAINTENANCE     TYMP

6.2 Human Phenotype Ontology

  • 疾病表型相关的基因集
1
2
3
4
5
6
7
8
9
gset = clusterProfiler::read.gmt("msigdb_v7.5.1_GMTs/c5.hpo.v7.5.1.symbols.gmt")
head(gset)
#                              term    gene
# 1 HP_MULTICYSTIC_KIDNEY_DYSPLASIA TMEM107
# 2 HP_MULTICYSTIC_KIDNEY_DYSPLASIA  LZTFL1
# 3 HP_MULTICYSTIC_KIDNEY_DYSPLASIA    PEX6
# 4 HP_MULTICYSTIC_KIDNEY_DYSPLASIA    GPC4
# 5 HP_MULTICYSTIC_KIDNEY_DYSPLASIA  TFAP2A
# 6 HP_MULTICYSTIC_KIDNEY_DYSPLASIA    SIX1

7、C6:oncogenic signature gene sets

  • 癌症细胞系中相关癌基因异常表达的上下调基因集
1
2
3
4
5
6
7
8
9
gset = clusterProfiler::read.gmt("msigdb_v7.5.1_GMTs/c6.all.v7.5.1.symbols.gmt")
head(gset)
#            term    gene
# 1 GLI1_UP.V1_DN  CACUL1
# 2 GLI1_UP.V1_DN   CCNL2
# 3 GLI1_UP.V1_DN   YIPF1
# 4 GLI1_UP.V1_DN   GTDC1
# 5 GLI1_UP.V1_DN    OPN3
# 6 GLI1_UP.V1_DN SLC22A1

8、C7:immunologic signature gene sets

8.1:ImmuneSigDB

  • 主要收集自GEO,有关免疫系统异常状态差异分析的上下调基因集
1
2
3
4
5
6
7
8
9
gset = clusterProfiler::read.gmt("msigdb_v7.5.1_GMTs/c7.immunesigdb.v7.5.1.symbols.gmt")
head(gset)
#                                   term  gene
# 1 KAECH_NAIVE_VS_DAY8_EFF_CD8_TCELL_UP RFLNB
# 2 KAECH_NAIVE_VS_DAY8_EFF_CD8_TCELL_UP AMPD3
# 3 KAECH_NAIVE_VS_DAY8_EFF_CD8_TCELL_UP  NSG2
# 4 KAECH_NAIVE_VS_DAY8_EFF_CD8_TCELL_UP DUSP6
# 5 KAECH_NAIVE_VS_DAY8_EFF_CD8_TCELL_UP ADCY6
# 6 KAECH_NAIVE_VS_DAY8_EFF_CD8_TCELL_UP   TEC

8.2:VAX: vaccine response gene sets

  • 疫苗反应的基因集

9、C8:cell type signature gene sets

  • 不同细胞类型的marker基因;主要收集自文献
1
2
gset = clusterProfiler::read.gmt("msigdb_v7.5.1_GMTs/c8.all.v7.5.1.symbols.gmt")
length(unique(gset$term)) #700

Hallmark name Process category Description 释义 Number of founder sets Number of genes
1 APICAL_JUNCTION cellular component Apical junction complex consisting of adherens and tight junctions 顶端连接复合体,影响细胞-细胞粘附和细胞间通讯 37 200
2 APICAL_SURFACE cellular component Membrane proteins in the apical domain 顶端结构域的膜蛋白 12 44
3 PEROXISOME cellular component Peroxisomes 过氧化物酶体,一种细胞器 28 107
4 ADIPOGENESIS development Adipocyte development 脂肪细胞的发育 36 200
5 ANGIOGENESIS development Blood vessel formation 血管的形成 14 36
6 EPITHELIAL_MESENCHYMAL_TRANSITION development Epithelial mesenchymal transition 上皮间充质转化,在胚胎发育、慢性炎症、组织重建、癌症转移和多种纤维化疾病中发挥了重要作用 107 200
7 MYOGENESIS development Muscle differentiation 肌肉发育分化 64 200
8 SPERMATOGENESIS development Sperm development and male fertility 精子发育和男性生育能力 24 135
9 PANCREAS_BETA_CELL development Genes specific to pancreatic beta cells 胰腺细胞 24 40
10 DNA_REPAIR DNA damage DNA repair DNA修复 44 150
11 UV_RESPONSE_DOWN DNA damage UV response: down-regulated genes 紫外反应–下调基因 17 144
12 UV_RESPONSE_UP DNA damage UV response: up-regulated genes 紫外反应–上调基因 16 158
13 ALLOGRAFT_REJECTION immune Allograft rejection 同种异体移植物排斥 190 200
14 COAGULATION immune Coagulation cascade 凝血级联,身体形成血凝块以防止过多失血的过程 71 138
15 COMPLEMENT immune Complement cascade 补体级联反应 71 200
16 INTERFERON_ALPHA_RESPONSE immune Interferon alpha response 干扰素 α 82 97
17 INTERFERON_GAMMA_RESPONSE immune Interferon gamma response 干扰素 γ 82 200
18 IL6_JAK_STAT3_SIGNALING immune IL6 STAT3 signaling during acute phase response IL6 细胞因子受体 24 87
19 INFLAMMATORY_RESPONSE immune Inflammation 炎症 120 200
20 BILE_ACID_METABOLISM metabolic Biosynthesis of bile acids 胆汁酸生物合成 28 112
21 CHOLESTEROL_HOMEOSTASIS metabolic Cholesterol homeostasis 胆固醇平衡 28 74
22 FATTY_ACID_METABOLISM metabolic Fatty acid metabolism 脂肪酸代谢 53 158
23 GLYCOLYSIS metabolic Glycolysis and gluconeogenesis 糖酵解和糖质新生 87 200
24 HEME_METABOLISM metabolic Heme metabolism 血红素代谢 36 200
25 OXIDATIVE_PHOSPHORYLATION metabolic Oxidative phosphorylation and citric acid cycle 氧化磷酸化与柠檬酸循环 93 200
26 XENOBIOTIC_METABOLISM metabolic Metabolism of xenobiotics 外源性物质代谢 124 200
27 APOPTOSIS pathway Programmed cell death; caspase pathway 程序性细胞死亡;半胱天冬酶通路 80 161
28 HYPOXIA pathway Response to hypoxia; HIF1A targets 缺氧;HIF1A:缺氧诱导因子-1α 87 200
29 PROTEIN_SECRETION pathway Protein secretion 蛋白质分泌 74 96
30 UNFOLDED_PROTEIN_RESPONSE pathway Unfolded protein response; ER stress 内质网应激 22 113
31 REACTIVE_OXYGEN_SPECIES_PATHWAY pathway Reactive oxygen species (ROS) pathway 活性氧(ROS)途径 13 49
32 E2F_TARGETS proliferation E2F targets 参与哺乳动物细胞中DNA的细胞周期调控和合成 420 200
33 G2M_CHECKPOINT proliferation Cell cycle G2/M checkpoint 细胞周期G2/M检查点 420 200
34 MYC_TARGETS_V1 proliferation MYC targets variant 1 致癌基因MYC的治疗靶点 404 200
35 MYC_TARGETS_V2 proliferation MYC targets variant 2 致癌基因MYC的治疗靶点 6 58
36 P53_PATHWAY proliferation p53 pathway TP53通路 85 200
37 MITOTIC_SPINDLE proliferation Mitotic spindle assembly 有丝分裂纺锤体组装 108 200
38 ANDROGEN_RESPONSE signaling Androgen response 雄激素反应 8 117
39 ESTROGEN_RESPONSE_EARLY signaling Early estrogen response 雌激素反应 61 200
40 ESTROGEN_RESPONSE_LATE signaling Late estrogen response 雌激素反应 61 200
41 IL2_JAK_STAT5_SIGNALING signaling IL2 STAT5 signaling IL2 STAT5通路 13 200
42 KRAS_SIGNALING_UP signaling KRAS signaling, up-regulated genes KRAS通路 上调基因 14 200
43 KRAS_SIGNALING_DOWN signaling KRAS signaling, down-regulated genes KRAS通路 下调基因 16 200
44 MTORC1_SIGNALING signaling mTORC1 signaling mTOR通路 487 200
45 NOTCH_SIGNALING signaling Notch signaling Notch通路 49 32
46 PI3K_AKT_MTOR_SIGNALING signaling PI3K signaling via AKT to mTORC1 PI3K/AKT/mTOR通路 591 105
47 HEDGEHOG_SIGNALING signaling Hedgehog signaling Hedgehog通路 79 36
48 TGF_BETA_SIGNALING signaling TGF beta signaling TGF-β通路 29 54
49 TNFA_SIGNALING_VIA_NFKB signaling TNFA signaling via NFkB TNF-α/NF-kB通路 132 200
50 WNT_BETA_CATENIN_SIGNALING signaling Cannonical beta catenin pathway 经典Wnt信号通路 49 42