Chapter 1 Introduction
1.1 UCSC Xena Datasets
UCSC Xena platform is developed by the UC Santa Cruz Genomics Institute and serves as a comprehensive repository to provide thousands of processed omics datasets from large cancer research projects (e.g. TCGA, PCAWG and CCLE) or individual research groups and enable unprecedented research opportunities.
The hierarchy for storing and querying datasets is as follows:
- The UCSC Xena repository is generally comprised of 11 data hubs from various portals;
- Each data hub could include many sample cohorts;
- Each sample cohort could involve multiple omics profiling or phenotype data;
- Each omics profiling could have several datasets due to normalization methods or other factors.
Note: In the UCSCXenaShiny, we summarized all the data (sub)types into four main types, referring to clinicalMatrix, genomicsMatrix, genomicSegment, mutationVector, where the genomicsMatrix is the key component.
The following figure shows the numbers of cohorts and datasets for each datahub of UCSC Xena. Although some hubs (like toilHub, pancanAtlasHub) have relatively lower numbers of datasets, they mainly focus of pan-cancer integration, which could be more valuable in some research. Here, we will briefly introduce these hubs for better use.
(1) tcgaHub (TCGA hub)
- Statistics: 38 cohorts, 715 datasets;
- Source: TCGA Data Coordinating Center (DCC), Jan 2016
- Description: The hub is specific to the TCGA project with both individual tumor or integrative pan-cancer cohorts
(2) gdcHub (GDC hub)
- Statistics: 42 cohorts, 534 datasets;
- Source: GDC Data Portal (GDC), v18.0, 2019-08-28
- Description: The hub incorporates TCGA project with both individual tumor or integrative pan-cancer cohorts, as well as TARGET project (childhood cancers).
(3) pancanAtlasHub (Pan-Cancer Atlas Hub)
- Statistics: 1 cohort, 22 datasets;
- Source: Pan-Cancer Atlas publications on Cell.
- Description: The hub collects the curated pan-cancer TCGA data generated by the PanCan Atlas consortium working groups.
(4) toilHub (UCSC Toil RNA-seq Recompute)
- Statistics: 5 cohort, 51 datasets;
- Source: The toil pipleline from UCSC Genomics Institute.
- Description: The hub aims to integrate the pan-cancer RNA-seq data from TCGA, TARGET and GTEx databases.
(5) publicHub (UCSC Public Hub)
- Statistics: 37 cohort, 114 datasets;
- Source: Public resources from extensive collection.
- Description: The hub collects various cancer (or cell lines, e.g. CCLE, CMAP) omics data from other public studies.
(6) icgcHub (ICGC Xena Hub)
- Statistics: 3 cohort, 23 datasets;
- Source: International Cancer Genome Consortium(ICGC).
- Description: The hub considers the genomics data from ICGC but only included the US(TCGA) related projects for Gene/Protein expression data.
(7) pcawgHub (PCAWG Hub)
- Statistics: 2 cohort, 53 datasets;
- Source: The Pancancer Analysis of Whole Genomes (PCAWG) study.
- Description: The hub focus on multi-omics data of cancer whole genomes across many tumor types from International Cancer Genome Consortium(ICGC).
(8) atacseqHub(ATAC-seq Hub)
- Statistics: 2 cohort, 9 datasets;
- Source: The chromatin accessibility landscape of primary human cancers
- Description: The hub describes chromatin accessibility of 410 TCGA tumor samples across 23 cancer types using the ATAC-seq technology.
(9) kidfirstHub(Kids First Xena Hub)
- Statistics: 3 cohort, 50 datasets;
- Source: the Gabriella Miller Kids First Pediatric Research Program
- Description: The hub incorporates Pediatric Brain Tumor Atlas and TARGET project.
(10) treehouseHub
- Statistics: 16 cohorts, 44 datasets
- Source: the Treehouse Childhood Cancer Initiative
- Description: The hub is also specific to children cancer.
(11) singlecellHub
- Statistics: 16 cohorts, 71 datasets
- Source: Human Cell Atlas (HCA)
- Description: The hub includes several scRNA-seq datasets from HCA, invovling cancer or normal tissues, human or mouse origin.
1.2 TCGA abbreviations
Abbreviation | Name | |
---|---|---|
1 | LAML | Acute Myeloid Leukemia |
2 | ACC | Adrenocortical carcinoma |
3 | BLCA | Bladder Urothelial Carcinoma |
4 | LGG | Brain Lower Grade Glioma |
5 | BRCA | Breast invasive carcinoma |
6 | CESC | Cervical squamous cell carcinoma and endocervical adenocarcinoma |
7 | CHOL | Cholangiocarcinoma |
8 | LCML | Chronic Myelogenous Leukemia |
9 | COAD | Colon adenocarcinoma |
10 | CNTL | Controls |
11 | ESCA | Esophageal carcinoma |
12 | FPPP | FFPE Pilot Phase II |
13 | GBM | Glioblastoma multiforme |
14 | HNSC | Head and Neck squamous cell carcinoma |
15 | KICH | Kidney Chromophobe |
16 | KIRC | Kidney renal clear cell carcinoma |
17 | KIRP | Kidney renal papillary cell carcinoma |
18 | LIHC | Liver hepatocellular carcinoma |
19 | LUAD | Lung adenocarcinoma |
20 | LUSC | Lung squamous cell carcinoma |
21 | DLBC | Lymphoid Neoplasm Diffuse Large B-cell Lymphoma |
22 | MESO | Mesothelioma |
23 | MISC | Miscellaneous |
24 | OV | Ovarian serous cystadenocarcinoma |
25 | PAAD | Pancreatic adenocarcinoma |
26 | PCPG | Pheochromocytoma and Paraganglioma |
27 | PRAD | Prostate adenocarcinoma |
28 | READ | Rectum adenocarcinoma |
29 | SARC | Sarcoma |
30 | SKCM | Skin Cutaneous Melanoma |
31 | STAD | Stomach adenocarcinoma |
32 | TGCT | Testicular Germ Cell Tumors |
33 | THYM | Thymoma |
34 | THCA | Thyroid carcinoma |
35 | UCS | Uterine Carcinosarcoma |
36 | UCEC | Uterine Corpus Endometrial Carcinoma |
37 | UVM | Uveal Melanoma |
1.3 PCAWG abbreviations
Abbreviation | Name | |
---|---|---|
1 | BLCA-US | Bladder Urothelial Cancer - TCGA, US |
2 | BRCA-US | Breast Cancer - TCGA, US |
3 | CESC-US | Cervical Squamous Cell Carcinoma - TCGA, US |
4 | CLLE-ES | Chronic Lymphocytic Leukemia - ES |
5 | COAD-US | Colon Adenocarcinoma - TCGA, US |
6 | DLBC-US | Lymphoid Neoplasm Diffuse Large B-cell Lymphoma - TCGA, US |
7 | ESAD-UK | Esophageal Adenocarcinoma - UK |
8 | GBM-US | Brain Glioblastoma Multiforme - TCGA, US |
9 | HNSC-US | Head and Neck Squamous Cell Carcinoma - TCGA, US |
10 | KICH-US | Kidney Chromophobe - TCGA, US |
11 | KIRC-US | Kidney Renal Clear Cell Carcinoma - TCGA, US |
12 | KIRP-US | Kidney Renal Papillary Cell Carcinoma - TCGA, US |
13 | LAML-US | Acute Myeloid Leukemia - TCGA, US |
14 | LGG-US | Brain Lower Grade Glioma - TCGA, US |
15 | LIHC-US | Liver Hepatocellular carcinoma - TCGA, US |
16 | LIRI-JP | Liver Cancer - RIKEN, JP |
17 | LUAD-US | Lung Adenocarcinoma - TCGA, US |
18 | LUSC-US | Lung Squamous Cell Carcinoma - TCGA, US |
19 | MALY-DE | Malignant Lymphoma - DE |
20 | OV-AU | Ovarian Cancer - AU |
21 | OV-US | Ovarian Serous Cystadenocarcinoma - TCGA, US |
22 | PACA-AU | Pancreatic Cancer - AU |
23 | PRAD-US | Prostate Adenocarcinoma - TCGA, US |
24 | READ-US | Rectum Adenocarcinoma - TCGA, US |
25 | RECA-EU | Renal Cell Cancer - EU/FR |
26 | SARC-US | Sarcoma - TCGA, US |
27 | SKCM-US | Skin Cutaneous melanoma - TCGA, US |
28 | STAD-US | Gastric Adenocarcinoma - TCGA, US |
29 | THCA-US | Head and Neck Thyroid Carcinoma - TCGA, US |
30 | UCEC-US | Uterine Corpus Endometrial Carcinoma- TCGA, US |