Chapter 5 Molecular Data Analysis
For each function, we will introduce its basic use and example output. Users can check all analysis or visualization parameters available by clicking the corresponding title link.
Among most functions, their first parameter is molecular identifier of one data type. Users can also design a molecular signature comprised of multiple molecules (e.g.
TP53 + 2 * KRAS - 1.3 * PTEN
).Users can also modify alternative datasets if available for one molecular type through the
opt_pancan
parameter. (see more viastr(.opt_pancan)
)
5.1 TCGA analysis
Database | Type | Function |
---|---|---|
TCGA | Comparison | vis_toil_TvsN() |
TCGA | Comparison | vis_toil_TvsN_cancer() |
TCGA | Comparison | vis_pancan_anatomy() |
TCGA | Comparison | vis_toil_Mut() |
TCGA | Comparison | vis_toil_Mut_cancer() |
TCGA | Correlation | vis_gene_cor() |
TCGA | Correlation | vis_gene_cor_cancer() |
TCGA | Correlation | vis_gene_TIL_cor() |
TCGA | Correlation | vis_gene_immune_cor() |
TCGA | Correlation | vis_gene_tmb_cor() |
TCGA | Correlation | vis_gene_msi_cor() |
TCGA | Correlation | vis_gene_stemness_cor() |
TCGA | Correlation | vis_gene_pw_cor() |
TCGA | Survival | tcga_surv_plot() |
TCGA | Survival | vis_unicox_tree() |
TCGA | Dimension Reduction | vis_dim_dist() |
5.1.1 Comparison analysis
5.1.1.1 vis_toil_TvsN()
Compare molecular value between tumor and normal samples across pan-cancer. (Custom module)
- Basic use:
vis_toil_TvsN(Gene=, data_type=)
5.1.1.2 vis_toil_TvsN_cancer()
Compare molecular value between tumor and normal samples in one cancer. (Custom module)
- Basic use:
vis_toil_TvsN_cancer(Gene=, data_type=, Cancer=)
5.1.1.3 vis_pancan_anatomy()
- Basic use:
vis_pancan_anatomy(Gene="TP53", Gender=, data_type="mRNA")
## $plot
##
## $data
## # A tibble: 57 × 8
## group Median tissue type.x type organ color value
## <chr> <dbl> <chr> <fct> <fct> <chr> <chr> <dbl>
## 1 ACC_normal 3.60 ACC ACC_normal_GTEx normal adrenal_gland #41a… 3.60
## 2 ACC_tumor 3.85 ACC ACC_tumor_TCGA tumor adrenal_gland #41a… 3.85
## 3 BLCA_normal 4.65 BLCA BLCA_normal_TCGA normal urinary_bladder oran… 4.65
## 4 BLCA_normal 4.65 BLCA BLCA_normal_GTEx normal urinary_bladder oran… 4.65
## 5 BLCA_tumor 5.04 BLCA BLCA_tumor_TCGA tumor urinary_bladder oran… 5.04
## 6 BRCA_normal 4.61 BRCA BRCA_normal_GTEx normal breast #41a… 4.61
## 7 BRCA_normal 4.61 BRCA BRCA_normal_TCGA normal breast #41a… 4.61
## 8 BRCA_tumor 5.36 BRCA BRCA_tumor_TCGA tumor breast #41a… 5.36
## 9 CHOL_normal 2.84 CHOL CHOL_normal_TCGA normal gall_bladder oran… 2.84
## 10 CHOL_tumor 5.31 CHOL CHOL_tumor_TCGA tumor gall_bladder oran… 5.31
## # ℹ 47 more rows
5.1.1.4 vis_toil_Mut()
Compare molecular value between mutation and wild tumor samples across pan-cancer. (Custom module)
- Basic use:
vis_toil_Mut(mut_Gene=, Gene=, data_type=)
5.1.1.5 vis_toil_Mut_cancer()
Compare molecular value between mutation and wild tumor samples in one cancer. (Custom module)
- Basic use:
vis_toil_Mut_cancer(Gene=, data_type=, Cancer=)
5.1.2 Correlation analysis
5.1.2.1 vis_gene_cor()
Calculate the correlation between two molecules value in tumor samples of pan-cancers. (Custom module)
- Basic use:
vis_gene_cor(Gene1=, data_type1=, Gene2=, data_type2=)
5.1.2.2 vis_gene_cor_cancer()
Calculate the correlation between two molecules value in tumor samples of one cancer. (Custom module)
- Basic use:
vis_gene_cor_cancer(Gene1=, data_type1=, Gene2=, data_type2=, cancer_choose=)
vis_gene_cor_cancer(Gene1 = "CSF1R", data_type1 = "mRNA",
Gene2 = "JAK3", data_type2 = "mRNA",
cancer_choose = "ACC")
5.1.2.3 vis_gene_TIL_cor()
Calculate the correlation between one molecule and one type of TIL in tumor samples across pan-cancers. (Custom module)
- Basic use:
vis_gene_TIL_cor(Gene= ,data_type= ,sig=)
## [1] "CIBERSORT" "CIBERSORT-ABS" "EPIC" "MCPCOUNTER"
## [5] "QUANTISEQ" "TIMER" "XCELL"
sig = paste(tcga_ids$id_TIL$TIMER$Level3,
tcga_ids$id_TIL$TIMER$Level2, sep = "_")
vis_gene_TIL_cor(Gene = "TP53", data_type = "mRNA",
sig = sig)
5.1.2.4 vis_gene_immune_cor()
Calculate the correlation between one molecule and one type of Immune signature in tumor samples across pan-cancers. (Custom module)
- Basic use:
vis_gene_immune_cor(Gene= ,data_type= ,sig=)
tcga_pan_immune_signature <- load_data("tcga_pan_immune_signature")
table(tcga_pan_immune_signature$Source)
##
## Attractors Bindea c7atoms Cibersort ICR Wolf Yasin
## 9 25 32 20 3 68 3
5.1.2.5 vis_gene_tmb_cor()
Calculate the correlation between one molecule and TMB score in tumor samples across pan-cancers. (Custom module)
- Basic use:
vis_gene_tmb_cor(Gene= , data_type= )
5.1.2.6 vis_gene_msi_cor()
Calculate the correlation between one molecule and MSI score in tumor samples across pan-cancers. (Custom module)
- Basic use:
vis_gene_msi_cor(Gene= , data_type= )
5.1.2.7 vis_gene_stemness_cor()
Calculate the correlation between one molecule and stemness score in tumor samples across pan-cancers. (Custom module)
- Basic use:
vis_gene_stemness_cor(Gene= , data_type= )
5.1.2.8 vis_gene_pw_cor()
Calculate the correlation between one molecule and pathway score in tumor samples of one cancer. (Custom module)
- Basic use:
vis_gene_pw_cor(Gene= , data_type= )
vis_gene_pw_cor(Gene = "TP53", data_type = "mRNA",
pw_name = "HALLMARK_ADIPOGENESIS",
cancer_choose = "ACC")
5.1.3 Survival analysis
5.1.3.1 tcga_surv_plot()
Perform the log-rank test of one molecule for one cancer. (Custom module)
- Basic use:
tcga_surv_plot(data=, time= , status= )
# Firstly, prepare the molecular value as well as survival data
data <- tcga_surv_get(item = "TP53",profile = "mRNA",
TCGA_cohort = "LUAD")
head(data)
## # A tibble: 6 × 13
## sampleID value OS OS.time DSS DSS.time DFI DFI.time PFI PFI.time
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 TCGA-05-4420… 4.51 0 912 0 912 0 912 0 912
## 2 TCGA-91-6840… 5.90 0 372 0 372 0 372 0 372
## 3 TCGA-44-6778… 5.30 0 1864 0 1864 0 1864 0 1864
## 4 TCGA-67-3774… 5.22 0 385 0 385 NA NA 0 385
## 5 TCGA-64-1679… 5.46 0 2488 0 2488 0 2488 0 2488
## 6 TCGA-55-6982… 4.54 1 995 1 995 NA NA 1 183
## # ℹ 3 more variables: gender <chr>, age <dbl>, stage <chr>
## Warning in do_once((if (is_R_CMD_check()) stop else warning)("The function
## xfun::isFALSE() will be deprecated in the future. Please ", : The function
## xfun::isFALSE() will be deprecated in the future. Please consider using
## base::isFALSE(x) or identical(x, FALSE) instead.
By default, the median data of molecular data is used to divided into two groups for log-rank test. It can be modified in corresponding paramters.
5.1.3.2 vis_unicox_tree()
Perform the Cox regression analysis of one molecule across pan-cancers. (Custom module)
- Basic use:
vis_unicox_tree(Gene= , data_type= , measure=)
By default, the median data of molecular data is used to divided into two groups for Cox regression analysis. It can be modified in corresponding paramters.
5.1.4 Dimension reduction
5.1.4.1 vis_dim_dist()
Perform dimension reduction analysis of multiple molecules for samples in groups. (Custom module)
- Basic use:
vis_dim_dist(ids=, data_type= ,group_info= )
# Firstly, prepare the grouping information of samples
group_info = tcga_clinical_fine %>%
dplyr::filter(Cancer=="BRCA") %>%
dplyr::select(Sample, Code) %>%
dplyr::rename(Group=Code)
head(group_info)
## # A tibble: 6 × 2
## Sample Group
## <chr> <chr>
## 1 TCGA-3C-AAAU-01 TP
## 2 TCGA-3C-AALI-01 TP
## 3 TCGA-3C-AALJ-01 TP
## 4 TCGA-3C-AALK-01 TP
## 5 TCGA-4H-AAAK-01 TP
## 6 TCGA-5L-AAT0-01 TP
ids = c("TP53", "KRAS", "PTEN", "MDM2", "CDKN1A")
vis_dim_dist(ids = ids, data_type = "mRNA",
group_info= group_info)
5.2 PCAWG analysis
Database | Type | Function |
---|---|---|
PCAWG | Comparison | vis_pcawg_dist() |
PCAWG | Correlation | vis_pcawg_gene_cor() |
PCAWG | Survival | vis_pcawg_unicox_tree() |
5.2.1 Comparsion analysis
5.2.1.1 vis_pcawg_dist()
Compare molecular value between tumor and normal samples across pan-cancer. (Custom module)
- Basic use:
vis_pcawg_dist(Gene= ,data_type= )
5.2.2 Correlation analysis
5.2.2.1 vis_pcawg_gene_cor()
Calculate the correlation between two molecules value in tumor samples of one cancer. (Custom module)
- Basic use:
vis_pcawg_gene_cor(Gene1= ,data_type1 = ,Gene2 = ,data_type2 = ,dcc_project_code_choose=)
5.2.3 Survival analysis
5.2.3.1 vis_pcawg_unicox_tree()
Perform the Cox regression analysis (OS) of one molecule across pan-cancers. (Custom module)
- Basic use:
vis_pcawg_unicox_tree(Gene= , data_type= )
By default, the median data of molecular data is used to divided into two groups for Cox regression analysis. It can be modified in corresponding paramters.
5.3 CCLE analysis
Database | Type | Function |
---|---|---|
CCLE | Comparison | vis_ccle_tpm() |
CCLE | Comparison | vis_gene_drug_response_diff() |
CCLE | Correlation | vis_ccle_gene_cor() |
CCLE | Correlation | vis_gene_drug_response_asso() |
5.3.1 Comparsion analysis
5.3.1.1 vis_ccle_tpm()
Compare molecular value among different tissues of cancer cell lines. (Custom module)
- Basic use:
vis_ccle_tpm(Gene= ,data_type= )
5.3.2 Correlation analysis
5.3.2.1 vis_ccle_gene_cor()
Calculate the correlation between two molecules value in one tissue type of cancer cell lines. (Custom module)
- Basic use:
vis_ccle_gene_cor(Gene1= ,data_type1= ,Gene2= ,data_type2= ,SitePrimary= )
vis_ccle_gene_cor(Gene1 = "CSF1R", data_type1 = "mRNA",
Gene2 = "JAK3", data_type2 = "mRNA",
SitePrimary = "prostate")
vis_gene_drug_response_diff()
andvis_gene_drug_response_asso()
are initially designed for drug pharmacogenomics analysis. In the updated shiny application, we have provided more comprehensive pharmacogenomics analysis.
5.4 General analysis
Database | Type | Function |
---|---|---|
General | Comparison | vis_identifier_grp_comparison() |
General | Correlation | vis_identifier_cor() |
General | Correlation | vis_identifier_multi_cor() |
General | Survival | vis_identifier_grp_surv() |
General | Dimension Reduction | vis_identifier_dim_dist() |
5.4.1 Comparison analysis
5.4.1.1 vis_identifier_grp_comparison()
Compare molecular value between custom groups based on one genomics matrix UCSC Xena dataset. (Custom module)
- Basic use:
vis_identifier_grp_comparison(dataset= , id= ,grp_df= )
# Firstly, prepare custom groups of samples
library(UCSCXenaTools)
cli_df <- XenaGenerate(
subset = XenaDatasets == "TCGA.LUAD.sampleMap/LUAD_clinicalMatrix"
) %>%
XenaQuery() %>%
XenaDownload() %>%
XenaPrepare()
grp_df = cli_df[, c("sampleID", "pathologic_M")] %>%
dplyr::filter(pathologic_M %in% c("M0", "M1", "MX"))
head(grp_df) # col-1: sample; col-2: grouping info
## # A tibble: 6 × 2
## sampleID pathologic_M
## <chr> <chr>
## 1 TCGA-05-4244-01 M1
## 2 TCGA-05-4249-01 M0
## 3 TCGA-05-4250-01 M0
## 4 TCGA-05-4382-01 M0
## 5 TCGA-05-4384-01 M0
## 6 TCGA-05-4389-01 M0
5.4.2 Correlation analysis
5.4.2.1 vis_identifier_cor()
Calculate the correlation between two molecules value from genomics matrix UCSC Xena datasets. (Custom module)
- Basic use:
vis_identifier_cor(dataset= ,id1= ,dataset= ,id2= )
dataset <- "TcgaTargetGtex_rsem_isoform_tpm"
vis_identifier_cor(dataset1 = dataset, id1 = "TP53",
dataset2 = dataset, id2 = "KRAS")
## Warning: package 'ggpubr' was built under R version 4.2.3
## Warning: package 'ggplot2' was built under R version 4.2.3
5.4.2.2 vis_identifier_multi_cor()
Calculate the pairwise correlation among multiple molecules value from one genomics matrix UCSC Xena dataset. (Custom module)
- Basic use:
vis_identifier_multi_cor(dataset= ,ids= )
5.4.3 Survival analysis
5.4.3.1 vis_identifier_grp_surv()
Perform the log-rank test of one molecule for one genomics matrix UCSC Xena dataset. (Custom module)
- Basic use:
vis_identifier_grp_surv(dataset= , id= , surv_df= )
# Firstly, prepare survival data of samples
library(UCSCXenaTools)
cli_df <- XenaGenerate(
subset = XenaDatasets == "TCGA.LUAD.sampleMap/LUAD_clinicalMatrix"
) %>%
XenaQuery() %>%
XenaDownload() %>%
XenaPrepare()
surv_df <- cli_df[, c("sampleID", "days_to_death", "vital_status")]
surv_df$vital_status <- ifelse(surv_df$vital_status == "DECEASED", 1, 0)
surv_df = na.omit(surv_df)
head(surv_df) # col-1: sample; col-2: survival time; col-3: survival status
## # A tibble: 6 × 3
## sampleID days_to_death vital_status
## <chr> <dbl> <dbl>
## 1 TCGA-05-4250-01 121 1
## 2 TCGA-05-4395-01 0 1
## 3 TCGA-05-4396-01 303 1
## 4 TCGA-05-4397-01 731 1
## 5 TCGA-05-4402-01 244 1
## 6 TCGA-05-4415-01 91 1
mol_dataset <- "TCGA.LUAD.sampleMap/HiSeqV2_percentile"
vis_identifier_grp_surv(dataset = mol_dataset, id = "KRAS",
surv_df = surv_df)
## Warning in do_once((if (is_R_CMD_check()) stop else warning)("The function
## xfun::isFALSE() will be deprecated in the future. Please ", : The function
## xfun::isFALSE() will be deprecated in the future. Please consider using
## base::isFALSE(x) or identical(x, FALSE) instead.
By default, the best cutoff is decided. User can change it through the
cutoff_mode
parameter.
5.4.4 Dimension reduction
5.4.4.1 vis_identifier_dim_dist()
Perform dimension reduction analysis of multiple molecules for samples in groups. (Custom module)
- Basic use:
vis_identifier_dim_dist(dataset= ,ids= , grp_df= )
# Firstly, prepare the grouping information of samples
library(UCSCXenaTools)
cli_dataset <- "TCGA.LUAD.sampleMap/LUAD_clinicalMatrix"
cli_df <- XenaGenerate(
subset = XenaDatasets == cli_dataset
) %>%
XenaQuery() %>%
XenaDownload() %>%
XenaPrepare()
grp_df = cli_df[, c("sampleID", "gender")]
head(grp_df) # col-1: sample; col-2: grouping info
## # A tibble: 6 × 2
## sampleID gender
## <chr> <chr>
## 1 TCGA-05-4244-01 MALE
## 2 TCGA-05-4249-01 MALE
## 3 TCGA-05-4250-01 FEMALE
## 4 TCGA-05-4382-01 MALE
## 5 TCGA-05-4384-01 MALE
## 6 TCGA-05-4389-01 MALE
mol_dataset <- "TCGA.LUAD.sampleMap/HiSeqV2_percentile"
ids = c("TP53", "KRAS", "PTEN", "MDM2", "CDKN1A")
vis_identifier_dim_dist(dataset = mol_dataset,
ids = ids,
grp_df = grp_df)