Chapter 5 Molecular data analysis

  1. For each function, we will introduce its basic use and example output. Users can check all analysis or visualization parameters available by clicking the corresponding title link.

  2. Among most functions, their first parameter is molecular identifier of one data type. Users can also design a molecular signature comprised of multiple molecules (e.g. TP53 + 2 * KRAS - 1.3 * PTEN).

  3. Users can also modify alternative datasets if available for one molecular type through the opt_pancan parameter. (see more via str(.opt_pancan))

5.1 TCGA analysis

Table 5.1: Specilized functions to analyze TCGA molecular data
Database Type Function
TCGA Comparison vis_toil_TvsN()
TCGA Comparison vis_toil_TvsN_cancer()
TCGA Comparison vis_toil_Mut()
TCGA Comparison vis_toil_Mut_cancer()
TCGA Correlation vis_gene_cor()
TCGA Correlation vis_gene_cor_cancer()
TCGA Correlation vis_gene_TIL_cor()
TCGA Correlation vis_gene_immune_cor()
TCGA Correlation vis_gene_tmb_cor()
TCGA Correlation vis_gene_msi_cor()
TCGA Correlation vis_gene_stemness_cor()
TCGA Correlation vis_gene_pw_cor()
TCGA Survival tcga_surv_plot()
TCGA Survival vis_unicox_tree()
TCGA Dimension Reduction vis_dim_dist()

5.1.1 Comparison analysis

5.1.1.1 vis_toil_TvsN()

Compare molecular value between tumor and normal samples across pan-cancer. (Custom module)

  • Basic use: vis_toil_TvsN(Gene=, data_type=)
vis_toil_TvsN(Gene="TP53", data_type = "mRNA")
The difference of mRNA TP53 across pan-cancer

Figure 5.1: The difference of mRNA TP53 across pan-cancer

Tips: For parameter data_type, one of 4 molecular types c(“mRNA”, “transcript”, “methylation”, “miRNA”) are supported.

5.1.1.2 vis_toil_TvsN_cancer()

Compare molecular value between tumor and normal samples in one cancer. (Custom module)

  • Basic use: vis_toil_TvsN_cancer(Gene=, data_type=, Cancer=)
vis_toil_TvsN_cancer(Gene="TP53", data_type = "mRNA", Cancer = "BRCA")
The difference of mRNA TP53 in ACC cancer

Figure 5.2: The difference of mRNA TP53 in ACC cancer

Tips: For parameter data_type, all molelcuar types supported in function query_pancan_value() are applicable.

5.1.1.3 vis_toil_Mut()

Compare molecular value between mutation and wild tumor samples across pan-cancer. (Custom module)

  • Basic use: vis_toil_Mut(mut_Gene=, Gene=, data_type=)
vis_toil_Mut(mut_Gene = "TP53", Gene = "TNF", data_type = "mRNA")
The difference of mRNA TNF between TP53-mut and TP53-wild tumor samples across pan-cancer

Figure 5.3: The difference of mRNA TNF between TP53-mut and TP53-wild tumor samples across pan-cancer

Tips: For parameter data_type, one of 4 molelcuar types c(“mRNA”, “transcript”, “methylation”, “miRNA”) are supported.

5.1.1.4 vis_toil_Mut_cancer()

Compare molecular value between mutation and wild tumor samples in one cancer. (Custom module)

  • Basic use: vis_toil_Mut_cancer(Gene=, data_type=, Cancer=)
vis_toil_Mut_cancer(mut_Gene = "TP53", Gene = "TNF", data_type = "mRNA", Cancer = "BRCA")
The difference of mRNA TNF between TP53-mut and TP53-wild tumor samples in BRCA cancer

Figure 5.4: The difference of mRNA TNF between TP53-mut and TP53-wild tumor samples in BRCA cancer

Tips: For parameter data_type, all molelcuar types supported in function query_pancan_value() are applicable.

5.1.2 Correlation analysis

5.1.2.1 vis_gene_cor()

Calculate the correlation between two molecules value in tumor samples of pan-cancers. (Custom module)

  • Basic use: vis_gene_cor(Gene1=, data_type1=, Gene2=, data_type2=)
vis_gene_cor(Gene1 = "CSF1R", data_type1 = "mRNA", Gene2 = "JAK3", data_type2 = "mRNA")
The correlation between mRNA CSF1R and mRNA JAK3 in tumor samples of pan-cancers

Figure 5.5: The correlation between mRNA CSF1R and mRNA JAK3 in tumor samples of pan-cancers

5.1.2.2 vis_gene_cor_cancer()

Calculate the correlation between two molecules value in tumor samples of one cancer. (Custom module)

  • Basic use: vis_gene_cor_cancer(Gene1=, data_type1=, Gene2=, data_type2=, cancer_choose=)
vis_gene_cor_cancer(Gene1 = "CSF1R", data_type1 = "mRNA", 
                    Gene2 = "JAK3", data_type2 = "mRNA", 
                    cancer_choose = "ACC")
The correlation between mRNA CSF1R and mRNA JAK3 in tumor samples of ACC cancer

Figure 5.6: The correlation between mRNA CSF1R and mRNA JAK3 in tumor samples of ACC cancer

5.1.2.3 vis_gene_TIL_cor()

Calculate the correlation between one molecule and one type of TIL in tumor samples across pan-cancers. (Custom module)

  • Basic use: vis_gene_TIL_cor(Gene= ,data_type= ,sig=)
tcga_ids = load_data("pancan_identifier_help")
names(tcga_ids$id_TIL)
## [1] "CIBERSORT"     "CIBERSORT-ABS" "EPIC"          "MCPCOUNTER"   
## [5] "QUANTISEQ"     "TIMER"         "XCELL"
sig = paste(tcga_ids$id_TIL$TIMER$Level3,
            tcga_ids$id_TIL$TIMER$Level2, sep = "_")
vis_gene_TIL_cor(Gene = "TP53", data_type = "mRNA",
                 sig = sig)
The correlation between mRNA TP53 and TIMER TIL in tumor samples across pan-cancers

Figure 5.7: The correlation between mRNA TP53 and TIMER TIL in tumor samples across pan-cancers

5.1.2.4 vis_gene_immune_cor()

Calculate the correlation between one molecule and one type of Immune signature in tumor samples across pan-cancers. (Custom module)

  • Basic use: vis_gene_immune_cor(Gene= ,data_type= ,sig=)
tcga_pan_immune_signature <- load_data("tcga_pan_immune_signature")
table(tcga_pan_immune_signature$Source)
## 
## Attractors     Bindea    c7atoms  Cibersort        ICR       Wolf      Yasin 
##          9         25         32         20          3         68          3
vis_gene_immune_cor(Gene = "TP53", data_type = "mRNA",
                    Immune_sig_type = "Cibersort")
The correlation between mRNA TP53 and Cibersort signature in tumor samples across pan-cancers

Figure 5.8: The correlation between mRNA TP53 and Cibersort signature in tumor samples across pan-cancers

5.1.2.5 vis_gene_tmb_cor()

Calculate the correlation between one molecule and TMB score in tumor samples across pan-cancers. (Custom module)

  • Basic use: vis_gene_tmb_cor(Gene= , data_type= )
vis_gene_tmb_cor(Gene = "TP53", data_type = "mRNA")
The correlation between mRNA TP53 and TMB score in tumor samples across pan-cancers

Figure 5.9: The correlation between mRNA TP53 and TMB score in tumor samples across pan-cancers

5.1.2.6 vis_gene_msi_cor()

Calculate the correlation between one molecule and MSI score in tumor samples across pan-cancers. (Custom module)

  • Basic use: vis_gene_msi_cor(Gene= , data_type= )
vis_gene_msi_cor(Gene = "TP53", data_type = "mRNA")
The correlation between mRNA TP53 and MSI score in tumor samples across pan-cancers

Figure 5.10: The correlation between mRNA TP53 and MSI score in tumor samples across pan-cancers

5.1.2.7 vis_gene_stemness_cor()

Calculate the correlation between one molecule and stemness score in tumor samples across pan-cancers. (Custom module)

  • Basic use: vis_gene_stemness_cor(Gene= , data_type= )
vis_gene_stemness_cor(Gene = "TP53", data_type = "mRNA")
The correlation between mRNA TP53 and stemness score in tumor samples across pan-cancers

Figure 5.11: The correlation between mRNA TP53 and stemness score in tumor samples across pan-cancers

5.1.2.8 vis_gene_pw_cor()

Calculate the correlation between one molecule and pathway score in tumor samples of one cancer. (Custom module)

  • Basic use: vis_gene_pw_cor(Gene= , data_type= )
vis_gene_pw_cor(Gene = "TP53", data_type = "mRNA", 
                pw_name = "HALLMARK_ADIPOGENESIS",
                cancer_choose = "ACC")
The correlation between mRNA TP53 and pathway score in tumor samples in ACC cancer

Figure 5.12: The correlation between mRNA TP53 and pathway score in tumor samples in ACC cancer

5.1.3 Survival analysis

5.1.3.1 tcga_surv_plot()

Perform the log-rank test of one molecule for one cancer. (Custom module)

  • Basic use: tcga_surv_plot(data=, time= , status= )
# Firstly, prepare the molecular value as well as survival data
data <- tcga_surv_get(item = "TP53",profile = "mRNA",
                      TCGA_cohort = "LUAD")
head(data)
## # A tibble: 6 × 13
##   sampleID      value    OS OS.time   DSS DSS.time   DFI DFI.time   PFI PFI.time
##   <chr>         <dbl> <dbl>   <dbl> <dbl>    <dbl> <dbl>    <dbl> <dbl>    <dbl>
## 1 TCGA-05-4420…  4.51     0     912     0      912     0      912     0      912
## 2 TCGA-91-6840…  5.90     0     372     0      372     0      372     0      372
## 3 TCGA-44-6778…  5.30     0    1864     0     1864     0     1864     0     1864
## 4 TCGA-67-3774…  5.22     0     385     0      385    NA       NA     0      385
## 5 TCGA-64-1679…  5.46     0    2488     0     2488     0     2488     0     2488
## 6 TCGA-55-6982…  4.54     1     995     1      995    NA       NA     1      183
## # ℹ 3 more variables: gender <chr>, age <dbl>, stage <chr>
tcga_surv_plot(data, time = "DSS.time", status = "DSS") # OS/DSS/DFI/PFI
## Warning in do_once((if (is_R_CMD_check()) stop else warning)("The function
## xfun::isFALSE() will be deprecated in the future. Please ", : The function
## xfun::isFALSE() will be deprecated in the future. Please consider using
## base::isFALSE(x) or identical(x, FALSE) instead.
The log-rank test (DSS) of mRNA TP53 for LUAD cancer

Figure 5.13: The log-rank test (DSS) of mRNA TP53 for LUAD cancer

By default, the median data of molecular data is used to divided into two groups for log-rank test. It can be modified in corresponding paramters.

5.1.3.2 vis_unicox_tree()

Perform the Cox regression analysis of one molecule across pan-cancers. (Custom module)

  • Basic use: vis_unicox_tree(Gene= , data_type= , measure=)
vis_unicox_tree(Gene = "PTEN", data_type = "mRNA", measure = "OS")
The Cox regression analysis (OS) of mRNA PTEN across pan-cancers

Figure 5.14: The Cox regression analysis (OS) of mRNA PTEN across pan-cancers

By default, the median data of molecular data is used to divided into two groups for Cox regression analysis. It can be modified in corresponding paramters.

5.1.4 Dimension reduction

5.1.4.1 vis_dim_dist()

Perform dimension reduction analysis of multiple molecules for samples in groups. (Custom module)

  • Basic use: vis_dim_dist(ids=, data_type= ,group_info= )
# Firstly, prepare the grouping information of samples 
group_info = tcga_clinical_fine %>% 
  dplyr::filter(Cancer=="BRCA") %>% 
  dplyr::select(Sample, Code) %>% 
  dplyr::rename(Group=Code)
head(group_info)
## # A tibble: 6 × 2
##   Sample          Group
##   <chr>           <chr>
## 1 TCGA-3C-AAAU-01 TP   
## 2 TCGA-3C-AALI-01 TP   
## 3 TCGA-3C-AALJ-01 TP   
## 4 TCGA-3C-AALK-01 TP   
## 5 TCGA-4H-AAAK-01 TP   
## 6 TCGA-5L-AAT0-01 TP
ids = c("TP53", "KRAS", "PTEN", "MDM2", "CDKN1A")
vis_dim_dist(ids = ids, data_type = "mRNA", 
             group_info= group_info)
The dimension reduction analysis (PCA) of 5 mRNA molcules in BRCA cancer samples grouped by tissue codes

Figure 5.15: The dimension reduction analysis (PCA) of 5 mRNA molcules in BRCA cancer samples grouped by tissue codes

5.2 PCAWG analysis

Table 5.2: Specilized functions to analyze PCAWG molecular data
Database Type Function
PCAWG Comparison vis_pcawg_dist()
PCAWG Correlation vis_pcawg_gene_cor()
PCAWG Survival vis_pcawg_unicox_tree()

5.2.1 Comparsion analysis

5.2.1.1 vis_pcawg_dist()

Compare molecular value between tumor and normal samples across pan-cancer. (Custom module)

  • Basic use: vis_pcawg_dist(Gene= ,data_type= )
vis_pcawg_dist(Gene = "TP53", data_type = "mRNA")

5.2.2 Correlation analysis

5.2.2.1 vis_pcawg_gene_cor()

Calculate the correlation between two molecules value in tumor samples of one cancer. (Custom module)

  • Basic use: vis_pcawg_gene_cor(Gene1= ,data_type1 = ,Gene2 = ,data_type2 = ,dcc_project_code_choose=)
vis_pcawg_gene_cor(Gene1 = "CSF1R", data_type1 = "mRNA",
                   Gene2 = "JAK3", data_type2 = "mRNA",
                   dcc_project_code_choose = "BLCA-US")

5.2.3 Survival analysis

5.2.3.1 vis_pcawg_unicox_tree()

Perform the Cox regression analysis (OS) of one molecule across pan-cancers. (Custom module)

  • Basic use: vis_pcawg_unicox_tree(Gene= , data_type= )
vis_pcawg_unicox_tree(Gene = "TP53", data_type = "mRNA")
The Cox regression analysis (OS) of mRNA TP53 across pan-cancers

Figure 5.16: The Cox regression analysis (OS) of mRNA TP53 across pan-cancers

By default, the median data of molecular data is used to divided into two groups for Cox regression analysis. It can be modified in corresponding paramters.

5.3 CCLE analysis

Table 5.3: Specilized functions to analyze CCLE molecular data
Database Type Function
CCLE Comparison vis_ccle_tpm()
CCLE Comparison vis_gene_drug_response_diff()
CCLE Correlation vis_ccle_gene_cor()
CCLE Correlation vis_gene_drug_response_asso()

5.3.1 Comparsion analysis

5.3.1.1 vis_ccle_tpm()

Compare molecular value among different tissues of cancer cell lines. (Custom module)

  • Basic use: vis_ccle_tpm(Gene= ,data_type= )
vis_ccle_tpm(Gene = "TP53", data_type = "mRNA")

5.3.2 Correlation analysis

5.3.2.1 vis_ccle_gene_cor()

Calculate the correlation between two molecules value in one tissue type of cancer cell lines. (Custom module)

  • Basic use: vis_ccle_gene_cor(Gene1= ,data_type1= ,Gene2= ,data_type2= ,SitePrimary= )
vis_ccle_gene_cor(Gene1 = "CSF1R", data_type1 = "mRNA", 
                  Gene2 = "JAK3", data_type2 = "mRNA", 
                  SitePrimary = "prostate")

vis_gene_drug_response_diff() and vis_gene_drug_response_asso() are initially designed for drug pharmacogenomics analysis. In the updated shiny application, we have provided more comprehensive pharmacogenomics analysis.

5.4 General analysis

Table 5.4: Specilized functions to analyze General molecular data
Database Type Function
General Comparison vis_identifier_grp_comparison()
General Correlation vis_identifier_cor()
General Correlation vis_identifier_multi_cor()
General Survival vis_identifier_grp_surv()
General Dimension Reduction vis_identifier_dim_dist()

5.4.1 Comparison analysis

5.4.1.1 vis_identifier_grp_comparison()

Compare molecular value between custom groups based on one genomics matrix UCSC Xena dataset. (Custom module)

  • Basic use: vis_identifier_grp_comparison(dataset= , id= ,grp_df= )
# Firstly, prepare custom groups of samples
library(UCSCXenaTools)
cli_df <- XenaGenerate(
  subset = XenaDatasets == "TCGA.LUAD.sampleMap/LUAD_clinicalMatrix"
) %>%
  XenaQuery() %>%
  XenaDownload() %>%
  XenaPrepare()
grp_df = cli_df[, c("sampleID", "pathologic_M")] %>%
  dplyr::filter(pathologic_M %in% c("M0", "M1", "MX"))
head(grp_df) # col-1: sample; col-2: grouping info
## # A tibble: 6 × 2
##   sampleID        pathologic_M
##   <chr>           <chr>       
## 1 TCGA-05-4244-01 M1          
## 2 TCGA-05-4249-01 M0          
## 3 TCGA-05-4250-01 M0          
## 4 TCGA-05-4382-01 M0          
## 5 TCGA-05-4384-01 M0          
## 6 TCGA-05-4389-01 M0
mol_dataset <- "TCGA.LUAD.sampleMap/HiSeqV2_percentile"
vis_identifier_grp_comparison(dataset = mol_dataset, id = "TP53", 
                              grp_df = grp_df)
## Warning in min(x): no non-missing arguments to min; returning Inf
## Warning in max(x): no non-missing arguments to max; returning -Inf

5.4.2 Correlation analysis

5.4.2.1 vis_identifier_cor()

Calculate the correlation between two molecules value from genomics matrix UCSC Xena datasets. (Custom module)

  • Basic use: vis_identifier_cor(dataset= ,id1= ,dataset= ,id2= )
dataset <- "TcgaTargetGtex_rsem_isoform_tpm"
vis_identifier_cor(dataset1 = dataset, id1 = "TP53",
                   dataset2 = dataset, id2 = "KRAS")
## Warning: package 'ggpubr' was built under R version 4.2.3
## Warning: package 'ggplot2' was built under R version 4.2.3

5.4.2.2 vis_identifier_multi_cor()

Calculate the pairwise correlation among multiple molecules value from one genomics matrix UCSC Xena dataset. (Custom module)

  • Basic use: vis_identifier_multi_cor(dataset= ,ids= )
dataset <- "TcgaTargetGtex_rsem_isoform_tpm"
vis_identifier_multi_cor(dataset = dataset,
                         ids = c("TP53", "KRAS", "PTEN"))

5.4.3 Survival analysis

5.4.3.1 vis_identifier_grp_surv()

Perform the log-rank test of one molecule for one genomics matrix UCSC Xena dataset. (Custom module)

  • Basic use: vis_identifier_grp_surv(dataset= , id= , surv_df= )
# Firstly, prepare survival data of samples
library(UCSCXenaTools)
cli_df <- XenaGenerate(
  subset = XenaDatasets == "TCGA.LUAD.sampleMap/LUAD_clinicalMatrix"
) %>%
  XenaQuery() %>%
  XenaDownload() %>%
  XenaPrepare()
surv_df <- cli_df[, c("sampleID", "days_to_death", "vital_status")]
surv_df$vital_status <- ifelse(surv_df$vital_status == "DECEASED", 1, 0)
surv_df = na.omit(surv_df)
head(surv_df)  # col-1: sample; col-2: survival time; col-3: survival status
## # A tibble: 6 × 3
##   sampleID        days_to_death vital_status
##   <chr>                   <dbl>        <dbl>
## 1 TCGA-05-4250-01           121            1
## 2 TCGA-05-4395-01             0            1
## 3 TCGA-05-4396-01           303            1
## 4 TCGA-05-4397-01           731            1
## 5 TCGA-05-4402-01           244            1
## 6 TCGA-05-4415-01            91            1
mol_dataset <- "TCGA.LUAD.sampleMap/HiSeqV2_percentile"
vis_identifier_grp_surv(dataset = mol_dataset, id = "KRAS", 
                        surv_df = surv_df)
## Warning in do_once((if (is_R_CMD_check()) stop else warning)("The function
## xfun::isFALSE() will be deprecated in the future. Please ", : The function
## xfun::isFALSE() will be deprecated in the future. Please consider using
## base::isFALSE(x) or identical(x, FALSE) instead.
The log-rank test (DSS) of mRNA KRAS for ne specific dataset

Figure 5.17: The log-rank test (DSS) of mRNA KRAS for ne specific dataset

By default, the best cutoff is decided. User can change it through the cutoff_mode parameter.

5.4.4 Dimension reduction

5.4.4.1 vis_identifier_dim_dist()

Perform dimension reduction analysis of multiple molecules for samples in groups. (Custom module)

  • Basic use: vis_identifier_dim_dist(dataset= ,ids= , grp_df= )
# Firstly, prepare the grouping information of samples 
library(UCSCXenaTools)
cli_dataset <- "TCGA.LUAD.sampleMap/LUAD_clinicalMatrix"
cli_df <- XenaGenerate(
  subset = XenaDatasets == cli_dataset
) %>%
  XenaQuery() %>%
  XenaDownload() %>%
  XenaPrepare()
grp_df = cli_df[, c("sampleID", "gender")]
head(grp_df) # col-1: sample; col-2: grouping info
## # A tibble: 6 × 2
##   sampleID        gender
##   <chr>           <chr> 
## 1 TCGA-05-4244-01 MALE  
## 2 TCGA-05-4249-01 MALE  
## 3 TCGA-05-4250-01 FEMALE
## 4 TCGA-05-4382-01 MALE  
## 5 TCGA-05-4384-01 MALE  
## 6 TCGA-05-4389-01 MALE
mol_dataset <- "TCGA.LUAD.sampleMap/HiSeqV2_percentile"
ids = c("TP53", "KRAS", "PTEN", "MDM2", "CDKN1A")
vis_identifier_dim_dist(dataset = mol_dataset,
                        ids = ids, 
                        grp_df = grp_df)
The dimension reduction analysis (PCA) of 5 mRNA molcules in BRCA cancer samples grouped by tissue codes

Figure 5.18: The dimension reduction analysis (PCA) of 5 mRNA molcules in BRCA cancer samples grouped by tissue codes