Chapter 5 Molecular data analysis
For each function, we will introduce its basic use and example output. Users can check all analysis or visualization parameters available by clicking the corresponding title link.
Among most functions, their first parameter is molecular identifier of one data type. Users can also design a molecular signature comprised of multiple molecules (e.g.
TP53 + 2 * KRAS - 1.3 * PTEN
).Users can also modify alternative datasets if available for one molecular type through the
opt_pancan
parameter. (see more viastr(.opt_pancan)
)
5.1 TCGA analysis
Database | Type | Function |
---|---|---|
TCGA | Comparison | vis_toil_TvsN() |
TCGA | Comparison | vis_toil_TvsN_cancer() |
TCGA | Comparison | vis_toil_Mut() |
TCGA | Comparison | vis_toil_Mut_cancer() |
TCGA | Correlation | vis_gene_cor() |
TCGA | Correlation | vis_gene_cor_cancer() |
TCGA | Correlation | vis_gene_TIL_cor() |
TCGA | Correlation | vis_gene_immune_cor() |
TCGA | Correlation | vis_gene_tmb_cor() |
TCGA | Correlation | vis_gene_msi_cor() |
TCGA | Correlation | vis_gene_stemness_cor() |
TCGA | Correlation | vis_gene_pw_cor() |
TCGA | Survival | tcga_surv_plot() |
TCGA | Survival | vis_unicox_tree() |
TCGA | Dimension Reduction | vis_dim_dist() |
5.1.1 Comparison analysis
5.1.1.1 vis_toil_TvsN()
Compare molecular value between tumor and normal samples across pan-cancer. (Custom module)
- Basic use:
vis_toil_TvsN(Gene=, data_type=)
Tips: For parameter
data_type
, one of 4 molecular types c(“mRNA”, “transcript”, “methylation”, “miRNA”) are supported.
5.1.1.2 vis_toil_TvsN_cancer()
Compare molecular value between tumor and normal samples in one cancer. (Custom module)
- Basic use:
vis_toil_TvsN_cancer(Gene=, data_type=, Cancer=)
Tips: For parameter
data_type
, all molelcuar types supported in functionquery_pancan_value()
are applicable.
5.1.1.3 vis_toil_Mut()
Compare molecular value between mutation and wild tumor samples across pan-cancer. (Custom module)
- Basic use:
vis_toil_Mut(mut_Gene=, Gene=, data_type=)
Tips: For parameter
data_type
, one of 4 molelcuar types c(“mRNA”, “transcript”, “methylation”, “miRNA”) are supported.
5.1.1.4 vis_toil_Mut_cancer()
Compare molecular value between mutation and wild tumor samples in one cancer. (Custom module)
- Basic use:
vis_toil_Mut_cancer(Gene=, data_type=, Cancer=)
Tips: For parameter
data_type
, all molelcuar types supported in functionquery_pancan_value()
are applicable.
5.1.2 Correlation analysis
5.1.2.1 vis_gene_cor()
Calculate the correlation between two molecules value in tumor samples of pan-cancers. (Custom module)
- Basic use:
vis_gene_cor(Gene1=, data_type1=, Gene2=, data_type2=)
5.1.2.2 vis_gene_cor_cancer()
Calculate the correlation between two molecules value in tumor samples of one cancer. (Custom module)
- Basic use:
vis_gene_cor_cancer(Gene1=, data_type1=, Gene2=, data_type2=, cancer_choose=)
vis_gene_cor_cancer(Gene1 = "CSF1R", data_type1 = "mRNA",
Gene2 = "JAK3", data_type2 = "mRNA",
cancer_choose = "ACC")
5.1.2.3 vis_gene_TIL_cor()
Calculate the correlation between one molecule and one type of TIL in tumor samples across pan-cancers. (Custom module)
- Basic use:
vis_gene_TIL_cor(Gene= ,data_type= ,sig=)
## [1] "CIBERSORT" "CIBERSORT-ABS" "EPIC" "MCPCOUNTER"
## [5] "QUANTISEQ" "TIMER" "XCELL"
sig = paste(tcga_ids$id_TIL$TIMER$Level3,
tcga_ids$id_TIL$TIMER$Level2, sep = "_")
vis_gene_TIL_cor(Gene = "TP53", data_type = "mRNA",
sig = sig)
5.1.2.4 vis_gene_immune_cor()
Calculate the correlation between one molecule and one type of Immune signature in tumor samples across pan-cancers. (Custom module)
- Basic use:
vis_gene_immune_cor(Gene= ,data_type= ,sig=)
tcga_pan_immune_signature <- load_data("tcga_pan_immune_signature")
table(tcga_pan_immune_signature$Source)
##
## Attractors Bindea c7atoms Cibersort ICR Wolf Yasin
## 9 25 32 20 3 68 3
5.1.2.5 vis_gene_tmb_cor()
Calculate the correlation between one molecule and TMB score in tumor samples across pan-cancers. (Custom module)
- Basic use:
vis_gene_tmb_cor(Gene= , data_type= )
5.1.2.6 vis_gene_msi_cor()
Calculate the correlation between one molecule and MSI score in tumor samples across pan-cancers. (Custom module)
- Basic use:
vis_gene_msi_cor(Gene= , data_type= )
5.1.2.7 vis_gene_stemness_cor()
Calculate the correlation between one molecule and stemness score in tumor samples across pan-cancers. (Custom module)
- Basic use:
vis_gene_stemness_cor(Gene= , data_type= )
5.1.2.8 vis_gene_pw_cor()
Calculate the correlation between one molecule and pathway score in tumor samples of one cancer. (Custom module)
- Basic use:
vis_gene_pw_cor(Gene= , data_type= )
vis_gene_pw_cor(Gene = "TP53", data_type = "mRNA",
pw_name = "HALLMARK_ADIPOGENESIS",
cancer_choose = "ACC")
5.1.3 Survival analysis
5.1.3.1 tcga_surv_plot()
Perform the log-rank test of one molecule for one cancer. (Custom module)
- Basic use:
tcga_surv_plot(data=, time= , status= )
# Firstly, prepare the molecular value as well as survival data
data <- tcga_surv_get(item = "TP53",profile = "mRNA",
TCGA_cohort = "LUAD")
head(data)
## # A tibble: 6 × 13
## sampleID value OS OS.time DSS DSS.time DFI DFI.time PFI PFI.time
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 TCGA-05-4420… 4.51 0 912 0 912 0 912 0 912
## 2 TCGA-91-6840… 5.90 0 372 0 372 0 372 0 372
## 3 TCGA-44-6778… 5.30 0 1864 0 1864 0 1864 0 1864
## 4 TCGA-67-3774… 5.22 0 385 0 385 NA NA 0 385
## 5 TCGA-64-1679… 5.46 0 2488 0 2488 0 2488 0 2488
## 6 TCGA-55-6982… 4.54 1 995 1 995 NA NA 1 183
## # ℹ 3 more variables: gender <chr>, age <dbl>, stage <chr>
## Warning in do_once((if (is_R_CMD_check()) stop else warning)("The function
## xfun::isFALSE() will be deprecated in the future. Please ", : The function
## xfun::isFALSE() will be deprecated in the future. Please consider using
## base::isFALSE(x) or identical(x, FALSE) instead.
By default, the median data of molecular data is used to divided into two groups for log-rank test. It can be modified in corresponding paramters.
5.1.3.2 vis_unicox_tree()
Perform the Cox regression analysis of one molecule across pan-cancers. (Custom module)
- Basic use:
vis_unicox_tree(Gene= , data_type= , measure=)
By default, the median data of molecular data is used to divided into two groups for Cox regression analysis. It can be modified in corresponding paramters.
5.1.4 Dimension reduction
5.1.4.1 vis_dim_dist()
Perform dimension reduction analysis of multiple molecules for samples in groups. (Custom module)
- Basic use:
vis_dim_dist(ids=, data_type= ,group_info= )
# Firstly, prepare the grouping information of samples
group_info = tcga_clinical_fine %>%
dplyr::filter(Cancer=="BRCA") %>%
dplyr::select(Sample, Code) %>%
dplyr::rename(Group=Code)
head(group_info)
## # A tibble: 6 × 2
## Sample Group
## <chr> <chr>
## 1 TCGA-3C-AAAU-01 TP
## 2 TCGA-3C-AALI-01 TP
## 3 TCGA-3C-AALJ-01 TP
## 4 TCGA-3C-AALK-01 TP
## 5 TCGA-4H-AAAK-01 TP
## 6 TCGA-5L-AAT0-01 TP
ids = c("TP53", "KRAS", "PTEN", "MDM2", "CDKN1A")
vis_dim_dist(ids = ids, data_type = "mRNA",
group_info= group_info)
5.2 PCAWG analysis
Database | Type | Function |
---|---|---|
PCAWG | Comparison | vis_pcawg_dist() |
PCAWG | Correlation | vis_pcawg_gene_cor() |
PCAWG | Survival | vis_pcawg_unicox_tree() |
5.2.1 Comparsion analysis
5.2.1.1 vis_pcawg_dist()
Compare molecular value between tumor and normal samples across pan-cancer. (Custom module)
- Basic use:
vis_pcawg_dist(Gene= ,data_type= )
5.2.2 Correlation analysis
5.2.2.1 vis_pcawg_gene_cor()
Calculate the correlation between two molecules value in tumor samples of one cancer. (Custom module)
- Basic use:
vis_pcawg_gene_cor(Gene1= ,data_type1 = ,Gene2 = ,data_type2 = ,dcc_project_code_choose=)
5.2.3 Survival analysis
5.2.3.1 vis_pcawg_unicox_tree()
Perform the Cox regression analysis (OS) of one molecule across pan-cancers. (Custom module)
- Basic use:
vis_pcawg_unicox_tree(Gene= , data_type= )
By default, the median data of molecular data is used to divided into two groups for Cox regression analysis. It can be modified in corresponding paramters.
5.3 CCLE analysis
Database | Type | Function |
---|---|---|
CCLE | Comparison | vis_ccle_tpm() |
CCLE | Comparison | vis_gene_drug_response_diff() |
CCLE | Correlation | vis_ccle_gene_cor() |
CCLE | Correlation | vis_gene_drug_response_asso() |
5.3.1 Comparsion analysis
5.3.1.1 vis_ccle_tpm()
Compare molecular value among different tissues of cancer cell lines. (Custom module)
- Basic use:
vis_ccle_tpm(Gene= ,data_type= )
5.3.2 Correlation analysis
5.3.2.1 vis_ccle_gene_cor()
Calculate the correlation between two molecules value in one tissue type of cancer cell lines. (Custom module)
- Basic use:
vis_ccle_gene_cor(Gene1= ,data_type1= ,Gene2= ,data_type2= ,SitePrimary= )
vis_ccle_gene_cor(Gene1 = "CSF1R", data_type1 = "mRNA",
Gene2 = "JAK3", data_type2 = "mRNA",
SitePrimary = "prostate")
vis_gene_drug_response_diff()
andvis_gene_drug_response_asso()
are initially designed for drug pharmacogenomics analysis. In the updated shiny application, we have provided more comprehensive pharmacogenomics analysis.
5.4 General analysis
Database | Type | Function |
---|---|---|
General | Comparison | vis_identifier_grp_comparison() |
General | Correlation | vis_identifier_cor() |
General | Correlation | vis_identifier_multi_cor() |
General | Survival | vis_identifier_grp_surv() |
General | Dimension Reduction | vis_identifier_dim_dist() |
5.4.1 Comparison analysis
5.4.1.1 vis_identifier_grp_comparison()
Compare molecular value between custom groups based on one genomics matrix UCSC Xena dataset. (Custom module)
- Basic use:
vis_identifier_grp_comparison(dataset= , id= ,grp_df= )
# Firstly, prepare custom groups of samples
library(UCSCXenaTools)
cli_df <- XenaGenerate(
subset = XenaDatasets == "TCGA.LUAD.sampleMap/LUAD_clinicalMatrix"
) %>%
XenaQuery() %>%
XenaDownload() %>%
XenaPrepare()
grp_df = cli_df[, c("sampleID", "pathologic_M")] %>%
dplyr::filter(pathologic_M %in% c("M0", "M1", "MX"))
head(grp_df) # col-1: sample; col-2: grouping info
## # A tibble: 6 × 2
## sampleID pathologic_M
## <chr> <chr>
## 1 TCGA-05-4244-01 M1
## 2 TCGA-05-4249-01 M0
## 3 TCGA-05-4250-01 M0
## 4 TCGA-05-4382-01 M0
## 5 TCGA-05-4384-01 M0
## 6 TCGA-05-4389-01 M0
mol_dataset <- "TCGA.LUAD.sampleMap/HiSeqV2_percentile"
vis_identifier_grp_comparison(dataset = mol_dataset, id = "TP53",
grp_df = grp_df)
## Warning in min(x): no non-missing arguments to min; returning Inf
## Warning in max(x): no non-missing arguments to max; returning -Inf
5.4.2 Correlation analysis
5.4.2.1 vis_identifier_cor()
Calculate the correlation between two molecules value from genomics matrix UCSC Xena datasets. (Custom module)
- Basic use:
vis_identifier_cor(dataset= ,id1= ,dataset= ,id2= )
dataset <- "TcgaTargetGtex_rsem_isoform_tpm"
vis_identifier_cor(dataset1 = dataset, id1 = "TP53",
dataset2 = dataset, id2 = "KRAS")
## Warning: package 'ggpubr' was built under R version 4.2.3
## Warning: package 'ggplot2' was built under R version 4.2.3
5.4.2.2 vis_identifier_multi_cor()
Calculate the pairwise correlation among multiple molecules value from one genomics matrix UCSC Xena dataset. (Custom module)
- Basic use:
vis_identifier_multi_cor(dataset= ,ids= )
5.4.3 Survival analysis
5.4.3.1 vis_identifier_grp_surv()
Perform the log-rank test of one molecule for one genomics matrix UCSC Xena dataset. (Custom module)
- Basic use:
vis_identifier_grp_surv(dataset= , id= , surv_df= )
# Firstly, prepare survival data of samples
library(UCSCXenaTools)
cli_df <- XenaGenerate(
subset = XenaDatasets == "TCGA.LUAD.sampleMap/LUAD_clinicalMatrix"
) %>%
XenaQuery() %>%
XenaDownload() %>%
XenaPrepare()
surv_df <- cli_df[, c("sampleID", "days_to_death", "vital_status")]
surv_df$vital_status <- ifelse(surv_df$vital_status == "DECEASED", 1, 0)
surv_df = na.omit(surv_df)
head(surv_df) # col-1: sample; col-2: survival time; col-3: survival status
## # A tibble: 6 × 3
## sampleID days_to_death vital_status
## <chr> <dbl> <dbl>
## 1 TCGA-05-4250-01 121 1
## 2 TCGA-05-4395-01 0 1
## 3 TCGA-05-4396-01 303 1
## 4 TCGA-05-4397-01 731 1
## 5 TCGA-05-4402-01 244 1
## 6 TCGA-05-4415-01 91 1
mol_dataset <- "TCGA.LUAD.sampleMap/HiSeqV2_percentile"
vis_identifier_grp_surv(dataset = mol_dataset, id = "KRAS",
surv_df = surv_df)
## Warning in do_once((if (is_R_CMD_check()) stop else warning)("The function
## xfun::isFALSE() will be deprecated in the future. Please ", : The function
## xfun::isFALSE() will be deprecated in the future. Please consider using
## base::isFALSE(x) or identical(x, FALSE) instead.
By default, the best cutoff is decided. User can change it through the
cutoff_mode
parameter.
5.4.4 Dimension reduction
5.4.4.1 vis_identifier_dim_dist()
Perform dimension reduction analysis of multiple molecules for samples in groups. (Custom module)
- Basic use:
vis_identifier_dim_dist(dataset= ,ids= , grp_df= )
# Firstly, prepare the grouping information of samples
library(UCSCXenaTools)
cli_dataset <- "TCGA.LUAD.sampleMap/LUAD_clinicalMatrix"
cli_df <- XenaGenerate(
subset = XenaDatasets == cli_dataset
) %>%
XenaQuery() %>%
XenaDownload() %>%
XenaPrepare()
grp_df = cli_df[, c("sampleID", "gender")]
head(grp_df) # col-1: sample; col-2: grouping info
## # A tibble: 6 × 2
## sampleID gender
## <chr> <chr>
## 1 TCGA-05-4244-01 MALE
## 2 TCGA-05-4249-01 MALE
## 3 TCGA-05-4250-01 FEMALE
## 4 TCGA-05-4382-01 MALE
## 5 TCGA-05-4384-01 MALE
## 6 TCGA-05-4389-01 MALE
mol_dataset <- "TCGA.LUAD.sampleMap/HiSeqV2_percentile"
ids = c("TP53", "KRAS", "PTEN", "MDM2", "CDKN1A")
vis_identifier_dim_dist(dataset = mol_dataset,
ids = ids,
grp_df = grp_df)