(1)NMF是非负矩阵分解(Non-negative Matrix Factorization)的缩写。它是将一个非负数据矩阵分解为两个非负矩阵的乘积,其中一个矩阵表示特征的基矩阵,另一个矩阵表示每个样本在这些特征上的系数矩阵。这样的分解可以将原始数据表示为一组非负基向量的加权组合,从而实现数据的降维和特征提取。

Non-Negative Matrix Factorization - GeeksforGeeks

(2)结合Bulk RNA-seq表达矩阵(Original Matrix)可以更直观地理解为拆分成:基因与细胞类型矩阵(基矩阵),以及细胞比例与样本矩阵(系数矩阵)。下面结合R包NMF学习其基础用法

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# 模拟数据:100个样本的30个基因表达矩阵
set.seed(42)
exp_mat = matrix(abs(rnorm(3000)), nrow = 30) 
colnames(exp_mat) = paste0("sp",1:100)
rownames(exp_mat) = paste0("gene",1:30)
dim(exp_mat)
# [1]  30 100
exp_mat[1:4,1:4]
#             sp1       sp2       sp3       sp4
# gene1 1.3709584 0.4554501 0.3672346 1.3921164
# gene2 0.5646982 0.7048373 0.1852306 0.4761739
# gene3 0.3631284 1.0351035 0.5818237 0.6503486
# gene4 0.6328626 0.6089264 1.3997368 1.3911105

1、基础用法

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
library(NMF)

# 支持算法:c("brunet","lee","ls-nmf","nsNMF","offset","pe-nmf","snmf/r","snmf/l") 
nmf_model <- nmf(exp_mat, rank = 3, method = "brunet", seed = 42)

# 获取基矩阵(基因--细胞)
basis_matrix <- basis(nmf_model)
dim(basis_matrix)
# [1] 30  3
head(basis_matrix)
#            [,1]      [,2]      [,3]
# gene1 1.7436283 0.7842223 3.0718832
# gene2 1.0542018 0.9599996 2.8275129
# gene3 2.7498596 1.4277110 1.0828163
# gene4 0.8918389 1.8432274 2.7861665
# gene5 4.2662810 0.8883752 0.3467359
# gene6 5.8054591 0.6420813 0.2967837

# 获取系数矩阵(细胞--样本)
coef_matrix <- coef(nmf_model)
dim(coef_matrix)
# [1]   3 100
head(coef_matrix[,1:3])
#              sp1        sp2        sp3
# [1,] 0.002897117 0.02866644 0.02137146
# [2,] 0.131645724 0.14692969 0.04635707
# [3,] 0.022945508 0.05598917 0.11973042
# [4,] 0.169223269 0.06491003 0.13558182
# [5,] 0.088389864 0.10569255 0.12587923

2、挑选Rank

1
2
3
4
5
6
# 遍历2~10种候选值,每个重复10次
nmf_model_candi <- nmf(exp_mat, rank = 2:10, method = "brunet", nrun=10)

pdf("candi_ranks.pdf", width=8, height=8)
plot(nmf_model_candi)
dev.off()
image-20230617211813871
1
2
3
pdf("candi_ranks_consensusmap.pdf", width=10, height=10)
consensusmap(nmf_model_candi)
dev.off()
image-20230617211951589

3、结果可视化

根据上述结果,选择k=4作为比较合适的rank值

1
2
3
nmf_model = nmf(exp_mat, rank = 4, method = "brunet", nrun=100)
basis_matrix <- basis(nmf_model) 
coef_matrix <- coef(nmf_model)   

参考https://cran.r-project.org/web/packages/NMF/vignettes/heatmaps.pdf,进行绘图

  • 可视化基矩阵(基因–细胞)
1
2
3
pdf("basismap.pdf", width = 10, height = 10)
basismap(nmf_model)
dev.off()
image-20230617212631388
  • 可视化系数矩阵(细胞–样本)
1
2
3
pdf("coefmap.pdf", width = 10, height = 10)
coefmap(nmf_model)
dev.off()
image-20230617212835879
  • 可视化样本consensus聚类
1
2
3
pdf("consensusmap.pdf", width = 10, height = 10)
consensusmap(nmf_model, labCol=NA, labRow=NA)
dev.off()
image-20230617212924641

4、知二求一

1
2
3
4
5
6
set.seed(42)
V = matrix(abs(rnorm(50)), nrow =5)
set.seed(42)
W = matrix(abs(rnorm(20)), nrow =5)
set.seed(42)
H = matrix(abs(rnorm(40)), nrow =4)
  • 根据原始矩阵V基矩阵W,求系数矩阵H
1
2
3
4
nmf_obj <- nmf(V, 4,  W = W, method = "brunet", nrun = 100)
fit(nmf_obj)
H2 <- coef(nmf_obj)
H2[,1:4]
  • 根据原始矩阵V系数矩阵H,求基矩阵W
1
2
3
4
nmf_obj <- nmf(V, 4,  H = H, method = "brunet", nrun = 100)
fit(nmf_obj)
W2 <- coef(nmf_obj)
W2[,1:4]

*在nrun足够大的情况下,可以得到相对稳定的结果