AVBAE-MODFR:一种新颖的深度学习框架,用于对多组学数据进行嵌入和特征选择,用于泛癌症分类。
AVBAE-MODFR: A novel deep learning framework of embedding and feature selection on multi-omics data for pan-cancer classification.
发表日期:2024 May 14
作者:
Minghe Li, Huike Guo, Keao Wang, Chuanze Kang, Yanbin Yin, Han Zhang
来源:
COMPUTERS IN BIOLOGY AND MEDICINE
摘要:
癌症多组学数据的整合分析用于泛癌分类,在肿瘤诊断、分析临床显着特征、提供精准医疗等各个方面具有临床应用潜力。在这些应用中,高维多组学数据的嵌入和特征选择在临床上是必要的。近年来,深度学习算法因其强大的捕捉非线性关系的能力而成为最有前途的癌症多组学整合分析方法。鉴于高维度和异质性,开发用于癌症多组学嵌入和特征选择的有效深度学习架构仍然是研究人员面临的挑战。在本文中,我们提出了一种名为 AVBAE-MODFR 的新型两阶段深度学习模型,用于泛癌分类。 AVBAE-MODFR通过基于对抗变分贝叶斯方法的multi2multi自动编码器实现嵌入,并进一步利用基于双网络的特征排序方法进行特征选择。 AVBAE-MODFR利用AVBAE预训练网络参数,提高了分类性能并增强了MODFR中特征排序的稳定性。首先,AVBAE 学习多个组学特征之间的高质量表示,以实现无监督的泛癌分类。我们设计了一种有效的判别器架构来区分更新前向变分参数的潜在分布。其次,我们提出MODFR,通过训练设计的multi2one选择器网络来同时评估多组学特征对于特征选择的重要性,其中基于随机掩模子集的平均梯度的有效评估方法可以避免输入特征漂移引起的偏差。我们对 TCGA 泛癌数据集进行实验,并将其与每个阶段的四种最先进的方法进行比较。结果显示了 AVBAE-MODFR 相对于 SOTA 方法的优越性。版权所有 © 2024 Elsevier Ltd。保留所有权利。
Integration analysis of cancer multi-omics data for pan-cancer classification has the potential for clinical applications in various aspects such as tumor diagnosis, analyzing clinically significant features, and providing precision medicine. In these applications, the embedding and feature selection on high-dimensional multi-omics data is clinically necessary. Recently, deep learning algorithms become the most promising cancer multi-omic integration analysis methods, due to the powerful capability of capturing nonlinear relationships. Developing effective deep learning architectures for cancer multi-omics embedding and feature selection remains a challenge for researchers in view of high dimensionality and heterogeneity. In this paper, we propose a novel two-phase deep learning model named AVBAE-MODFR for pan-cancer classification. AVBAE-MODFR achieves embedding by a multi2multi autoencoder based on the adversarial variational Bayes method and further performs feature selection utilizing a dual-net-based feature ranking method. AVBAE-MODFR utilizes AVBAE to pre-train the network parameters, which improves the classification performance and enhances feature ranking stability in MODFR. Firstly, AVBAE learns high-quality representation among multiple omics features for unsupervised pan-cancer classification. We design an efficient discriminator architecture to distinguish the latent distributions for updating forward variational parameters. Secondly, we propose MODFR to simultaneously evaluate multi-omics feature importance for feature selection by training a designed multi2one selector network, where the efficient evaluation approach based on the average gradient of random mask subsets can avoid bias caused by input feature drift. We conduct experiments on the TCGA pan-cancer dataset and compare it with four state-of-the-art methods for each phase. The results show the superiority of AVBAE-MODFR over SOTA methods.Copyright © 2024 Elsevier Ltd. All rights reserved.