研究动态
Articles below are published ahead of final publication in an issue. Please cite articles in the following format: authors, (year), title, journal, DOI.

新颖的多组学解混淆变分自动编码器可以获得有意义的疾病亚型。

Novel multi-omics deconfounding variational autoencoders can obtain meaningful disease subtyping.

发表日期:2024 Sep 23
作者: Zuqi Li, Sonja Katz, Edoardo Saccenti, David W Fardo, Peter Claes, Vitor A P Martins Dos Santos, Kristel Van Steen, Gennady V Roshchupkin
来源: BRIEFINGS IN BIOINFORMATICS

摘要:

无监督学习,特别是聚类,在疾病亚型分型和患者分层中发挥着关键作用,尤其是在大规模多组学数据丰富的情况下。深度学习模型,例如变分自动编码器(VAE),可以通过利用个体间的异质性来增强聚类算法。然而,混杂因素的影响——与病情无关的外部因素,例如批次效应或年龄聚类常常被忽视,从而引入偏差和虚假的生物学结论。在这项工作中,我们介绍了四种新颖的基于 VAE 的去混杂框架,专为聚类多组学数据而定制。这些框架有效地减轻了混杂效应,同时保留了真实的生物模式。采用的去混杂策略包括(i)去除与混杂因素相关的潜在特征,(ii)条件 VAE,(iii)对抗性训练,以及(iv)在损失函数中添加正则化项。使用来自癌症基因组图谱的真实多组学数据,我们模拟了各种混杂效应(线性、非线性、分类、混合),并根据重建误差、聚类稳定性和去混杂功效评估了 50 次重复的模型性能。我们的结果表明,我们的新模型,特别是条件多组学 VAE (cXVAE),成功地处理了模拟混杂效应并恢复了生物驱动的聚类结构。 cXVAE 准确识别患者标签并揭示癌症类型之间有意义的病理关联,验证去混杂的表示。此外,我们的研究表明,一些提出的策略(例如对抗性训练)在消除混杂因素方面被证明是不够的。总之,我们的研究提出了同时多组学数据集成、降维和聚类解混杂的创新框架。开放获取数据的基准测试为最终用户提供了指导,促进有意义的患者分层,以优化精准医疗。© 作者 2024。由牛津大学出版社出版。
Unsupervised learning, particularly clustering, plays a pivotal role in disease subtyping and patient stratification, especially with the abundance of large-scale multi-omics data. Deep learning models, such as variational autoencoders (VAEs), can enhance clustering algorithms by leveraging inter-individual heterogeneity. However, the impact of confounders-external factors unrelated to the condition, e.g. batch effect or age-on clustering is often overlooked, introducing bias and spurious biological conclusions. In this work, we introduce four novel VAE-based deconfounding frameworks tailored for clustering multi-omics data. These frameworks effectively mitigate confounding effects while preserving genuine biological patterns. The deconfounding strategies employed include (i) removal of latent features correlated with confounders, (ii) a conditional VAE, (iii) adversarial training, and (iv) adding a regularization term to the loss function. Using real-life multi-omics data from The Cancer Genome Atlas, we simulated various confounding effects (linear, nonlinear, categorical, mixed) and assessed model performance across 50 repetitions based on reconstruction error, clustering stability, and deconfounding efficacy. Our results demonstrate that our novel models, particularly the conditional multi-omics VAE (cXVAE), successfully handle simulated confounding effects and recover biologically driven clustering structures. cXVAE accurately identifies patient labels and unveils meaningful pathological associations among cancer types, validating deconfounded representations. Furthermore, our study suggests that some of the proposed strategies, such as adversarial training, prove insufficient in confounder removal. In summary, our study contributes by proposing innovative frameworks for simultaneous multi-omics data integration, dimensionality reduction, and deconfounding in clustering. Benchmarking on open-access data offers guidance to end-users, facilitating meaningful patient stratification for optimized precision medicine.© The Author(s) 2024. Published by Oxford University Press.