前沿快讯
聚焦肿瘤与肿瘤类器官最新研究,动态一手掌握。

新型的多媒体变形自动编码器可以获得有意义的疾病亚型

Novel multi-omics deconfounding variational autoencoders can obtain meaningful disease subtyping

影响因子:7.70000
分区:生物学2区 / 数学与计算生物学1区 生化研究方法2区
发表日期:2024 Sep 23
作者: Zuqi Li, Sonja Katz, Edoardo Saccenti, David W Fardo, Peter Claes, Vitor A P Martins Dos Santos, Kristel Van Steen, Gennady V Roshchupkin

摘要

无监督的学习,尤其是聚类,在疾病亚型和患者分层中起着关键作用,尤其是大量的多摩学数据。深度学习模型,例如变分自动编码器(VAE),可以通过利用个体间异质性来增强聚类算法。但是,混杂因素 - 外部因素的影响与这种情况无关,例如批处理效应或年龄聚类经常被忽略,引入偏见和虚假的生物学结论。在这项工作中,我们介绍了四个基于VAE的新型反面框架,该框架量身定制了用于聚类多摩尼克数据的框架。这些框架有效地减轻了混杂的影响,同时保留了真正的生物学模式。所采用的变形策略包括(i)删除与混杂因素相关的潜在特征,(ii)条件性vae,(iii)对抗训练,以及(iv)为损失函数增加正则化项。使用来自癌症基因组图集的现实多摩学数据,我们模拟了基于重建误差,聚类稳定性和反谐情效率的各种混杂效应(线性,非线性,分类,混合)和评估的模型性能。我们的结果表明,我们的新型模型,尤其是条件多摩斯VAE(CXVAE),成功地处理了模拟的混杂效应,并恢复了生物学驱动的聚类结构。 CXVAE准确地识别了患者标签,并揭示了癌症类型之间有意义的病理关联,从而验证了反污染的表示。此外,我们的研究表明,一些提出的策略,例如对抗训练,证明不足以消除混杂因素。总而言之,我们的研究通过提出创新的框架,以同时进行多摩斯数据集成,降低维度降低和聚类中的变形。开放访问数据的基准测试为最终用户提供指导,从而促进有意义的患者分层以优化精确药物。

Abstract

Unsupervised learning, particularly clustering, plays a pivotal role in disease subtyping and patient stratification, especially with the abundance of large-scale multi-omics data. Deep learning models, such as variational autoencoders (VAEs), can enhance clustering algorithms by leveraging inter-individual heterogeneity. However, the impact of confounders-external factors unrelated to the condition, e.g. batch effect or age-on clustering is often overlooked, introducing bias and spurious biological conclusions. In this work, we introduce four novel VAE-based deconfounding frameworks tailored for clustering multi-omics data. These frameworks effectively mitigate confounding effects while preserving genuine biological patterns. The deconfounding strategies employed include (i) removal of latent features correlated with confounders, (ii) a conditional VAE, (iii) adversarial training, and (iv) adding a regularization term to the loss function. Using real-life multi-omics data from The Cancer Genome Atlas, we simulated various confounding effects (linear, nonlinear, categorical, mixed) and assessed model performance across 50 repetitions based on reconstruction error, clustering stability, and deconfounding efficacy. Our results demonstrate that our novel models, particularly the conditional multi-omics VAE (cXVAE), successfully handle simulated confounding effects and recover biologically driven clustering structures. cXVAE accurately identifies patient labels and unveils meaningful pathological associations among cancer types, validating deconfounded representations. Furthermore, our study suggests that some of the proposed strategies, such as adversarial training, prove insufficient in confounder removal. In summary, our study contributes by proposing innovative frameworks for simultaneous multi-omics data integration, dimensionality reduction, and deconfounding in clustering. Benchmarking on open-access data offers guidance to end-users, facilitating meaningful patient stratification for optimized precision medicine.