前沿快讯
聚焦肿瘤与肿瘤类器官最新研究,动态一手掌握。

新颖的多组学变分自编码器实现有意义的疾病亚型划分

Novel multi-omics deconfounding variational autoencoders can obtain meaningful disease subtyping

DOI 原文链接
用sci-hub下载
ℹ️
如无法下载,请从 Sci-Hub 选择可用站点尝试。
影响因子:7.7
分区:生物学2区 / 数学与计算生物学1区 生化研究方法2区
发表日期:2024 Sep 23
作者: Zuqi Li, Sonja Katz, Edoardo Saccenti, David W Fardo, Peter Claes, Vitor A P Martins Dos Santos, Kristel Van Steen, Gennady V Roshchupkin
DOI: 10.1093/bib/bbae512

摘要

无监督学习,尤其是聚类,在疾病亚型划分和患者分层中扮演着关键角色,特别是在面对大量多组学数据时。深度学习模型,如变分自编码器(VAE),能通过利用个体间的异质性提升聚类算法的表现。然而,混杂因素(如批效应或年龄)对聚类的影响常被忽略,导致偏差和虚假的生物学结论。本研究提出了四种新颖的基于VAE的去混杂框架,专为多组学数据的聚类设计。这些框架有效减轻了混杂效应,同时保持了真实的生物学模式。所采用的去混杂策略包括(i)去除与混杂因素相关的潜在特征,(ii)条件VAE(cVAE),(iii)对抗性训练,以及(iv)在损失函数中加入正则化项。利用来自癌症基因组图谱(TCGA)的真实多组学数据,我们模拟了多种混杂效应(线性、非线性、类别性、混合型)并在50次重复中评估模型性能,指标包括重建误差、聚类稳定性及去混杂效果。结果显示,特别是条件多组学VAE(cXVAE)模型,能有效应对模拟的混杂效应并恢复生物学驱动的聚类结构。cXVAE能够准确识别患者标签,揭示癌症类型间的有意义的病理相关性,验证了去混杂表示的有效性。此外,部分策略如对抗性训练在去除混杂因素方面表现不足。总之,我们提出的创新框架实现了多组学数据的整合、降维和去混杂,为精准医学的患者分层提供了有益指导。

Abstract

Unsupervised learning, particularly clustering, plays a pivotal role in disease subtyping and patient stratification, especially with the abundance of large-scale multi-omics data. Deep learning models, such as variational autoencoders (VAEs), can enhance clustering algorithms by leveraging inter-individual heterogeneity. However, the impact of confounders-external factors unrelated to the condition, e.g. batch effect or age-on clustering is often overlooked, introducing bias and spurious biological conclusions. In this work, we introduce four novel VAE-based deconfounding frameworks tailored for clustering multi-omics data. These frameworks effectively mitigate confounding effects while preserving genuine biological patterns. The deconfounding strategies employed include (i) removal of latent features correlated with confounders, (ii) a conditional VAE, (iii) adversarial training, and (iv) adding a regularization term to the loss function. Using real-life multi-omics data from The Cancer Genome Atlas, we simulated various confounding effects (linear, nonlinear, categorical, mixed) and assessed model performance across 50 repetitions based on reconstruction error, clustering stability, and deconfounding efficacy. Our results demonstrate that our novel models, particularly the conditional multi-omics VAE (cXVAE), successfully handle simulated confounding effects and recover biologically driven clustering structures. cXVAE accurately identifies patient labels and unveils meaningful pathological associations among cancer types, validating deconfounded representations. Furthermore, our study suggests that some of the proposed strategies, such as adversarial training, prove insufficient in confounder removal. In summary, our study contributes by proposing innovative frameworks for simultaneous multi-omics data integration, dimensionality reduction, and deconfounding in clustering. Benchmarking on open-access data offers guidance to end-users, facilitating meaningful patient stratification for optimized precision medicine.