交叉注意力使得能够对 130 名肺癌患者的有限组学-影像-临床数据进行深度学习。
Cross-attention enables deep learning on limited omics-imaging-clinical data of 130 lung cancer patients.
发表日期:2024 Jul 05
作者:
Suraj Verma, Giuseppe Magazzù, Noushin Eftekhari, Thai Lou, Alex Gilhespy, Annalisa Occhipinti, Claudio Angione
来源:
GENES & DEVELOPMENT
摘要:
从多组学数据中提取预后因素的深度学习工具最近有助于对生存结果进行个性化预测。然而,整合的组学-成像-临床数据集的规模有限带来了挑战。在这里,我们提出了两种生物学可解释且强大的深度学习架构,用于非小细胞肺癌(NSCLC)患者的生存预测,同时从计算机断层扫描(CT)扫描图像、基因表达数据和临床信息中学习。所提出的模型整合了患者特异性的临床、转录组和成像数据,并结合了京都基因和基因组百科全书(KEGG)和Reactome通路信息,在学习过程中添加了生物学知识,以提取预后基因生物标志物和分子通路。虽然这两种模型在仅包含 130 名患者的数据集上进行训练时都可以准确地将患者分为高风险组和低风险组,但在稀疏自动编码器中引入交叉注意机制可显着提高性能,突出肿瘤区域和 NSCLC 相关基因作为潜在的生物标志物从而在从小的成像组学临床样本中学习时提供了显着的方法论进步。版权所有 © 2024 作者。由爱思唯尔公司出版。保留所有权利。
Deep-learning tools that extract prognostic factors derived from multi-omics data have recently contributed to individualized predictions of survival outcomes. However, the limited size of integrated omics-imaging-clinical datasets poses challenges. Here, we propose two biologically interpretable and robust deep-learning architectures for survival prediction of non-small cell lung cancer (NSCLC) patients, learning simultaneously from computed tomography (CT) scan images, gene expression data, and clinical information. The proposed models integrate patient-specific clinical, transcriptomic, and imaging data and incorporate Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome pathway information, adding biological knowledge within the learning process to extract prognostic gene biomarkers and molecular pathways. While both models accurately stratify patients in high- and low-risk groups when trained on a dataset of only 130 patients, introducing a cross-attention mechanism in a sparse autoencoder significantly improves the performance, highlighting tumor regions and NSCLC-related genes as potential biomarkers and thus offering a significant methodological advancement when learning from small imaging-omics-clinical samples.Copyright © 2024 The Author(s). Published by Elsevier Inc. All rights reserved.