通过检查游离 DNA 末端基序开发用于癌症诊断的深度学习模型。

Development of a deep learning model for cancer diagnosis by inspecting cell-free DNA end-motifs.

Original text

发表日期：2024 Jul 27

作者： Hongru Shen, Meng Yang, Jilei Liu, Kexin Chen, Xiangchun Li

来源： npj Precision Oncology

摘要：

通过 cfDNA 准确区分患有癌症和未患癌症的患者对于早期癌症诊断至关重要。在此，我们开发并验证了一种基于深度学习的模型，名为通过变压器进行末端基序检查（EMIT），用于通过学习 cfDNA 末端基序的特征表示来区分患有癌症和未患癌症的个体。 EMIT 是一种自我监督学习方法，对 cfDNA 末端基序的排名进行建模。我们对 4606 个样本进行了不同类型的 cfDNA 测序，以开发 EIMIT，然后评估 EMIT 在六个数据集和一个包含全基因组、全基因组亚硫酸氢盐和 5-羟甲基胞嘧啶测序的额外内部测试集上的线性投影的分类性能。 EMIT 表示的线性投影在这六个数据集中实现了接收者操作曲线下面积 (AUROC) 值的范围从 0.895 (0.835-0.955) 到 0.996 (0.994-0.997)，显着优于其基线。此外，我们还表明，在进行全外显子组测序的独立测试集上，EMIT 表示的线性投影在肺癌鉴定中可以实现 0.962 (0.914-1.0) 的 AUROC。这项研究的结果表明，基于 Transformer 的深度学习模型可以从 cfDNA 末端基序中学习癌症区分表征。这种深度学习模型的表示可用于区分患有和未患有癌症的患者。© 2024。作者。

Accurate discrimination between patients with and without cancer from cfDNA is crucial for early cancer diagnosis. Herein, we develop and validate a deep-learning-based model entitled end-motif inspection via transformer (EMIT) for discriminating individuals with and without cancer by learning feature representations from cfDNA end-motifs. EMIT is a self-supervised learning approach that models rankings of cfDNA end-motifs. We include 4606 samples subjected to different types of cfDNA sequencing to develop EIMIT, and subsequently evaluate classification performance of linear projections of EMIT on six datasets and an additional inhouse testing set encopassing whole-genome, whole-genome bisulfite and 5-hydroxymethylcytosine sequencing. The linear projection of representations from EMIT achieved area under the receiver operating curve (AUROC) values ranged from 0.895 (0.835-0.955) to 0.996 (0.994-0.997) across these six datasets, outperforming its baseline by significant margins. Additionally, we showed that linear projection of EMIT representations can achieve an AUROC of 0.962 (0.914-1.0) in identification of lung cancer on an independent testing set subjected to whole-exome sequencing. The findings of this study indicate that a transformer-based deep learning model can learn cancer-discrimative representations from cfDNA end-motifs. The representations of this deep learning model can be exploited for discriminating patients with and without cancer.© 2024. The Author(s).