基于Transformer的表示学习与多实例学习在仅通过甲硫酸盐处理的血浆游离DNA原始片段中进行癌症诊断的研究
Transformer-based representation learning and multiple-instance learning for cancer diagnosis exclusively from raw sequencing fragments of bisulfite-treated plasma cell-free DNA
DOI 原文链接
用sci-hub下载
如无法下载,请从 Sci-Hub 选择可用站点尝试。
影响因子:4.5
分区:医学2区 / 肿瘤学3区
发表日期:2024 Nov
作者:
Jilei Liu, Hongru Shen, Yichen Yang, Meng Yang, Qiang Zhang, Kexin Chen, Xiangchun Li
DOI:
10.1002/1878-0261.13745
摘要
从甲硫酸盐处理的游离DNA(cfDNA)片段进行早期癌症诊断,需繁琐的数据分析流程。在此,我们提出一种基于深度学习的早期癌症拦截与诊断(DECIDIA)方法,能够仅通过甲硫酸盐处理的cfDNA测序片段实现高准确度的癌症诊断。DECIDIA依赖于Transformer模型的DNA片段表示学习和弱监督多实例学习进行分类。我们系统性评估了DECIDIA在一组经过筛选的5389个样本(包括结直肠癌CRC;1574例、肝细胞癌HCC;1181例、肺癌;654例及非癌对照;1980例)中的癌症诊断和癌症类型预测性能。在CRC数据集上的10折交叉验证中,DECIDIA实现了0.980(95% CI,0.976-0.984)的受试者工作特征曲线下面积(AUROC),优于基于甲基化强度的基准方法,能够区分癌症患者与无癌对照。值得注意的是,在外部独立HCC测试集中,DECIDIA在区分HCC患者与无癌对照方面达到了0.910(95% CI,0.896-0.924)的AUROC,尽管模型训练中未使用HCC数据。在癌症类型分类方面,DECIDIA实现了0.963(95% CI,0.960-0.966)的微平均AUROC和82.8%的整体准确率(95% CI,81.8-83.9)。此外,我们从原始测序数据中提取了四个表现出癌症与对照、不同癌种间差异的序列特征。这一方法代表了利用甲硫酸盐处理的cfDNA甲基组进行液体活检,简化繁琐数据分析流程的全新范式。
Abstract
Early cancer diagnosis from bisulfite-treated cell-free DNA (cfDNA) fragments requires tedious data analytical procedures. Here, we present a deep-learning-based approach for early cancer interception and diagnosis (DECIDIA) that can achieve accurate cancer diagnosis exclusively from bisulfite-treated cfDNA sequencing fragments. DECIDIA relies on transformer-based representation learning of DNA fragments and weakly supervised multiple-instance learning for classification. We systematically evaluate the performance of DECIDIA for cancer diagnosis and cancer type prediction on a curated dataset of 5389 samples that consist of colorectal cancer (CRC; n = 1574), hepatocellular cell carcinoma (HCC; n = 1181), lung cancer (n = 654), and non-cancer control (n = 1980). DECIDIA achieved an area under the receiver operating curve (AUROC) of 0.980 (95% CI, 0.976-0.984) in 10-fold cross-validation settings on the CRC dataset by differentiating cancer patients from cancer-free controls, outperforming benchmarked methods that are based on methylation intensities. Noticeably, DECIDIA achieved an AUROC of 0.910 (95% CI, 0.896-0.924) on the externally independent HCC testing set in distinguishing HCC patients from cancer-free controls, although there was no HCC data used in model development. In the settings of cancer-type classification, we observed that DECIDIA achieved a micro-average AUROC of 0.963 (95% CI, 0.960-0.966) and an overall accuracy of 82.8% (95% CI, 81.8-83.9). In addition, we distilled four sequence signatures from the raw sequencing reads that exhibited differential patterns in cancer versus control and among different cancer types. Our approach represents a new paradigm towards eliminating the tedious data analytical procedures for liquid biopsy that uses bisulfite-treated cfDNA methylome.