基于变压器的表示学习和多种构度学习,用于癌症诊断,仅来自硫酸血浆处理的无原始测序片段
Transformer-based representation learning and multiple-instance learning for cancer diagnosis exclusively from raw sequencing fragments of bisulfite-treated plasma cell-free DNA
影响因子:4.50000
分区:医学2区 / 肿瘤学3区
发表日期:2024 Nov
作者:
Jilei Liu, Hongru Shen, Yichen Yang, Meng Yang, Qiang Zhang, Kexin Chen, Xiangchun Li
摘要
从甲硫酸硫酸盐处理的无细胞DNA(CFDNA)片段进行的早期癌症诊断需要乏味的数据分析程序。在这里,我们提出了一种基于深度学习的早期癌症截止和诊断(DECIDIA)的方法,该方法可以从Bisulfite处理的CFDNA测序片段中仅实现准确的癌症诊断。 Decidia依赖于基于变压器的DNA片段的表示和弱监督的多种现状学习进行分类。我们系统地评估了由大肠癌(CRC; n = 1574),肝细胞细胞癌(HCC; n = 1181),肺癌(n = 654)和非企业对照(n = 1980)组成的5389个样品(CRC; n = 1574),肝细胞细胞癌(hcc; n = 1181)的5389个样品的癌症诊断和癌症类型预测的性能。 Decidia在接收器操作曲线(AUROC)下达到了0.980(95%CI,0.976-0.984),在CRC数据集的10倍跨验证设置中,通过将癌症患者与无癌症对照区分开,超出基于甲基化的基于甲基化的基于甲基化的方法,从而在CRC数据集上进行了10倍的跨验证设置。值得注意的是,Decidia的AUROC在外部独立的HCC测试中的AUROC为0.910(95%CI,0.896-0.924),尽管在模型开发中没有使用HCC数据,但在区分HCC患者和无癌症对照的外部HCC测试中设置了AUROC。在癌症类型分类的环境中,我们观察到Decidia的微平均AUROC为0.963(95%CI,0.960-0.966),总体准确度为82.8%(95%CI,81.8-83.9)。此外,我们从原始测序读取中提炼了四个序列特征,这些读数在癌症与对照中表现出差异模式以及在不同的癌症类型中。我们的方法代表了一种新的范式,用于消除使用硫磺处理的CFDNA甲基甲基甲基甲基甲基甲基甲基甲基甲基菌的液体活检的乏味数据分析程序。
Abstract
Early cancer diagnosis from bisulfite-treated cell-free DNA (cfDNA) fragments requires tedious data analytical procedures. Here, we present a deep-learning-based approach for early cancer interception and diagnosis (DECIDIA) that can achieve accurate cancer diagnosis exclusively from bisulfite-treated cfDNA sequencing fragments. DECIDIA relies on transformer-based representation learning of DNA fragments and weakly supervised multiple-instance learning for classification. We systematically evaluate the performance of DECIDIA for cancer diagnosis and cancer type prediction on a curated dataset of 5389 samples that consist of colorectal cancer (CRC; n = 1574), hepatocellular cell carcinoma (HCC; n = 1181), lung cancer (n = 654), and non-cancer control (n = 1980). DECIDIA achieved an area under the receiver operating curve (AUROC) of 0.980 (95% CI, 0.976-0.984) in 10-fold cross-validation settings on the CRC dataset by differentiating cancer patients from cancer-free controls, outperforming benchmarked methods that are based on methylation intensities. Noticeably, DECIDIA achieved an AUROC of 0.910 (95% CI, 0.896-0.924) on the externally independent HCC testing set in distinguishing HCC patients from cancer-free controls, although there was no HCC data used in model development. In the settings of cancer-type classification, we observed that DECIDIA achieved a micro-average AUROC of 0.963 (95% CI, 0.960-0.966) and an overall accuracy of 82.8% (95% CI, 81.8-83.9). In addition, we distilled four sequence signatures from the raw sequencing reads that exhibited differential patterns in cancer versus control and among different cancer types. Our approach represents a new paradigm towards eliminating the tedious data analytical procedures for liquid biopsy that uses bisulfite-treated cfDNA methylome.