研究动态
Articles below are published ahead of final publication in an issue. Please cite articles in the following format: authors, (year), title, journal, DOI.

基于 Transformer 的表示学习和多实例学习,仅通过亚硫酸氢盐处理的血浆无细胞 DNA 的原始测序片段进行癌症诊断。

Transformer-based representation learning and multiple-instance learning for cancer diagnosis exclusively from raw sequencing fragments of bisulfite-treated plasma cell-free DNA.

发表日期:2024 Oct 08
作者: Jilei Liu, Hongru Shen, Yichen Yang, Meng Yang, Qiang Zhang, Kexin Chen, Xiangchun Li
来源: Molecular Oncology

摘要:

通过亚硫酸氢盐处理的游离 DNA (cfDNA) 片段进行早期癌症诊断需要繁琐的数据分析程序。在这里,我们提出了一种基于深度学习的早期癌症拦截和诊断方法 (DECIDIA),该方法可以仅通过亚硫酸氢盐处理的 cfDNA 测序片段实现准确的癌症诊断。 DECIDIA 依赖于基于 Transformer 的 DNA 片段表示学习和弱监督多实例学习来进行分类。我们在包含 5389 个样本的精选数据集上系统地评估了 DECIDIA 在癌症诊断和癌症类型预测方面的性能,这些样本包括结直肠癌 (CRC;n = 1574)、肝细胞细胞癌 (HCC;n = 1181)、肺癌 (n = 654)和非癌症对照(n = 1980)。通过区分癌症患者与无癌对照,DECIDIA 在 CRC 数据集的 10 倍交叉验证设置中实现了 0.980 的受试者工作曲线下面积 (AUROC)(95% CI,0.976-0.984),优于以下基准方法:基于甲基化强度。值得注意的是,尽管模型开发中没有使用 HCC 数据,但 DECIDIA 在外部独立 HCC 测试集上实现了 0.910 的 AUROC(95% CI,0.896-0.924),以区分 HCC 患者与无癌症对照。在癌症类型分类的设置中,我们观察到 DECIDIA 的微平均 AUROC 为 0.963(95% CI,0.960-0.966),总体准确度为 82.8%(95% CI,81.8-83.9)。此外,我们从原始测序读数中提取了四个序列特征,这些特征在癌症与对照以及不同癌症类型之间表现出差异模式。我们的方法代表了一种新的范式,旨在消除使用亚硫酸氢盐处理的 cfDNA 甲基组进行液体活检的繁琐数据分析程序。© 2024 作者。约翰·威利出版的《分子肿瘤学》
Early cancer diagnosis from bisulfite-treated cell-free DNA (cfDNA) fragments requires tedious data analytical procedures. Here, we present a deep-learning-based approach for early cancer interception and diagnosis (DECIDIA) that can achieve accurate cancer diagnosis exclusively from bisulfite-treated cfDNA sequencing fragments. DECIDIA relies on transformer-based representation learning of DNA fragments and weakly supervised multiple-instance learning for classification. We systematically evaluate the performance of DECIDIA for cancer diagnosis and cancer type prediction on a curated dataset of 5389 samples that consist of colorectal cancer (CRC; n = 1574), hepatocellular cell carcinoma (HCC; n = 1181), lung cancer (n = 654), and non-cancer control (n = 1980). DECIDIA achieved an area under the receiver operating curve (AUROC) of 0.980 (95% CI, 0.976-0.984) in 10-fold cross-validation settings on the CRC dataset by differentiating cancer patients from cancer-free controls, outperforming benchmarked methods that are based on methylation intensities. Noticeably, DECIDIA achieved an AUROC of 0.910 (95% CI, 0.896-0.924) on the externally independent HCC testing set in distinguishing HCC patients from cancer-free controls, although there was no HCC data used in model development. In the settings of cancer-type classification, we observed that DECIDIA achieved a micro-average AUROC of 0.963 (95% CI, 0.960-0.966) and an overall accuracy of 82.8% (95% CI, 81.8-83.9). In addition, we distilled four sequence signatures from the raw sequencing reads that exhibited differential patterns in cancer versus control and among different cancer types. Our approach represents a new paradigm towards eliminating the tedious data analytical procedures for liquid biopsy that uses bisulfite-treated cfDNA methylome.© 2024 The Author(s). Molecular Oncology published by John Wiley & Sons Ltd on behalf of Federation of European Biochemical Societies.