大型语言模型可根据无细胞 DNA 的末端基序图谱对癌症进行高精度诊断。

Large language model produces high accurate diagnosis of cancer from end-motif profiles of cell-free DNA.

Original text

发表日期：2024 Jul 25

作者： Jilei Liu, Hongru Shen, Kexin Chen, Xiangchun Li

来源： BRIEFINGS IN BIOINFORMATICS

摘要：

经过指令调整的大型语言模型 (LLM) 表现出与人类意图保持一致的卓越能力。我们提出了一种基于模型、指令调整的用于癌症评估的法学硕士 (iLLMAC)，它可以使用无细胞脱氧核糖核酸 (cfDNA) 末端基序图谱来检测癌症。 iLLMAC 基于来自三个数据集的 1135 名癌症患者和 1106 名对照者的血浆 cfDNA 测序数据而开发，其癌症诊断的受试者工作曲线下面积 (AUROC) 为 0.866 [95% 置信区间 (CI)，0.773-0.959]，癌症诊断为 0.924 (95)。使用 16 个末端基序检测肝细胞癌 (HCC) 的 % CI，0.841-1.0。随着基序的增加，性能随之提高，对于 64 个末端基序，癌症诊断和 HCC 检测分别达到 0.886（95% CI，0.794-0.977）和 0.956（95% CI，0.89-1.0）。在外部测试集上，iLLMAC 的癌症诊断 AUROC 为 0.912（95% CI，0.849-0.976），HCC 检测（64 个末端基序）的 AUROC 为 0.938（95% CI，0.885-0.992），显着优于基准方法。此外，iLLMAC 通过亚硫酸氢盐和 5-羟甲基胞嘧啶测序在数据集上实现了高分类性能。我们的研究强调了基于 LLM 的指令调整对于基于 cfDNA 的癌症检测的有效性。© 作者 2024。由牛津大学出版社出版。

Instruction-tuned large language models (LLMs) demonstrate exceptional ability to align with human intentions. We present an LLM-based model-instruction-tuned LLM for assessment of cancer (iLLMAC)-that can detect cancer using cell-free deoxyribonucleic acid (cfDNA) end-motif profiles. Developed on plasma cfDNA sequencing data from 1135 cancer patients and 1106 controls across three datasets, iLLMAC achieved area under the receiver operating curve (AUROC) of 0.866 [95% confidence interval (CI), 0.773-0.959] for cancer diagnosis and 0.924 (95% CI, 0.841-1.0) for hepatocellular carcinoma (HCC) detection using 16 end-motifs. Performance increased with more motifs, reaching 0.886 (95% CI, 0.794-0.977) and 0.956 (95% CI, 0.89-1.0) for cancer diagnosis and HCC detection, respectively, with 64 end-motifs. On an external-testing set, iLLMAC achieved AUROC of 0.912 (95% CI, 0.849-0.976) for cancer diagnosis and 0.938 (95% CI, 0.885-0.992) for HCC detection with 64 end-motifs, significantly outperforming benchmarked methods. Furthermore, iLLMAC achieved high classification performance on datasets with bisulfite and 5-hydroxymethylcytosine sequencing. Our study highlights the effectiveness of LLM-based instruction-tuning for cfDNA-based cancer detection.© The Author(s) 2024. Published by Oxford University Press.