大型语言模型可根据无细胞 DNA 的末端基序图谱对癌症进行高精度诊断。
Large language model produces high accurate diagnosis of cancer from end-motif profiles of cell-free DNA.
发表日期:2024 Jul 25
作者:
Jilei Liu, Hongru Shen, Kexin Chen, Xiangchun Li
来源:
BRIEFINGS IN BIOINFORMATICS
摘要:
经过指令调整的大型语言模型 (LLM) 表现出与人类意图保持一致的卓越能力。我们提出了一种基于模型、指令调整的用于癌症评估的法学硕士 (iLLMAC),它可以使用无细胞脱氧核糖核酸 (cfDNA) 末端基序图谱来检测癌症。 iLLMAC 基于来自三个数据集的 1135 名癌症患者和 1106 名对照者的血浆 cfDNA 测序数据而开发,其癌症诊断的受试者工作曲线下面积 (AUROC) 为 0.866 [95% 置信区间 (CI),0.773-0.959],癌症诊断为 0.924 (95)。使用 16 个末端基序检测肝细胞癌 (HCC) 的 % CI,0.841-1.0。随着基序的增加,性能随之提高,对于 64 个末端基序,癌症诊断和 HCC 检测分别达到 0.886(95% CI,0.794-0.977)和 0.956(95% CI,0.89-1.0)。在外部测试集上,iLLMAC 的癌症诊断 AUROC 为 0.912(95% CI,0.849-0.976),HCC 检测(64 个末端基序)的 AUROC 为 0.938(95% CI,0.885-0.992),显着优于基准方法。此外,iLLMAC 通过亚硫酸氢盐和 5-羟甲基胞嘧啶测序在数据集上实现了高分类性能。我们的研究强调了基于 LLM 的指令调整对于基于 cfDNA 的癌症检测的有效性。© 作者 2024。由牛津大学出版社出版。
Instruction-tuned large language models (LLMs) demonstrate exceptional ability to align with human intentions. We present an LLM-based model-instruction-tuned LLM for assessment of cancer (iLLMAC)-that can detect cancer using cell-free deoxyribonucleic acid (cfDNA) end-motif profiles. Developed on plasma cfDNA sequencing data from 1135 cancer patients and 1106 controls across three datasets, iLLMAC achieved area under the receiver operating curve (AUROC) of 0.866 [95% confidence interval (CI), 0.773-0.959] for cancer diagnosis and 0.924 (95% CI, 0.841-1.0) for hepatocellular carcinoma (HCC) detection using 16 end-motifs. Performance increased with more motifs, reaching 0.886 (95% CI, 0.794-0.977) and 0.956 (95% CI, 0.89-1.0) for cancer diagnosis and HCC detection, respectively, with 64 end-motifs. On an external-testing set, iLLMAC achieved AUROC of 0.912 (95% CI, 0.849-0.976) for cancer diagnosis and 0.938 (95% CI, 0.885-0.992) for HCC detection with 64 end-motifs, significantly outperforming benchmarked methods. Furthermore, iLLMAC achieved high classification performance on datasets with bisulfite and 5-hydroxymethylcytosine sequencing. Our study highlights the effectiveness of LLM-based instruction-tuning for cfDNA-based cancer detection.© The Author(s) 2024. Published by Oxford University Press.