研究动态
Articles below are published ahead of final publication in an issue. Please cite articles in the following format: authors, (year), title, journal, DOI.

使用微调的大型语言模型对脑部 MRI 报告进行自动分类。

Automated classification of brain MRI reports using fine-tuned large language models.

发表日期:2024 Jul 12
作者: Jun Kanzawa, Koichiro Yasaka, Nana Fujita, Shin Fujiwara, Osamu Abe
来源: Brain Structure & Function

摘要:

本研究旨在探讨微调大语言模型 (LLM) 将脑部 MRI 报告分类为治疗前、治疗后和非肿瘤病例的效果。这项回顾性研究包括 759、284 和 164 份脑部 MRI 报告,用于训练、验证和评估。测试数据集。放射科医生将报告分为三组:非肿瘤(第 1 组)、治疗后肿瘤(第 2 组)和治疗前肿瘤(第 3 组)病例。使用训练数据集对来自 Transformers Japanese 模型的预训练双向编码器表示进行微调,并在验证数据集上进行评估。选择在验证数据集上表现出最高准确度的模型作为最终模型。另外两名放射科医生参与对三组测试数据集中的报告进行分类。该模型在测试数据集上的性能与两名放射科医生的性能进行了比较。微调后的 LLM 的总体准确度为 0.970(95% CI:0.930-0.990)。第 1/2/3 组的模型灵敏度为 1.000/0.864/0.978。该模型对组 1/2/3 的特异性为 0.991/0.993/0.958。 LLM 和人类读者之间在准确性、敏感性和特异性方面没有发现统计学上的显着差异 (p ≥ 0.371)。法学硕士完成分类任务的速度大约是放射科医生的 20-26 倍。将第 2 组和第 3 组与第 1 组区分开来的受试者工作特征曲线下面积为 0.994 (95% CI: 0.982-1.000),将第 3 组与第 1 组和第 2 组区分开来的受试者工作特征曲线下面积为 0.992 (95% CI: 0.982-1.000)。经过微调的法学硕士在对脑部 MRI 报告进行分类方面表现出与放射科医生相当的性能,同时所需时间大大减少。© 2024。作者。
This study aimed to investigate the efficacy of fine-tuned large language models (LLM) in classifying brain MRI reports into pretreatment, posttreatment, and nontumor cases.This retrospective study included 759, 284, and 164 brain MRI reports for training, validation, and test dataset. Radiologists stratified the reports into three groups: nontumor (group 1), posttreatment tumor (group 2), and pretreatment tumor (group 3) cases. A pretrained Bidirectional Encoder Representations from Transformers Japanese model was fine-tuned using the training dataset and evaluated on the validation dataset. The model which demonstrated the highest accuracy on the validation dataset was selected as the final model. Two additional radiologists were involved in classifying reports in the test datasets for the three groups. The model's performance on test dataset was compared to that of two radiologists.The fine-tuned LLM attained an overall accuracy of 0.970 (95% CI: 0.930-0.990). The model's sensitivity for group 1/2/3 was 1.000/0.864/0.978. The model's specificity for group1/2/3 was 0.991/0.993/0.958. No statistically significant differences were found in terms of accuracy, sensitivity, and specificity between the LLM and human readers (p ≥ 0.371). The LLM completed the classification task approximately 20-26-fold faster than the radiologists. The area under the receiver operating characteristic curve for discriminating groups 2 and 3 from group 1 was 0.994 (95% CI: 0.982-1.000) and for discriminating group 3 from groups 1 and 2 was 0.992 (95% CI: 0.982-1.000).Fine-tuned LLM demonstrated a comparable performance with radiologists in classifying brain MRI reports, while requiring substantially less time.© 2024. The Author(s).