大型语言模型可以帮助放射学研究所需的生物统计学和编码。

Large language models can help with biostatistics and coding needed in radiology research.

Original text

发表日期：2024 Oct 14

作者： Adarsh Ghosh, Hailong Li, Andrew T Trout

来源： ACADEMIC RADIOLOGY

摘要：

放射学的原始研究通常涉及处理大型数据集、数据操作、统计测试和编码。最近的研究表明，大语言模型（LLM）可以解决生物信息学任务，这表明它们在放射学研究中的潜力。这项研究评估了法学硕士为放射学研究提供统计和深度学习解决方案和代码的能力。我们使用了适用于 ChatGPT-4o、ChatGPT-3.5 和 Google Gemini 的基于网络的聊天界面。实验 1：生物统计和数据可视化：我们评估了每个法学硕士建议生物统计测试并使用癌症成像档案数据集生成 R 代码的能力。提示是基于同行评审手稿的统计分析。生成的代码在 R Studio 中测试了正确性、运行时错误以及生成所请求的可视化的能力。实验 2：深度学习：我们使用 RSNA-STR 肺炎检测挑战数据集来评估 ChatGPT-4o 和 Gemini 为基于 Transformer 的图像分类模型 (Vision Transformer ViT-B/16) 生成 Python 代码的能力。生成的代码在 Jupiter Notebook 中测试了功能和运行时错误。在提出的 8 个统计问题中，针对 7 个 (ChatGPT-4o)、6 个 (ChatGPT-3.5) 和 5 个 (Gemini) 场景建议了正确的统计答案。与 ChatGPT-3.5 (5/7) 和 Gemini (5/7) 相比，ChatGPT-4o 输出的 R 代码具有更少的运行时错误（提供的 7 个代码中的 6 个）。 ChatGPT4o 和 Gemini 都能够生成所请求的可视化，但存在一些运行时错误。将运行时错误从 ChatGPT4o 生成的代码迭代复制到聊天中有助于解决这些错误。 Gemini 最初在代码生成过程中产生了幻觉，但能够在重新启动实验时提供准确的代码。 ChatGPT4-o 和 Gemini 成功生成了用于深度学习任务的初始 Python 代码。实施过程中遇到的错误通过使用聊天界面的迭代得到解决，展示了LLM在提供基线代码以进一步细化代码和解决运行时错误方面的实用性。LLM可以协助放射学研究的编码任务，为数据可视化、统计测试提供初始代码，深度学习模型可帮助研究人员掌握基础生物统计学知识。虽然法学硕士可以提供一个有用的起点，但它们要求用户完善和验证代码，并且由于潜在的错误、幻觉风险和数据隐私法规，必须小心谨慎。法学硕士可以帮助解决放射学研究中的编码和统计问题。这可以帮助主要作者解决放射学研究中所需的编码问题。版权所有 © 2024 大学放射科医生协会。由爱思唯尔公司出版。保留所有权利。

Original research in radiology often involves handling large datasets, data manipulation, statistical tests, and coding. Recent studies show that large language models (LLMs) can solve bioinformatics tasks, suggesting their potential in radiology research. This study evaluates an LLM's ability to provide statistical and deep learning solutions and code for radiology research.We used web-based chat interfaces available for ChatGPT-4o, ChatGPT-3.5, and Google Gemini. EXPERIMENT 1: BIOSTATISTICS AND DATA VISUALIZATION: We assessed each LLMs' ability to suggest biostatistical tests and generate R code for the same using a Cancer Imaging Archive dataset. Prompts were based on statistical analyses from a peer-reviewed manuscript. The generated code was tested in R Studio for correctness, runtime errors and the ability to generate the requested visualization. EXPERIMENT 2: DEEP LEARNING: We used the RSNA-STR Pneumonia Detection Challenge dataset to evaluate ChatGPT-4o and Gemini's ability to generate Python code for transformer-based image classification models (Vision Transformer ViT-B/16). The generated code was tested in a Jupiter Notebook for functionality and run time errors.Out of the 8 statistical questions posed, correct statistical answers were suggested for 7 (ChatGPT-4o), 6 (ChatGPT-3.5), and 5 (Gemini) scenarios. The R code output by ChatGPT-4o had fewer runtime errors (6 out of the 7 total codes provided) compared to ChatGPT-3.5 (5/7) and Gemini (5/7). Both ChatGPT4o and Gemini were able to generate visualization requested with a few run time errors. Iteratively copying runtime errors from the code generated by ChatGPT4o into the chat helped resolve them. Gemini initially hallucinated during code generation but was able to provide accurate code on restarting the experiment. ChatGPT4-o and Gemini successfully generated initial Python code for deep learning tasks. Errors encountered during implementation were resolved through iterations using the chat interface, demonstrating LLM utility in providing baseline code for further code refinement and resolving run time errors.LLMs can assist in coding tasks for radiology research, providing initial code for data visualization, statistical tests, and deep learning models helping researchers with foundational biostatistical knowledge. While LLM can offer a useful starting point, they require users to refine and validate the code and caution is necessary due to potential errors, the risk of hallucinations and data privacy regulations.LLMs can help with coding and statistical problems in radiology research. This can help primary authors trouble shoot coding needed in radiology research.Copyright © 2024 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.