ChatGPT 与国家卵巢癌治疗指南的比较:ChatGPT 做对了吗? - 纪念斯隆凯特琳癌症中心团队卵巢研究。
ChatGPT compared to national guidelines for management of ovarian cancer: Did ChatGPT get it right? - A Memorial Sloan Kettering Cancer Center Team Ovary study.
发表日期:2024 Jul 22
作者:
Lindsey Finch, Vance Broach, Jacqueline Feinberg, Ahmed Al-Niaimi, Nadeem R Abu-Rustum, Qin Zhou, Alexia Iasonos, Dennis S Chi
来源:
GYNECOLOGIC ONCOLOGY
摘要:
我们与国家综合癌症网络 (NCCN) 卵巢癌管理指南相比,评估了聊天机器人的性能。使用 NCCN 指南,我们在单个时间点生成了 10 个有关卵巢癌管理的问题和答案。问题按主题分为危险因素、手术管理、医疗管理和监测。我们要求 ChatGPT (GPT-4) 在没有提示的情况下(无提示 GPT)和有提示的工程(有提示的 GPT)提供响应。由 5 名妇科肿瘤学家对答复进行盲法评估其准确性和完整性。 0分被定义为不准确,1分被定义为准确且不完整,2分被定义为准确且完整。对 NCCN、无提示 GPT 和有提示 GPT 答案之间的评估进行了比较。总体而言,NCCN 的回答中有 48%、无提示 GPT 中有 64%、提示 GPT 中有 66% 是准确和完整的。与 GPT-4 相比,NCCN 的准确但不完整反应的百分比更高。与 NCCN 相比,GPT-4 对于危险因素、手术管理和监测问题的准确和完整评分百分比更高;然而,对于有关医疗管理的问题,GPT-4 的百分比低于 NCCN。总体而言,来自无提示 GPT 的回答有 14%、有提示 GPT 回答有 12%、来自 NCCN 的回答有 10% 不准确。GPT-4 在单个时间点对一组有关卵巢癌的有限问题提供了准确、完整的回答,并且最好风险因素、手术管理和监测领域的表现。然而,偶尔出现的错误应该会限制目前聊天机器人在无人监督的情况下的使用。版权所有 © 2024 Elsevier Inc. 保留所有权利。
We evaluated the performance of a chatbot compared to the National Comprehensive Cancer Network (NCCN) Guidelines for the management of ovarian cancer.Using NCCN Guidelines, we generated 10 questions and answers regarding management of ovarian cancer at a single point in time. Questions were thematically divided into risk factors, surgical management, medical management, and surveillance. We asked ChatGPT (GPT-4) to provide responses without prompting (unprompted GPT) and with prompt engineering (prompted GPT). Responses were blinded and evaluated for accuracy and completeness by 5 gynecologic oncologists. A score of 0 was defined as inaccurate, 1 as accurate and incomplete, and 2 as accurate and complete. Evaluations were compared among NCCN, unprompted GPT, and prompted GPT answers.Overall, 48% of responses from NCCN, 64% from unprompted GPT, and 66% from prompted GPT were accurate and complete. The percentage of accurate but incomplete responses was higher for NCCN vs GPT-4. The percentage of accurate and complete scores for questions regarding risk factors, surgical management, and surveillance was higher for GPT-4 vs NCCN; however, for questions regarding medical management, the percentage was lower for GPT-4 vs NCCN. Overall, 14% of responses from unprompted GPT, 12% from prompted GPT, and 10% from NCCN were inaccurate.GPT-4 provided accurate and complete responses at a single point in time to a limited set of questions regarding ovarian cancer, with best performance in areas of risk factors, surgical management, and surveillance. Occasional inaccuracies, however, should limit unsupervised use of chatbots at this time.Copyright © 2024 Elsevier Inc. All rights reserved.