生成人工智能作为患者乳腺癌信息的来源：谨慎行事。

Generative artificial intelligence as a source of breast cancer information for patients: Proceed with caution.

Original text

发表日期：2024 Aug 30

作者： Ko Un Park, Stuart Lipsitz, Laura S Dominici, Filipa Lynce, Christina A Minami, Faina Nakhlis, Adrienne G Waks, Laura E Warren, Nadine Eidman, Jeannie Frazier, Lourdes Hernandez, Carla Leslie, Susan Rafte, Delia Stroud, Joel S Weissman, Tari A King, Elizabeth A Mittendorf

来源： CANCER

摘要：

本研究评估了聊天机器人界面生成预训练变压器 (ChatGPT) 3.5 作为患者乳腺癌信息来源的准确性、临床一致性和可读性。乳腺癌倡导者确定了患者可能会问 ChatGPT 的 20 个问题。这些于 2023 年 7 月提交给 ChatGPT 3.5，并重复了 3 次。反应分为两个领域：准确性（4 点李克特量表，4 = 最差）和临床一致性（信息在临床上与医生反应相似；5 点李克特量表，5 = 根本不相似）。使用字数统计的组内相关系数（ICC）来估计响应与重复的一致性。使用 Flesch Kincaid 可读性量表计算响应可读性。请求并验证参考文献。总体平均准确度为 1.88（范围 1.0-3.0；95% 置信区间 [CI]，1.42-1.94），临床一致性为 2.79（范围 1.0-5.0；95% CI，1.94-3.64）。每个回复的平均字数为 310 个字（范围为每个回复 146-441 个字），具有很高的一致性（ICC，0.75；95% CI，0.59-0.91；p < .001）。平均可读性较差，为 37.9（范围为 18.0-60.5），但一致性较高（ICC，0.73；95% CI，0.57-0.90；p < .001）。易读性和更好的临床一致性之间存在微弱的相关性（-0.15；p = .025）。准确性与可读性无关（0.05；p = .079）。平均参考文献数量为 1.97 篇（范围 1-4；总计 119 篇）。 ChatGPT 仅引用一次同行评审文章，并且经常引用不存在的网站 (41%)。由于 ChatGPT 3.5 的响应在 24% 的时间里不正确，并且在 41% 的时间里没有提供真实的参考资料，因此应警告患者在使用 ChatGPT 进行医疗时信息。© 2024 美国癌症协会。

This study evaluated the accuracy, clinical concordance, and readability of the chatbot interface generative pretrained transformer (ChatGPT) 3.5 as a source of breast cancer information for patients.Twenty questions that patients are likely to ask ChatGPT were identified by breast cancer advocates. These were posed to ChatGPT 3.5 in July 2023 and were repeated three times. Responses were graded in two domains: accuracy (4-point Likert scale, 4 = worst) and clinical concordance (information is clinically similar to physician response; 5-point Likert scale, 5 = not similar at all). The concordance of responses with repetition was estimated using intraclass correlation coefficient (ICC) of word counts. Response readability was calculated using the Flesch Kincaid readability scale. References were requested and verified.The overall average accuracy was 1.88 (range 1.0-3.0; 95% confidence interval [CI], 1.42-1.94), and clinical concordance was 2.79 (range 1.0-5.0; 95% CI, 1.94-3.64). The average word count was 310 words per response (range, 146-441 words per response) with high concordance (ICC, 0.75; 95% CI, 0.59-0.91; p < .001). The average readability was poor at 37.9 (range, 18.0-60.5) with high concordance (ICC, 0.73; 95% CI, 0.57-0.90; p < .001). There was a weak correlation between ease of readability and better clinical concordance (-0.15; p = .025). Accuracy did not correlate with readability (0.05; p = .079). The average number of references was 1.97 (range, 1-4; total, 119). ChatGPT cited peer-reviewed articles only once and often referenced nonexistent websites (41%).Because ChatGPT 3.5 responses were incorrect 24% of the time and did not provide real references 41% of the time, patients should be cautioned about using ChatGPT for medical information.© 2024 American Cancer Society.