ChatGPT 反应的一致性和质量与卵巢癌临床指南的比较:德尔菲法。
The Consistency and Quality of ChatGPT Responses Compared to Clinical Guidelines for Ovarian Cancer: A Delphi Approach.
发表日期:2024 May 14
作者:
Dario Piazza, Federica Martorana, Annabella Curaba, Daniela Sambataro, Maria Rosaria Valerio, Alberto Firenze, Basilio Pecorino, Paolo Scollo, Vito Chiantera, Giuseppe Scibilia, Paolo Vigneri, Vittorio Gebbia, Giuseppa Scandurra
来源:
Best Pract Res Cl Ob
摘要:
近年来,ChatGPT 等生成式人工智能模型越来越多地应用于医疗保健领域。尽管承认人工智能模型在快速获取资源和制定临床问题响应方面具有巨大潜力,但使用这些模型获得的结果仍然需要通过与既定的临床指南进行比较来进行验证。本研究将 AI 模型对八个临床问题的回答与意大利肿瘤内科协会 (AIOM) 卵巢癌指南进行了比较。作者使用德尔菲法来评估 ChatGPT 和 AIOM 指南的回答。由医疗保健专业人员组成的专家小组使用李克特五点量表,根据清晰度、一致性、全面性、可用性和质量对答复进行了评估。 GRADE 方法评估了证据质量和建议的强度。一项涉及 14 名医生的调查显示,与 AI 模型相比,AIOM 指南的平均得分始终较高,具有统计学上的显着差异。事后测试表明,AIOM 指南与所有 AI 模型存在显着差异,AI 模型之间没有显着差异。虽然 AI 模型可以提供快速响应,但它们必须在清晰度、一致性、全面性、可用性和质量方面符合既定的临床指南。这些发现强调了在临床决策中依赖专家制定的指南的重要性,并强调了人工智能模型改进的潜在领域。
In recent years, generative Artificial Intelligence models, such as ChatGPT, have increasingly been utilized in healthcare. Despite acknowledging the high potential of AI models in terms of quick access to sources and formulating responses to a clinical question, the results obtained using these models still require validation through comparison with established clinical guidelines. This study compares the responses of the AI model to eight clinical questions with the Italian Association of Medical Oncology (AIOM) guidelines for ovarian cancer.The authors used the Delphi method to evaluate responses from ChatGPT and the AIOM guidelines. An expert panel of healthcare professionals assessed responses based on clarity, consistency, comprehensiveness, usability, and quality using a five-point Likert scale. The GRADE methodology assessed the evidence quality and the recommendations' strength.A survey involving 14 physicians revealed that the AIOM guidelines consistently scored higher averages compared to the AI models, with a statistically significant difference. Post hoc tests showed that AIOM guidelines significantly differed from all AI models, with no significant difference among the AI models.While AI models can provide rapid responses, they must match established clinical guidelines regarding clarity, consistency, comprehensiveness, usability, and quality. These findings underscore the importance of relying on expert-developed guidelines in clinical decision-making and highlight potential areas for AI model improvement.