癌症相关热门查询之人工智能聊天机器人回答评估

Assessment of Artificial Intelligence Chatbot Responses to Top Searched Queries About Cancer.

Original text

发表日期：2023 Aug 24

作者： Alexander Pan, David Musheyev, Daniel Bockelman, Stacy Loeb, Abdo E Kabarriti

来源： JAMA Oncology

摘要：

消费者越来越多地将人工智能（AI）聊天机器人作为信息来源。然而，这些聊天机器人生成的关于皮肤癌、肺癌、乳腺癌、结直肠癌和前列腺癌的癌症信息的质量尚未使用验证工具进行评估。本研究旨在通过使用经验证工具表征由4个AI聊天机器人生成的有关肤癌、肺癌、乳腺癌、结直肠癌和前列腺癌的信息的质量和错误信息的存在。本横断面研究使用验证的工具评估了AI聊天机器人对与5种最常见癌症相关的5个最常搜索的查询的文本回复。搜索数据来自公开可用的Google Trends平台，并使用相同的提示从4个AI聊天机器人（ChatGPT版本3.5（OpenAI）、Perplexity（Perplexity.AI）、Chatsonic（Writesonic）和Bing AI（Microsoft））生成回复。从2021年1月1日至2023年1月1日，将与皮肤癌、肺癌、乳腺癌、结直肠癌和前列腺癌相关的Google Trends前5个搜索查询输入到4个AI聊天机器人中。主要结果为基于经验证的DISCERN工具的消费者健康信息的质量（信息质量得分为1 [低] 到5 [高]）以及基于Patient Education Materials Assessment Tool（PEMAT）的可理解性和可操作性领域的信息的可理解性和可操作性（得分为0% -100%，分数越高表示可理解性和可操作性水平越高）。次要结果包括使用5项Likert量表评分的错误信息（错误信息得分为1 [无错误信息] 到5 [高错误信息]）和使用Flesch-Kincaid读级评分评估的可读性。分析包括来自4个聊天机器人对于皮肤癌、肺癌、乳腺癌、结直肠癌和前列腺癌的5个最常见搜索查询的100个回复。4个AI聊天机器人生成的文本回复质量良好（DISCERN评分中位数[范围]为5 [2-5]），且未发现错误信息。可理解性为中等（PEMAT可理解性评分中位数[范围]为66.7% [33.3%-90.1%]）且可操作性较差（PEMAT可操作性评分中位数[范围]为20.0% [0%-40.0%]）。根据Flesch-Kincaid读级评分，回复文本的写作水平为大学水平。这项横断面研究的结果表明，AI聊天机器人通常为与癌症相关的热门搜索查询提供准确的信息，但这些回复并不易于操作，并且写作水平达到了大学水平。这些限制表明，AI聊天机器人应作为医学信息的补充来源，而非主要信息来源。

Consumers are increasingly using artificial intelligence (AI) chatbots as a source of information. However, the quality of the cancer information generated by these chatbots has not yet been evaluated using validated instruments.To characterize the quality of information and presence of misinformation about skin, lung, breast, colorectal, and prostate cancers generated by 4 AI chatbots.This cross-sectional study assessed AI chatbots' text responses to the 5 most commonly searched queries related to the 5 most common cancers using validated instruments. Search data were extracted from the publicly available Google Trends platform and identical prompts were used to generate responses from 4 AI chatbots: ChatGPT version 3.5 (OpenAI), Perplexity (Perplexity.AI), Chatsonic (Writesonic), and Bing AI (Microsoft).Google Trends' top 5 search queries related to skin, lung, breast, colorectal, and prostate cancer from January 1, 2021, to January 1, 2023, were input into 4 AI chatbots.The primary outcomes were the quality of consumer health information based on the validated DISCERN instrument (scores from 1 [low] to 5 [high] for quality of information) and the understandability and actionability of this information based on the understandability and actionability domains of the Patient Education Materials Assessment Tool (PEMAT) (scores of 0%-100%, with higher scores indicating a higher level of understandability and actionability). Secondary outcomes included misinformation scored using a 5-item Likert scale (scores from 1 [no misinformation] to 5 [high misinformation]) and readability assessed using the Flesch-Kincaid Grade Level readability score.The analysis included 100 responses from 4 chatbots about the 5 most common search queries for skin, lung, breast, colorectal, and prostate cancer. The quality of text responses generated by the 4 AI chatbots was good (median [range] DISCERN score, 5 [2-5]) and no misinformation was identified. Understandability was moderate (median [range] PEMAT Understandability score, 66.7% [33.3%-90.1%]), and actionability was poor (median [range] PEMAT Actionability score, 20.0% [0%-40.0%]). The responses were written at the college level based on the Flesch-Kincaid Grade Level score.Findings of this cross-sectional study suggest that AI chatbots generally produce accurate information for the top cancer-related search queries, but the responses are not readily actionable and are written at a college reading level. These limitations suggest that AI chatbots should be used supplementarily and not as a primary source for medical information.