研究动态
Articles below are published ahead of final publication in an issue. Please cite articles in the following format: authors, (year), title, journal, DOI.

使用 xFakeSci 学习算法检测 ChatGPT 假科学。

Detection of ChatGPT fake science with the xFakeSci learning algorithm.

发表日期:2024 Jul 14
作者: Ahmed Abdeen Hamed, Xindong Wu
来源: Alzheimers & Dementia

摘要:

以 ChatGPT 为代表的生成式人工智能工具正在成为新的现实。这项研究的前提是“人工智能生成的内容可能表现出与科学文章不同的独特行为”。在这项研究中,我们展示了如何使用针对各种疾病和病症的即时工程手段生成文章。然后,我们展示如何分两个阶段测试这个前提并证明其有效性。随后,我们介绍了 xFakeSci,一种新颖的学习算法,它能够区分 ChatGPT 生成的文章和科学家发表的出版物。该算法使用两个来源驱动的网络模型进行训练。为了缓解过度拟合问题,我们采用了基于数据驱动启发式的校准步骤,包括邻近度和比率。具体来说,从针对三种不同医疗状况的总共 3952 篇假文章中,该算法仅使用 100 篇文章进行训练,但使用 100 篇文章的折叠进行校准。至于分类步骤,每个条件使用 300 篇文章进行。实际的标签步骤是针对 50 篇生成的文章和 50 篇真实的 PubMed 摘要的同等组合进行的。该测试还跨越了 2010 年至 2024 年的出版期,涵盖了对三种不同疾病的研究:癌症、抑郁症和阿尔茨海默氏症。此外,我们还根据一些经典数据挖掘算法(例如支持向量机、回归和朴素贝叶斯)评估了 xFakeSci 算法的准确性。 xFakeSci 算法的 F1 分数在 80% 到 94% 之间,优于常见的数据挖掘算法,后者的 F1 分数在 38% 到 52% 之间。我们将显着的差异归因于校准和邻近距离启发式的引入,这强调了这种有希望的性能。事实上,ChatGPT 产生的假科学预测提出了相当大的挑战。尽管如此,xFakeSci 算法的引入是打击假科学的重要一步。© 2024。作者。
Generative AI tools exemplified by ChatGPT are becoming a new reality. This study is motivated by the premise that "AI generated content may exhibit a distinctive behavior that can be separated from scientific articles". In this study, we show how articles can be generated using means of prompt engineering for various diseases and conditions. We then show how we tested this premise in two phases and prove its validity. Subsequently, we introduce xFakeSci, a novel learning algorithm, that is capable of distinguishing ChatGPT-generated articles from publications produced by scientists. The algorithm is trained using network models driven from both sources. To mitigate overfitting issues, we incorporated a calibration step that is built upon data-driven heuristics, including proximity and ratios. Specifically, from a total of a 3952 fake articles for three different medical conditions, the algorithm was trained using only 100 articles, but calibrated using folds of 100 articles. As for the classification step, it was performed using 300 articles per condition. The actual label steps took place against an equal mix of 50 generated articles and 50 authentic PubMed abstracts. The testing also spanned publication periods from 2010 to 2024 and encompassed research on three distinct diseases: cancer, depression, and Alzheimer's. Further, we evaluated the accuracy of the xFakeSci algorithm against some of the classical data mining algorithms (e.g., Support Vector Machines, Regression, and Naive Bayes). The xFakeSci algorithm achieved F1 scores ranging from 80 to 94%, outperforming common data mining algorithms, which scored F1 values between 38 and 52%. We attribute the noticeable difference to the introduction of calibration and a proximity distance heuristic, which underscores this promising performance. Indeed, the prediction of fake science generated by ChatGPT presents a considerable challenge. Nonetheless, the introduction of the xFakeSci algorithm is a significant step on the way to combating fake science.© 2024. The Author(s).