研究动态
Articles below are published ahead of final publication in an issue. Please cite articles in the following format: authors, (year), title, journal, DOI.

用于解释筛查性乳房 X 光检查的人工智能:双重读取实践中漏诊癌症和难以定位病变的影响。

AI for interpreting screening mammograms: implications for missed cancer in double reading practices and challenging-to-locate lesions.

发表日期:2024 May 24
作者: Zhengqiang Jiang, Ziba Gandomkar, Phuong Dung Yun Trieu, Seyedamir Tavakoli Taba, Melissa L Barron, Sarah J Lewis
来源: Best Pract Res Cl Ob

摘要:

尽管已经对在各种情况下添加人工智能作为替代第二阅读器的价值进行了研究,但尚不清楚在双重阅读实践中实施人工智能工具是否会捕获两位独立评估乳房X光检查的放射科医生所遗漏的其他细微癌症。本文评估了两种最先进的人工智能 (AI) 模型在采用双读实践的筛查计划中检测回顾性发现的漏诊癌症的有效性。该研究还探讨了人工智能和放射科医生在定位病灶方面的一致性,考虑到放射科医生在定位病灶方面的不同程度的一致性。全局感知多实例分类器 (GMIC) 和全局局部激活图 (GLAM) 模型针对我们的数据集进行了微调。我们评估了这两种模型对由三名放射科医生组成的小组回顾性发现的漏检癌症的敏感性,该小组回顾了在双重阅读实践的筛查计划中发现的 729 例癌症病例的先前检查。其中两名专家对病变进行了注释,并根据其一致性水平将病例分类为“几乎完美”、“大量”、“中等”和“差”。我们采用相似度或直方图交集 (SIM) 和 Kullback-Leibler 散度 (KLD) 指标来比较 AI 模型中恶性病例的显着性图与放射科医生在每个类别中的注释。总共有 24.82% 的癌症被标记为“漏诊”。 GMIC 和 GLAM 对漏检癌症病例的表现分别为 82.98% 和 79.79%,而对于真正筛查出的癌症,表现分别为 89.54% 和 87.25%(敏感度差异的 p 值 < 0.05) )。正如预期的那样,显着图上的 SIM 和 KLD 最好为“几乎完美”,其次是“大量”、“中等”和“差”。 GMIC 和 GLAM(p 值 < 0.05)在较高的一致性下表现出较高的灵敏度。即使在具有独立双读的筛查项目中,添加人工智能也有可能识别出遗漏的癌症。然而,放射科医生难以定位病变,这也给 AI 带来了类似的挑战。© 2024。作者。
Although the value of adding AI as a surrogate second reader in various scenarios has been investigated, it is unknown whether implementing an AI tool within double reading practice would capture additional subtle cancers missed by both radiologists who independently assessed the mammograms. This paper assesses the effectiveness of two state-of-the-art Artificial Intelligence (AI) models in detecting retrospectively-identified missed cancers within a screening program employing double reading practices. The study also explores the agreement between AI and radiologists in locating the lesions, considering various levels of concordance among the radiologists in locating the lesions. The Globally-aware Multiple Instance Classifier (GMIC) and Global-Local Activation Maps (GLAM) models were fine-tuned for our dataset. We evaluated the sensitivity of both models on missed cancers retrospectively identified by a panel of three radiologists who reviewed prior examinations of 729 cancer cases detected in a screening program with double reading practice. Two of these experts annotated the lesions, and based on their concordance levels, cases were categorized as 'almost perfect,' 'substantial,' 'moderate,' and 'poor.' We employed Similarity or Histogram Intersection (SIM) and Kullback-Leibler Divergence (KLD) metrics to compare saliency maps of malignant cases from the AI model with annotations from radiologists in each category. In total, 24.82% of cancers were labeled as "missed." The performance of GMIC and GLAM on the missed cancer cases was 82.98% and 79.79%, respectively, while for the true screen-detected cancers, the performances were 89.54% and 87.25%, respectively (p-values for the difference in sensitivity < 0.05). As anticipated, SIM and KLD from saliency maps were best in 'almost perfect,' followed by 'substantial,' 'moderate,' and 'poor.' Both GMIC and GLAM (p-values < 0.05) exhibited greater sensitivity at higher concordance. Even in a screening program with independent double reading, adding AI could potentially identify missed cancers. However, the challenging-to-locate lesions for radiologists impose a similar challenge for AI.© 2024. The Author(s).