基于图像、基于文本和多模态 AI 方法的比较评估，用于乳房 X 光检查中自动乳腺密度评估。

Comparative evaluation of image-based vs. text-based vs. multimodal AI approaches for automatic breast density assessment in mammograms.

Original text

发表日期：2024 Jul 20

作者： Pilar López-Úbeda, Teodoro Martín-Noguerol, Félix Paulano-Godino, Antonio Luna

来源： Comput Meth Prog Bio

摘要：

在过去的十年中，人们对将人工智能（AI）系统应用于乳腺癌评估（包括乳腺密度评估）越来越感兴趣。然而，很少有模型被开发来整合文本乳房X线照相报告和乳房X线照相图像。我们的目标是 (1) 生成基于自然语言处理 (NLP) 的人工智能系统，(2) 评估基于外部图像的软件，以及 (3) 使用后期融合方法，通过集成来开发多模态系统根据美国放射学会 (ACR) 乳房 X 光检查和放射学报告指南，对乳腺密度进行图像和文本推断。我们首先比较了不同的 NLP 模型，其中三个基于 n-gram 术语频率 - 逆文档频率和两个 Transformer基于的架构，使用 1533 个非结构化乳房 X 光检查报告作为训练集，使用 303 个报告作为测试集。随后，我们使用 303 个乳房 X 光图像评估了一个基于外部图像的软件。最后，我们评估了我们的多模态系统，同时考虑了文本和乳房 X 光图像。我们最好的 NLP 模型达到了 88% 的准确度，而外部软件和多模态系统在 ACR 乳房密度分类方面分别达到了 75% 和 80% 的准确度。我们的多模态系统优于基于图像的工具，但它目前并没有改善 NLP 模型为 ACR 乳腺密度分类提供的结果。尽管如此，这里观察到的有希望的结果为关于利用多模式工具评估乳腺密度进行更全面的研究提供了可能性。版权所有 © 2024 Elsevier B.V. 保留所有权利。

In the last decade, there has been a growing interest in applying artificial intelligence (AI) systems to breast cancer assessment, including breast density evaluation. However, few models have been developed to integrate textual mammographic reports and mammographic images. Our aims are (1) to generate a natural language processing (NLP)-based AI system, (2) to evaluate an external image-based software, and (3) to develop a multimodal system, using the late fusion approach, by integrating image and text inferences for the automatic classification of breast density according to the American College of Radiology (ACR) guidelines in mammograms and radiological reports.We first compared different NLP models, three based on n-gram term frequency - inverse document frequency and two transformer-based architectures, using 1533 unstructured mammogram reports as a training set and 303 reports as a test set. Subsequently, we evaluated an external image-based software using 303 mammogram images. Finally, we assessed our multimodal system taking into account both text and mammogram images.Our best NLP model achieved 88 % accuracy, while the external software and the multimodal system achieved 75 % and 80 % accuracy, respectively, in classifying ACR breast densities.Although our multimodal system outperforms the image-based tool, it currently does not improve the results offered by the NLP model for ACR breast density classification. Nevertheless, the promising results observed here open the possibility to more comprehensive studies regarding the utilization of multimodal tools in the assessment of breast density.Copyright © 2024 Elsevier B.V. All rights reserved.