MA-SAM：用于 3D 医学图像分割的模态不可知 SAM 适应。

MA-SAM: Modality-agnostic SAM adaptation for 3D medical image segmentation.

Original text

发表日期：2024 Aug 22

作者： Cheng Chen, Juzheng Miao, Dufan Wu, Aoxiao Zhong, Zhiling Yan, Sekeun Kim, Jiang Hu, Zhengliang Liu, Lichao Sun, Xiang Li, Tianming Liu, Pheng-Ann Heng, Quanzheng Li

来源： MEDICAL IMAGE ANALYSIS

摘要：

Segment Anything Model (SAM) 是通用图像分割的基础模型，在众多自然图像分割任务中展示了令人印象深刻的零样本性能。然而，当应用于医学图像时，SAM 的性能显着下降，这主要是由于自然图像域和医学图像域之间存在巨大差异。为了有效地使 SAM 适应医学图像，在微调过程中纳入关键的三维信息（即体积或时间知识）非常重要。同时，我们的目标是在其原始 2D 主干中充分利用 SAM 的预训练权重。在本文中，我们介绍了一种与模态无关的 SAM 适应框架，称为 MA-SAM，适用于各种体积和视频医疗数据。我们的方法植根于参数高效的微调策略，仅更新一小部分权重增量，同时保留 SAM 的大部分预训练权重。通过将一系列 3D 适配器注入图像编码器的转换器块中，我们的方法使预先训练的 2D 主干能够从输入数据中提取三维信息。我们使用 CT、MRI 和手术视频数据的 11 个公共数据集，在 5 个医学图像分割任务上全面评估我们的方法。值得注意的是，在不使用任何提示的情况下，我们的方法始终优于各种最先进的 3D 方法，在 CT 多器官分割、MRI 前列腺分割和 Dice 中超过 nnU-Net 0.9%、2.6% 和 9.9%。分别进行手术场景分割。我们的模型还表现出很强的泛化能力，并且在使用提示时擅长挑战肿瘤分割。我们的代码位于：https://github.com/cchen-cc/MA-SAM。版权所有 © 2024 Elsevier B.V. 保留所有权利。

The Segment Anything Model (SAM), a foundation model for general image segmentation, has demonstrated impressive zero-shot performance across numerous natural image segmentation tasks. However, SAM's performance significantly declines when applied to medical images, primarily due to the substantial disparity between natural and medical image domains. To effectively adapt SAM to medical images, it is important to incorporate critical third-dimensional information, i.e., volumetric or temporal knowledge, during fine-tuning. Simultaneously, we aim to harness SAM's pre-trained weights within its original 2D backbone to the fullest extent. In this paper, we introduce a modality-agnostic SAM adaptation framework, named as MA-SAM, that is applicable to various volumetric and video medical data. Our method roots in the parameter-efficient fine-tuning strategy to update only a small portion of weight increments while preserving the majority of SAM's pre-trained weights. By injecting a series of 3D adapters into the transformer blocks of the image encoder, our method enables the pre-trained 2D backbone to extract third-dimensional information from input data. We comprehensively evaluate our method on five medical image segmentation tasks, by using 11 public datasets across CT, MRI, and surgical video data. Remarkably, without using any prompt, our method consistently outperforms various state-of-the-art 3D approaches, surpassing nnU-Net by 0.9%, 2.6%, and 9.9% in Dice for CT multi-organ segmentation, MRI prostate segmentation, and surgical scene segmentation respectively. Our model also demonstrates strong generalization, and excels in challenging tumor segmentation when prompts are used. Our code is available at: https://github.com/cchen-cc/MA-SAM.Copyright © 2024 Elsevier B.V. All rights reserved.