预测手动修复自动分段所需的工作量。

Predicting the effort required to manually mend auto-segmentations.

Original text

发表日期：2024 Jun 13

作者： Da He, Jayaram K Udupa, Yubing Tong, Drew A Torigian

来源： Best Pract Res Cl Ob

摘要：

自动分割是医学图像分析的关键和基础步骤之一。自动分割技术的质量影响精密放射学和放射肿瘤学的效率，因为高质量的自动分割通常需要有限的手动校正。分割指标对于评估自动分割结果和指导自动分割技术的开发是必要且重要的。目前广泛应用的分割指标通常根据重叠面积（例如骰子系数（DC））或边界之间的距离（例如豪斯多夫距离（HD））将自动分割与地面实况进行比较。然而，这些指标可能无法很好地表明在临床实践中观察自动分割结果时所需的手动修复工作。在本文中，我们研究了不同的分割指标，以探索评估自动分割与临床需求的适当方法。记录专家纠正自动分割的修补时间，以指示所需的修补工作量。五个明确定义的指标：基于重叠区域的指标 DC、基于分段边界距离的指标 HD、基于分段边界长度的指标表面 DC (surDC) 和添加路径长度 (APL)，以及新提出的混合指标 Mendability在相关分析实验和回归实验中讨论了指数（MI）。除了这些明确定义的指标之外，我们还初步探索了使用深度学习模型来预测修复工作的可行性，该模型以分割掩模和原始图像作为输入。实验使用来自三个不同机构的 7 个对象的数据集进行，其中包含原始计算机断层扫描 (CT) 图像、地面真实分割、自动分割、校正分割和记录的修补时间。根据对五个明确定义的指标的相关分析和回归实验，MI 的变化在指示稀疏对象的修复工作方面表现出最佳性能，而 HD 的变化在评估非稀疏对象的修复工作时效果最好。此外，深度学习模型可以很好地预测修复自动分割所需的工作，即使不需要地面实况分割，这展示了一种新颖且简单的方法来评估和增强自动分割技术的潜力。

Auto-segmentation is one of the critical and foundational steps for medical image analysis. The quality of auto-segmentation techniques influences the efficiency of precision radiology and radiation oncology since high-quality auto-segmentations usually require limited manual correction. Segmentation metrics are necessary and important to evaluate auto-segmentation results and guide the development of auto-segmentation techniques. Currently widely applied segmentation metrics usually compare the auto-segmentation with the ground truth in terms of the overlapping area (e.g., Dice Coefficient (DC)) or the distance between boundaries (e.g., Hausdorff Distance (HD)). However, these metrics may not well indicate the manual mending effort required when observing the auto-segmentation results in clinical practice. In this article, we study different segmentation metrics to explore the appropriate way of evaluating auto-segmentations with clinical demands. The mending time for correcting auto-segmentations by experts is recorded to indicate the required mending effort. Five well-defined metrics, the overlapping area-based metric DC, the segmentation boundary distance-based metric HD, the segmentation boundary length-based metrics surface DC (surDC) and added path length (APL), and a newly proposed hybrid metric Mendability Index (MI) are discussed in the correlation analysis experiment and regression experiment. In addition to these explicitly defined metrics, we also preliminarily explore the feasibility of using deep learning models to predict the mending effort, which takes segmentation masks and the original images as the input. Experiments are conducted using datasets of 7 objects from three different institutions, which contain the original computed tomography (CT) images, the ground truth segmentations, the auto-segmentations, the corrected segmentations, and the recorded mending time. According to the correlation analysis and regression experiments for the five well-defined metrics, the variety of MI shows the best performance to indicate the mending effort for sparse objects, while the variety of HD works best when assessing the mending effort for non-sparse objects. Moreover, the deep learning models could well predict efforts required to mend auto-segmentations, even without the need of ground truth segmentations, demonstrating the potential of a novel and easy way to evaluate and boost auto-segmentation techniques.