通过局部梯度进行基于深度学习的医学图像评估的不确定性量化。

Uncertainty quantification via localized gradients for deep learning-based medical image assessments.

Original text

发表日期：2024 Jul 09

作者： Brayden Schott, Dmitry Pinchuk, Victor Santoro-Fernandes, Zan Klanecek, Luciano Rivetti, Alison Deatsch, Scott Perlman, Yixuan Li, Robert Jeraj

来源： Disease Models & Mechanisms

摘要：

辅助医学图像评估任务的深度学习模型必须准确可靠，才能在临床环境中部署。尽管深度学习模型已被证明在各种任务中都具有高度准确度，但表明这些模型可靠性的衡量标准尚未建立。越来越多地引入不确定性量化（UQ）方法来告知用户模型输出的可靠性。然而，大多数现有方法无法扩展到先前验证的模型，因为它们不是事后的，并且它们会改变模型的输出。在这项工作中，我们通过引入一种新颖的事后 UQ 方法（称为局部梯度 UQ）克服了这些限制，并证明了其在基于深度学习的转移性疾病描绘中的实用性。该方法利用训练模型的局部梯度空间来评估训练模型的敏感性参数。我们将局部梯度 UQ 方法与使用模型概率输出定义的非梯度度量进行了比较。每个不确定性测量的性能在四个临床相关实验中进行评估：（1）对人为降低图像质量的响应，（2）匹配的高质量和低质量临床图像之间的比较，（3）假阳性（FP）过滤，以及(4) 与医生评定的疾病可能性相符。(1) 通过局部梯度 UQ 方法增强了对人为降级图像质量的响应，其中未降级图像和大多数降级图像中匹配病灶之间的中位百分比差异始终较高。局部梯度不确定性测量优于非梯度不确定性测量（例如，加性高斯噪声为 62.35% vs. 2.16%）。 (2) 局部梯度 UQ 测量对高质量和低质量临床图像的反应更好（对于两种非梯度不确定性测量，p<0.05 vs p>0.1）。 (3)与非梯度方法相比，局部梯度UQ方法增强了FP过滤性能，使受试者工作特征曲线下面积(ROC AUC)增加了20.1%，假阳性率降低了26%。 (4) 局部梯度 UQ 方法还通过将与医生评定的疾病可能性对应的 ROC AUC 增加了 16.2%，显示出与医生评定的恶性病变可能性更有利的对应关系。总之，这项工作介绍并验证了一种新的基于梯度的方法UQ 方法用于基于深度学习的医学图像评估，以增强用户在使用已部署的临床模型时的信任。知识共享归属许可。

Deep learning models that aid in medical image assessment tasks must be both accurate and reliable to be deployed within clinical settings. While deep learning models have been shown to be highly accurate across a variety of tasks, measures that indicate the reliability of these models are less established. Increasingly, uncertainty quantification (UQ) methods are being introduced to inform users on the reliability of model outputs. However, most existing methods cannot be augmented to previously validated models because they are not post hoc, and they change a model's output. In this work, we overcome these limitations by introducing a novel post hoc UQ method, termed Local Gradients UQ, and demonstrate its utility for deep learning-based metastatic disease delineation.This method leverages a trained model's localized gradient space to assess sensitivities to trained model parameters. We compared the Local Gradients UQ method to non-gradient measures defined using model probability outputs. The performance of each uncertainty measure was assessed in four clinically relevant experiments: (1) response to artificially degraded image quality, (2) comparison between matched high- and low-quality clinical images, (3) false positive (FP) filtering, and (4) correspondence with physician-rated disease likelihood.(1) Response to artificially degraded image quality was enhanced by the Local Gradients UQ method, where the median percent difference between matching lesions in non-degraded and most degraded images was consistently higher for the Local Gradients uncertainty measure than the non-gradient uncertainty measures (e.g., 62.35% vs. 2.16% for additive Gaussian noise). (2) The Local Gradients UQ measure responded better to high- and low-quality clinical images (p<0.05 vs p>0.1 for both non-gradient uncertainty measures). (3) FP filtering performance was enhanced by the Local Gradients UQ method when compared to the non-gradient methods, increasing the area under the receiver operating characteristic curve (ROC AUC) by 20.1% and decreasing the false positive rate by 26%. (4) The Local Gradients UQ method also showed more favorable correspondence with physician-rated likelihood for malignant lesions by increasing ROC AUC for correspondence with physician-rated disease likelihood by 16.2%.In summary, this work introduces and validates a novel gradient-based UQ method for deep learning-based medical image assessments to enhance user trust when using deployed clinical models.Creative Commons Attribution license.