SMiT：疾病严重程度检测的对称掩蔽变压器。

SMiT: symmetric mask transformer for disease severity detection.

Original text

发表日期：2023 Sep 12

作者： Chengsheng Zhang, Cheng Chen, Chen Chen, Xiaoyi Lv

来源： DIABETES & METABOLISM

摘要：

深度学习方法在智能诊断疾病方面的应用已成为智能医学研究的焦点。在处理图像分类任务时，如果病灶区域小而不均匀，训练中涉及的背景图像会影响最终确定病变范围的准确性。我们没有按照传统方法从CNN模型的角度构建智能系统来协助医生诊断，而是提出了一种纯Transformer框架，用于病理图像的诊断分级。我们提出了一种Symmetric Mask Pre-Training视觉Transformer SMiT模型，用于对病理癌症图像进行分级。SMiT在第一个和最后一个编码器层位置对输入图像令牌序列进行对称相同的高概率稀疏化，对视觉变压器进行预训练，并在加载预训练权重后，对基线模型的参数进行微调，使模型能够更集中地提取病变区域的详细特征，有效地摆脱潜在的特征依赖问题。SMiT在经过高斯滤波去噪处理的4500张结肠癌组织病理图像上实现了92.8%的分类准确率。我们在公开可用的Kaggle上的糖尿病视网膜病变数据集APTOS2019上验证了本研究方法的有效性和普适性，并分别实现了91.9%的二次Cohen Kappa系数、86.91%的准确率和72.85%的F1-score，比基于CNN模型的先前研究结果提高了1-2%。SMiT使用了更简单的策略以实现更好的结果，以协助医生做出准确的临床决策，这对于在医学成像领域充分利用视觉变压器具有启示作用。© 2023. 作者(们)独家许可Springer-Verlag GmbH Germany发表，隶属于 Springer Nature。

The application of deep learning methods to the intelligent diagnosis of diseases has been the focus of intelligent medical research. When dealing with image classification tasks, if the lesion area is small and uneven, the background image involved in the training will affect the ultimate accuracy in determining the extent of the lesion. We did not follow the traditional approach of building an intelligent system to assist physicians in diagnosis from the perspective of CNN models, but instead proposed a pure transformer framework that can be used for diagnostic grading of pathological images.We propose a Symmetric Mask Pre-Training vision Transformer SMiT model for grading pathological cancer images. SMiT performs a symmetrically identical high probability sparsification of the input image token sequence at the first and last encoder layer positions to pre-train visual transformers, and the parameters of the baseline model are fine-tuned after loading the pre-training weights, allowing the model to concentrate more on extracting detailed features in the lesion region, effectively getting rid of the potential feature dependency problem.SMiT achieved 92.8% classification accuracy on 4500 histopathological images of colorectal cancer processed by Gaussian filter denoising. We validated the effectiveness and generalizability of this study's methodology on the publicly available diabetic retinopathy dataset APTOS2019 from Kaggle and achieved quadratic Cohen Kappa, accuracy and F1-score of 91.9%, 86.91% and 72.85%, respectively, which were 1-2% better than previous studies based on CNN models.SMiT uses a simpler strategy to achieve better results to assist physicians in making accurate clinical decisions, which can be an inspiration for making good use of the visual transformers in the field of medical imaging.© 2023. The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.