自我监督的三元对比学习用于分类子宫内膜组织病理图像。

Self-Supervised Triplet Contrastive Learning for Classifying Endometrial Histopathological Images.

Original text

发表日期：2023 Sep 12

作者： Fengjun Zhao, Zhiwei Wang, Hongyan Du, Xiaowei He, Xin Cao

来源： IEEE Journal of Biomedical and Health Informatics

摘要：

早期识别子宫内膜癌或癌前病变的组织病理图像对于精确的子宫内膜医疗是至关重要的，然而由于病理学家相对稀缺，这一任务变得越来越困难。计算机辅助诊断（CAD）通过基于特征工程的机器学习或端到端的深度学习（DL）提供了一个自动化的选择来确认子宫内膜疾病。特别是，先进的自监督学习减轻了监督学习对大规模人工标注数据的依赖，并可用于为特定分类任务预训练DL模型。因此，我们开发了一种新的自监督三元对比学习（SSTCL）模型，用于分类子宫内膜组织病理图像。具体而言，该模型包括一个在线分支和两个目标分支。第二个目标分支包括一种名为随机镶嵌遮罩（RMM）的简单而强大的增强方法，通过将遮罩图像的特征映射到完整图像的特征附近，起到了有效的正则化作用。此外，我们在每个分支中添加了一个狭颈变压器（BoT）模型作为自注意力模块，以通过考虑不同位置特征的内容信息和相对距离来学习全局信息。在公共子宫内膜数据集上，我们的模型使用20％、50％和100％标记图像的分类准确率分别为77.31±0.84％、80.87±0.48％和83.22±0.87％。在内部数据集上，我们的模型以96.81％的三类诊断准确度（95％置信区间为95.61-98.02％）获得了较好的效果。在这两个数据集上，我们的模型优于最先进的监督式和自监督方法。我们的模型可以帮助病理学家使用有限的人工标注组织病理图像实现高准确性和高效率的自动诊断子宫内膜疾病。

Early identification of endometrial cancer or precancerous lesions from histopathological images is crucial for precise endometrial medical care, which however is increasing hampered by the relative scarcity of pathologists. Computer-aided diagnosis (CAD) provides an automated alternative for confirming endometrial diseases with either feature-engineered machine learning or end-toend deep learning (DL). In particular, advanced selfsupervised learning alleviates the dependence of supervised learning on large-scale human-annotated data and can be used to pre-train DL models for specific classification tasks. Thereby, we develop a novel selfsupervised triplet contrastive learning (SSTCL) model for classifying endometrial histopathological images. Specifically, this model consists of one online branch and two target branches. The second target branch includes a simple yet powerful augmentation named random mosaic masking (RMM), which functions as an effective regularization by mapping the features of masked images close to those of intact ones. Moreover, we add a bottleneck Transformer (BoT) model into each branch as a selfattention module to learn the global information by considering both content information and relative distances between features at different locations. On public endometrial dataset, our model achieved four-class classification accuracies of 77.31±0.84, 80.87±0.48 and 83.22±0.87% using 20, 50 and 100% labeled images, respectively. When transferred to the in-house dataset, our model obtained a three-class diagnostic accuracy of 96.81% with 95% confidence interval of 95.61-98.02%. On both datasets, our model outperformed state-of-the-art supervised and self-supervised methods. Our model may help pathologists to automatically diagnose endometrial diseases with high accuracy and efficiency using limited human-annotated histopathological images.