从错误中学习：阴性预训练和课程化查询用于组织学组织分类的主动学习。

Learning from Incorrectness: Active Learning with Negative Pre-training and Curriculum Querying for Histological Tissue Classification.

Original text

发表日期：2023 Sep 08

作者： Wentao Hu, Lianglun Cheng, Guoheng Huang, Xiaochen Yuan, Guo Zhong, Chi-Man Pun, Jian Zhou, Muyan Cai

来源： IEEE TRANSACTIONS ON MEDICAL IMAGING

摘要：

组织切片的层级组织组织分类是一种有效的组织切片分析预处理方法。然而，使用深度学习对组织进行分类需要昂贵的注释成本。为了缓解注释预算的限制，将主动学习（AL）应用于组织分类是一种有前途的解决方案。然而，在应用过程中，各类别之间的性能存在很大的不平衡，与相对性能不足的类别相对应的组织对于癌症诊断同样重要。本文中，我们提出了一种主动学习框架称为ICAL，其中包含错误性负预训练（INP）和类别化课程查询（CCQ），这两种方法从类别到类别和类别自身的角度解决了上述问题。尤其是，INP将来自CCQ的错误预测结果作为负预训练的补充标签，以更好地在训练过程中区分相似类别。CCQ根据INP训练的模型对每个类别的学习状态进行调整，利用不确定性评估并补偿由于类别性能不足而引起的查询偏差。在两个组织分类数据集上的实验结果表明，ICAL在少于16%的标记数据下实现了接近完全监督学习的性能。与最先进的主动学习算法相比，ICAL在所有类别中实现了更好且更平衡的性能，且在极低的标注预算下保持了稳健性。源代码将在https://github.com/LactorHwt/ICAL发布。

Patch-level histological tissue classification is an effective pre-processing method for histological slide analysis. However, the classification of tissue with deep learning requires expensive annotation costs. To alleviate the limitations of annotation budgets, the application of active learning (AL) to histological tissue classification is a promising solution. Nevertheless, there is a large imbalance in performance between categories during application, and the tissue corresponding to the categories with relatively insufficient performance are equally important for cancer diagnosis. In this paper, we propose an active learning framework called ICAL, which contains Incorrectness Negative Pre-training (INP) and Category-wise Curriculum Querying (CCQ) to address the above problem from the perspective of category-to-category and from the perspective of categories themselves, respectively. In particular, INP incorporates the unique mechanism of active learning to treat the incorrect prediction results that obtained from CCQ as complementary labels for negative pre-training, in order to better distinguish similar categories during the training process. CCQ adjusts the query weights based on the learning status on each category by the model trained by INP, and utilizes uncertainty to evaluate and compensate for query bias caused by inadequate category performance. Experimental results on two histological tissue classification datasets demonstrate that ICAL achieves performance approaching that of fully supervised learning with less than 16% of the labeled data. In comparison to the state-of-the-art active learning algorithms, ICAL achieved better and more balanced performance in all categories and maintained robustness with extremely low annotation budgets. The source code will be released at https://github.com/LactorHwt/ICAL.