基于电子健康记录的乳腺癌表型NLP算法的跨机构评估。
A cross-institutional evaluation on breast cancer phenotyping NLP algorithms on electronic health records.
发表日期:2023
作者:
Sicheng Zhou, Nan Wang, Liwei Wang, Ju Sun, Anne Blaes, Hongfang Liu, Rui Zhang
来源:
Computational and Structural Biotechnology Journal
摘要:
基于Transformer的语言模型由于在临床自然语言处理任务上的优秀表现而在临床领域中得到了广泛应用。然而,在模型开发过程中,往往忽略了这些模型的泛化能力。本研究评估了基于Transformer的临床自然语言处理模型CancerBERT以及经典机器学习模型(如条件随机场CRF、双向长短期记忆条件随机场BiLSTM-CRF)在不同临床机构之间的泛化能力,通过进行乳腺癌表型提取任务。我们从明尼苏达大学(UMN)和梅奥诊所(MC)的电子病历中收集了两个乳腺癌患者的临床语料库,并按照相同的标准进行了注释。我们开发了三种类型的自然语言处理模型(即CRF、BiLSTM-CRF和CancerBERT)来从临床文本中提取癌症表型。我们使用不同的学习策略(模型迁移和局部训练)在不同的测试集上评估了模型的泛化能力。通过与模型性能之间的关联,评估了实体覆盖率得分。我们在UMN和MC手动注释了200份和161份临床文档。两个机构的语料库在目标实体之间的相似度要高于整体语料库。在两个临床机构的独立测试集和置换测试集中,CancerBERT模型获得了最佳性能。在一个机构开发并在另一个机构进行进一步微调的CancerBERT模型相比于在本地数据上开发的模型也取得了合理的性能(微平均F1值:0.925 vs 0.932)。结果表明,在我们的命名实体识别任务中,CancerBERT模型在三种临床自然语言处理模型中具有较强的学习能力和泛化性能,并且具有识别复杂实体的优势,如具有不同标签的实体。
© 2023 The Authors. Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology.
Transformer-based language models are prevailing in the clinical domain due to their excellent performance on clinical NLP tasks. The generalizability of those models is usually ignored during the model development process. This study evaluated the generalizability of CancerBERT, a Transformer-based clinical NLP model, along with classic machine learning models, i.e., conditional random field (CRF), bi-directional long short-term memory CRF (BiLSTM-CRF), across different clinical institutes through a breast cancer phenotype extraction task.Two clinical corpora of breast cancer patients were collected from the electronic health records from the University of Minnesota (UMN) and Mayo Clinic (MC), and annotated following the same guideline. We developed three types of NLP models (i.e., CRF, BiLSTM-CRF and CancerBERT) to extract cancer phenotypes from clinical texts. We evaluated the generalizability of models on different test sets with different learning strategies (model transfer vs locally trained). The entity coverage score was assessed with their association with the model performances.We manually annotated 200 and 161 clinical documents at UMN and MC. The corpora of the two institutes were found to have higher similarity between the target entities than the overall corpora. The CancerBERT models obtained the best performances among the independent test sets from two clinical institutes and the permutation test set. The CancerBERT model developed in one institute and further fine-tuned in another institute achieved reasonable performance compared to the model developed on local data (micro-F1: 0.925 vs 0.932).The results indicate the CancerBERT model has superior learning ability and generalizability among the three types of clinical NLP models for our named entity recognition task. It has the advantage to recognize complex entities, e.g., entities with different labels.© 2023 The Authors. Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology.