研究动态
Articles below are published ahead of final publication in an issue. Please cite articles in the following format: authors, (year), title, journal, DOI.

评估用于识别和表征美国高风险非肌肉侵袭性膀胱癌患者的自然语言处理模型。

Evaluation of a Natural Language Processing Model to Identify and Characterize Patients in the United States With High-Risk Non-Muscle-Invasive Bladder Cancer.

发表日期:2023 Sep
作者: Vikram M Narayan, Despina Siolas, Eric S Meadows, Vladimir Turzhitsky, Arthur Sillah, Kentaro Imai, Andrew J McMurry, Haojie Li
来源: MEDICINE & SCIENCE IN SPORTS & EXERCISE

摘要:

非肌层浸润性膀胱癌 (NMIBC) 的治疗以临床和病理标准的风险分层为指导。本研究旨在开发一种自然语言处理 (NLP) 模型,用于从非结构化电子病历 (EMR) 中回顾性识别高风险 NMIBC 患者,并应用该模型来描述患者和肿瘤特征。我们使用了三个独立的 EMR 衍生数据数据集包括 2011-2020 年诊断为膀胱癌的成年患者,用于 NLP 模型开发和训练 (n = 140)、验证 (n = 697) 以及回顾性队列分析的应用 (n = 4,402)。采用深度学习方法训练NLP对病历术语的识别,识别七种高危NMIBC标准;使用 F1 分数评估模型性能,并根据特征进行加权。然后使用算法将每位患者分类为高风险 NMIBC(是/否)。手动审核的记录是黄金标准。模型训练后,除一个不常见特征(前列腺尿道受累)外,所有特征的 F1 分数均 >0.7。 Ta (0.897) 和 T1 (0.897) 的受试者工作曲线下面积 (AUC) 最高;原位癌的 AUC 最低(CIS;0.617)。对于高危NMIBC分类,阳性预测值为79.4%,阴性预测值为93.2%,假阳性率为8.9%。敏感性和特异性分别为 83.7% 和 91.1%。在748名手动确认为高危NMIBC的患者中,196名(26%)患有CIS(其中19%还患有T1,23%还患有Ta病); 552 个肿瘤 (74%) 没有相关的 CIS。NLP 模型与基于规则的算法相结合,识别出具有良好性能的高风险 NMIBC,并使未来的工作能够研究高风险的真实治疗模式和临床结果NMIBC。
Treatment of non-muscle-invasive bladder cancer (NMIBC) is guided by risk stratification using clinical and pathologic criteria. This study aimed to develop a natural language processing (NLP) model for identifying patients with high-risk NMIBC retrospectively from unstructured electronic medical records (EMRs) and to apply the model to describe patient and tumor characteristics.We used three independent EMR-derived data sets including adult patients with a bladder cancer diagnosis in 2011-2020 for NLP model development and training (n = 140), validation (n = 697), and application for the retrospective cohort analysis (n = 4,402). Deep learning methods were used to train NLP recognition of medical chart terminology to identify seven high-risk NMIBC criteria; model performance was assessed using the F1 score, weighted across features. An algorithm was then used to classify each patient as high-risk NMIBC (yes/no). Manually reviewed records served as the gold standard.The F1 scores after model training were >0.7 for all but one uncommon feature (prostatic urethral involvement). The highest area under the receiver operating curves (AUC) was observed for Ta (0.897) and T1 (0.897); the lowest AUC was for carcinoma in situ (CIS; 0.617). For high-risk NMIBC classification, positive predictive value was 79.4%, negative predictive value was 93.2%, and false-positive rate was 8.9%. Sensitivity and specificity were 83.7% and 91.1%, respectively. Of 748 patients manually confirmed as having high-risk NMIBC, 196 (26%) had CIS (of whom 19% also had T1 and 23% also had Ta disease); 552 tumors (74%) had no associated CIS.The NLP model, combined with a rule-based algorithm, identified high-risk NMIBC with good performance and will enable future work to study real-world treatment patterns and clinical outcomes for high-risk NMIBC.