一种来自病理报告自然语言处理的新型分期系统,用于预测胰腺癌的预后结果:一项回顾性队列研究。
A novel staging system derived from natural language processing of pathology reports to predict prognostic outcomes of pancreatic cancer: a retrospective cohort study.
发表日期:2023 Aug 14
作者:
Bo Li, Beilei Wang, Pengjie Zhuang, Hongwei Cao, Shengyong Wu, Zhendong Tan, Suizhi Gao, Penghao Li, Wei Jin, Zhuo Shao, Kailian Zheng, Lele Wu, Bai Gao, Yang Wang, Hui Jiang, Shiwei Guo, Liang He, Yan Yang, Gang Jin
来源:
BIOMEDICINE & PHARMACOTHERAPY
摘要:
构建一个由病理学报告中的自然语言处理(NLP)衍生出来的新型肿瘤-结节-形态学(TNMor)分期系统,以预测胰腺导管腺癌(PDAC)的预后。本回顾性研究纳入了1,657名参与者,基于大型转诊中心和癌症基因组图谱计划(TCGA)数据库。在训练队列中,采用NLP技术从病理学报告中提取和筛选预后预测因子来开发TNMor系统,并将其与肿瘤-结节-转移(TNM)系统在内部和外部验证队列中进行进一步评估。主要结果通过Kaplan-Meier曲线的对数秩检验、一致性指数(C-index)和受试者工作特征曲线下面积(AUC)进行评估。NLP模型的精确度、召回率和F1值分别为88.83%、89.89%和89.21%。在Kaplan-Meier分析中,TNMor系统的分期间存活差异较TNM系统更为显著。此外,与TNM系统相比,我们的系统提供了改进的C-index(内部验证,0.58 vs. 0.54,P<0.001;外部验证,0.64 vs. 0.63,P<0.001),以及在1年、2年和3年生存方面的较高AUC值(内部验证:0.62 vs. 0.54,P<0.001;0.64 vs. 0.60,P=0.017;0.69 vs. 0.62,P=0.001;外部验证:0.69 vs. 0.65,P=0.098;0.68 vs. 0.64,P=0.154;0.64 vs. 0.55,P=0.032)。最后,与TNM系统相比,我们的系统在接受辅助治疗的患者精确分层方面尤为有益,具有改进的C-index(0.61 vs. 0.57,P<0.001),以及在1年、2年和3年生存方面的较高AUC值(0.64 vs. 0.57,P<0.001;0.64 vs. 0.58,P<0.001;0.67 vs. 0.61,P<0.001)。相比之下,这些发现表明TNMor系统在预测PDAC预后方面表现优于TNM系统。它是一个有前景的系统,可用于筛选风险调整的精准医学策略。版权所有© 2023作者。由Wolters Kluwer Health,Inc.出版。
To construct a novel Tumor-Node-Morphology (TNMor) staging system derived from natural language processing (NLP) of pathology reports to predict outcomes of pancreatic ductal adenocarcinoma (PDAC).This retrospective study with 1,657 participants was based on a large referral center and The Cancer Genome Atlas Program (TCGA) dataset. In the training cohort, NLP was used to extract and screen prognostic predictors from pathology reports to develop the TNMor system, which was further evaluated with the tumor-node-metastasis (TNM) system in the internal and external validation cohort, respectively. Main outcomes were evaluated by the log-rank test of Kaplan-Meier curves, concordance index (C-index) and area under receiver operating curve (AUC).The precision, recall, and F1 scores of the NLP model were 88.83%, 89.89%, and 89.21%, respectively. In Kaplan-Meier analysis, survival differences between stages in the TNMor system were more significant than that in the TNM system. In addition, our system provided an improved C-index (Internal validation, 0.58 vs. 0.54, P< 0.001; External validation, 0.64 vs. 0.63, P< 0.001), and higher AUCs for 1, 2, and 3-year survival (Internal validation: 0.62 vs. 0.54, P< 0.001; 0.64 vs. 0.60, P=0.017; 0.69 vs. 0.62, P=0.001; External validation: 0.69 vs. 0.65, P=0.098; 0.68 vs. 0.64, P=0.154; 0.64 vs. 0.55, P=0.032, respectively). Finally, our system was particularly beneficial for precise stratification of patients receiving adjuvant therapy, with an improved C-index (0.61 vs. 0.57, P< 0.001), and higher AUCs for 1, 2, and 3-year survival (0.64 vs. 0.57, P< 0.001; 0.64 vs. 0.58, P< 0.001; 0.67 vs. 0.61, P< 0.001; respectively) compared with the TNM system.These findings suggest that the TNMor system performed better than the TNM system in predicting PDAC prognosis. It is a promising system to screen risk-adjusted strategies for precision medicine.Copyright © 2023 The Author(s). Published by Wolters Kluwer Health, Inc.