应用机器学习分析电子健康档案以预测食管或贲门发生的腺癌。

Predicting Incident Adenocarcinoma of the Esophagus or Gastric Cardia Using Machine Learning of Electronic Health Records.

Original text

发表日期：2023 Aug 17

作者： Joel H Rubenstein, Simon Fontaine, Peter W MacDonald, Jennifer A Burns, Richard R Evans, Maria E Arasim, Joy W Chang, Elizabeth M Firsht, Sarah T Hawley, Sameer D Saini, Lauren P Wallner, Ji Zhu, Akbar K Waljee

来源： Disease Models & Mechanisms

摘要：

需要一种能够使用电子健康记录（EHR）自动预测食管腺癌（EAC）和贲门腺癌（GCA）发生的工具，以指导筛查决策。访问退伍军人事务部医疗保健集团数据仓库以确定2005年至2018年间接受至少1次就诊的退伍军人。通过退伍军人医保集团癌症登记中心识别到的EAC（8,430例）或GCA（2,965例）患者，与10,256,887个对照组进行了比较。预测因子包括在索引日期之前1至5年间的人口统计学信息、处方药品、实验室结果和诊断。开发并使用简单随机抽样插补和极限梯度提升（一种机器学习方法）进行内部验证，建立了Kettles食管和贲门腺癌预测（K-ECAN）工具。数据分别用于50％的训练、25％的初步验证和25％的最终测试。K-ECAN具有良好的校准性，并且比先前验证的模型（如HUNT和Kunzmann）和公开的指南具有更好的区分度（AuROC = 0.77）。仅使用索引日期前3到5年的数据会略微降低其准确性（AuROC = 0.75）。通过对男性进行欠采样以模拟非退伍军人医保人口，HUNT和Kunzmann的AUC值得到了改善，但是K-ECAN仍然最准确（AuROC = 0.85）。尽管GERD与EAC密切相关，但仅在预测中提供了少量的信息增益。K-ECAN是一种新颖的内部验证工具，可使用EHR数据预测EAC/GCA的发生。还需要进一步的工作来在退伍军人医保之外验证K-ECAN，并评估在EHR中实施的最佳方式。© 2023 AGA Institute。由Elsevier Inc.出版。版权所有。

Tools that can automatically predict incident esophageal adenocarcinoma (EAC) and gastric cardia adenocarcinoma (GCA) using electronic health records (EHR) to guide screening decisions are needed.The Veterans Health Administration (VHA) Corporate Data Warehouse was accessed to identify Veterans with ≥1 encounter between 2005 and 2018. Cases diagnosed with EAC (8,430) or GCA (2,965) were identified in the VHA Central Cancer Registry and compared to 10,256,887 controls. Predictors included demographics, prescriptions, laboratory results, and diagnoses between 1 and 5 years prior to index date. The Kettles Esophageal and Cardia Adenocarcinoma predictioN (K-ECAN) Tool was developed and internally validated using simple random sampling imputation and extreme gradient boosting, a machine learning method. Training was performed in 50%, preliminary validation in 25%, and final testing in 25% of the data.K-ECAN was well calibrated and had better discrimination (AuROC = 0.77) than previously validated models such as HUNT (AuROC = 0.68) and Kunzmann (AuROC = 0.64), or published guidelines. Using only data between 3 and 5 years prior to index slightly diminished its accuracy (AuROC = 0.75). Under-sampling men to simulate a non-VHA population, the AUCs of HUNT and Kunzmann improved, but K-ECAN was still most accurate (AuROC = 0.85). While GERD was strongly associated with EAC, it only contributed a small proportion of gain in information for prediction.K-ECAN is a novel, internally validated tool predicting incident EAC/GCA using EHR data. Further work is needed to validate K-ECAN outside VHA and to assess how best to implement it within EHRs.Copyright © 2023 AGA Institute. Published by Elsevier Inc. All rights reserved.