前沿快讯
聚焦肿瘤与肿瘤类器官最新研究,动态一手掌握。

利用误差多的算法衍生表型:提升电子健康记录数据中的危险因素关联研究

Leveraging error-prone algorithm-derived phenotypes: Enhancing association studies for risk factors in EHR data

DOI 原文链接
用sci-hub下载
ℹ️
如无法下载,请从 Sci-Hub 选择可用站点尝试。
影响因子:4.5
分区:医学2区 / 计算机:跨学科应用3区 医学:信息3区
发表日期:2024 Sep
作者: Yiwen Lu, Jiayi Tong, Jessica Chubak, Thomas Lumley, Rebecca A Hubbard, Hua Xu, Yong Chen
DOI: 10.1016/j.jbi.2024.104690

摘要

为某一表型开发多种可计算的电子健康记录(EHR)衍生表型已日益普遍。然而,EHR基础的关联研究通常仅关注单一表型。本文提出一种同时利用多种EHR衍生表型的方法,以减少因表型误差带来的偏差,并提高表型/暴露关联的效率。该方法结合多个算法衍生的表型与少量经过验证的结局数据,以减少偏差、提升估计的准确性和效率。在模拟研究和实际应用中,我们以Kaiser Permanente Washington的结肠癌复发数据为例,评估了该方法的表现。在没有单一替代指标在敏感性和特异性方面表现优于其他的情况下,该方法显著减少了偏差,优于仅使用单一算法衍生表型的传统方法,估计效率提升最高达30%。模拟和实际应用均验证了该方法在整合多重表型、减少偏差、提升统计精度与效率方面的有效性。该方法利用统计学上高效的似无关回归框架整合多重代理信息,为缺乏优越单一替代指标的情境提供了稳健的替代方案。

Abstract

It has become increasingly common for multiple computable phenotypes from electronic health records (EHR) to be developed for a given phenotype. However, EHR-based association studies often focus on a single phenotype. In this paper, we develop a method aiming to simultaneously make use of multiple EHR-derived phenotypes for reduction of bias due to phenotyping error and improved efficiency of phenotype/exposure associations.The proposed method combines multiple algorithm-derived phenotypes with a small set of validated outcomes to reduce bias and improve estimation accuracy and efficiency. The performance of our method was evaluated through simulation studies and real-world application to an analysis of colon cancer recurrence using EHR data from Kaiser Permanente Washington.In settings where there was no single surrogate performing uniformly better than all others in terms of both sensitivity and specificity, our method achieved substantial bias reduction compared to using a single algorithm-derived phenotype. Our method also led to higher estimation efficiency by up to 30% compared to an estimator that used only one algorithm-derived phenotype.Simulation studies and application to real-world data demonstrated the effectiveness of our method in integrating multiple phenotypes, thereby enhancing bias reduction, statistical accuracy and efficiency.Our method combines information across multiple surrogates using a statistically efficient seemingly unrelated regression framework. Our method provides a robust alternative to single-surrogate-based bias correction, especially in contexts lacking information on which surrogate is superior.