利用容易出错的算法衍生的表型：EHR数据中风险因素的关联研究增强

Leveraging error-prone algorithm-derived phenotypes: Enhancing association studies for risk factors in EHR data

JOURNAL OF BIOMEDICAL INFORMATICS

影响因子:4.50000

分区:医学2区 / 计算机：跨学科应用3区医学：信息3区

发表日期:2024 Sep

作者: Yiwen Lu, Jiayi Tong, Jessica Chubak, Thomas Lumley, Rebecca A Hubbard, Hua Xu, Yong Chen

摘要

从电子健康记录（EHR）为给定表型开发的多种可计算表型已经变得越来越普遍。但是，基于EHR的关联研究通常集中于单一表型。在本文中，我们开发了一种旨在同时利用多种EHR衍生的表型来减少由于表型误差而降低偏差的方法，并提高了表型/暴露关联的效率。该方法结合了多个Algorithm衍生的表型与一小部分验证的偏见和提高偏见和估计的精确效率和效率的估算和效率，并降低了一小部分验证表型。通过模拟研究和现实世界的应用，使用来自Kaiser Permanente Washington的EHR数据来评估我们的方法的性能。在这种情况下，没有单一的替代物在敏感性和特异性方面表现出色，而我们的方法均优于所有其他代理，我们的方法与单一的Algorithm-dergority dernepypersype相比，我们的方法降低了实质性偏见。与仅使用一种算法衍生的表型相比，我们的方法还导致更高的估计效率提高了30％。仿真研究和对现实世界数据的应用证明了我们方法在多种表型中的有效性，从而增强了降低偏见的降低，统计准确性和效率。我们的方法为基于单溶剂的偏置校正提供了一种可靠的替代方法，尤其是在缺乏替代物具有优越性信息的上下文中。

Abstract

It has become increasingly common for multiple computable phenotypes from electronic health records (EHR) to be developed for a given phenotype. However, EHR-based association studies often focus on a single phenotype. In this paper, we develop a method aiming to simultaneously make use of multiple EHR-derived phenotypes for reduction of bias due to phenotyping error and improved efficiency of phenotype/exposure associations.The proposed method combines multiple algorithm-derived phenotypes with a small set of validated outcomes to reduce bias and improve estimation accuracy and efficiency. The performance of our method was evaluated through simulation studies and real-world application to an analysis of colon cancer recurrence using EHR data from Kaiser Permanente Washington.In settings where there was no single surrogate performing uniformly better than all others in terms of both sensitivity and specificity, our method achieved substantial bias reduction compared to using a single algorithm-derived phenotype. Our method also led to higher estimation efficiency by up to 30% compared to an estimator that used only one algorithm-derived phenotype.Simulation studies and application to real-world data demonstrated the effectiveness of our method in integrating multiple phenotypes, thereby enhancing bias reduction, statistical accuracy and efficiency.Our method combines information across multiple surrogates using a statistically efficient seemingly unrelated regression framework. Our method provides a robust alternative to single-surrogate-based bias correction, especially in contexts lacking information on which surrogate is superior.