具有知识转移的高维广义线性模型的估计和推理。

Estimation and Inference for High-Dimensional Generalized Linear Models with Knowledge Transfer.

Original text

发表日期：2024

作者： Sai Li, Linjun Zhang, T Tony Cai, Hongzhe Li

来源： Disease Models & Mechanisms

摘要：

迁移学习提供了一个强大的工具，可以将相关研究的数据整合到感兴趣的目标研究中。在流行病学和医学研究中，目标疾病的分类可以借用其他相关疾病和人群的信息。在这项工作中，我们考虑高维广义线性模型（GLM）的迁移学习。提出了一种新颖的算法 TransHDGLM，该算法集成了目标研究和源研究的数据。建立了估计的最小最大收敛速率，并且所提出的估计器被证明是速率最优的。还研究了目标回归系数的统计推断。建立了偏估计量的渐近正态性，可用于构建回归系数的坐标置信区间。数值研究表明，与仅使用目标数据的 GLM 相比，估计和推理精度显着提高。所提出的方法应用于有关使用肠道微生物组对结直肠癌进行分类的真实数据研究，并且与仅使用目标数据的方法相比，可以提高分类准确性。

Transfer learning provides a powerful tool for incorporating data from related studies into a target study of interest. In epidemiology and medical studies, the classification of a target disease could borrow information across other related diseases and populations. In this work, we consider transfer learning for high-dimensional generalized linear models (GLMs). A novel algorithm, TransHDGLM, that integrates data from the target study and the source studies is proposed. Minimax rate of convergence for estimation is established and the proposed estimator is shown to be rate-optimal. Statistical inference for the target regression coefficients is also studied. Asymptotic normality for a debiased estimator is established, which can be used for constructing coordinate-wise confidence intervals of the regression coefficients. Numerical studies show significant improvement in estimation and inference accuracy over GLMs that only use the target data. The proposed methods are applied to a real data study concerning the classification of colorectal cancer using gut microbiomes, and are shown to enhance the classification accuracy in comparison to methods that only use the target data.