非稀疏回归模型的结构化迭代划分方法及其在生物数据分析中的应用。
A structured iterative division approach for non-sparse regression models and applications in biological data analysis.
发表日期:2024 May 23
作者:
Shun Yu, Yuehan Yang
来源:
Alzheimers & Dementia
摘要:
在本文中,我们重点关注估计具有非稀疏结构的数据的建模问题,特别关注表现出高度相关特征的生物数据。生物学和金融等各个领域都面临着非稀疏估计的挑战。我们使用所提出的方法(称为结构化迭代划分)来解决问题。结构化迭代划分有效地将数据划分为非稀疏和稀疏结构,并消除大量不相关变量,在保持计算效率的同时显着减少误差。数值和理论结果证明了所提出的方法在广泛的问题上的竞争优势,并且所提出的方法在与几种现有方法的数值比较中表现出优异的统计性能。我们将所提出的算法应用于两个生物学问题:基因微阵列数据集和嵌合蛋白数据集,分别用于乳腺癌和阿尔茨海默氏病远处转移的预后风险。结构化迭代划分提供了对基因识别和选择的见解,我们还在预测癌症风险和识别关键因素方面提供了有意义的结果。
In this paper, we focus on the modeling problem of estimating data with non-sparse structures, specifically focusing on biological data that exhibit a high degree of relevant features. Various fields, such as biology and finance, face the challenge of non-sparse estimation. We address the problems using the proposed method, called structured iterative division. Structured iterative division effectively divides data into non-sparse and sparse structures and eliminates numerous irrelevant variables, significantly reducing the error while maintaining computational efficiency. Numerical and theoretical results demonstrate the competitive advantage of the proposed method on a wide range of problems, and the proposed method exhibits excellent statistical performance in numerical comparisons with several existing methods. We apply the proposed algorithm to two biology problems, gene microarray datasets, and chimeric protein datasets, to the prognostic risk of distant metastasis in breast cancer and Alzheimer's disease, respectively. Structured iterative division provides insights into gene identification and selection, and we also provide meaningful results in anticipating cancer risk and identifying key factors.