研究动态
Articles below are published ahead of final publication in an issue. Please cite articles in the following format: authors, (year), title, journal, DOI.

稀疏半参数典型相关分析混合型数据。

Sparse semiparametric canonical correlation analysis for data of mixed types.

发表日期:2020 Sep
作者: Grace Yoon, Raymond J Carroll, Irina Gaynanova
来源: BIOMETRIKA

摘要:

规范相关分析是研究两组变量之间的线性关系的方法,但由于高维度和混合数据类型(连续/二元/零膨胀),在现代数据集上通常执行效果不佳。我们提出了一种新方法,用于稀疏混合数据类型的规范相关分析,不需要显式的参数假设。我们的主要贡献是使用截断潜在高斯科普拉来建模具有多余零的数据,这使我们能够推导出基于秩的潜在相关矩阵估计量,而无需估计边际转换函数。通过数值研究和对乳腺癌患者基因表达和微RNA数据之间的关联分析的应用,得到的半参数稀疏规范相关分析方法在高维环境中表现良好。
Canonical correlation analysis investigates linear relationships between two sets of variables, but often works poorly on modern datasets due to high-dimensionality and mixed data types (continuous/binary/zero-inflated). We propose a new approach for sparse canonical correlation analysis of mixed data types that does not require explicit parametric assumptions. Our main contribution is the use of truncated latent Gaussian copula to model the data with excess zeroes, which allows us to derive a rank-based estimator of latent correlation matrix without the estimation of marginal transformation functions. The resulting semiparametric sparse canonical correlation analysis method works well in high-dimensional settings as demonstrated via numerical studies, and application to the analysis of association between gene expression and micro RNA data of breast cancer patients.