COPS：通过对聚类算法进行稳健的多目标评估来发现多组学疾病亚型的新平台。

COPS: A novel platform for multi-omic disease subtype discovery via robust multi-objective evaluation of clustering algorithms.

Original text

发表日期：2024 Aug 05

作者： Teemu J Rintala, Vittorio Fortino

来源： PLoS Computational Biology

摘要：

最近针对复杂疾病亚型的多视图聚类算法的研究经常忽视聚类稳定性和预后相关性的关键评估等方面。此外，当前的框架不允许对数据驱动和路径驱动的聚类进行比较，这凸显了方法上的巨大差距。我们推出了 COPS R 包，专为单组学和多组学聚类结果的稳健评估而定制。 COPS 具有先进的方法，包括相似性网络、基于核的方法、降维和路径知识集成。其中一些方法无法通过 R 访问，而另一些则对应于 COPS 提出的新方法。我们的框架利用 mRNA、CNV、miRNA 和 DNA 甲基化数据，严格应用于七种癌症类型的多组学数据，包括乳腺癌、前列腺癌和肺癌。与以前的研究不同，我们的方法对比了数据和知识驱动的多视图聚类方法，并结合了交叉折叠验证以提高鲁棒性。使用 ARI 评分、通过包含相关协变量的 Cox 回归模型进行生存分析以及结果的稳定性来评估聚类结果。虽然生存分析和黄金标准一致性是标准指标，但它们在不同方法和数据集之间存在很大差异。因此，有必要使用从聚类稳定性到预后相关性的多个标准来评估多视图聚类方法，并提供同时比较这些指标的方法，以选择新数据集中疾病亚型发现的最佳方法。强调多目标评估，我们应用帕累托效率概念来衡量每个癌症案例研究中评估指标的平衡。在多个案例研究中，亲和网络融合、综合非负矩阵分解和具有线性或路径诱导核的多核 K 均值在识别具有显着不同生存结果的群体方面是最稳定和有效的。版权所有：© 2024 Rintala，Fortino。这是一篇根据知识共享署名许可条款分发的开放获取文章，允许在任何媒体上不受限制地使用、分发和复制，前提是注明原始作者和来源。

Recent research on multi-view clustering algorithms for complex disease subtyping often overlooks aspects like clustering stability and critical assessment of prognostic relevance. Furthermore, current frameworks do not allow for a comparison between data-driven and pathway-driven clustering, highlighting a significant gap in the methodology. We present the COPS R-package, tailored for robust evaluation of single and multi-omics clustering results. COPS features advanced methods, including similarity networks, kernel-based approaches, dimensionality reduction, and pathway knowledge integration. Some of these methods are not accessible through R, and some correspond to new approaches proposed with COPS. Our framework was rigorously applied to multi-omics data across seven cancer types, including breast, prostate, and lung, utilizing mRNA, CNV, miRNA, and DNA methylation data. Unlike previous studies, our approach contrasts data- and knowledge-driven multi-view clustering methods and incorporates cross-fold validation for robustness. Clustering outcomes were assessed using the ARI score, survival analysis via Cox regression models including relevant covariates, and the stability of the results. While survival analysis and gold-standard agreement are standard metrics, they vary considerably across methods and datasets. Therefore, it is essential to assess multi-view clustering methods using multiple criteria, from cluster stability to prognostic relevance, and to provide ways of comparing these metrics simultaneously to select the optimal approach for disease subtype discovery in novel datasets. Emphasizing multi-objective evaluation, we applied the Pareto efficiency concept to gauge the equilibrium of evaluation metrics in each cancer case-study. Affinity Network Fusion, Integrative Non-negative Matrix Factorization, and Multiple Kernel K-Means with linear or Pathway Induced Kernels were the most stable and effective in discerning groups with significantly different survival outcomes in several case studies.Copyright: © 2024 Rintala, Fortino. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.