加权质心树:总结单标记肿瘤突变树推断中系统发育的通用方法。
Weighted Centroid Trees: A general approach to summarize phylogenies in single-labeled tumor mutation tree inference.
发表日期:2024 Jul 10
作者:
Hamed Vasei, Mohammad-Hadi Foroughmand-Araabi, Amir Daneshgar
来源:
BIOINFORMATICS
摘要:
肿瘤树描述了癌症的进化过程,为发现癌症中反复出现的进化过程提供了基础。虽然它们不是从基因组数据中提取的主要信息,但它们对于此目的很有价值。一种这样的提取方法涉及将多个树汇总为单个代表性树,例如共识树或超级树。我们定义加权质心树问题,通过以下步骤找到一组单标记有根树的质心树:1)映射将给定的树放入欧几里德空间,2)计算映射树的加权质心矩阵,3)找到距离质心矩阵最近的映射树(NMTP)。我们表明,此设置包含先前研究的父子和祖先后代指标以及 GraPhyC 和 TuELiP 共识树算法。此外,我们表明,虽然 NMTP 问题对于邻接嵌入来说是多项式时间可解的,但对于祖先和距离映射来说是 NP 困难的。我们在不同的设置中引入了 NMTP 的整数线性程序,其中我们还为祖先嵌入的情况提供了一种称为 2-AncL2 的新算法,该算法对祖先信号使用了一种新颖的加权方案。我们的实验结果表明,与现有的共识树算法相比,2-AncL2 具有优越的性能。我们还说明了我们的设置在为大型真实乳腺癌数据集提供代表性树方面的应用,推断聚类质心树总结了有关原始数据集的可靠进化信息。https://github.com/vasei/WAncILP。补充材料可在在线生物信息学。© 作者 2024。由牛津大学出版社出版。
Tumor trees, which depict the evolutionary process of cancer, provide a backbone for discovering recurring evolutionary processes in cancer. While they are not the primary information extracted from genomic data, they are valuable for this purpose. One such extraction method involves summarizing multiple trees into a single representative tree, such as consensus trees or supertrees.We define the weighted centroid tree problem to find the centroid tree of a set of single-labeled rooted trees through the following steps: 1) mapping the given trees into the Euclidean space, 2) computing the weighted centroid matrix of the mapped trees, and 3) finding the nearest mapped tree (NMTP) to the centroid matrix. We show that this setup encompasses previously studied parent-child and ancestor-descendent metrics as well as the GraPhyC and TuELiP consensus tree algorithms. Moreover, we show that, while the NMTP problem is polynomial-time solvable for the adjacency embedding, it is NP-hard for ancestry and distance mappings. We introduce integer linear programs for NMTP in different setups where we also provide a new algorithm for the case of ancestry embedding called 2-AncL2, that uses a novel weighting scheme for ancestry signals. Our experimental results show that 2-AncL2 has a superior performance compared to available consensus tree algorithms. We also illustrate our setup's application on providing representative trees for a large real breast cancer dataset, deducing that the cluster centroid trees summarize reliable evolutionary information about the original dataset.https://github.com/vasei/WAncILP.Supplementary materials are available at Bioinformatics online.© The Author(s) 2024. Published by Oxford University Press.