通过使用正则化回归模型建模路径间的相互依赖关系,进行一致的路径富集估计。
Coherent pathway enrichment estimation by modeling inter-pathway dependencies using regularized regression.
发表日期:2023 Aug 23
作者:
Kim Philipp Jablonski, Niko Beerenwinkel
来源:
BIOINFORMATICS
摘要:
基因集富集方法是一种常用的工具,可提高基因列表的可解释性,比如从差异基因表达分析中获取的基因列表。这些方法基于计算是否与病理生物途径相关的基因发生异常调节地位较多,超过了按机会预期的情况。基因集富集工具依赖于预先存在的病理生物途径数据库,如KEGG、Reactome或基因本体论。这些数据库在规模和途径之间的冗余性方面正在增长,这给统计富集计算带来了困难。我们解决了这个问题,开发了一种新型的基因集富集方法,称为pareg,它基于正则化广义线性模型,并在富集计算中直接考虑了与特定生物功能相关的基因集之间的依赖关系,例如,由于共享基因。我们展示了pareg对于噪声的抗干扰性较竞争方法更好。此外,我们还利用TCGA中的乳腺癌样本在探索性分析中展示了我们方法恢复已知途径以及建议新型治疗靶点的能力。pareg作为一个R包在Bioconductor(https://bioconductor.org/packages/release/bioc/html/pareg.html)上以及https://github.com/cbg-ethz/pareg上免费提供。GitHub仓库还包含了用于复现所有结果所需的Snakemake工作流程。补充数据可在Bioinformatics在线上获取。©作者2023年。由牛津大学出版社发表。
Gene set enrichment methods are a common tool to improve the interpretability of gene lists as obtained, for example, from differential gene expression analyses. They are based on computing whether dysregulated genes are located in certain biological pathways more often than expected by chance. Gene set enrichment tools rely on pre-existing pathway databases such as KEGG, Reactome, or the Gene Ontology. These databases are increasing in size and in the number of redundancies between pathways, which complicates the statistical enrichment computation.We address this problem and develop a novel gene set enrichment method, called pareg, which is based on a regularized generalized linear model and directly incorporates dependencies between gene sets related to certain biological functions, for example, due to shared genes, in the enrichment computation. We show that pareg is more robust to noise than competing methods. Additionally, we demonstrate the ability of our method to recover known pathways as well as to suggest novel treatment targets in an exploratory analysis using breast cancer samples from TCGA.pareg is freely available as an R package on Bioconductor (https://bioconductor.org/packages/release/bioc/html/pareg.html) as well as on https://github.com/cbg-ethz/pareg. The GitHub repository also contains the Snakemake workflows needed to reproduce all results presented here.Supplementary data are available at Bioinformatics online.© The Author(s) 2023. Published by Oxford University Press.