GLDADec:标记基因引导的 LDA 建模,用于批量基因表达反卷积。
GLDADec: marker-gene guided LDA modeling for bulk gene expression deconvolution.
发表日期:2024 May 23
作者:
Iori Azuma, Tadahaya Mizuno, Hiroyuki Kusuhara
来源:
BRIEFINGS IN BIOINFORMATICS
摘要:
从大量转录组数据推断细胞类型比例对于免疫学和肿瘤学至关重要。在这里,我们引入引导式 LDA 反卷积 (GLDADec),这是一种批量反卷积方法,它使用细胞类型特定的标记基因名称来引导主题来估计每个样本的主题分布。通过使用血液数据集进行基准测试,我们展示了其高估计性能和稳健性。此外,我们将 GLDADec 应用于异质组织批量数据,并以数据驱动的方式进行全面的细胞类型分析。我们表明,GLDADec 在估计性能方面优于现有方法,并通过检查主题生物过程的丰富性来评估其生物可解释性。最后,我们将 GLDADec 应用于癌症基因组图谱肿瘤样本,根据估计的细胞类型比例进行亚型分层和生存分析,从而证明其在临床环境中的实用性。这种方法利用标记基因名称作为部分先验信息,可以应用于批量数据反卷积的各种场景。 GLDADec 可作为开源 Python 包在 https://github.com/mizuno-group/GLDADec 上获取。© 作者 2024。由牛津大学出版社出版。
Inferring cell type proportions from bulk transcriptome data is crucial in immunology and oncology. Here, we introduce guided LDA deconvolution (GLDADec), a bulk deconvolution method that guides topics using cell type-specific marker gene names to estimate topic distributions for each sample. Through benchmarking using blood-derived datasets, we demonstrate its high estimation performance and robustness. Moreover, we apply GLDADec to heterogeneous tissue bulk data and perform comprehensive cell type analysis in a data-driven manner. We show that GLDADec outperforms existing methods in estimation performance and evaluate its biological interpretability by examining enrichment of biological processes for topics. Finally, we apply GLDADec to The Cancer Genome Atlas tumor samples, enabling subtype stratification and survival analysis based on estimated cell type proportions, thus proving its practical utility in clinical settings. This approach, utilizing marker gene names as partial prior information, can be applied to various scenarios for bulk data deconvolution. GLDADec is available as an open-source Python package at https://github.com/mizuno-group/GLDADec.© The Author(s) 2024. Published by Oxford University Press.