异质伪体模拟可以对细胞类型反卷积方法进行真实的基准测试。
Heterogeneous pseudobulk simulation enables realistic benchmarking of cell-type deconvolution methods.
发表日期:2024 Jul 01
作者:
Mengying Hu, Maria Chikina
来源:
GENOME BIOLOGY
摘要:
计算细胞类型反卷积能够估计大块组织中的细胞类型丰度,对于理解组织微环境(尤其是肿瘤组织)非常重要。随着反卷积方法的快速发展,许多基准研究已经发表,旨在对这些方法进行综合评估。基准研究依赖于细胞类型解析的单细胞 RNA-seq 数据,通过按受控比例添加单个细胞类型来创建模拟伪批量数据集。在我们的工作中,我们展示了这种方法的标准应用,该方法使用随机选择的单细胞无论它们之间的内在差异如何,都会生成缺乏适当生物方差的合成批量表达值。我们演示了当前使用随机单元的批量仿真流程为何以及如何不现实,并提出了异构仿真策略作为解决方案。异构模拟的批量样本与实际批量数据集中观察到的方差相匹配,因此可以通过多种方式为基准测试提供具体的好处。我们证明,反卷积方法的概念类别在对异质性的鲁棒性方面存在显着差异,而无参考方法的表现特别差。对于基于回归的方法,异构模拟提供了一个明确的框架来区分参考构建和回归方法对性能的贡献。最后,我们对八个不同数据集的不同方法进行了广泛的基准测试,发现 BayesPrism 和混合 MuSiC/CIBERSORTx 方法表现最佳。我们的异构批量模拟方法和整个基准测试框架是在一个用户友好的包中实现的 https:// /github.com/humengying0907/deconvBenchmarking 和 https://doi.org/10.5281/zenodo.8206516,促进反卷积方法的进一步发展。© 2024。作者。
Computational cell type deconvolution enables the estimation of cell type abundance from bulk tissues and is important for understanding tissue microenviroment, especially in tumor tissues. With rapid development of deconvolution methods, many benchmarking studies have been published aiming for a comprehensive evaluation for these methods. Benchmarking studies rely on cell-type resolved single-cell RNA-seq data to create simulated pseudobulk datasets by adding individual cells-types in controlled proportions.In our work, we show that the standard application of this approach, which uses randomly selected single cells, regardless of the intrinsic difference between them, generates synthetic bulk expression values that lack appropriate biological variance. We demonstrate why and how the current bulk simulation pipeline with random cells is unrealistic and propose a heterogeneous simulation strategy as a solution. The heterogeneously simulated bulk samples match up with the variance observed in real bulk datasets and therefore provide concrete benefits for benchmarking in several ways. We demonstrate that conceptual classes of deconvolution methods differ dramatically in their robustness to heterogeneity with reference-free methods performing particularly poorly. For regression-based methods, the heterogeneous simulation provides an explicit framework to disentangle the contributions of reference construction and regression methods to performance. Finally, we perform an extensive benchmark of diverse methods across eight different datasets and find BayesPrism and a hybrid MuSiC/CIBERSORTx approach to be the top performers.Our heterogeneous bulk simulation method and the entire benchmarking framework is implemented in a user friendly package https://github.com/humengying0907/deconvBenchmarking and https://doi.org/10.5281/zenodo.8206516 , enabling further developments in deconvolution methods.© 2024. The Author(s).