CAIM:基于覆盖率的微生物组识别分析。
CAIM: coverage-based analysis for identification of microbiome.
发表日期:2024 Jul 25
作者:
Daniel A Acheampong, Piroon Jenjaroenpun, Thidathip Wongsurawat, Alongkorn Kurilung, Yotsawat Pomyen, Sangam Kandel, Pattapon Kunadirek, Natthaya Chuaypen, Kanthida Kusonmano, Intawat Nookaew
来源:
BRIEFINGS IN BIOINFORMATICS
摘要:
宏基因组样本中微生物类群的准确分类学分析对于深入了解微生物生态学至关重要。测序技术的最新进展为通过整个鸟枪宏基因组方法以物种分辨率了解这些微生物做出了巨大贡献。在这项研究中,我们开发了一种新的生物信息学工具,即基于覆盖度的微生物组识别分析(CAIM),使用基于比对的方法在长读长和短读长宏基因组样本中进行准确的分类和定量。 CAIM 依靠两种不同的遏制技术来识别宏基因组样本中的物种,使用其基因组覆盖信息来过滤假阳性,而不是传统的相对丰度方法。此外,我们提出了一种基于核苷酸计数的丰度估计,其产生的均方根误差比传统的读数计数方法更小。我们通过将 CAIM 与其他性能最佳的工具进行比较,评估了 CAIM 在 28 个宏基因组模拟群落和 2 个合成数据集上的性能。与其他工具相比,CAIM 在识别微生物类群和估计相对丰度方面在整个数据集中保持了一贯的良好性能。然后将 CAIM 应用于在 Nanopore(有或没有扩增)和 Illumina 测序平台上测序的真实数据集,并发现测序平台之间的分类谱具有高度相似性。最后,CAIM 应用于来自 4 个不同国家的 232 名结直肠癌患者和 229 名对照者以及 44 名原发性肝癌患者和 76 名对照者的粪便鸟枪法宏基因组数据集。使用基因组覆盖度截止值的模型的预测性能优于使用相对丰度截止值的模型,以高度可信的物种标记区分结直肠癌和原发性肝癌患者与健康对照。© 作者 2024。出版者牛津大学出版社。
Accurate taxonomic profiling of microbial taxa in a metagenomic sample is vital to gain insights into microbial ecology. Recent advancements in sequencing technologies have contributed tremendously toward understanding these microbes at species resolution through a whole shotgun metagenomic approach. In this study, we developed a new bioinformatics tool, coverage-based analysis for identification of microbiome (CAIM), for accurate taxonomic classification and quantification within both long- and short-read metagenomic samples using an alignment-based method. CAIM depends on two different containment techniques to identify species in metagenomic samples using their genome coverage information to filter out false positives rather than the traditional approach of relative abundance. In addition, we propose a nucleotide-count-based abundance estimation, which yield lesser root mean square error than the traditional read-count approach. We evaluated the performance of CAIM on 28 metagenomic mock communities and 2 synthetic datasets by comparing it with other top-performing tools. CAIM maintained a consistently good performance across datasets in identifying microbial taxa and in estimating relative abundances than other tools. CAIM was then applied to a real dataset sequenced on both Nanopore (with and without amplification) and Illumina sequencing platforms and found high similarity of taxonomic profiles between the sequencing platforms. Lastly, CAIM was applied to fecal shotgun metagenomic datasets of 232 colorectal cancer patients and 229 controls obtained from 4 different countries and 44 primary liver cancer patients and 76 controls. The predictive performance of models using the genome-coverage cutoff was better than those using the relative-abundance cutoffs in discriminating colorectal cancer and primary liver cancer patients from healthy controls with a highly confident species markers.© The Author(s) 2024. Published by Oxford University Press.