VirRep:一种混合语言表示学习框架,用于从人类肠道宏基因组中识别病毒。
VirRep: a hybrid language representation learning framework for identifying viruses from human gut metagenomes.
发表日期:2024 Jul 04
作者:
Yanqi Dong, Wei-Hua Chen, Xing-Ming Zhao
来源:
GENOME BIOLOGY
摘要:
从宏基因组中识别病毒是探索人类肠道病毒组成的常见步骤。在这里,我们介绍 VirRep,一种混合语言表示学习框架,用于从人类肠道宏基因组中识别病毒。 VirRep 结合了上下文感知编码器和进化感知编码器,通过结合 k-mer 模式和序列同源性来改进序列表示。对不同病毒比例的模拟和真实数据集进行基准测试表明,VirRep 的性能优于最先进的方法。当应用于结直肠癌队列的粪便宏基因组时,VirRep 识别出 39 种与该疾病相关的高质量病毒种类,其中许多病毒无法通过现有方法检测到。© 2024。作者。
Identifying viruses from metagenomes is a common step to explore the virus composition in the human gut. Here, we introduce VirRep, a hybrid language representation learning framework, for identifying viruses from human gut metagenomes. VirRep combines a context-aware encoder and an evolution-aware encoder to improve sequence representation by incorporating k-mer patterns and sequence homologies. Benchmarking on both simulated and real datasets with varying viral proportions demonstrates that VirRep outperforms state-of-the-art methods. When applied to fecal metagenomes from a colorectal cancer cohort, VirRep identifies 39 high-quality viral species associated with the disease, many of which cannot be detected by existing methods.© 2024. The Author(s).