Deep5hmC:通过多模式深度学习模型预测全基因组 5-羟甲基胞嘧啶景观。
Deep5hmC: Predicting genome-wide 5-Hydroxymethylcytosine landscape via a multimodal deep learning model.
发表日期:2024 Aug 28
作者:
Xin Ma, Sai Ritesh Thela, Fengdi Zhao, Bing Yao, Zhexing Wen, Peng Jin, Jinying Zhao, Li Chen
来源:
Alzheimers & Dementia
摘要:
5-羟甲基胞嘧啶 (5hmC) 是一种重要的表观遗传标记,在调节组织特异性基因表达方面发挥着重要作用,对于了解人类基因组的动态功能至关重要。尽管它很重要,但预测整个基因组中的 5hmC 修饰仍然是一项具有挑战性的任务,特别是考虑到 DNA 序列与各种表观遗传因素(例如组蛋白修饰和染色质可及性)之间的复杂相互作用时。使用组织特异性 5hmC 测序数据,我们引入了 Deep5hmC,一种多模式方法深度学习框架,集成了 DNA 序列和表观遗传特征,例如组蛋白修饰和染色质可及性,以预测全基因组 5hmC 修饰。与 Deep5hmC 的单峰版本和最先进的机器学习方法相比,Deep5hmC 的多峰设计在预测定性和定量 5hmC 修饰方面表现出显着改进。这一改进通过对在前脑类器官发育过程中的四个发育阶段收集的 17 个人体组织的一套全面的 5hmC 测序数据进行基准测试来证明。与 DeepSEA 和随机森林相比,在预测二元 5hmC 修饰位点时,Deep5hmC 在四个前脑发育阶段的 AUROC 分别实现了接近 4% 和 17% 的改进,在 17 个人体组织中分别实现了 6% 和 27% 的改进;在预测连续 5hmC 修饰时,四个前脑发育阶段的 Spearman 相关系数提高了 8% 和 22%,17 个人体组织的 Spearman 相关系数提高了 17% 和 30%。值得注意的是,Deep5hmC 通过在阿尔茨海默病病例对照研究中准确预测基因表达并识别差异羟甲基化区域,展示了其实用性。 Deep5hmC 显着提高了我们对组织特异性基因调控的理解,并促进了复杂疾病新生物标志物的开发。Deep5hmC 可通过 https://github.com/lichen-lab/Deep5hmC 获取。补充数据可在 Bioinformatics online 获取。©作者 2024 年。由牛津大学出版社出版。
5-hydroxymethylcytosine (5hmC), a crucial epigenetic mark with a significant role in regulating tissue-specific gene expression, is essential for understanding the dynamic functions of the human genome. Despite its importance, predicting 5hmC modification across the genome remains a challenging task, especially when considering the complex interplay between DNA sequences and various epigenetic factors such as histone modifications and chromatin accessibility.Using tissue-specific 5hmC sequencing data, we introduce Deep5hmC, a multimodal deep learning framework that integrates both the DNA sequence and epigenetic features such as histone modification and chromatin accessibility to predict genome-wide 5hmC modification. The multimodal design of Deep5hmC demonstrates remarkable improvement in predicting both qualitative and quantitative 5hmC modification compared to unimodal versions of Deep5hmC and state-of-the-art machine learning methods. This improvement is demonstrated through benchmarking on a comprehensive set of 5hmC sequencing data collected at four developmental stages during forebrain organoid development and across 17 human tissues. Compared to DeepSEA and random forest, Deep5hmC achieves close 4 % and 17% improvement of AUROC across four forebrain developmental stages, and 6% and 27% across 17 human tissues for predicting binary 5hmC modification sites; and 8% and 22% improvement of Spearman correlation coefficient across four forebrain developmental stages, and 17% and 30% across 17 human tissues for predicting continuous 5hmC modification. Notably, Deep5hmC showcases its practical utility by accurately predicting gene expression and identifying differentially hydroxymethylated regions in a case-control study of Alzheimer's disease. Deep5hmC significantly improves our understanding of tissue-specific gene regulation and facilitates the development of new biomarkers for complex diseases.Deep5hmC is available via https://github.com/lichen-lab/Deep5hmC.Supplementary data are available at Bioinformatics online.© The Author(s) 2024. Published by Oxford University Press.