研究动态
Articles below are published ahead of final publication in an issue. Please cite articles in the following format: authors, (year), title, journal, DOI.

mosGraphGPT:使用生成人工智能的多组学信号图的基础模型。

mosGraphGPT: a foundation model for multi-omic signaling graphs using generative AI.

发表日期:2024 Aug 06
作者: Heming Zhang, Di Huang, Emily Chen, Dekang Cao, Tim Xu, Ben Dizdar, Guangfu Li, Yixin Chen, Philip Payne, Michael Province, Fuhai Li
来源: Alzheimers & Dementia

摘要:

生成式预训练模型代表了自然语言处理和计算机视觉领域的重大进步,它可以基于大型通用数据集的预训练生成连贯且上下文相关的内容,并针对特定任务进行微调。使用大规模组学数据构建基础模型有望解码和理解细胞内复杂的信号语言模式。与现有的组学数据基础模型不同,我们为多组学信号(mos)图构建了一个基础模型 mosGraphGPT ,其中使用多级信号图集成和解释多组学数据。该模型使用癌症基因组图谱 (TCGA) 中癌症的多组学数据进行预训练,并针对阿尔茨海默病 (AD) 的多组学数据进行微调。实验评估结果表明,该模型不仅可以提高疾病分类的准确性,而且可以通过揭示疾病靶点和信号相互作用来解释。模型代码通过 GitHub 上传,链接为:https://github.com/mosGraph/mosGraphGPT。
Generative pretrained models represent a significant advancement in natural language processing and computer vision, which can generate coherent and contextually relevant content based on the pre-training on large general datasets and fine-tune for specific tasks. Building foundation models using large scale omic data is promising to decode and understand the complex signaling language patterns within cells. Different from existing foundation models of omic data, we build a foundation model, mosGraphGPT , for multi-omic signaling (mos) graphs, in which the multi-omic data was integrated and interpreted using a multi-level signaling graph. The model was pretrained using multi-omic data of cancers in The Cancer Genome Atlas (TCGA), and fine-turned for multi-omic data of Alzheimer's Disease (AD). The experimental evaluation results showed that the model can not only improve the disease classification accuracy, but also is interpretable by uncovering disease targets and signaling interactions. And the model code are uploaded via GitHub with link: https://github.com/mosGraph/mosGraphGPT.