通过蛋白质语言建模来研究人类和非肥胖糖尿病小鼠的MHC II类免疫肽组学。
Investigating the human and non-obese diabetic mouse MHC class II immunopeptidome using protein language modelling.
发表日期:2023 Aug 01
作者:
Philip Hartout, Bojana Počuča, Celia Méndez-García, Christian Schleberger
来源:
BIOINFORMATICS
摘要:
鉴定与主要组织相容性复合体II(MHCII)相关的肽段是评估免疫调节剂和药物原型的免疫调节功能的核心任务。MHCII-肽段呈递预测具有多种生物制药应用,包括通过计算机模拟进行生物制剂和工程衍生物的安全评估,或在免疫性疾病和癌症中推进特异性抗原免疫调节药物的快速发现计划。这已经导致了大规模的适应性免疫受体抗原反应和MHC相关肽质组学数据集的收集。与此同时,近期在蛋白质语言建模(PLM)中的深度学习算法进展表明,它们有潜力利用大量的序列数据并改善MHC呈递预测。在这里,我们对人类和小鼠MHCII免疫肽体组数据进行了紧凑型转换器模型(AEGIS)的训练,包括一种临床前小鼠模型,并评估了其在肽段呈递预测任务中的性能。我们展示了这个转换器与现有的深度学习算法表现相当,并且多个生物的数据集组合可以提高模型的性能。我们训练了带有和不带有MHCII信息的模型变体。在这两种选择中,首次通过在非肥胖糖尿病(NOD)小鼠中表达的I-Ag7 MHCII分子呈递的肽段的包含,使得能够准确地计算出临床前类型1糖尿病模型生物中的肽段的计算机模拟预测,这具有有希望的治疗应用。源代码可以在https://github.com/Novartis/AEGIS上获得。补充数据可以在Bioinformatics在线获取。© The Author(s) 2023. 由Oxford University Press出版。
Identifying peptides associated with the major histocompability complex class II (MHCII) is a central task in the evaluation of the immunoregulatory function of therapeutics and drug prototypes. MHCII-peptide presentation prediction has multiple biopharmaceutical applications, including the safety assessment of biologics and engineered derivatives in silico, or the fast progression of antigen-specific immunomodulatory drug discovery programs in immune disease and cancer. This has resulted in the collection of large-scale data sets on adaptive immune receptor antigenic responses and MHC-associated peptide proteomics. In parallel, recent deep learning algorithmic advances in protein language modelling (PLM) have shown potential in leveraging large collections of sequence data and improve MHC presentation prediction.Here, we train a compact transformer model (AEGIS) on human and mouse MHCII immunopeptidome data, including a preclinical murine model, and evaluate its performance on the peptide presentation prediction task. We show that the transformer performs on par with existing deep learning algorithms and that combining datasets from multiple organisms increases model performance. We trained variants of the model with and without MHCII information. In both alternatives, the inclusion of peptides presented by the I-Ag7 MHC class II molecule expressed by the non-obese diabetic (NOD) mice enabled for the first time the accuratein silico prediction of presented peptides in a preclinical type 1 diabetes model organism, which has promising therapeutic applications.The source code is available at https://github.com/Novartis/AEGIS.Supplementary data are available at Bioinformatics online.© The Author(s) 2023. Published by Oxford University Press.