PredLLPS_PSSM:一种基于进化信息和深度神经网络的液液蛋白分离鉴定新型预测器。
PredLLPS_PSSM: a novel predictor for liquid-liquid protein separation identification based on evolutionary information and a deep neural network.
发表日期:2023 Aug 22
作者:
Shengming Zhou, Yetong Zhou, Tian Liu, Jia Zheng, Cangzhi Jia
来源:
BRIEFINGS IN BIOINFORMATICS
摘要:
生物分子凝聚物的形成通过液液相分离(LLPS)已成为细胞中生物活动时空协调的普遍机制,并被广泛观察到直接调控与癌细胞病理相关的关键细胞过程。然而,蛋白质序列的复杂性和构象的多样性是固有的无序的,这给LLPS蛋白计算和实验研究带来了巨大挑战。在此,我们提出了一种新的预测器,名为PredLLPS_PSSM,用于仅基于序列演化信息的LLPS蛋白鉴定。由于寻找真实可靠的样本是构建预测器的基石,我们从三个数据库的最新版本中重新收集和整理了LLPS蛋白。通过比较位置特异性评分矩阵(PSSM)和词嵌入的性能,PredLLPS_PSSM结合了基于PSSM的信息和两个深度学习框架。利用三个现有的独立测试数据集和两个新构建的独立测试数据集进行的独立测试表明,与最先进的方法相比,PredLLPS_PSSM具有更好的性能。此外,我们还对三个数据库中未包含的来自三种昆虫的九个实验鉴定的LLPS蛋白进行了PredLLPS_PSSM的测试。此外,我们还应用强大的Shapley Additive exPlanation算法和热图找到了与LLPS相关的最关键的氨基酸。© 本文作者 2023 年。由牛津大学出版社出版。版权所有。有关权限,请发送电子邮件至:journals.permissions@oup.com。
The formation of biomolecular condensates by liquid-liquid phase separation (LLPS) has become a universal mechanism for spatiotemporal coordination of biological activities in cells and has been widely observed to directly regulate the key cellular processes involved in cancer cell pathology. However, the complexity of protein sequences and the diversity of conformations are inherently disordered, which poses great challenges for LLPS protein calculations and experimental research. Herein, we proposed a novel predictor named PredLLPS_PSSM for LLPS protein identification based only on sequence evolution information. Because finding real and reliable samples is the cornerstone of building predictors, we collected anew and collated the LLPS proteins from the latest versions of three databases. By comparing the performance of the position-specific score matrix (PSSM) and word embedding, PredLLPS_PSSM combined PSSM-based information and two deep learning frameworks. Independent tests using three existing independent test datasets and two newly constructed independent test datasets demonstrated the superiority of PredLLPS_PSSM compared with state-of-the-art methods. Furthermore, we tested PredLLPS_PSSM on nine experimentally identified LLPS proteins from three insects that were not included in any of the databases. In addition, the powerful Shapley Additive exPlanation algorithm and heatmap were applied to find the most critical amino acids relevant to LLPS.© The Author(s) 2023. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.