iACP-DFSRA:基于ResCNN和Attention双通道融合策略的抗癌肽鉴定。
iACP-DFSRA: Identification of Anticancer Peptides Based on a Dual-channel Fusion Strategy of ResCNN and Attention.
发表日期:2024 Oct 01
作者:
Xin Wang, Zimeng Zhang, Chang Liu
来源:
JOURNAL OF MOLECULAR BIOLOGY
摘要:
抗癌肽(ACPs)由于安全性好、副作用合理、选择性高而被广泛应用于癌症治疗。然而,由于 ACP 的鉴定极其昂贵,因此经过实验验证的 ACP 数量有限。因此,迫切需要准确且经济高效的 ACP 识别方法。在这项工作中,我们提出了一种基于深度学习的模型,名为 iACP-DFSRA,用于 ACP 识别。具体来说,我们采用了两种序列嵌入技术,ProtBert_BFD预训练语言模型和手工特征来编码蛋白质序列。然后,使用LightGBM进行特征选择,将选择的特征分别输入到ResCNN和Attention机制中,提取局部和全局特征。最后,利用Attention机制对级联特征进行深度融合,让关键特征得到模型更多的关注,并通过全连接层进行预测。 10倍交叉验证的结果表明,与最新的AACFlow模型相比,iACP-DFSRA模型在大多数指标上都提供了改进的结果,Sp为94.15%,Sn为95.32%,Acc为94.74%,MCC为89.48%。事实上,iACP-DFSRA 模型是该独立测试数据集上 Acc > 90% 且 MCC > 80% 的唯一模型。此外,我们还进一步证明了我们的模型在其他数据集上的优越性。此外,t-SNE和SHAP解释分析表明,使用两个通道进行特征提取并使用Attention机制进行深度融合至关重要,这有助于iACP-DFSRA更有效地预测ACP。版权所有©2024 Elsevier Ltd.保留权利。
Anticancer peptides (ACPs) have been widely applied in the treatment of cancer owing to good safety, rational side effects, and high selectivity. However, the number of ACPs that have been experimentally validated is limited as identification of ACPs is extremely expensive. Hence, accurate and cost-effective identification methods for ACPs are urgently needed. In this work, we proposed a deep learning-based model, named iACP-DFSRA, for ACPs identification. Specifically, we adopted two kinds of sequence embedding technologies, ProtBert_BFD pre-training language model and handcrafted features to encode protein sequences. Then, the LightGBM was used for feature selection, and the selected features were input into ResCNN and Attention mechanism, respectively, to extract local and global features. Finally, the concatenate features were deeply fused by using the Attention mechanism to allow key features to be paid more attention to by the model and make predictions by fully connected layer. The results of 10-fold cross-validation demonstrated that the iACP-DFSRA model delivered improved results in most metrics with Sp of 94.15%, Sn of 95.32%, Acc of 94.74% and MCC of 89.48% compared to the latest AACFlow model. Indeed, the iACP-DFSRA model is the only model with Acc > 90% and MCC > 80% on this independent test dataset. Furthermore, we have further demonstrated the superiority of our model on additional datasets. In addition, t-SNE and SHAP interpretation analysis demonstrated that it is crucial to use two channels for feature extraction and use the Attention mechanism for deep fusion, which helps the iACP-DFSRA to predict ACPs more effectively.Copyright © 2024 Elsevier Ltd. All rights reserved.