用于虚拟筛选 YTHDF1 m6A 阅读器蛋白的数据增强机器学习评分功能。
Data-augmented machine learning scoring functions for virtual screening of YTHDF1 m6A reader protein.
发表日期:2024 Oct 12
作者:
Muhammad Junaid, Bo Wang, Wenjin Li
来源:
COMPUTERS IN BIOLOGY AND MEDICINE
摘要:
机器学习正在迅速推进药物发现过程,显着提高速度和效率。计算机辅助药物设计的创新主要是由基于结构和配体的方法驱动的。当目标的已知抑制剂数量有限时,通常首选数据增强策略来增强模型性能。在这项研究中,我们利用多种传统机器学习算法开发了基于结构的药物发现的预测机器学习模型,这些算法经过目标和配体动力学感知数据集的训练。为了说明我们的方法,我们提出了一个组合模型,该模型结合了分类和回归,利用 PLEC 特征来预测 YTHDF1 抑制剂。 YTHDF1 是一种参与 mRNA 翻译的关键 m6A 读取蛋白,与多种癌症有关,使其成为一个有前途的治疗靶点。由于 YTHDF1 蛋白独特的结合特性,使用通用评分函数的传统基于结构的虚拟筛选 (SBVS) 一直难以识别有效的 YTHDF1 抑制剂。为了克服这个问题,我们开发了 YTHDF1 特定的机器学习评分函数 (MLSF) 以增强 SBVS 功效。我们采用各种数据增强技术来生成综合数据集,其中包含配体和 YTHDF1 蛋白的多种构象。我们使用四种机器学习算法训练了 64 个特定于 YTHDF1 的 MLSF,并在十个测试集上对它们进行了评估,重点关注它们的预测和排名能力。我们的结果表明,具有蛋白质配体扩展连接指纹 (ANN-PLEC) 的人工神经网络优于其他 MLSF,始终实现 0.87 的高精确回忆曲线下面积 (PR-AUC)。这种方法显示出对活性分子数量有限的靶标的希望,为药物发现研究提供了一条可行的道路。 ANN-PLEC 评分函数在 GitHub 上免费提供,供其他研究人员访问和使用 https://github.com/JuniML/SBVS-YTHDF1/。版权所有 © 2024。由 Elsevier Ltd 发布。
Machine learning is rapidly advancing the drug discovery process, significantly enhancing speed and efficiency. Innovation in computer-aided drug design is primarily driven by structure- and ligand-based approaches. When the number of known inhibitors for a target is limited, data augmentation strategies are often preferred to enhance model performance. In this study, we developed predictive machine learning models for structure-based drug discovery leveraging multiple traditional machine learning algorithms trained with target and ligand dynamics-aware datasets. To illustrate our approach, we present a composite model that combines classification and regression to predict YTHDF1 inhibitors, utilizing PLEC features. YTHDF1, a key m6A reader protein involved in mRNA translation, is implicated in various cancers, making it a promising therapeutic target. Traditional structure-based virtual screening (SBVS) using generic scoring functions has struggled to identify potent YTHDF1 inhibitors due to the protein's unique binding characteristics. To overcome this, we developed YTHDF1-specific machine learning scoring functions (MLSFs) to enhance SBVS efficacy. We employed various data augmentation techniques to generate a comprehensive dataset, incorporating multiple conformations of ligands and the YTHDF1 protein. We have trained 64 YTHDF1-specific MLSFs using four machine learning algorithms and evaluated them on ten test sets, focusing on their predictive and ranking power. Our results demonstrate that the artificial neural network with protein-ligand extended connectivity fingerprints (ANN-PLEC) outperforms other MLSFs, consistently achieving high area under the precision-recall curve (PR-AUC) of 0.87. This method shows promise for targets with limited quantities of active molecules, providing a viable path forward for drug discovery research. The ANN-PLEC scoring function is made freely available on GitHub for other researchers to access and utilize https://github.com/JuniML/SBVS-YTHDF1/.Copyright © 2024. Published by Elsevier Ltd.