使用人工智能预测的蛋白质结构作为参考来预测肿瘤抑制乳腺癌基因的功能丧失活性。
Using AI-predicted protein structures as a reference to predict loss-of-function activity in tumor suppressor breast cancer genes.
发表日期:2024 Dec
作者:
Rohan Gnanaolivu, Steven N Hart
来源:
Computational and Structural Biotechnology Journal
摘要:
肿瘤抑制乳腺癌基因 BRCA1、BRCA2、PALB2 和 RAD51C 中大多数错义变异的功能丧失 (LOF) 分类仍未分类,并且混淆了临床可操作性。由于它们的稀有性,对这些变异进行分类具有挑战性,导致临床医生依赖于计算机预测方法。蛋白质稳定性变化与功能相关,因此稳定性预测很有价值。对错义变异扰动的稳定性预测需要高分辨率的蛋白质结构。然而,这些高分辨率结构的可用性仍然缺乏。本研究探索使用生成式 AI 来预测高分辨率蛋白质结构,然后使用计算机蛋白质稳定性预测方法对其进行分析,以评估蛋白质有序区域中的 LOF 活性。这项研究还确定了 dbNSFP v4.7 数据库中适当的计算机蛋白质稳定性和专用计算机错义预测方法,以预测这四个基因的有序区域中的 LOF 活性。来自同源重组 DNA 修复 (HDR) 测定的功能分类和来自 ClinVar 数据库的变异分类为评估这些计算机预测方法的性能提供了可靠的数据集。BRCA1-C 末端 (BRCT) 结构域和 DNA- 的复杂 AlphaFold2 结构使用蛋白质稳定性工具 FoldX 分析 BRCA2 的结合 (DB) 结构域,预测错义变体的 LOF 活性明显优于有序区域中实验衍生的结构。 BRCT 域的曲线下面积 (AUC)= 0.861 (95% CI:0.858-0.863) 和 AUC= 0.842 (95% CI:0.840-0.845),而 DB 域的 AUC= 0.836 (95% CI) :0.8322-0.841),相比之下,BRCT 域的 AUC= 0.847(95% CI:0.844-0.850)和 AUC= 0.835(95% CI:0.832-0.837),以及 AUC= 0.830(95% CI:0.821-0.8320) )来自实验衍生结构的 DB 域。蛋白质稳定性并不能比专用的计算机错义预测器更好地预测错义变体的 LOF 活性。总体而言,我们发现与 dbNSFP 数据库中存在的所有其他计算机错义预测因子相比,AlphaMissense 排名靠前,这四个癌症基因的有序区域的平均 AUC= 0.890 (95% CI 0.886-0.895)。在根据基因 BRCA1、BRCA2、PALB2 和 RAD51C 的有序区域中预测的蛋白质稳定性评估 LOF 活性时,生成式 AI 蛋白质预测结构可以优于实验衍生的结构。该研究还强调了 AlphaMissense 作为首屈一指的计算机错义预测方法的预测性能,可预测这四种肿瘤抑制乳腺癌基因中错义变异的 LOF 活性。本研究的代码可以在 GitHub (https://github.com/rohandavidg/CarePred) 上免费下载。© 2024 作者。
The loss-of-function (LOF) classification of most missense variants in tumor suppressor breast cancer genes BRCA1, BRCA2, PALB2, and RAD51C remains unclassified and confounds clinical actionability. Classifying these variants is challenging due to their rarity, leading clinicians to rely on in silico predictive methods. Protein stability changes are associated with function, making stability predictors valuable. Stability predictions upon missense variant perturbations require high-resolution protein structures. However, the availability of these high-resolution structures is lacking. This study explores using generative AI to predict high-resolution protein structures, which can then be analyzed with in silico protein stability prediction methods to assess LOF activity in ordered regions of the protein. This study also determines the appropriate in silico protein stability and dedicated in silico missense prediction methods in dbNSFP v4.7 database to predict LOF activity in ordered regions of these four genes. Functional classifications from homology recombination DNA repair (HDR) assays and variant classifications from the ClinVar database provide a reliable dataset for evaluating the performance of these in silico prediction methods.Complex AlphaFold2 structures of the BRCA1-C terminal (BRCT) domain and the DNA-binding (DB) domain of BRCA2, analyzed using protein stability tool FoldX predicts LOF activity from missense variants significantly better than experimentally-derived structures in ordered regions. The BRCT domain achieved an Area Under the Curve (AUC)= 0.861 (95 % CI:0.858-0.863) and AUC= 0.842 (95 % CI:0.840-0.845), while the DB domain achieved an AUC= 0.836 (95 % CI:0.8322-0.841), compared to AUC= 0.847 (95 % CI:0.844-0.850) and AUC= 0.835 (95 % CI:0.832-0.837) from the BRCT domain, and AUC= 0.830 (95 % CI:0.821-0.8320) from the DB domain from experimentally-derived structures. Protein stability does not predict LOF activity from missense variants better than dedicated in silico missense predictors. Overall, we find that AlphaMissense ranks highly, with an average AUC= 0.890 (95 % CI 0.886-0.895) from ordered regions across these four cancer genes, compared to all other in silico missense predictors present in the dbNSFP database.The study reveals that generative AI protein predicted structures can outperform experimentally-derived structures in evaluating LOF activity from predicted protein stability in ordered regions of genes BRCA1, BRCA2, PALB2 and RAD51C. The study also highlights the predictive performance of AlphaMissense as the premier in silico missense prediction method to predict LOF activity from missense variants in these four tumor suppressor breast cancer genes. The code for this study can be downloaded for free on GitHub (https://github.com/rohandavidg/CarePred).© 2024 The Authors.