使用变分自动编码器和遗传算法进行循环 RNA 疾病关联预测的集成方法。
An ensemble approach for circular RNA-disease association prediction using variational autoencoder and genetic algorithm.
发表日期:2024 Aug 31
作者:
C M Salooja, Arjun Sanker, K Deepthi, A S Jereesh
来源:
Alzheimers & Dementia
摘要:
环状RNA(circRNA)是具有共价闭环结构的内源性非编码RNA。它们具有许多生物学功能,主要是调节功能。它们已被证明可以调节人类基因组中的蛋白质编码基因。 CircRNA 与多种疾病有关,如阿尔茨海默病、糖尿病、动脉粥样硬化、帕金森病和癌症。识别环状RNA与疾病之间的关联对于疾病的诊断、预防和治疗至关重要。所提出的模型基于变分自动编码器和遗传算法环状 RNA 疾病关联 (VAGA-CDA),预测新的 circRNA 疾病关联。首先,通过合成少数过采样技术 (SMOTE) 增强经过实验验证的 circRNA 疾病关联,并使用变分自动编码器重新生成,并通过遗传算法 (GA) 将特征选择应用于这些向量。变分自动编码器有效地从增强样本中提取特征。遗传算法的优化特征选择有效地进行了降维。然后将提取的复杂特征向量提供给随机森林分类器以预测新的 circRNA 疾病关联。该模型在 5 倍和 10 倍交叉验证下的 AUC 值分别为 0.9644 和 0.9628。案例研究的结果表明了所提出模型的稳健性。
Circular RNAs (circRNAs) are endogenous non-coding RNAs with a covalently closed loop structure. They have many biological functions, mainly regulatory ones. They have been proven to modulate protein-coding genes in the human genome. CircRNAs are linked to various diseases like Alzheimer's disease, diabetes, atherosclerosis, Parkinson's disease and cancer. Identifying the associations between circular RNAs and diseases is essential for disease diagnosis, prevention, and treatment. The proposed model, based on the variational autoencoder and genetic algorithm circular RNA disease association (VAGA-CDA), predicts novel circRNA-disease associations. First, the experimentally verified circRNA-disease associations are augmented with the synthetic minority oversampling technique (SMOTE) and regenerated using a variational autoencoder, and feature selection is applied to these vectors by a genetic algorithm (GA). The variational autoencoder effectively extracts features from the augmented samples. The optimized feature selection of the genetic algorithm effectively carried out dimensionality reduction. The sophisticated feature vectors extracted are then given to a Random Forest classifier to predict new circRNA-disease associations. The proposed model yields an AUC value of 0.9644 and 0.9628 under 5-fold and 10-fold cross-validations, respectively. The results of the case studies indicate the robustness of the proposed model.