乌干达接受抗逆转录病毒治疗的艾滋病毒感染者的癌前宫颈癌病变预测:监督机器学习算法的比较。
Prediction of precancerous cervical cancer lesions among women living with HIV on antiretroviral therapy in Uganda: a comparison of supervised machine learning algorithms.
发表日期:2024 Jul 08
作者:
Florence Namalinzi, Kefas Rimamnuskeb Galadima, Robinah Nalwanga, Isaac Sekitoleko, Leon Fidele Ruganzu Uwimbabazi
来源:
Disease Models & Mechanisms
摘要:
宫颈癌 (CC) 是女性中最常见的癌症类型之一,在低收入和中等收入国家 (LMIC) 中患病率最高。如果及早发现,这是一种可以治愈的疾病。机器学习 (ML) 技术有助于早期检测和预测,从而降低筛查和治疗成本。这项研究的重点是乌干达感染艾滋病毒 (WLHIV) 的女性。其目的是确定 CC 的最佳预测因子以及最能预测 WLHIV 中 CC 的监督 ML 模型。使用的辅助数据包括来自乌干达中部三个卫生机构的 3025 名妇女。使用多元二元逻辑回归和随机森林递归特征消除 (RFERF) 来识别最佳预测变量。五个型号;应用逻辑回归 (LR)、随机森林 (RF)、K 最近邻 (KNN)、支持向量机 (SVM) 和多层感知器 (MLP) 来识别表现优异者。使用混淆矩阵和受试者工作特征曲线下面积(AUC/ROC)来评估模型。结果显示,抗逆转录病毒治疗(ART)持续时间、WHO临床分期、TPT状态、病毒载量状态和计划生育两种技术共同选择,因此在 CC 预测中非常重要。 RFERF 选择的特征的 RF 优于其他模型,最高得分为 90% 准确度和 0.901 AUC。早期识别 CC 和了解危险因素有助于控制疾病。无论使用何种选择技术,RF 均优于其他应用的模型。未来的研究可以扩展到包括未接受 ART 的女性来预测 CC。© 2024。作者。
Cervical cancer (CC) is among the most prevalent cancer types among women with the highest prevalence in low- and middle-income countries (LMICs). It is a curable disease if detected early. Machine learning (ML) techniques can aid in early detection and prediction thus reducing screening and treatment costs. This study focused on women living with HIV (WLHIV) in Uganda. Its aim was to identify the best predictors of CC and the supervised ML model that best predicts CC among WLHIV.Secondary data that included 3025 women from three health facilities in central Uganda was used. A multivariate binary logistic regression and recursive feature elimination with random forest (RFERF) were used to identify the best predictors. Five models; logistic regression (LR), random forest (RF), K-Nearest neighbor (KNN), support vector machine (SVM), and multi-layer perceptron (MLP) were applied to identify the out-performer. The confusion matrix and the area under the receiver operating characteristic curve (AUC/ROC) were used to evaluate the models.The results revealed that duration on antiretroviral therapy (ART), WHO clinical stage, TPT status, Viral load status, and family planning were commonly selected by the two techniques and thus highly significant in CC prediction. The RF from the RFERF-selected features outperformed other models with the highest scores of 90% accuracy and 0.901 AUC.Early identification of CC and knowledge of the risk factors could help control the disease. The RF outperformed other models applied regardless of the selection technique used. Future research can be expanded to include ART-naïve women in predicting CC.© 2024. The Author(s).