研究动态
Articles below are published ahead of final publication in an issue. Please cite articles in the following format: authors, (year), title, journal, DOI.

采用KNN填充SMOTE特征和多模型集成学习方法来提高宫颈癌预测

Improving Prediction of Cervical Cancer Using KNN Imputed SMOTE Features and Multi-Model Ensemble Learning Approach.

发表日期:2023 Sep 04
作者: Hanen Karamti, Raed Alharthi, Amira Al Anizi, Reemah M Alhebshi, Ala' Abdulmajid Eshmawi, Shtwai Alsubai, Muhammad Umer
来源: Cancers

摘要:

目标:宫颈癌在发展中国家的女性中居死因之首。保证最佳医疗指导下的早期鉴别和治疗,是减少宫颈癌后遗症的最重要程序。通过观察宫颈抹片图像是检测此类恶性肿瘤的最佳方法之一。对于自动检测宫颈癌,现有数据集中经常存在缺失值,这可能严重影响机器学习模型的性能。方法:为解决这些挑战,本研究提出了一个自动化系统,通过SMOTE特征高效处理缺失值以实现高精度的宫颈癌预测。所提出的系统采用了叠加集成投票分类器模型,结合三个机器学习模型以及KNN填充和SMOTE上采样特征来处理缺失值。结果:在使用KNN填充的SMOTE特征下,所提出的模型实现了99.99%的准确度、99.99%的精确度、99.99%的召回率和99.99%的F1得分。本研究比较了所提出模型在四种情况下与多个其他机器学习算法的性能:删除缺失值、使用KNN插补、使用SMOTE特征以及使用KNN填充的SMOTE特征。本研究验证了所提出模型在检测宫颈癌数据收集中缺失值和类别不平衡问题上的有效性,并可以帮助医务人员及时发现和为宫颈癌患者提供更好的护理。
Objective: Cervical cancer ranks among the top causes of death among females in developing countries. The most important procedures that should be followed to guarantee the minimizing of cervical cancer's aftereffects are early identification and treatment under the finest medical guidance. One of the best methods to find this sort of malignancy is by looking at a Pap smear image. For automated detection of cervical cancer, the available datasets often have missing values, which can significantly affect the performance of machine learning models. Methods: To address these challenges, this study proposes an automated system for predicting cervical cancer that efficiently handles missing values with SMOTE features to achieve high accuracy. The proposed system employs a stacked ensemble voting classifier model that combines three machine learning models, along with KNN Imputer and SMOTE up-sampled features for handling missing values. Results: The proposed model achieves 99.99% accuracy, 99.99% precision, 99.99% recall, and 99.99% F1 score when using KNN imputed SMOTE features. The study compares the performance of the proposed model with multiple other machine learning algorithms under four scenarios: with missing values removed, with KNN imputation, with SMOTE features, and with KNN imputed SMOTE features. The study validates the efficacy of the proposed model against existing state-of-the-art approaches. Conclusions: This study investigates the issue of missing values and class imbalance in the data collected for cervical cancer detection and might aid medical practitioners in timely detection and providing cervical cancer patients with better care.