一种新型的深度机器学习算法,具有尺寸和尺寸降低方法的特征消除方法:甲状腺癌诊断,随机缺少数据
A novel deep machine learning algorithm with dimensionality and size reduction approaches for feature elimination: thyroid cancer diagnoses with randomly missing data
影响因子:7.70000
分区:生物学2区 / 数学与计算生物学1区 生化研究方法2区
发表日期:2024 May 23
作者:
Onder Tutsoy, Hilmi Erdem Sumbul
摘要
即使最近开发了大量的检查工具,甲状腺癌事件仍会持续增加。由于甲状腺癌诊断没有标准和某些程序,因此临床医生需要进行各种测试。这种审查过程产生了多维大数据,缺乏通用方法会导致随机分布的丢失(稀疏)数据,这都是机器学习算法的巨大挑战。本文旨在开发一种准确且有效的深度学习算法来诊断甲状腺癌。在这方面,将随机分布的缺失数据纳入学习问题中的奇异性,并开发了内在和目标相似性方法的尺寸降低,以选择最有用的输入数据集。此外,使用层次聚类算法的尺寸减小,以消除相似的数据样本。培训了四种机器学习算法,并使用看不见的数据进行了测试,以验证其概括和稳健性。结果产生了100%的培训,并为看不见的数据进行了83%的测试。在相等条件下还检查了算法的计算时间效率。
Abstract
Thyroid cancer incidences endure to increase even though a large number of inspection tools have been developed recently. Since there is no standard and certain procedure to follow for the thyroid cancer diagnoses, clinicians require conducting various tests. This scrutiny process yields multi-dimensional big data and lack of a common approach leads to randomly distributed missing (sparse) data, which are both formidable challenges for the machine learning algorithms. This paper aims to develop an accurate and computationally efficient deep learning algorithm to diagnose the thyroid cancer. In this respect, randomly distributed missing data stemmed singularity in learning problems is treated and dimensionality reduction with inner and target similarity approaches are developed to select the most informative input datasets. In addition, size reduction with the hierarchical clustering algorithm is performed to eliminate the considerably similar data samples. Four machine learning algorithms are trained and also tested with the unseen data to validate their generalization and robustness abilities. The results yield 100% training and 83% testing preciseness for the unseen data. Computational time efficiencies of the algorithms are also examined under the equal conditions.