一种新颖的深度机器学习算法,具有用于特征消除的降维和尺寸缩减方法:具有随机缺失数据的甲状腺癌诊断。
A novel deep machine learning algorithm with dimensionality and size reduction approaches for feature elimination: thyroid cancer diagnoses with randomly missing data.
发表日期:2024 May 23
作者:
Onder Tutsoy, Hilmi Erdem Sumbul
来源:
BRIEFINGS IN BIOINFORMATICS
摘要:
尽管最近开发了大量的检查工具,但甲状腺癌的发病率仍在持续增加。由于甲状腺癌的诊断没有标准且确定的程序可遵循,临床医生需要进行各种测试。这种审查过程会产生多维大数据,而缺乏通用方法会导致随机分布的缺失(稀疏)数据,这对机器学习算法来说都是巨大的挑战。本文旨在开发一种准确且计算高效的深度学习算法来诊断甲状腺癌。在这方面,处理学习问题中由奇异性引起的随机分布缺失数据,并开发利用内部和目标相似性方法进行降维以选择信息最丰富的输入数据集。此外,通过层次聚类算法进行尺寸缩减,以消除相当相似的数据样本。四种机器学习算法经过训练并使用未见过的数据进行测试,以验证其泛化和鲁棒性能力。结果对于未见过的数据产生 100% 的训练精度和 83% 的测试精度。还在同等条件下检查了算法的计算时间效率。© 作者 2024。由牛津大学出版社出版。
Thyroid cancer incidences endure to increase even though a large number of inspection tools have been developed recently. Since there is no standard and certain procedure to follow for the thyroid cancer diagnoses, clinicians require conducting various tests. This scrutiny process yields multi-dimensional big data and lack of a common approach leads to randomly distributed missing (sparse) data, which are both formidable challenges for the machine learning algorithms. This paper aims to develop an accurate and computationally efficient deep learning algorithm to diagnose the thyroid cancer. In this respect, randomly distributed missing data stemmed singularity in learning problems is treated and dimensionality reduction with inner and target similarity approaches are developed to select the most informative input datasets. In addition, size reduction with the hierarchical clustering algorithm is performed to eliminate the considerably similar data samples. Four machine learning algorithms are trained and also tested with the unseen data to validate their generalization and robustness abilities. The results yield 100% training and 83% testing preciseness for the unseen data. Computational time efficiencies of the algorithms are also examined under the equal conditions.© The Author(s) 2024. Published by Oxford University Press.