研究动态
Articles below are published ahead of final publication in an issue. Please cite articles in the following format: authors, (year), title, journal, DOI.

早期诊断和个性化治疗是医疗领域中的关键问题,而合成数据建模是一种新颖的视觉学习方法。

Early diagnosis and personalised treatment focusing on synthetic data modelling: Novel visual learning approach in healthcare.

发表日期:2023 Aug 02
作者: Ahsanullah Yunas Mahmoud, Daniel Neagu, Daniele Scrimieri, Amr Rashad Ahmed Abdullatif
来源: COMPUTERS IN BIOLOGY AND MEDICINE

摘要:

通过机器学习的手段可以促进疾病的早期诊断和个体化治疗。数据的质量对诊断有影响,因为医疗数据通常是稀疏的、不平衡的,并且包含了无关属性,导致诊断效果不佳。为了解决数据挑战的影响,改进资源分配并实现更好的健康结果,提出了一种新颖的可视化学习方法。本研究通过确定需要更少或更多的合成数据以改善数据集的质量(如观测数量和特征)来对可视化学习方法做出贡献,根据预定的个体化治疗和早期诊断。此外,进行了许多可视化实验,包括使用统计特征、累积和、直方图、相关矩阵、均方根误差和主成分分析等方法来可视化原始和合成数据,以解决数据挑战。选择了癌症、心脏病、糖尿病、冷冻疗法和免疫疗法等真实的医疗数据作为案例研究。作为准确性、敏感性和特异性等方面的分类基准和比较点,实施了几种模型,如k-最近邻和随机森林。为了模拟算法实施和数据,使用生成对抗网络来创建和操作合成数据,而随机森林则用于对数据进行分类。通过结合生成对抗网络和随机森林模型构建了一个可调整和适应的系统模型。该系统模型提供了工作步骤、概述和流程图。实验证明,大多数数据增强场景允许在数据分析的第一阶段应用可视化学习作为一种新颖的方法。为了在维护统计特性的同时实现适当质量数据和最佳分类性能之间的有意义的适应性协同作用,可视化学习为研究人员和从业人员提供了实用的人机协同机器学习可视化工具。在实施算法之前,可使用可视化学习方法来实现早期和个体化诊断。对于免疫疗法数据,随机森林表现出的精确度、召回率、F1分数、准确度、敏感性和特异性分别为81%、82%、81%、88%、95%和60%,而对于合成数据则分别为91%、96%、93%、93%、96%和73%。未来的研究可能会探索平衡医疗数据的数量和质量的最佳策略。版权所有 © 2023 The Author(s). Published by Elsevier Ltd.. All rights reserved.
The early diagnosis and personalised treatment of diseases are facilitated by machine learning. The quality of data has an impact on diagnosis because medical data are usually sparse, imbalanced, and contain irrelevant attributes, resulting in suboptimal diagnosis. To address the impacts of data challenges, improve resource allocation, and achieve better health outcomes, a novel visual learning approach is proposed. This study contributes to the visual learning approach by determining whether less or more synthetic data are required to improve the quality of a dataset, such as the number of observations and features, according to the intended personalised treatment and early diagnosis. In addition, numerous visualisation experiments are conducted, including using statistical characteristics, cumulative sums, histograms, correlation matrix, root mean square error, and principal component analysis in order to visualise both original and synthetic data to address the data challenges. Real medical datasets for cancer, heart disease, diabetes, cryotherapy and immunotherapy are selected as case studies. As a benchmark and point of classification comparison in terms of such as accuracy, sensitivity, and specificity, several models are implemented such as k-Nearest Neighbours and Random Forest. To simulate algorithm implementation and data, Generative Adversarial Network is used to create and manipulate synthetic data, whilst, Random Forest is implemented to classify the data. An amendable and adaptable system is constructed by combining Generative Adversarial Network and Random Forest models. The system model presents working steps, overview and flowchart. Experiments reveal that the majority of data-enhancement scenarios allow for the application of visual learning in the first stage of data analysis as a novel approach. To achieve meaningful adaptable synergy between appropriate quality data and optimal classification performance while maintaining statistical characteristics, visual learning provides researchers and practitioners with practical human-in-the-loop machine learning visualisation tools. Prior to implementing algorithms, the visual learning approach can be used to actualise early, and personalised diagnosis. For the immunotherapy data, the Random Forest performed best with precision, recall, f-measure, accuracy, sensitivity, and specificity of 81%, 82%, 81%, 88%, 95%, and 60%, as opposed to 91%, 96%, 93%, 93%, 96%, and 73% for synthetic data, respectively. Future studies might examine the optimal strategies to balance the quantity and quality of medical data.Copyright © 2023 The Author(s). Published by Elsevier Ltd.. All rights reserved.