利用机器学习和传统方法评估早期乳腺癌患者的危险因素和生存率。
Evaluation of risk factors and survival rates of patients with early-stage breast cancer with machine learning and traditional methods.
发表日期:2024 Jul 11
作者:
Emrah Gökay Özgür, Ayse Ulgen, Sinan Uzun, Gülnaz Nural Bekiroğlu
来源:
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS
摘要:
本文旨在根据临床表现和治疗后生存概率,使用 Cox 比例风险回归分析 (CPH)、一些机器学习技术和治疗后生存概率的加速失效时间 (AFT) 模型,对预后因素进行预测并比较预测方法。早期乳腺癌患者的病理信息。该研究分三个阶段进行。第一阶段采用CPH方法。第二阶段应用AFT模型,最后阶段应用机器学习方法。该数据集包含 1994 年 1 月 1 日至 2009 年 12 月 31 日期间向马尔马拉大学医院肿瘤诊所提出申请的 697 名乳腺癌患者。根据C指数、5年生存率和10年生存率对使用患者的各种参数获得的模型进行比较。根据作为分析结果获得的模型,获得MetLN和年龄作为CPH 方法和 AFT 方法得出显着危险因素,而机器学习方法中获得 MetLN、年龄、肿瘤大小、LV1 和囊外受累作为危险因素。此外,当检查手持模型的c指数值时,CPH模型为69.8,AFT模型为70.36,随机生存森林为72.1,梯度提升机为72.8。总之,该研究强调了比较传统统计方法和机器学习算法在提高早期乳腺癌预后中危险因素确定精度方面的潜力。此外,应努力增强机器学习模型的可解释性,确保所获得的结果能够被临床从业者有效地传达和利用。这将使早期乳腺癌患者的治疗和随访过程中能够做出更明智的决策和个性化护理。版权所有 © 2024 Elsevier B.V. 保留所有权利。
This article is aimed to make predictions in terms of prognostic factors and compare prediction methods by using Cox proportional hazards regression analysis (CPH), some machine learning techniques and Accelerated Failure Time (AFT) model for post-treatment survival probabilities according to clinical presentations and pathological information of early-stage breast cancer patients.The study was carried out in three stages. In the first stage, the CPH method was applied. In the second stage, the AFT model and in the last stage, machine learning methods were applied. The data set consists of 697 breast cancer patients who applied to Marmara University Hospital oncology clinic between 01.01.1994 and 31.12.2009. The models obtained by using various parameters of the patients were compared according to the C index, 5-year survival rate and 10-year survival rate.According to the models obtained as a result of the analyses applied, MetLN and age were obtained as a significant risk factor as a result of CPH method and AFT methods, while MetLN, age, tumor size, LV1 and extracapsular involvement were obtained as risk factors in machine learning methods. In addition, when the c-index values of the handheld models are examined, it is obtained as 69.8 for the CPH model, 70.36 for the AFT model, 72.1 for the random survival forest and 72.8 for the gradient boosting machine. In conclusion, the study highlights the potential of comparing conventional statistical methods and machine-learning algorithms to improve the precision of risk factor determination in early-stage breast cancer prognosis. Additionally, efforts should be made to enhance the interpretability of machine-learning models, ensuring that the results obtained can be effectively communicated and utilized by clinical practitioners. This would enable more informed decision-making and personalized care in the treatment and follow-up processes for early-stage breast cancer patients.Copyright © 2024 Elsevier B.V. All rights reserved.