利用机器学习与传统方法评估早期乳腺癌患者的风险因素及生存率

Evaluation of risk factors and survival rates of patients with early-stage breast cancer with machine learning and traditional methods

DOI 原文链接

用sci-hub下载

INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS

影响因子:4.1

分区:医学2区 / 卫生保健与服务2区计算机：信息系统3区医学：信息3区

发表日期:2024 Oct

作者: Emrah Gökay Özgür, Ayse Ulgen, Sinan Uzun, Gülnaz Nural Bekiroğlu

DOI: 10.1016/j.ijmedinf.2024.105548

摘要

本文旨在根据临床表现和病理信息，利用Cox比例风险模型（CPH）、机器学习技术和加速失效时间（AFT）模型对早期乳腺癌患者的预后因素进行预测和比较。研究分三个阶段进行：第一阶段采用CPH分析；第二阶段应用AFT模型；第三阶段使用多种机器学习方法。数据集包括697名于1994年1月1日至2009年12月31日间在马尔马拉大学医院肿瘤科就诊的乳腺癌患者。通过不同参数构建模型，并根据C指数、5年生存率和10年生存率进行比较。分析结果显示，MetLN（淋巴结转移）和年龄在CPH和AFT模型中为显著风险因素，而在机器学习模型中，MetLN、年龄、肿瘤大小、LV1和包膜外侵犯被识别为风险因素。模型的C指数分别为69.8（CPH）、70.36（AFT）、72.1（随机生存森林）和72.8（梯度提升机）。研究强调对比传统统计方法与机器学习算法以提升早期乳腺癌预后风险因素识别的准确性。此外，应努力提高机器学习模型的可解释性，以便临床实践者能有效理解和应用结果，从而实现更科学的决策和个体化治疗。

Abstract

This article is aimed to make predictions in terms of prognostic factors and compare prediction methods by using Cox proportional hazards regression analysis (CPH), some machine learning techniques and Accelerated Failure Time (AFT) model for post-treatment survival probabilities according to clinical presentations and pathological information of early-stage breast cancer patients.The study was carried out in three stages. In the first stage, the CPH method was applied. In the second stage, the AFT model and in the last stage, machine learning methods were applied. The data set consists of 697 breast cancer patients who applied to Marmara University Hospital oncology clinic between 01.01.1994 and 31.12.2009. The models obtained by using various parameters of the patients were compared according to the C index, 5-year survival rate and 10-year survival rate.According to the models obtained as a result of the analyses applied, MetLN and age were obtained as a significant risk factor as a result of CPH method and AFT methods, while MetLN, age, tumor size, LV1 and extracapsular involvement were obtained as risk factors in machine learning methods. In addition, when the c-index values of the handheld models are examined, it is obtained as 69.8 for the CPH model, 70.36 for the AFT model, 72.1 for the random survival forest and 72.8 for the gradient boosting machine. In conclusion, the study highlights the potential of comparing conventional statistical methods and machine-learning algorithms to improve the precision of risk factor determination in early-stage breast cancer prognosis. Additionally, efforts should be made to enhance the interpretability of machine-learning models, ensuring that the results obtained can be effectively communicated and utilized by clinical practitioners. This would enable more informed decision-making and personalized care in the treatment and follow-up processes for early-stage breast cancer patients.