通过机器学习和传统方法评估早期乳腺癌患者的危险因素和存活率
Evaluation of risk factors and survival rates of patients with early-stage breast cancer with machine learning and traditional methods
影响因子:4.10000
分区:医学2区 / 卫生保健与服务2区 计算机:信息系统3区 医学:信息3区
发表日期:2024 Oct
作者:
Emrah Gökay Özgür, Ayse Ulgen, Sinan Uzun, Gülnaz Nural Bekiroğlu
摘要
本文的目的是通过使用COX比例危害回归分析(CPH)来对预后因素进行预测,并根据早期乳腺癌患者的临床临床表现和病理信息进行比较,某些机器学习技术和加速失败时间(AFT)模型。在第一阶段,应用了CPH方法。在第二阶段,AFT模型和最后阶段,应用了机器学习方法。数据集由697名乳腺癌患者组成,他们在1994年1月1日至2009年12月31日之间向马尔马拉大学医院肿瘤学诊所申请。根据C指数,5年的存活率和10年生存率比较通过使用患者的各种参数获得的模型。根据应用分析,MetLN和年龄获得的模型,根据CPH方法和AFT方法,获得了重要的危险因素,而MetlN,年龄,thumor尺寸,lv1和extrac instrap instrap instrap Instraps Instraps Instraps Invers Invers cph方法。另外,当检查手持式模型的C-索引值时,CPH模型的获得为69.8,AFT模型为70.36,随机生存林为72.1,对于梯度增强机器72.8。总之,该研究强调了比较常规统计方法和机器学习算法的潜力,以提高早期乳腺癌预后中危险因素的确定精度。此外,应努力提高机器学习模型的可解释性,以确保临床从业人员可以有效地传达和利用所获得的结果。这将使在早期乳腺癌患者的治疗和随访过程中实现更明智的决策和个性化护理。
Abstract
This article is aimed to make predictions in terms of prognostic factors and compare prediction methods by using Cox proportional hazards regression analysis (CPH), some machine learning techniques and Accelerated Failure Time (AFT) model for post-treatment survival probabilities according to clinical presentations and pathological information of early-stage breast cancer patients.The study was carried out in three stages. In the first stage, the CPH method was applied. In the second stage, the AFT model and in the last stage, machine learning methods were applied. The data set consists of 697 breast cancer patients who applied to Marmara University Hospital oncology clinic between 01.01.1994 and 31.12.2009. The models obtained by using various parameters of the patients were compared according to the C index, 5-year survival rate and 10-year survival rate.According to the models obtained as a result of the analyses applied, MetLN and age were obtained as a significant risk factor as a result of CPH method and AFT methods, while MetLN, age, tumor size, LV1 and extracapsular involvement were obtained as risk factors in machine learning methods. In addition, when the c-index values of the handheld models are examined, it is obtained as 69.8 for the CPH model, 70.36 for the AFT model, 72.1 for the random survival forest and 72.8 for the gradient boosting machine. In conclusion, the study highlights the potential of comparing conventional statistical methods and machine-learning algorithms to improve the precision of risk factor determination in early-stage breast cancer prognosis. Additionally, efforts should be made to enhance the interpretability of machine-learning models, ensuring that the results obtained can be effectively communicated and utilized by clinical practitioners. This would enable more informed decision-making and personalized care in the treatment and follow-up processes for early-stage breast cancer patients.