人工智能基于个性化预测对结直肠癌患者的生存状况进行分析。
Artificial intelligence based personalized predictive survival among colorectal cancer patients.
发表日期:2023 Feb 21
作者:
David Susič, Shabbir Syed-Abdul, Erik Dovgan, Jitendra Jonnagaddala, Anton Gradišek
来源:
Comput Meth Prog Bio
摘要:
结直肠癌是一个重要的健康问题。它现在是全球第三常见的癌症,第四大癌症死因。本研究旨在评估机器学习算法在诊断后1-5年内预测结直肠癌患者生存率的性能,并确定最重要的变量。本研究使用了1236名诊断为结直肠癌的患者和118个预测变量。感兴趣的结果是一个二元变量,指示患者是否在相应的年份存活。使用与结果的互信息得分选择了20个预测变量。我们实施了11个机器学习算法,并使用分层折叠交叉验证和配对学生t检验来评估它们的性能。我们将结果与卡潘-迈尔(Kaplan-Meier)估计和考克斯比例风险回归进行了比较。
对于每个存活年份的最重要的20个预测变量,逻辑回归算法的接收器操作特性曲线下面积分别为0.850(0.014SD,95%CI为0.840-0.860)和0.872(0.014SD,95%CI为0.861-0.882)。使用最重要的5个预测变量,相应的值分别为0.793(0.020SD,95%CI为0.778-0.807)和0.794(0.011SD,95%CI为0.785-0.802)。预测1年的最重要变量是R残余数,M远处转移,总体阶段,5年内可能的复发和肿瘤长度,而预测5年的最重要变量是5年内可能的复发,R残余数,M远处转移,阳性淋巴结数量和姑息化疗。生物标志物似乎不在前20个最重要的变量之中。对于所有生存区间,顶级模型的概率与Kaplan-Meier估计一致,均在一倍标准差和95%置信区间内。
研究结果表明,机器学习算法可以预测结直肠癌患者的生存概率,并可用于协助临床护理管理中的决策。此外,该研究揭示了评估结直肠癌患者短期和长期生存的最重要变量。 版权所有©2023年作者。Elsevier B.V.出版,版权所有。
Colorectal cancer is a major health concern. It is now the third most common cancer and the fourth leading cause of cancer mortality worldwide. The aim of this study was to evaluate the performance of machine learning algorithms for predicting survival of colorectal cancer patients 1 to 5 years after diagnosis, and identify the most important variables.A sample of 1236 patients diagnosed with colorectal cancer and 118 predictor variables has been used. The outcome of interest was a binary variable indicating whether the patient survived the number of years in question or not. 20 predictor variables were selected using mutual information score with the outcome. We implemented 11 machine learning algorithms and evaluated their performance with a 5 by 2-fold cross-validation with stratified folds and with paired Student's t-tests. We compared the results with the Kaplan-Meier estimator and Cox's proportional hazard regression.Using the 20 most important predictor variables for each of the survival years, the logistic regression algorithm achieved an area under the receiver operating characteristic curve of 0.850 (0.014 SD, 0.840-0.860 95 % CI) for the 1-year, and 0.872 (0.014 SD, 0.861-0.882 95% CI) for the 5-year survival prediction. Using only the 5 most important predictor variables, the corresponding values are 0.793 (0.020 SD, 0.778-0.807 95% CI) and 0.794 (0.011 SD, 0.785-0.802 95% CI). The most important variables for 1-year prediction were number of R residual, M distant metastasis, overall stage, probable recurrence within 5 years, and tumour length, whereas for 5-year prediction the most important were probable recurrence within 5 years, R residual, M distant metastasis, number of positive lymph nodes, and palliative chemotherapy. Biomarkers do not appear among the top 20 most important ones. For all survival intervals, the probability of the top model agrees with the Kaplan-Meier estimate, both in the interval of one standard deviation and in the 95% confidence interval.The findings suggest that machine learning algorithms can predict the survival probability of colorectal cancer patients and can be used to inform the patients and assist decision-making in clinical care management. In addition, this study unveils the most essential variables for estimating survival short- and long-term among patients with Colorectal cancer.Copyright © 2023 The Author(s). Published by Elsevier B.V. All rights reserved.