肺癌患者的生存分析:Cox 回归和机器学习模型的比较。
Survival analysis for lung cancer patients: A comparison of Cox regression and machine learning models.
发表日期:2024 Aug 26
作者:
Sebastian Germer, Christiane Rudolph, Louisa Labohm, Alexander Katalinic, Natalie Rath, Katharina Rausch, Bernd Holleczek, , Heinz Handels
来源:
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS
摘要:
基于癌症登记数据的生存分析对于监测医疗保健的有效性至关重要。随着新方法的出现,适用于癌症登记数据的统计工具不断增多。近年来,开发了用于生存分析的机器学习方法。本研究的目的是比较成熟的 Cox 回归和新颖的机器学习方法在以前未使用的数据集上的模型性能。该研究基于石勒苏益格-荷尔斯泰因癌症登记处的肺癌数据。比较了四种生存分析模型:Cox 比例风险回归 (CoxPH) 作为最常用的统计模型,以及随机生存森林 (RSF) 和基于 DeepSurv 和 TabNet 方法的两种神经网络架构。使用一致性指数 (C-I)、Brier 评分和 AUC-ROC 评分对模型进行评估。此外,为了更深入地了解模型的决策过程,我们使用排列特征重要性评分和 SHAP 值确定了对患者生存影响较大的特征。使用包括国际癌症联盟建立的癌症分期的数据集对照(UICC),表现最好的模型是 CoxPH(C-I:0.698±0.005),而使用包括肿瘤大小、淋巴结和转移状态(TNM)的数据集导致 RSF 成为表现最好的模型(C-I:0.703) ±0.004)。可解释性指标表明,该模型首先依赖于 UICC 分期和转移状态的组合,这与其他研究相对应。所研究的方法对于流行病学研究人员创建更准确的生存模型具有高度相关性,可以帮助医生做出明智的决策关于肺癌患者适当治疗和管理的决策,最终提高生存率和生活质量。版权所有 © 2024。由 Elsevier B.V. 出版
Survival analysis based on cancer registry data is of paramount importance for monitoring the effectiveness of health care. As new methods arise, the compendium of statistical tools applicable to cancer registry data grows. In recent years, machine learning approaches for survival analysis were developed. The aim of this study is to compare the model performance of the well established Cox regression and novel machine learning approaches on a previously unused dataset.The study is based on lung cancer data from the Schleswig-Holstein Cancer Registry. Four survival analysis models are compared: Cox Proportional Hazard Regression (CoxPH) as the most commonly used statistical model, as well as Random Survival Forests (RSF) and two neural network architectures based on the DeepSurv and TabNet approaches. The models are evaluated using the concordance index (C-I), the Brier score and the AUC-ROC score. In addition, to gain more insight in the decision process of the models, we identified the features that have an higher impact on patient survival using permutation feature importance scores and SHAP values.Using a dataset including the cancer stage established by the Union for International Cancer Control (UICC), the best performing model is the CoxPH (C-I: 0.698±0.005), while using a dataset which includes the tumor size, lymph node and metastasis status (TNM) leads to the RSF as best performing model (C-I: 0.703±0.004). The explainability metrics show that the models rely on the combined UICC stage and the metastasis status in the first place, which corresponds to other studies.The studied methods are highly relevant for epidemiological researchers to create more accurate survival models, which can help physicians make informed decisions about appropriate therapies and management of patients with lung cancer, ultimately improving survival and quality of life.Copyright © 2024. Published by Elsevier B.V.