使用机器学习结合遗传和临床病理因素的放射性肺炎综合预测模型。
Integrative prediction model for radiation pneumonitis incorporating genetic and clinical-pathological factors using machine learning.
发表日期:2024 Sep
作者:
Seo Hee Choi, Euidam Kim, Seok-Jae Heo, Mi Youn Seol, Yoonsun Chung, Hong In Yoon
来源:
GENES & DEVELOPMENT
摘要:
我们的目的是通过整合相关的临床病理和遗传因素,考虑临床、剂量学参数和 TGF-β1 通路中基因的单核苷酸多态性 (SNP) 的关联,开发基于机器学习的严重放射性肺炎 (RP) 预测模型我们前瞻性地招募了 59 名接受放射治疗的原发性肺癌患者,并分析了治疗前的血液样本、临床病理/剂量学变量以及 TGFβ 通路基因中的 11 个功能性 SNP。使用合成少数过采样技术 (SMOTE) 和嵌套交叉验证,我们开发了一种基于机器学习的严重 RP(等级 ≥ 2)预测模型。使用四种方法(基于过滤、基于包装、嵌入和逻辑回归)进行特征选择,并使用三种机器学习模型评估性能。中位随访时间为 39.7 个月,20.3% 的患者出现严重 RP 。在我们的最终模型中,年龄(> 66 岁)、吸烟史、PTV 容量(> 300 cc)和 BMP2 rs1979855 中的 AG/GG 基因型被确定为最重要的预测因子。此外,与单独使用临床病理变量相比,将基因组变量与临床病理变量结合起来进行预测可显着提高 AUC(0.822 与 0.741,p = 0.029)。使用基于包装器的方法和逻辑模型选择相同的特征集,展示了所有机器学习模型的最佳性能(AUC:分别为 XGBoost 0.815、RF 0.805、SVM 0.712)。我们成功开发了基于机器学习的预测RP 模型,证明年龄、吸烟史、PTV 量和 BMP2 rs1979855 基因型是重要的预测因子。值得注意的是,与单独的临床病理因素相比,整合 SNP 数据可显着增强预测性能。© 2024 作者。
We aimed to develop a machine learning-based prediction model for severe radiation pneumonitis (RP) by integrating relevant clinicopathological and genetic factors, considering the associations of clinical, dosimetric parameters, and single nucleotide polymorphisms (SNPs) of genes in the TGF-β1 pathway with RP.We prospectively enrolled 59 primary lung cancer patients undergoing radiotherapy and analyzed pretreatment blood samples, clinicopathological/dosimetric variables, and 11 functional SNPs in TGFβ pathway genes. Using the Synthetic Minority Over-sampling Technique (SMOTE) and nested cross-validation, we developed a machine learning-based prediction model for severe RP (grade ≥ 2). Feature selection was conducted using four methods (filtered-based, wrapper-based, embedded, and logistic regression), and performance was evaluated using three machine learning models.Severe RP occurred in 20.3 % of patients with a median follow-up of 39.7 months. In our final model, age (>66 years), smoking history, PTV volume (>300 cc), and AG/GG genotype in BMP2 rs1979855 were identified as the most significant predictors. Additionally, incorporating genomic variables for prediction alongside clinicopathological variables significantly improved the AUC compared to using clinicopathological variables alone (0.822 vs. 0.741, p = 0.029). The same feature set was selected using both the wrapper-based method and logistic model, demonstrating the best performance across all machine learning models (AUC: XGBoost 0.815, RF 0.805, SVM 0.712, respectively).We successfully developed a machine learning-based prediction model for RP, demonstrating age, smoking history, PTV volume, and BMP2 rs1979855 genotype as significant predictors. Notably, incorporating SNP data significantly enhanced predictive performance compared to clinicopathological factors alone.© 2024 The Author(s).