用于早期肺癌诊断的机器学习衍生的外周血转录组生物标志物：揭示肿瘤免疫相互作用机制。

Machine learning-derived peripheral blood transcriptomic biomarkers for early lung cancer diagnosis: Unveiling tumor-immune interaction mechanisms.

Original text

发表日期：2024 Oct 16

作者： Xiaohua Li, Xuebing Li, Jiangyue Qin, Lei Lei, Hua Guo, Xi Zheng, Xuefeng Zeng

来源： BIOFACTORS

摘要：

肺癌仍然是全球癌症相关死亡的主要原因。早期检测和全面了解肿瘤-免疫相互作用对于改善患者的治疗效果至关重要。这项研究旨在开发一种新型生物标志物组合，利用外周血转录组学和机器学习算法进行早期肺癌诊断，同时提供对肿瘤免疫串扰机制的见解。利用训练队列 (GSE135304)，我们采用多种机器学习算法，根据外周血转录组特征制定肺癌诊断评分 (LCDS)。 LCDS 模型的性能使用多个验证队列（GSE42834、GSE157086 和内部数据集）中的受试者工作特征 (ROC) 曲线 (AUC) 下面积进行评估。外周血样本取自 20 名肺癌患者和 10 名健康对照受试者，代表在成都市第六人民医院招募的内部队列。我们采用先进的生物信息学技术，通过全面的免疫浸润和通路富集分析来探索肿瘤-免疫相互作用。初步筛选确定了 844 个差异表达基因，随后使用 Boruta 特征选择算法将其细化为 87 个基因。随机森林 (RF) 算法在构建 LCDS 模型方面表现出最高的准确度，平均 AUC 为 0.938。较低的 LCDS 值与免疫评分升高以及 CD4 和 CD8 T 细胞浸润增加显着相关，表明抗肿瘤免疫反应增强。较高的 LCDS 分数与缺氧、过氧化物酶体增殖物激活受体 (PPAR) 和 Toll 样受体 (TLR) 信号通路的激活以及 DNA 损伤修复通路分数的降低相关。我们的研究提出了一种新型的、机器学习衍生的外周血转录组生物标志物组，在早期肺癌诊断中具有潜在的应用。 LCDS 模型不仅在区分肺癌患者和健康个体方面表现出很高的准确性，而且还为肿瘤免疫相互作用和潜在的癌症生物学提供了宝贵的见解。这种方法可能有助于早期肺癌检测，并有助于更深入地了解肿瘤免疫串扰背后的分子和细胞机制。此外，我们关于 LCDS 和免疫浸润模式之间关系的研究结果可能会对未来针对肺癌免疫系统的治疗策略的研究产生影响。© 2024 作者。 BioFactors 由 Wiley periodicals LLC 代表国际生物化学和分子生物学联盟出版。

Lung cancer continues to be the leading cause of cancer-related mortality worldwide. Early detection and a comprehensive understanding of tumor-immune interactions are crucial for improving patient outcomes. This study aimed to develop a novel biomarker panel utilizing peripheral blood transcriptomics and machine learning algorithms for early lung cancer diagnosis, while simultaneously providing insights into tumor-immune crosstalk mechanisms. Leveraging a training cohort (GSE135304), we employed multiple machine learning algorithms to formulate a Lung Cancer Diagnostic Score (LCDS) based on peripheral blood transcriptomic features. The LCDS model's performance was evaluated using the area under the receiver operating characteristic (ROC) curve (AUC) in multiple validation cohorts (GSE42834, GSE157086, and an in-house dataset). Peripheral blood samples were obtained from 20 lung cancer patients and 10 healthy control subjects, representing an in-house cohort recruited at the Sixth People's Hospital of Chengdu. We employed advanced bioinformatics techniques to explore tumor-immune interactions through comprehensive immune infiltration and pathway enrichment analyses. Initial screening identified 844 differentially expressed genes, which were subsequently refined to 87 genes using the Boruta feature selection algorithm. The random forest (RF) algorithm demonstrated the highest accuracy in constructing the LCDS model, yielding a mean AUC of 0.938. Lower LCDS values were significantly associated with elevated immune scores and increased CD4+ and CD8+ T-cell infiltration, indicative of enhanced antitumor-immune responses. Higher LCDS scores correlated with activation of hypoxia, peroxisome proliferator-activated receptor (PPAR), and Toll-like receptor (TLR) signaling pathways, as well as reduced DNA damage repair pathway scores. Our study presents a novel, machine learning-derived peripheral blood transcriptomic biomarker panel with potential applications in early lung cancer diagnosis. The LCDS model not only demonstrates high accuracy in distinguishing lung cancer patients from healthy individuals but also offers valuable insights into tumor-immune interactions and underlying cancer biology. This approach may facilitate early lung cancer detection and contribute to a deeper understanding of the molecular and cellular mechanisms underlying tumor-immune crosstalk. Furthermore, our findings on the relationship between LCDS and immune infiltration patterns may have implications for future research on therapeutic strategies targeting the immune system in lung cancer.© 2024 The Author(s). BioFactors published by Wiley Periodicals LLC on behalf of International Union of Biochemistry and Molecular Biology.