整合TCGA和单细胞测序数据用于结直肠癌的10基因预后风险评估模型。
Integrating TCGA and single-cell sequencing data for colorectal cancer: a 10-gene prognostic risk assessment model.
发表日期:2023 Sep 13
作者:
Di Lu, Xiaofang Li, Yuan Yuan, Yaqi Li, Jiannan Wang, Qian Zhang, Zhiyu Yang, Shanjun Gao, Xiulei Zhang, Bingxi Zhou
来源:
GENES & DEVELOPMENT
摘要:
结直肠癌代表着一个重大的健康威胁,然而早期临床评估和预后的标准化方法仍然难以捉摸。本研究通过使用Seurat软件包来分析一个单细胞测序数据集(GSE178318),以确定表征不同细胞亚群的独特标记基因,从而弥补了这一差距。通过在癌症基因组图谱(TCGA)数据库中进行CIBERSORT分析,我们发现结直肠癌数据的细胞亚群和预后价值之间存在显著差异。我们运用WGCNA技术确定与这些细胞亚群强相关的模块,并利用survival包的coxph方法筛选这些模块中的基因。根据这些筛选出的基因,我们进一步对TCGA数据集进行分层,结果显示不同基因亚型之间存在显著差异。通过生存分析严格评估这些差异表达基因的预后相关性,并使用LASSO回归建立预后因素的模型。我们最终得到的模型是通过基于这些差异表达基因和LASSO回归的10个基因标记,准确预测临床预后,即使在外部数据集中也能得到验证。具体来说,C7亚群的自然杀伤细胞与结直肠癌的生存和预后有显著关联,这一发现在TCGA数据库中得到了证实。这些发现凸显了一个整合了单细胞测序的见解和TCGA数据的10个基因标记预后风险评估模型的潜力,有效估计与结直肠癌相关的风险。© 2023. Springer Science+Business Media, LLC.
Colorectal cancer represents a significant health threat, yet a standardized method for early clinical assessment and prognosis remains elusive. This study sought to address this gap by using the Seurat package to analyze a single-cell sequencing dataset (GSE178318) of colorectal cancer, thereby identifying distinctive marker genes characterizing various cell subpopulations. Through CIBERSORT analysis of colorectal cancer data within The Cancer Genome Atlas (TCGA) database, significant differences existed in both cell subpopulations and prognostic values. Employing WGCNA, we pinpointed modules exhibiting strong correlations with these subpopulations, subsequently utilizing the survival package coxph to isolate genes within these modules. Further stratification of TCGA dataset based on these selected genes brought to light notable variations between subtypes. The prognostic relevance of these differentially expressed genes was rigorously assessed through survival analysis, with LASSO regression employed for modeling prognostic factors. Our resulting model, anchored by a 10-gene signature originating from these differentially expressed genes and LASSO regression, proved adept at accurately predicting clinical prognoses, even when tested against external datasets. Specifically, natural killer cells from the C7 subpopulation were found to bear significant associations with colorectal cancer survival and prognosis, as observed within the TCGA database. These findings underscore the promise of an integrated 10-gene signature prognostic risk assessment model, harmonizing single-cell sequencing insights with TCGA data, for effectively estimating the risk associated with colorectal cancer.© 2023. Springer Science+Business Media, LLC.