QIGTD:基于张量分解的肺腺癌演化关键基因识别方法
QIGTD: identifying critical genes in the evolution of lung adenocarcinoma with tensor decomposition
DOI 原文链接
用sci-hub下载
如无法下载,请从 Sci-Hub 选择可用站点尝试。
影响因子:6.1
分区:生物学3区 / 数学与计算生物学3区
发表日期:2024 Sep 04
作者:
Bolin Chen, Jinlei Zhang, Ci Shao, Jun Bian, Ruiming Kang, Xuequn Shang
DOI:
10.1186/s13040-024-00386-w
摘要
识别关键基因对于理解复杂疾病的发病机制至关重要。传统研究多比较正常与疾病样本之间的生物分子变化,或从单一静态的生物分子网络中检测重要节点,往往忽略不同疾病阶段之间的动态变化。然而,研究生物分子网络的时间变化并识别关键基因对于理解疾病的发生与发展具有重要意义。本文提出了一种新方法——基于张量分解的基因重要性量化(QIGTD)。该方法首先通过整合时间内和时间间的网络信息构建时间序列网络,保持邻接阶段网络之间的连接性,利用局部相似性。采用张量描述此时间序列网络的连接关系,并提出一种三阶张量分解方法,以捕获每个网络快照的拓扑信息及整个网络的时间序列特性。QIGTD是一种无需学习且高效的方法,适用于样本较少的数据集。我们在肺腺癌(LUAD)数据集上评估了其效果,并与三种先进方法(T-degree, T-closeness, T-betweenness)进行了对比。结果显示,QIGTD在精确度和平均精度均优于这些方法。在前50个关键基因中,29个已由DisGeNET数据库验证与LUAD高度相关,36个在LUAD相关的基因本体(GO)术语中显著富集,包括核分裂、有丝分裂核分裂、染色体分离、细胞器分裂及有丝分裂姐妹染色单体分离。综上,QIGTD有效捕捉基因网络的时间变化,识别出关键基因,为研究生物网络的时间动态提供了有价值的工具,有助于理解如LUAD等疾病的潜在机制。
Abstract
Identifying critical genes is important for understanding the pathogenesis of complex diseases. Traditional studies typically comparing the change of biomecules between normal and disease samples or detecting important vertices from a single static biomolecular network, which often overlook the dynamic changes that occur between different disease stages. However, investigating temporal changes in biomolecular networks and identifying critical genes is critical for understanding the occurrence and development of diseases.A novel method called Quantifying Importance of Genes with Tensor Decomposition (QIGTD) was proposed in this study. It first constructs a time series network by integrating both the intra and inter temporal network information, which preserving connections between networks at adjacent stages according to the local similarities. A tensor is employed to describe the connections of this time series network, and a 3-order tensor decomposition method was proposed to capture both the topological information of each network snapshot and the time series characteristics of the whole network. QIGTD is also a learning-free and efficient method that can be applied to datasets with a small number of samples.The effectiveness of QIGTD was evaluated using lung adenocarcinoma (LUAD) datasets and three state-of-the-art methods: T-degree, T-closeness, and T-betweenness were employed as benchmark methods. Numerical experimental results demonstrate that QIGTD outperforms these methods in terms of the indices of both precision and mAP. Notably, out of the top 50 genes, 29 have been verified to be highly related to LUAD according to the DisGeNET Database, and 36 are significantly enriched in LUAD related Gene Ontology (GO) terms, including nuclear division, mitotic nuclear division, chromosome segregation, organelle fission, and mitotic sister chromatid segregation.In conclusion, QIGTD effectively captures the temporal changes in gene networks and identifies critical genes. It provides a valuable tool for studying temporal dynamics in biological networks and can aid in understanding the underlying mechanisms of diseases such as LUAD.