研究动态
Articles below are published ahead of final publication in an issue. Please cite articles in the following format: authors, (year), title, journal, DOI.

ctGAN:将基因表达和生存数据与生成对抗网络相结合。

ctGAN: combined transformation of gene expression and survival data with generative adversarial network.

发表日期:2024 May 23
作者: Jaeyoon Kim, Junhee Seok
来源: BRIEFINGS IN BIOINFORMATICS

摘要:

最近的研究广泛使用深度学习算法来分析基因表达,以预测疾病诊断、治疗效果和生存结果。对癌症等高死亡率疾病的生存分析研究是必不可少的。然而,由于相对于大量基因而言样本量有限,深度学习模型受到过度拟合的困扰。因此,最新的风格转移深度生成模型已被用来生成基因表达数据。然而,这些模型在临床用途上的适用性受到限制,因为它们仅生成转录组数据。因此,本研究提出了 ctGAN,它能够使用生成对抗网络(GAN)对基因表达和生存数据进行组合转换。 ctGAN 通过乳腺癌和其他 11 种癌症类型之间的风格转换来增强数据,从而改善生存分析。我们评估了与之前模型相比的一致性指数(C-index)增强,以证明其优越性。在 11 种癌症类型中,有 9 种观察到性能改善。此外,ctGAN 在 11 种癌症类型中的 7 种中表现优于之前的模型,其中结肠腺癌 (COAD) 表现出最显着的改善(中位 C 指数增加约 15.70%)。此外,与仅使用真实 COAD (p 值 = 0.797) 相比,整合生成的 COAD 增强了对数秩 p 值 (0.041)。根据数据分布,我们证明该模型生成了高度可信的数据。在聚类评估中,ctGAN 在大多数情况下表现出最高的性能(89.62%)。这些发现表明,ctGAN 可以有意义地用于预测疾病进展并选择医学领域的个性化治疗。© 作者 2024。由牛津大学出版社出版。
Recent studies have extensively used deep learning algorithms to analyze gene expression to predict disease diagnosis, treatment effectiveness, and survival outcomes. Survival analysis studies on diseases with high mortality rates, such as cancer, are indispensable. However, deep learning models are plagued by overfitting owing to the limited sample size relative to the large number of genes. Consequently, the latest style-transfer deep generative models have been implemented to generate gene expression data. However, these models are limited in their applicability for clinical purposes because they generate only transcriptomic data. Therefore, this study proposes ctGAN, which enables the combined transformation of gene expression and survival data using a generative adversarial network (GAN). ctGAN improves survival analysis by augmenting data through style transformations between breast cancer and 11 other cancer types. We evaluated the concordance index (C-index) enhancements compared with previous models to demonstrate its superiority. Performance improvements were observed in nine of the 11 cancer types. Moreover, ctGAN outperformed previous models in seven out of the 11 cancer types, with colon adenocarcinoma (COAD) exhibiting the most significant improvement (median C-index increase of ~15.70%). Furthermore, integrating the generated COAD enhanced the log-rank p-value (0.041) compared with using only the real COAD (p-value = 0.797). Based on the data distribution, we demonstrated that the model generated highly plausible data. In clustering evaluation, ctGAN exhibited the highest performance in most cases (89.62%). These findings suggest that ctGAN can be meaningfully utilized to predict disease progression and select personalized treatments in the medical field.© The Author(s) 2024. Published by Oxford University Press.