研究动态
Articles below are published ahead of final publication in an issue. Please cite articles in the following format: authors, (year), title, journal, DOI.

ML-GAP:使用自动编码器和数据增强的机器学习增强基因组分析管道。

ML-GAP: machine learning-enhanced genomic analysis pipeline using autoencoders and data augmentation.

发表日期:2024
作者: Melih Agraz, Dincer Goksuluk, Peng Zhang, Bum-Rak Choi, Richard T Clements, Gaurav Choudhary, George Em Karniadakis
来源: Frontiers in Genetics

摘要:

RNA 测序 (RNA-Seq) 的出现极大地增进了我们对转录组景观的理解,揭示了跨生物状态和条件的复杂基因表达模式。然而,RNA-Seq 数据的复杂性和数量给识别差异表达基因 (DEG) 带来了挑战,这对于理解癌症等疾病的分子基础至关重要。我们引入了一种新型机器学习增强型基因组数据分析管道 (ML-GAP)它结合了自动编码器和创新的数据增强策略,特别是 MixUp 方法,来克服这些挑战。通过输入对及其标签的线性组合创建合成训练示例,MixUp 显着增强了模型从训练数据泛化到未见过的示例的能力。我们的结果证明了 ML-GAP 在准确性、效率和洞察力方面的优越性,特别是归功于MixUp 方法对管道的有效性做出了重大贡献,极大地推进了基因组数据分析并在该领域树立了新标准。这反过来表明 ML-GAP 有潜力执行更准确的 DEG 检测,但也提供了新的治疗干预和研究的途径。通过集成可解释的人工智能 (XAI) 技术,ML-GAP 确保了透明且可解释的分析,突出了已识别遗传标记的重要性。版权所有 © 2024 Agraz、Goksuluk、Zhang、Choi、Clements、Choudhary 和 Karniadakis。
The advent of RNA sequencing (RNA-Seq) has significantly advanced our understanding of the transcriptomic landscape, revealing intricate gene expression patterns across biological states and conditions. However, the complexity and volume of RNA-Seq data pose challenges in identifying differentially expressed genes (DEGs), critical for understanding the molecular basis of diseases like cancer.We introduce a novel Machine Learning-Enhanced Genomic Data Analysis Pipeline (ML-GAP) that incorporates autoencoders and innovative data augmentation strategies, notably the MixUp method, to overcome these challenges. By creating synthetic training examples through a linear combination of input pairs and their labels, MixUp significantly enhances the model's ability to generalize from the training data to unseen examples.Our results demonstrate the ML-GAP's superiority in accuracy, efficiency, and insights, particularly crediting the MixUp method for its substantial contribution to the pipeline's effectiveness, advancing greatly genomic data analysis and setting a new standard in the field.This, in turn, suggests that ML-GAP has the potential to perform more accurate detection of DEGs but also offers new avenues for therapeutic intervention and research. By integrating explainable artificial intelligence (XAI) techniques, ML-GAP ensures a transparent and interpretable analysis, highlighting the significance of identified genetic markers.Copyright © 2024 Agraz, Goksuluk, Zhang, Choi, Clements, Choudhary and Karniadakis.