前沿快讯
聚焦肿瘤与肿瘤类器官最新研究,动态一手掌握。

来自转录组学的全蛋白质组拷贝数估计

Proteome-wide copy-number estimation from transcriptomics

影响因子:7.70000
分区:生物学1区 Top / 生化与分子生物学1区
发表日期:2024 Nov
作者: Andrew J Sweatt, Cameron D Griffiths, Sarah M Groves, B Bishal Paudel, Lixin Wang, David F Kashatus, Kevin A Janes

摘要

蛋白质拷贝数限制了调节网络的系统级特性,但是与RNA-Seq相比,比例蛋白质组学数据仍然稀缺。我们使用来自定量蛋白质组学和转录组学的最佳可用数据在统计上将mRNA与蛋白质相关联,用于369个细胞系中的4366个基因。该方法始于蛋白质的中位拷贝数,并从等级附加mRNA-蛋白质和mRNA-MRNA依赖性,以定义将mRNA与蛋白质联系起来的最佳基因特异性模型。对于数十种细胞系和主要样品,这些蛋白质推断从mRNA匹配严格的无效模型,基于计数的蛋白质充足存储库,经验mRNA与蛋白质比以及蛋白质组的梦想挑战赢家。最佳的mRNA到蛋白质关系捕获了生物学过程以及数百种已知的蛋白质蛋白质复合物,提示机械关系。我们使用该方法来识别1489个通过蛋白质推断参数参数的1489个系统生物学感染模型的coxsackievivirus b3易感性的病毒受体丰度阈值。当应用于796个乳腺癌的RNA-seq谱时,推断出的拷贝数估计总共重新分类了26-29%的腔肿瘤。通过在不同的生物学环境中采用以基因为中心的mRNA-蛋白协方差的观点,我们实现了与当代蛋白质组学的技术可重复性相当的准确性。

Abstract

Protein copy numbers constrain systems-level properties of regulatory networks, but proportional proteomic data remain scarce compared to RNA-seq. We related mRNA to protein statistically using best-available data from quantitative proteomics and transcriptomics for 4366 genes in 369 cell lines. The approach starts with a protein's median copy number and hierarchically appends mRNA-protein and mRNA-mRNA dependencies to define an optimal gene-specific model linking mRNAs to protein. For dozens of cell lines and primary samples, these protein inferences from mRNA outmatch stringent null models, a count-based protein-abundance repository, empirical mRNA-to-protein ratios, and a proteogenomic DREAM challenge winner. The optimal mRNA-to-protein relationships capture biological processes along with hundreds of known protein-protein complexes, suggesting mechanistic relationships. We use the method to identify a viral-receptor abundance threshold for coxsackievirus B3 susceptibility from 1489 systems-biology infection models parameterized by protein inference. When applied to 796 RNA-seq profiles of breast cancer, inferred copy-number estimates collectively re-classify 26-29% of luminal tumors. By adopting a gene-centered perspective of mRNA-protein covariation across different biological contexts, we achieve accuracies comparable to the technical reproducibility of contemporary proteomics.