从转录组学进行全蛋白质组拷贝数估算
Proteome-wide copy-number estimation from transcriptomics
DOI 原文链接
用sci-hub下载
如无法下载,请从 Sci-Hub 选择可用站点尝试。
影响因子:7.7
分区:生物学1区 Top / 生化与分子生物学1区
发表日期:2024 Nov
作者:
Andrew J Sweatt, Cameron D Griffiths, Sarah M Groves, B Bishal Paudel, Lixin Wang, David F Kashatus, Kevin A Janes
DOI:
10.1038/s44320-024-00064-3
摘要
蛋白质的拷贝数限制了调控网络的系统水平性质,但与RNA-seq相比,比例蛋白质组学数据仍然稀缺。我们利用定量蛋白质组学和转录组学的最佳可用数据,对4366个基因在369个细胞系中的mRNA与蛋白质进行了统计关联分析。该方法从蛋白的中位拷贝数开始,层级式添加mRNA-蛋白和mRNA-mRNA依赖关系,定义了连接mRNAs与蛋白质的最优基因特异模型。对于数十个细胞系和原发样本,这些通过mRNA推断的蛋白质结果优于严格的零模型、基于计数的蛋白丰度库、经验的mRNA到蛋白质比例以及蛋白质组学DREAM挑战的获胜模型。最优的mRNA-蛋白关系不仅捕获了生物过程,还涵盖了数百个已知的蛋白-蛋白复合体,提示潜在的机械关系。我们利用该方法识别了柯萨奇病毒B3易感性的病毒受体表达阈值,该模型基于1489个系统生物学感染模型参数化的蛋白推断。当应用于796个乳腺癌RNA-seq数据时,推断的拷贝数估算可以重新分类26-29%的管腔型肿瘤。通过采用一种以基因为中心的视角,分析不同生物学背景下mRNA与蛋白的协变关系,达到了与当代蛋白质组学技术复制性相当的准确性。
Abstract
Protein copy numbers constrain systems-level properties of regulatory networks, but proportional proteomic data remain scarce compared to RNA-seq. We related mRNA to protein statistically using best-available data from quantitative proteomics and transcriptomics for 4366 genes in 369 cell lines. The approach starts with a protein's median copy number and hierarchically appends mRNA-protein and mRNA-mRNA dependencies to define an optimal gene-specific model linking mRNAs to protein. For dozens of cell lines and primary samples, these protein inferences from mRNA outmatch stringent null models, a count-based protein-abundance repository, empirical mRNA-to-protein ratios, and a proteogenomic DREAM challenge winner. The optimal mRNA-to-protein relationships capture biological processes along with hundreds of known protein-protein complexes, suggesting mechanistic relationships. We use the method to identify a viral-receptor abundance threshold for coxsackievirus B3 susceptibility from 1489 systems-biology infection models parameterized by protein inference. When applied to 796 RNA-seq profiles of breast cancer, inferred copy-number estimates collectively re-classify 26-29% of luminal tumors. By adopting a gene-centered perspective of mRNA-protein covariation across different biological contexts, we achieve accuracies comparable to the technical reproducibility of contemporary proteomics.