CoRAL 通过长读长测序准确解析染色体外 DNA 基因组结构。
CoRAL accurately resolves extrachromosomal DNA genome structures with long-read sequencing.
发表日期:2024 Jul 09
作者:
Kaiyuan Zhu, Matthew Gregory Jones, Jens Luebeck, Xinxin Bu, Hyerim Yi, King L Huang, Ivy Tsz-Lo Wong, Shu Zhang, Paul S Mischel, Howard Chang, Vineet Bafna
来源:
GENOME RESEARCH
摘要:
染色体外 DNA (ecDNA) 是癌症中局灶性癌基因扩增的核心机制,发生在大约 15% 的早期癌症和 30% 的晚期癌症中。 EcDNA 通过动态调节癌基因拷贝数和重新连接基因调控网络来驱动肿瘤的形成、进化和耐药性。阐明 ecDNA 扩增的基因组结构对于了解肿瘤病理学和开发更有效的疗法至关重要。双端短读长 (Illumina) 测序和作图已用于使用断点图来表示 ecDNA 扩增,其中推断的 ecDNA 结构在图中被编码为循环。断点图的遍历已用于成功预测癌症样本中 ecDNA 的存在。然而,短读长技术在断点识别、复杂重排和内部重复的分相以及 ecDNA 结构的细胞间异质性去卷积方面本质上受到限制。长读长技术,例如来自 Oxford Nanopore Technologies 的技术,有可能改善推理,因为较长的读长更适合绘制结构变异,并且更有可能跨越重新排列或重复的区域。在这里,我们提出了 CoRAL(长读长扩增的完整重建),用于使用长读长数据重建 ecDNA 架构。 CoRAL 使用二次规划重建可能的循环架构,同时优化重建的简约性、解释的拷贝数和长读映射的一致性。与之前基于短读和长读的工具相比,CoRAL 极大地改进了广泛模拟和先前表征的细胞系的 10 个数据集的重建。随着长读长的使用变得广泛,我们预计 CoRAL 将成为分析肿瘤焦点放大的景观和演变的有价值的工具。由冷泉港实验室出版社出版。
Extrachromosomal DNA (ecDNA) is a central mechanism for focal oncogene amplification in cancer, occurring in approximately 15% of early-stage cancers and 30% of late-stage cancers. EcDNAs drive tumor formation, evolution, and drug resistance by dynamically modulating oncogene copy-number and rewiring gene-regulatory networks. Elucidating the genomic architecture of ecDNA amplifications is critical for understanding tumor pathology and developing more effective therapies. Paired-end short-read (Illumina) sequencing and mapping have been utilized to represent ecDNA amplifications using a breakpoint graph, where the inferred architecture of ecDNA is encoded as a cycle in the graph. Traversals of breakpoint graph have been used to successfully predict ecDNA presence in cancer samples. However, short-read technologies are intrinsically limited in the identification of breakpoints, phasing together of complex rearrangements and internal duplications, and deconvolution of cell-to-cell heterogeneity of ecDNA structures. Long-read technologies, such as from Oxford Nanopore Technologies, have the potential to improve inference as the longer reads are better at mapping structural variants and are more likely to span rearranged or duplicated regions. Here, we propose CoRAL (Complete Reconstruction of Amplifications with Long reads), for reconstructing ecDNA architectures using long-read data. CoRAL reconstructs likely cyclic architectures using quadratic programming that simultaneously optimizes parsimony of reconstruction, explained copy number, and consistency of long-read mapping. CoRAL substantially improves reconstructions in extensive simulations and 10 datasets from previously-characterized cell lines as compared to previous short and long-read based tools. As long-read usage becomes wide-spread, we anticipate that CoRAL will be a valuable tool for profiling the landscape and evolution of focal amplifications in tumors.Published by Cold Spring Harbor Laboratory Press.