Readon:一种新颖的算法,用于识别具有长读测序数据的通读转录本。
Readon: a novel algorithm to identify read-through transcripts with long-read sequencing data.
发表日期:2024 May 28
作者:
Siang Chen, Hao Wang, Dongdong Zhang, Runsheng Chen, Jianjun Luo
来源:
BIOINFORMATICS
摘要:
人类基因组中有许多聚集的转录活性区域,其中转录复合物不能立即在上游基因终止位点终止转录,而是继续转录基因间区域和下游基因,产生通读转录本。多项研究证明了通读转录本在肿瘤发生和发展中的调节作用。然而,受新一代测序读取长度的限制,通读转录本的发现进展缓慢。对于长且错误的第三代测序数据,本研究开发了一种新颖的最小化草图算法,以准确、快速地识别通读转录本。Readon 最初将参考序列分割成不同的活性区域。它在每个区域内采用滑动窗口方法,计算最小值,并构建用于查询索引的专用结构化数组。在对候选通读转录本进行初始对齐锚筛选之后,执行进一步的确认步骤。与现有软件的比较评估揭示了 Readon 在模拟和验证的真实数据上的卓越性能。此外,还提供了两种下游工具:一种用于预测通读转录本是否可能经历无义介导的衰变或编码蛋白质,另一种用于可视化剪接模式。Readon 可在 GitHub 上免费获得(https://github.com/readon/readon)。 com/Bulabula45/Readon)。补充数据可在生物信息学在线获取。© 作者 2024。由牛津大学出版社出版。
There are many clustered transcriptionally active regions in the human genome, in which the transcription complex can not immediately terminate transcription at the upstream gene termination site, but instead continues to transcribe intergenic regions and downstream genes, resulting in read-through transcripts. Several studies have demonstrated the regulatory roles of read-through transcripts in tumorigenesis and development. However, limited by the read length of next-generation sequencing, discovery of read-through transcripts has been slow. For long but also erroneous third-generation sequencing data, this study developed a novel minimizer sketch algorithm to accurately and quickly identify read-through transcripts.Readon initially splits the reference sequence into distinct active regions. It employs a sliding window approach within each region, calculates minimizers, and constructs the specialized structured arrays for query indexing. Following initial alignment anchor screening of candidate read-through transcripts, further confirmation steps are executed. Comparative assessments against existing software reveal Readon's superior performance on both simulated and validated real data. Additionally, two downstream tools are provided: one for predicting whether a read-through transcript is likely to undergo nonsense-mediated decay or encodes a protein, and another for visualizing splicing patterns.Readon is freely available on GitHub (https://github.com/Bulabula45/Readon).Supplementary data are available at Bioinformatics online.© The Author(s) 2024. Published by Oxford University Press.