机器学习能够对持续 CTCF 结合位点的突变热点进行全癌识别。
Machine learning enables pan-cancer identification of mutational hotspots at persistent CTCF binding sites.
发表日期:2024 Jul 02
作者:
Wenhan Chen, Yi C Zeng, Joanna Achinger-Kawecka, Elyssa Campbell, Alicia K Jones, Alastair G Stewart, Amanda Khoury, Susan J Clark
来源:
Epigenetics & Chromatin
摘要:
CCCTC 结合因子 (CTCF) 是一种绝缘体蛋白,可与高度保守的 DNA 基序结合,促进三维 (3D) 核结构和转录的调节。 CTCF 结合位点 (CTCF-BS) 存在于非编码 DNA 中,并且在癌症中经常发生突变。我们之前的研究确定了 CTCF-BS 的一个小亚类,它们对 CTCF 敲低具有抵抗力,称为持久性 CTCF 结合位点(P-CTCF-BS)。 P-CTCF-BS 显示出高度的结合保守性,并可能调节细胞类型的组成型 3D 染色质结构。在这里,使用 ICGC 测序数据,我们做出了惊人的观察,即与所有 CTCF-BS 相比,P-CTCF-BS 在乳腺癌和前列腺癌中表现出高度升高的突变率。为了解决 P-CTCF-BS 突变是否也在其他细胞类型中富集,我们开发了 CTCF-INSITE——一种利用机器学习根据实验确定的 P-CTCF-BS 的遗传和表观遗传特征来预测持久性的工具。值得注意的是,预测的 P-CTCF-BS 在所有 12 种测试的癌症类型中也显示出显着升高的突变负担。 P-CTCF-BS 突变的富集甚至更强,预测对 CTCF 结合和染色质环的功能影响。使用体外结合测定,我们验证了预测具有破坏性的 P-CTCF-BS 癌症突变确实减少了 CTCF 结合。这项研究共同揭示了癌症特异性 CTCF-BS DNA 突变的一个新亚类,并深入了解了它们在泛癌环境中基因组组织中的重要性。© 作者 2024。由牛津大学出版社代表 Nucleic Acids Research 出版。
CCCTC-binding factor (CTCF) is an insulator protein that binds to a highly conserved DNA motif and facilitates regulation of three-dimensional (3D) nuclear architecture and transcription. CTCF binding sites (CTCF-BSs) reside in non-coding DNA and are frequently mutated in cancer. Our previous study identified a small subclass of CTCF-BSs that are resistant to CTCF knock down, termed persistent CTCF binding sites (P-CTCF-BSs). P-CTCF-BSs show high binding conservation and potentially regulate cell-type constitutive 3D chromatin architecture. Here, using ICGC sequencing data we made the striking observation that P-CTCF-BSs display a highly elevated mutation rate in breast and prostate cancer when compared to all CTCF-BSs. To address whether P-CTCF-BS mutations are also enriched in other cell-types, we developed CTCF-INSITE-a tool utilising machine learning to predict persistence based on genetic and epigenetic features of experimentally-determined P-CTCF-BSs. Notably, predicted P-CTCF-BSs also show a significantly elevated mutational burden in all 12 cancer-types tested. Enrichment was even stronger for P-CTCF-BS mutations with predicted functional impact to CTCF binding and chromatin looping. Using in vitro binding assays we validated that P-CTCF-BS cancer mutations, predicted to be disruptive, indeed reduced CTCF binding. Together this study reveals a new subclass of cancer specific CTCF-BS DNA mutations and provides insights into their importance in genome organization in a pan-cancer setting.© The Author(s) 2024. Published by Oxford University Press on behalf of Nucleic Acids Research.