• Ei tuloksia

Transcriptional features at breakpoint regions

The use of next generation sequencing technologies has markedly increased the knowledge on the transcription of the genome. Transcriptional features, such as R-loops, have been linked to genomic instability (Sollier et al., 2014; Hatchi et al., 2015).

On the other hand, secondary alterations in E/R leukemia have been suggested to predominantly arise from RAG enzyme mediated off-targeting in sites other than the immunoglobulin genes (Papaemmanuil et al., 2014). RAG1 and RAG2 are

normally expressed during B-cell development for recombination of the immunoglobulin gene regions. They are also expressed in leukemic cells, especially in the E/R subtype.

We investigated the structural variation sites in ETV6-RUNX1-positive preB-ALL by analyzing transcriptional features (from GRO-seq data) at breakpoint regions (from WGS data). Transcription start sites (TSSs) in precursor B-cells have been considered susceptible to double strand breaks, mediated by H3K4me3 that is recognized by the RAG2 enzyme (Matthews et al., 2007; Teng et al., 2015). However, we noticed the breakpoints often resided a few kilobases from the TSS. Our analysis showed that the breakpoints overlap with RNA pol II stalling and convergent transcription, especially in case of recurrent breakpoint sites. We showed signals at the recurrent loci of, PAX5, BTG1, CDKN2A/B, and RAG1/2 , all of which are often hit by deletions in preB-ALL (Mullighan et al., 2007; Papaemmanuil et al., 2014).

Some of the regions may present novel enhancer RNAs or lncRNAs. We also showed genome-wide associations between elevated R-loop signal, convergent transcription, and RNA pol II stalling in normal and leukemic human cells. Although all the features are a part of the normal transcriptional process, genomic loci with recurrent breakpoints were particularly enriched with them. RNA pol II stalling was also associated with H3K4me3. Wide H3K4me3 regions have been suggested to ensure transcriptional consistency of key genes in cell identity and function (Benayoun et al., 2014).

We classified the structural variation breakpoints into two groups based on the presence of an RSS motif. Both RSS- and non-RSS breakpoint sites were found to be enriched at stalled RNA pol II and convergent transcription sites. No clear difference in associations with transcriptional features were seen between the two breakpoint types, however, the breakpoints with RSS-motif had especially high concurrence of the features, indicating that there might be a higher demand for the exposure of the exact RSS motif for the RAG enzymes. This is supported by the notion that RAG-mediated cleavage occurs on unpaired and unwound DNA (Akamatsu and Oettinger, 1998). In contrast, non-RSS breakpoints could develop during transcription elongation and convergent transcription by mechanisms dependent on the exposure of the region in general.

R-loops are expected to be formed in regions where RNA pol II is stalling (Skourti-Stathaki and Proudfoot, 2014; Jenjaroenpun et al., 2015). In addition to R-loops, convergent transcription has been hypothesized to cause pauses in transcription due to collisions between the crossing polymerases (Prescott and Proudfoot, 2002). Although we only saw a slight enrichment of breakpoints to

R-loop forming sequence sites (RLFS), we showed that they overlap with RNA pol II stalling events, and there was an increased signal level of DRIP-seq at RLFS-positive sites compared with RLFS-negative sites in ES cells. At the time, DRIP-seq data was not available for B-lineage cells but instead we used RLFS sequence prediction (Jenjaroenpun et al., 2015). The modest enrichment of breakpoints to RLFS sites may be influenced by the wide proportion of genome, especially the TSS sites, that is prone to R-loops as predicted by the program (15646 TSSs with RLFS-motif, 8220 without) and reported by others (Ginno et al., 2013; Lim et al., 2015; Sanz et al., 2016).

Additional mechanisms seem to be needed to result in breakage. For example, factors affecting the stability of the DNA:RNA hybrids could play a role. One of these factors is topoisomerase that functions in rewinding the strands behind RNA pol II, inhibiting R-loop formation (Atkin, Raimer and Wang, 2019). Defects in RNA processing after transcription could slow down the rewinding of DNA at R-loop prone sites and thereby expose to breaks. Indeed, quite recently, mutations in genes affecting RNA splicing that results in RNA pol II accumulation at certain mis-spliced loci have been suggested in leukemogenesis in AML (Yoshimi et al., 2019).

We used data from embryonic stem cells to show a general overlap of DNA:RNA-hybrids with RLFS motifs along with transcriptional features. We also showed overlap between stalling and convergent transcription using ES cell data with the idea that general properties of transcription are not dependent on cell type. When the overlap between the breakpoints in E/R leukemic cells and the transcriptional features in stem cells was analyzed, we did not see clear enrichments. This reflects cell type specific expression of the genome.

Our analysis comprised of data from B-lymphoid cell lines and preB-ALL leukemic cells (primary and cell lines). Nascent RNA signals were used to estimate the transcription occurring at the breakpoint sites before the breakage. Our study shows associations between transcriptional features, but the putative causes behind the associations and their regulation were not explored. In addition to ETV6-RUNX1, we also studied breakpoints resolved from other preB-ALL subtypes.

Although only 7% of the breakpoints in KMT2A-rearranged cases had RSS motif evidence (Andersson et al., 2015), we saw similar enrichment of convergent transcription and pol II stalling as in the case of ETV6-RUNX1. Similar transcriptional features were also seen for the breakpoints studied from the hyperdiploid and hypodiploid cases. This indicates that comparable transcriptional vulnerability may underlie the generation of the secondary alterations in these subtypes. The analysis could be extended in the future by using a model where GRO-seq signal was produced from preleukemic precursor B-cells and the structural

variation sites from the established leukemic cells were later resolved. Additional breakpoint data from more patients would also give more certainty for the recurrence of breaks inside specified regions.

Our study provides evidence on how genomic regions, especially with a nucleotide composition resembling RSS-motifs, are further selected for breakages recurrently seen at B-cell specific genomic sites. Recurrence is often used to define

“driver genes” which are proposed to be particularly essential for the cancer progression. However, regarding the emerging evidence on transcriptional and epigenomic features, at least some of the recurrently altered genes could instead just be more prone to double strand breaks at that specific cell type and state and not all of them necessarily act as active players in leukemogenesis.