Secondary Structure of Subgenomic RNA M of SARS-CoV-2

SARS-CoV-2 belongs to the Coronavirinae family. Like other coronaviruses, SARS-CoV-2 is enveloped and possesses a positive-sense, single-stranded RNA genome of ~30 kb. Genomic RNA is used as the template for replication and transcription. During these processes, positive-sense genomic RNA (gRNA) and subgenomic RNAs (sgRNAs) are created. Several studies presented the importance of the genomic RNA secondary structure in SARS-CoV-2 replication. However, the structure of sgRNAs has remained largely unsolved so far. In this study, we probed the sgRNA M model of SARS-CoV-2 in vitro. The presented model molecule includes 5′UTR and a coding sequence of gene M. This is the first experimentally informed secondary structure model of sgRNA M, which presents features likely to be important in sgRNA M function. The knowledge of sgRNA M structure provides insights to better understand virus biology and could be used for designing new therapeutics.


Introduction
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) causes coronavirus disease  and is responsible for widespread infection/death and concomitant disruptions to health services, travel, trade, education and has a negative impact on people's physical and mental health [1]. SARS-CoV-2 belongs to the Betacoronavirus genus and is a member of the Coronaviridae family, which also includes alpha-, gammaand deltacoronaviruses [2]. Like other coronaviruses, SARS-CoV-2 is enveloped and possess three structural proteins: membrane protein (M), spike protein (S), and envelope protein (E), while nucleocapsid protein (N) protects the viral RNA genome by forming a capsid [3]. SARS-CoV-2 also produces sixteen non-structural proteins (nsp1−16) and accessory proteins [4]. SARS-CoV-2 is an RNA virus and possesses a positive-sense, singlestranded RNA genome of~30 kb [5]. The genomic RNA has a 5 cap and 3 polyA tail and is used for the production of two large overlapping polyproteins (pp): pp1a and pp1ab. Polyproteins pp1a and pp1ab contain the non-structural proteins 1-11 and 1-16, respectively. Many of them, with an N protein, create the replicase-transcriptase complex (RTC) [6]. Genomic RNA is used as the template for replication and transcription by RTC. These processes result in the generation of negative-sense RNA intermediates that serve as templates for the production of positive-sense genomic RNA (gRNA) and subgenomic RNAs (sgRNAs). The gRNA is packaged into progeny virions or is used for translation, while sgRNAs encode conserved structural proteins, nucleocapsid proteins and several accessory proteins [6][7][8][9].
In coronaviruses, each sgRNA possesses a short 5 -terminal leader sequence derived from the 5 end of the genome. Transcription regulatory sequences (TRS) are necessary to add the leader sequence to sgRNAs [10]. TRSs are located at the 3 end of the leader The DNA template for the synthesis of sgRNA M was obtained in several steps. Firstly, reverse transcription was carried out using SuperScript III (Thermo Fisher Scientific) with Random Primers Mix (New England BioLabs, Ipswich, MA, USA) on RNA of SARS-CoV-2 from strain Slovakia/SK-BMC5/2020 (received from https://www.european-virus-archive. com, accessed on: 9 July 2020). Next, three PCR reactions using cDNA as template and specific primers (F1 and RM, F2 and RM, F3 and RM, Table 1) were performed to amplify the TRS-M coding sequence and add the leader sequence and transcription promoter on the 5 end of sgRNA M. After this step, primers FC and RC were used to add an EcoRI site on the 5 end and a Pst I site on the 3 end of the template of sgRNA M. DNA was purified using the Pure Link TM PCR Micro Kit (Thermo Fisher Scientific, Waltham, MA, USA). The DNA template was cloned into pUC19 and sequenced using the M13F and M13R primers for confirmation of proper sequence (Table 1).

RNA Synthesis
The DNA template for in vitro transcriptions of sgRNA M was obtained by PCR from a modified puC19 plasmid using primers FM and RM (Table 1). DNA was purified using the Pure Link TM PCR Micro Kit (Thermo Fisher Scientific, Waltham, MA, USA). The in vitro transcription reaction was performed using a MEGAscript™ T7 Transcription Kit (Thermo Fisher Scientific) according to the manufacturer's protocol. RNA product was purified using RNeasy MiniElute Cleanup Kit (Qiagen, Hilden, Germany). The integrity and purity of samples were checked on an agarose gel.

RNA Folding
Before each experiment, RNA was folded in the same manner. RNA was heated to 80 • C in water for 5 min and slowly cooled to 50 • C. At this temperature, folding buffer was added, and samples were slowly cooled to 37 • C. The final concentration of buffer was 300 mM NaCl, 5 mM MgCl 2 , 50 mM HEPES, pH 7.5. RNA integrity and homogeneity after folding were analyzed by native gel electrophoresis using 0.8% agarose gel running at 4 • C with low voltage. Under these conditions, one band was observed ( Figure S1, Supporting_Information_1).

Chemical Mapping Using NMIA, DMS and CMCT
The folding of RNA was carried out as described above. Next, chemical mapping was conducted according to published procedures with appropriate optimizations [26][27][28]. Briefly, 5.6 mM of NMIA, 30 mM of CMCT or 0.18% of DMS were used in mapping reactions. Chemical mapping was performed at 37 • C with DMS, CMCT or NMIA for 15, 30 or 40 min, respectively. Parallel, control reactions were facilitated in the same condition but without mapping reagents. Modified nucleotides were read-out by primer extension using a stoichiometry of 2 pmol primer/2 pmol RNA. Primer extension was performed at 55 • C with reverse transcriptase SuperScript III (Thermo Fisher Scientific) using the manufacturer's protocol. Next, cDNA fragments and ddNTP ladders were separated by capillary electrophoresis (Laboratory of Molecular Biology Techniques at Adam Mickiewicz University in Poznan). Primers labeled with 6-FAM were used for the detection of modification by DMS, CMCT or NMIA and with control reactions without mapping reagents. Samples were resolved in two capillaries (reaction and control) with ddNTP ladders. Primers labeled with 5-JOE were used for ddNTP ladders (most often ddATP). The experiments were performed in at least technical triplicate with the average results presented. To obtain the reactivity values of each nucleotide, the standard deviation (SD) was calculated (Table S1, Supporting_Information_2).

Processing of Chemical Mapping Data
The QuShape program was used to analyze mapping data according to a published method [29]. NMIA reactivities were normalized by the QuShape program using modelfree statistics to a scale spanning 0 to ∼2, where zero indicates no reactivity and 1.0 is the low average intensity for highly reactive RNA positions [29]. Nucleotides had normalized SHAPE reactivities 0-0.5, 0.5-0.7, and ≥0.7 correspond to unreactive, moderately reactive and highly reactive positions, respectively. Nucleotides with no data were designated as −999. Normalized SHAPE reactivities from the extension reaction of each primer were processed independently. DMS and CMCT modifications analysis was conducted similar to NMIA reactivity calculations, except that only strong modifications (reactivities ≥ 0.7) were used in RNAstructure program prediction.
Chemical mapping results were used in the RNAstructure program [30] for the prediction of the secondary structure of sgRNA M. Normalized SHAPE reactivity (as described above) were used in RNAstructure 6.2 through "Read SHAPE reactivity-pseudo free energy" mode with a slope of 1.8 and intercept of −0.6 kcal/mol [31]. DMS and CMCT strong reactivities were introduced in the same prediction using the "chemical modification" mode [32].

Bioinformatic Analysis of Base Pairs Probabilities
The sgRNA M base pair probabilities were obtained using the "Partition Function RNA" mode implemented in the RNAstructure program. SHAPE and chemical mapping experiment results were incorporated as constraints after loading the sequence file in "Partition Function RNA" mode, and a .pfs file was generated. All constraints were obtained as described in the Processing of chemical mapping data section and were the same as applied for sgRNA M folding. Next, the secondary structure of the sgRNA M model was annotated using the .pfs file using the "Add Probability Color Annotation" mode in the RNAstructure program, version 6.2

Covariation Analysis
The sequence of the in vitro probed sgRNA M underwent covariation analysis via the cm-builder pipeline, and details of this process are available [19]. Briefly, cm-builder utilizes the programs INFERence of RNA ALignment (INFERNAL) (here, release 1.1.2) [33,34] and R-scape (here, version 1.5.16) [35,36] to make alignments of a reference sequence to homologous sequences and then cross-evaluate a structural model for statistically significant covariation that maintains base pairing. The sgRNA M sequences were aligned to a previously generated fasta file [15] of 25,571 Coronaviridae sequences, obtained from the Virus Pathogen Database and Analysis Resource (ViPR database, https://www.viprbrc.org/ brc/home.spg?decorator=vipr, accessed on 10 February 2021) [37,38].
Additionally, the sgRNA M sequence was queried against the nucleotide BLAST (Basic Local Alignment Search Tool, https://blast.ncbi.nlm.nih.gov/Blast.cgi, accessed on 10 December 2021) database and yielded 453 homologous sequences. These sequences were subsequently MAFFT (Multiple Alignment using Fast Fourier Transform) [39] aligned with the sgRNA M sequence. From here, alignments were used to calculate the conservation of nucleotides in each base pair of the model structure.

Structure Probing of sgRNA M of SARS-CoV-2
The sequence of sgRNA M of SARS-CoV-2 was obtained by adding the leader sequence to the M coding sequence from the SARS-CoV-2 strain Slovakia/SK-BMC5/2020 using PCR reactions. The sgRNA M sequence of our model is identical to the SARS-CoV-2 Wuhan-Hu-1 isolate (ID: NC_045512.2). A leader sequence is characteristic for the sgRNA of the Coranaviridae family, and "leader to body fusion" takes place during discontinuous transcription [12]. Chemical mapping was used to determine a secondary structure of sgRNA M. In vitro transcribed sgRNA M was folded in folding buffer (300 mM NaCl, 5 mM MgCl 2 , 50 mM 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid (HEPES), pH 7.5) to obtain a single RNA conformation, as assessed by non-denaturing agarose gel ( Figure  S1, Supporting_Information_1). Chemical mapping was performed at 37 • C with DMS (methylates N1 of A and N3 of C when unpaired), CMCT (modifies N3 of U and N1 of G when unpaired) and SHAPE reagent N-methylisatoic anhydride (NMIA) (modifies flexible 2 -hydroxyl groups on the ribose) [26][27][28]. The modifications from chemical mapping were analyzed by reverse transcription followed by capillary electrophoresis (Figures S2 and S3, Supporting_Information 1; Table S1, Supporting_Information_2).

Base Pair Probabilities
To assess prediction quality and identify well-defined structural regions, we calculated the secondary structure partition function using RNAstructure 6.2 and, from this, determined the base pair probabilities for model pairs [40]. For the partition function calculations, experimental data were included (see Section 2.7 for details). Results indicate that there are several regions with paired and unpaired nucleotides of more than 90% probability: 1-121, 131-210, 220-502, 707-766. Additionally, all single-stranded regions are well defined by having a low probability of pairing ( Figure 1). Viruses 2021, 13, x 6 of 14 Figure 1. Predicted probability of nucleotides being paired or single-stranded in sgRNA M using the RNAstructure program. Probability lower than 50% is not colored. The partition function calculation incorporated restraints from strong reactivity of DMS and CMCT as well as SHAPE reactivities converted to pseudo-energies.

Model of Secondary Structure for sgRNA M
To predict the secondary structure of sgRNA M based on the experimental probing data, the results of chemical mapping were used to constrain predictions in the RNAstructure 6.2 program. SHAPE data were loaded as pseudoenergy constraints (the energy contribution of SHAPE reactive nt were penalized) and DMS and CMCT modifications were included as chemical mapping constraints (highly reactive nts are forbidden to be in Watson-Crick base pairs flanked by Watson-Crick base pairs). The default values for slope and intercept in the RNAstructure 6.2 program were used. The default values of these parameters were determined by optimizing the accurate modelling of the SHAPE data set with sequences of known structures [31].
Our model of sgRNA M is highly structured with plenty of accessible bulges and loops ( Figure 2). RNA motifs in the sgRNA M model are thermodynamically stable and have high calculated base pair probability. The ΔG°37 of the entire folded secondary structure is -416 kcal/mol. We showed that most of the inaccessible regions defined by chemical mapping correspond to areas containing base pairs. The 5′end of the sgRNA M model was folded into three hairpins: SL1, SL2 and SL3. These three hairpins also occur in the 5′UTR of SARS-CoV-2 in its 5′ 300 nt fragment [20,41,42] and are present in in vitro models and in-cell models of the whole genome [20][21][22][23]43]. These hairpins are also in good agreement with a structural-phylogenetic analysis of group IIb coronaviruses [44] and in silico Figure 1. Predicted probability of nucleotides being paired or single-stranded in sgRNA M using the RNAstructure program. Probability lower than 50% is not colored. The partition function calculation incorporated restraints from strong reactivity of DMS and CMCT as well as SHAPE reactivities converted to pseudo-energies.

Model of Secondary Structure for sgRNA M
To predict the secondary structure of sgRNA M based on the experimental probing data, the results of chemical mapping were used to constrain predictions in the RNAstructure 6.2 program. SHAPE data were loaded as pseudoenergy constraints (the energy contribution of SHAPE reactive nt were penalized) and DMS and CMCT modifications were included as chemical mapping constraints (highly reactive nts are forbidden to be in Watson-Crick base pairs flanked by Watson-Crick base pairs). The default values for slope and intercept in the RNAstructure 6.2 program were used. The default values of these parameters were determined by optimizing the accurate modelling of the SHAPE data set with sequences of known structures [31].
Our model of sgRNA M is highly structured with plenty of accessible bulges and loops ( Figure 2). RNA motifs in the sgRNA M model are thermodynamically stable and have high calculated base pair probability. The ∆G • 37 of the entire folded secondary structure is −416 kcal/mol. We showed that most of the inaccessible regions defined by chemical mapping correspond to areas containing base pairs. The 5 end of the sgRNA M model was folded into three hairpins: SL1, SL2 and SL3. These three hairpins also occur in the 5 UTR of SARS-CoV-2 in its 5 300 nt fragment [20,41,42] and are present in in vitro models and in-cell models of the whole genome [20][21][22][23]43]. These hairpins are also in good agreement with a structural-phylogenetic analysis of group IIb coronaviruses [44] and in silico prediction of the whole SARS-CoV-2 genome [16,17]. Moreover, the folding of a 5 leader sequence of sgRNA M is in agreement with a study of the secondary structure of sgRNA N [14]. This investigation showed that the 5 leader sequence folds almost autonomously in the sgRNA N, with the exception of a few poorly determined long-range interactions [14]. SL1 is the most variable among SARS-CoV-2 variants [41], generally possessing mismatches, bulges and a high number of A-U and U-A base pairs. This fact causes less thermodynamic stability of SL1 than SL2 and SL3. On the other hand, this feature is important for the replication of mouse hepatitis virus (MHV), a well-studied member of the Coronaviridae family [45]. SL2 is conserved in all CoVs, typically containing a pentaloop stacked on a five base-pair stem and creating a U-turn motif. This hairpin plays a critical role in MHV replication and translation [46]. SL3 is conserved only in subgroups of beta and gammaCoVs [4] and contains TRS-L sequences that take part in discontinuous transcription [11,44]. prediction of the whole SARS-CoV-2 genome [16,17]. Moreover, the folding of a 5′ leader sequence of sgRNA M is in agreement with a study of the secondary structure of sgRNA N [14]. This investigation showed that the 5′ leader sequence folds almost autonomously in the sgRNA N, with the exception of a few poorly determined long-range interactions [14]. SL1 is the most variable among SARS-CoV-2 variants [41], generally possessing mismatches, bulges and a high number of A-U and U-A base pairs. This fact causes less thermodynamic stability of SL1 than SL2 and SL3. On the other hand, this feature is important for the replication of mouse hepatitis virus (MHV), a well-studied member of the Coronaviridae family [45]. SL2 is conserved in all CoVs, typically containing a pentaloop stacked on a five base-pair stem and creating a U-turn motif. This hairpin plays a critical role in MHV replication and translation [46]. SL3 is conserved only in subgroups of beta and gammaCoVs [4] and contains TRS-L sequences that take part in discontinuous transcription [11,44]. Recently, a prediction of interaction between the SARS-CoV-2 genome and the human proteome indicated that a highly structured region at the 5′ end had a large number of interactions with proteins such as (1) ATP-dependent RNA helicase-DDX1, which was previously reported to be essential for Avian infectious bronchitis coronavirus replication [18,47], (2) adenosine deaminases acting on RNA (ADAR) that catalyzes the hydrolytic Recently, a prediction of interaction between the SARS-CoV-2 genome and the human proteome indicated that a highly structured region at the 5 end had a large number of interactions with proteins such as (1) ATP-dependent RNA helicase-DDX1, which was previously reported to be essential for Avian infectious bronchitis coronavirus replication [18,47], (2) adenosine deaminases acting on RNA (ADAR) that catalyzes the hydrolytic deamination of adenosine to inosine, which affects viral protein synthesis, proliferation and infectivity [18,48], and (3) 2 -5 -oligoadenylate synthetases which control viral RNA degradation [18,49,50]. Some of these proteins could interact with a leader sequence of sgRNA. This assumption was confirmed via experiments with DDX1 knockdown that reduced the number of sgRNA in SARS-CoV-1 infected cells [51]. This finding and the preservation of 5 UTR motifs in sgRNA M indicate similar interactions could occur with sgRNA M. Moreover, interactions between the SARS-CoV-2 genome, as well as sgRNAs and host RNAs, were revealed. However, the SARS-CoV-2 genome and sgRNAs take part in different interactions with host RNAs [23,43].
RNA structure probing coupled with nanopore direct-RNA sequencing were used to map sgRNAs with NAI in living cells, but the structure of sgRNA M was not proposed [23]. We also compared long-range RNA-RNA interactions within the secondary structure of sgRNA M mapped in vitro (Figure 2) with the in vivo RNA-RNA interactome of sgRNA M [43]. Overall, these interactions are different and complex. Moreover, Ziv and coauthors discovered the co-existence of alternative SARS-CoV-2 gRNA and sgRNA topologies, held by long-range base-pairing between regions tens of thousands of nucleotides apart [43].
We additionally compared our sgRNA M model ( Figure 2) with a proposed gene M secondary structure [20]. Generally, our presented model of the secondary structure of sgRNA M and the corresponding region of the whole SARS-CoV-2 genome obtained by probing in vivo [20] are different ( Figure 3). This difference is in agreement with a previous study about the in vivo RNA-RNA interactome of the full-length SARS-CoV-2 genome and several sgRNAs. Here some structural aspects of viral RNA are also discussed [43]. This investigation revealed that the viral genome and subgenomes adopt alternative topologies inside the cell. Moreover, some long-range RNA-RNA interactions in sgRNA of SARS-CoV-2 are unique [43].  [20]. However, hairpin 514-592 is three base pairs longer and possesses an additional internal loop in the cellular model. In turn, hairpins 131-150 and 281-297 are almost identical to those of the corresponding region in the SARS-CoV-2 genome that are mapped in cells [20]. In our model, loops are longer than the corresponding region of the in-cell model. On the other hand, the small motif 503-512 is the only hairpin structure uniquely characteristic for sgRNA M and does not exist in the context of the SARS-CoV-2 genome. Our determined sgRNA M secondary structure is also similar to corresponding regions of other published whole-genome SARS-CoV-2 models [22,23]. This similarity between our sgRNA M structure and the in-cell determined structure of the M sequence in the whole genome context is surprising since that RNA structure in vitro and in vivo could be significantly different. In vivo interaction between RNA and proteins or other molecules can influence secondary structure [52]. These data indicated that the sequence and thermodynamics alone are major determinants of sgRNA M structural motifs formation. All results point to these stable motifs being functionally significant. Furthermore, experiments and computational analyses have shown that large amounts of double-stranded regions have a strong propensity to interact with proteins and act as scaffolds for RNA-binding proteins ] [53][54][55]. sgRNA M is very structured, and it is possible for stable helices to interplay with proteins.
Viruses 2021, 13, x 9 of 14 structure of the M sequence in the whole genome context is surprising since that RNA structure in vitro and in vivo could be significantly different. In vivo interaction between RNA and proteins or other molecules can influence secondary structure [52]. These data indicated that the sequence and thermodynamics alone are major determinants of sgRNA M structural motifs formation. All results point to these stable motifs being functionally significant. Furthermore, experiments and computational analyses have shown that large amounts of double-stranded regions have a strong propensity to interact with proteins and act as scaffolds for RNA-binding proteins ] [53][54][55]. sgRNA M is very structured, and it is possible for stable helices to interplay with proteins.  [20]. Yellow rectangle indicates motifs of the leader sequence.

Local Structural Motifs in sgRNA M Are Mostly Independent of Leader Sequence.
We compared our sgRNA M model with the corresponding region of an in vitro model of the whole SARS-CoV-2 genome [19] to check the influence of the 5′leader sequence on the folding of sgRNA. We indicated that the structures are different. This feature is consistent with a study of sgRNA N and its corresponding region in the SARS-CoV-2 genome. This data indicated that the same RNA sequences can fold in different structures in the subgenomic and genomic contexts [14]. However, some local motifs (Figure 2

Local Structural Motifs in sgRNA M Are Mostly Independent of Leader Sequence
We compared our sgRNA M model with the corresponding region of an in vitro model of the whole SARS-CoV-2 genome [19] to check the influence of the 5 leader sequence on the folding of sgRNA. We indicated that the structures are different. This feature is consistent with a study of sgRNA N and its corresponding region in the SARS-CoV-2 genome. This data indicated that the same RNA sequences can fold in different structures in the subgenomic and genomic contexts [14]. However, some local motifs (Figure 2 It is possible that the existence and appropriate folding of neighboring regions has some, but relatively small, influence on some local motifs of sgRNA M. Some local structure motifs are identical within in vitro and in vivo SARS-CoV-2 genome models [20].

Covariation Analysis of the sgRNA M Secondary Structure
The covariation analysis of the sgRNA M model presented here, utilizing the cmbuilder pipeline with an alignment against 25,571 Coronaviridae sequences, yielded no base pairs with statistically significant covariation. Covariation (i.e., sites of mutated sequence which maintain base pairing and a 2D structure) is often used to support the potential for an RNA motif to be functional as the structure is being preserved even when the primary sequence is not. Importantly though, a lack of covariation does not indicate a lack of potential functionality.
An analysis of the conservation of the sgRNA M model base pairs against an alignment of 453 homologous sequences showed an average base pair conservation of 97.57%, with stem SL1 (Figure 2) having an average conservation of 50.97%, stem SL2 ( Figure 2) averaging 83.48% and the remaining base pairs averaging roughly 100% conservation (Table S1, Supporting_Information_3).
The high degree of conservation of most base pairs in the model may be partially responsible for the lack of detectable covariation. This lack of significant covariation is in line with previous studies [15,19,56]. Despite extensive evidence of stable, ordered secondary structure, few motifs were supported by significant covariation. Additionally, as most of the structures presented here exist within the sgRNA M coding sequence, there are additional evolutionary pressures to conserve these sequences as to not disrupt the protein amino acid sequence.

Possible Influence of Nucleotide Mutation on sgRNA M Structure of SARS-CoV-2 Variants
The SARS-CoV-2 genome constantly evolves, new mutations appear and virus variants are monitored. Changing of RNA sequence must occur in a frame to not be lethal for the virus and to preserve the function of proteins and also RNA structure. Emerging mutations should retain base pairs in RNA structure motifs that are important in the viral cycle. Therefore, we analyzed nucleotide mutations in the sequence of the M gene of SARS-CoV-2 variants. Table 3 (Table 3).

Conclusions
For the first time, the secondary structure of sgRNA M was determined based on the experimental data from several chemical mapping methods and bioinformatic analyses. The secondary structure model contains unique features likely to be important for sgRNA M functions. The structure also includes several of the same motifs as the genomic M fragment in the SARS-CoV-2 genome (Figure 3). Previously published reactivity of sgRNA M from structure probing in living cells supported the existence of some of the presented structural motifs of our sgRNA M model [23]. Although covariation analysis shows no base pairs with statistically significant covariation, the mutations of gene M within SARS-CoV-2 variants is largely in agreement with the presented structure and supports long-distance helixes. This new knowledge about sgRNA M provides insights to better understand virus biology and could be used for anti-SARS-CoV-2 strategies and designing new therapeutics. The revealed unique or same as in gRNA M structural motifs could be promising targets for antisense oligonucleotides, siRNAs and small molecules Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/v14020322/s1, Supporting_Information_1: Figure S1: Folding of sgRNA M on agarose gel; Figure S2: Example of capillary electrophoresis raw.data; Figure S3: sgRNA M nucleotides reactivity diagrams; Supporting_Information_2: Table S1: sgRNA M mapping data; Supporting_Information_3: Table S1: Base pairs counts for secondary structure of sgRNA M.

Conflicts of Interest:
No potential conflict of interest was reported by the authors.