Evaluating the Effect of 3′-UTR Variants in DICER1 and DROSHA on Their Tissue-Specific Expression by miRNA Target Prediction

Untranslated gene regions (UTRs) play an important role in controlling gene expression. 3′-UTRs are primarily targeted by microRNA (miRNA) molecules that form complex gene regulatory networks. Cancer genomes are replete with non-coding mutations, many of which are connected to changes in tumor gene expression that accompany the development of cancer and are associated with resistance to therapy. Therefore, variants that occurred in 3′-UTR under cancer progression should be analysed to predict their phenotypic effect on gene expression, e.g., by evaluating their impact on miRNA target sites. Here, we analyze 3′-UTR variants in DICER1 and DROSHA genes in the context of myelodysplastic syndrome (MDS) development. The key features of this analysis include an assessment of both “canonical” and “non-canonical” types of mRNA-miRNA binding and tissue-specific profiling of miRNA interactions with wild-type and mutated genes. As a result, we obtained a list of DICER1 and DROSHA variants likely altering the miRNA sites and, therefore, potentially leading to the observed tissue-specific gene downregulation. All identified variants have low population frequency consistent with their potential association with pathology progression.


Introduction
Differential gene expression analysis is a widely used approach to understand the biological differences between healthy and disease states in order to investigate altered molecular pathways and to discover new targets for specific treatment. RNA sequencing (RNAseq), whole-exome sequencing (WES), and whole-genome sequencing (WGS) are increasingly applied to identify disease-causing genetic variants in individuals [1]. However, despite the current progress in the application of sequencing technologies, analysis of WES or WGS achieves a diagnosis rate of only up to 50% depending on the disease entity [2]. One of the main challenges with the DNA-based molecular diagnostics approaches is the interpretation of non-coding variants. Many non-coding variants play a regulatory role in controlling RNA abundance, splicing and consequent genes expression [3]. A high prevalence of disease-associated SNPs (more than 90%) was found in non-coding regions of the genome, for example in promoter regions, enhancers, and non-coding RNA genes [4][5][6], indicating their important regulatory role. Mutations in 3 -untranslated regions (UTRs) can disrupt, create, or alter the microRNA (miRNA) binding sites thus affecting the expression level of corresponding genes and thereby deregulating pathways that control cellular functions [7]. Noteworthy, we differentiate the terms "mutation" and "polymorphism" as follows: mutation is an event leading to a disease condition, and polymorphism means a variation in the human genome that is observed in populational databases. miRNAs are estimated to regulate the translation of up to 60% of protein-coding genes [8]; some protein-coding genes are regulated by a single miRNA, while others are regulated by many miRNAs [9]. As fine regulators of gene expression, miRNAs play important roles in key biological processes, such as cell development, proliferation, differentiation, and cell death [10]. miRNAs are a unique class of small (17-22 nucleotides (nt) in length) non-coding single-stranded RNAs that comprise up to 1% of the human genome [11]. miR-NAs play an important regulatory role in gene expression at the posttranscriptional level by facilitating sequence-specific RNA interference. The canonical mechanism of their action is the direct interaction with mRNAs of protein-coding genes, followed by degradation of target mRNA by Argonaute-containing silencing complexes [12]. Pairing with mRNA occurs via complementary binding of miRNA seed region (usually nucleotides 2-8 of the miRNA) with the 3 -UTR or, in rare cases, with the 5 -UTR of the target gene [13]. However, recent studies have revealed roles for miRNA sequences beyond the seed region in specifying target recognition and regulation [14], suggesting more complex mechanisms of protein expression control. The newly discovered mechanisms of non-canonical miRNA regulation reveal a wider set of miRNA functions and additional pathways of gene regulation [15].
There are several types of miRNA-mRNA interaction, each leading to a substantial transcriptional repression. Canonical recognition is defined by a perfect pairing of a 6-8-mer subsequence located at the first 7-8 bases of miRNA (seed) and a segment of target mRNA 3 -UTR [12]. The majority of other non-canonical sites do not lead to significant gene downregulation [16,17], however, some specific interactions can be considered as effective as canonical. For example, the 3 -compensatory site, which modulates extensive pairing to the miRNA 3 -region compensates for a wobble or mismatch to one of the seed positions [12,18,19]. Another type of effective non-canonical site is a centered site which represents 11-12 contiguous pairing to the center of the miRNA [20]. Taking into account the complexity of miRNA-mRNA interactions, the alteration of a 3 -UTR can have numerous functional consequences by either introducing or removing miRNA target sequences or changing the binding efficiency, thereby, directly affecting protein expression. Evaluation of such regulatory variants in 3 -UTRs and their potential impact on tissue-specific gene regulation should be performed for individual pathologies that are characterized by aberrant gene expression. For instance, 3 -UTR mutations of DICER1, DROSHA, and other miRNA processing genes were found to be associated with the increased colorectal cancer risk [21], implying the catastrophic consequences of such, often overlooked and underestimated variants. Moreover, impairment of miRNA-mRNA interaction in miRNA processing genes could lead to altered miRNA expression, which in turn affects the miRNA-mRNA interaction even more, leading to a self-inducing disease mechanism.
Myelodysplastic syndrome (MDS) refers to a heterogeneous group of closely related clonal hematopoietic disorders commonly found in the aging population. All of these disorders are associated with ineffective hematopoiesis with one or more peripheral blood cytopenias and characterized by accumulation of somatic mutations [22] along with genome instability and high incidence of secondary cancerogenic genetic events determining frequent transformation of MDS into acute myeloid leukemia (AML) [23]. The set of genetic alterations defines the disease prognosis and the choice of specific therapy. One of the hypotheses on MDS etiology states that mutations in mesenchymal stromal cells (MSCs) can disrupt the microenvironment which triggers MDS initialization. In particular, deletion of the miRNA processing endonuclease DICER1 selectively in mesenchymal osteoprogenitors in murine model induces markedly disordered hematopoiesis with several features of MDS, indicating the role of this gene in mesenchymal "stroma" as a primary regulator of tissue function [24]. Moreover, further investigations demonstrated the downregulated expression of both miRNA processing endonucleases DICER1 and DROSHA in MSCs from myelodysplastic syndrome patient cells compared to normal cells. This observation was accompanied by underlying miRNA repression compared to healthy controls [25]. Recent analysis of MDS clinical data revealed a high mutational burden in DICER1 (54%) and DROSHA (17%), and the detected SNPs were predominantly located in 5 -UTR and 3 -UTR regions [26]. Taken together, these observations suggest that during MDS progression, DICER1 and DROSHA genes are under mutational pressure directed at the dysregulation of miRNA processing function and, as a result, leading to impaired protein expression. The triggering mechanism of such global dysregulation might be due to increased targeting of miRNA or by newly formed miRNA pairing to DICER1 and DROSHA due to the acquired mutations in regulatory regions (mainly in 3 -UTR) during tumor progression. Consequently, the present study is focused on evaluation of 3 -UTR SNPs in DICER1 and DROSHA genes through the prediction of putative miRNA target subsequences and their comparison due to the acquired mutations. SNP prioritization is based on bioinformatics analysis of structural and functional properties of miRNA pairing to the altered and original 3 -UTR loci. We propose to filter out those miRNAs that are not detectable in the normal tissue; here, we used the miRNA profile of MSCs that is specific to the scope of MDS. The proposed pipeline can be further applied to evaluate noncoding SNPs in other genes and tissues of interest.

Data Collection
All known 3 -UTRs polymorphisms of DICER1 and DROSHA were taken from dbSNP (https://www.ncbi.nlm.nih.gov/snp/, accessed on 5 February 2021), as well as mutations discovered in probes with "haematopoietic and lymphoid" histology subtype from COS-MIC database (https://cancer.sanger.ac.uk/cosmic, accessed on 5 February 2021) and from the recent study of these genes in patients with MDS [26]. 3 -UTR sequences were taken from NM_001195573 transcript for DICER1, and from NM_013235 transcript for DROSHA.

"Canonical" Binding Analysis
MicroSNiPer (MPI-IE, Freiburg, Germany) [27] tool interrogates the 3 -UTR and predicts whether a SNP within the target site will disrupt/eliminate or enhance/create a miRNA binding site. It applies the modified FASTA algorithm [28] for finding significant matches between 3 -UTR and miRNAs from miRbase database release v19 (http://www.mirbase.org/, accessed on 17 February 2021) [29]. The alignment between the 3 -UTR and the miRNA requires an uninterrupted match of at least 6 nucleotides from the 5 -end of miRNA which would mimic the canonical seed site. In our analysis, we used the following parameters: a minimal length set to 6 nucleotides and G-U wobbles allowed. A table of miRNAs was derived with their corresponding seeds at the original and mutated 3 -UTR of DROSHA and DICER1.

Energy Evaluation of mRNA-miRNA Duplex
The IntaRNA v2.0 (University of Freiburg, Freiburg, Germany) [30] algorithm implements the RNA energy parameters from the Vienna RNA package v2.* (University of Vienna, Vienna, Austria) [31], enforcing the seed interaction to be energetically favorable. It was used with default parameters (single-site, loop-based RNA-RNA interaction with minimal free energy model, taking only energetically favorable interactions with negative energy, the temperature is set to 37 • C) on the set of original and mutated 3 -UTR subsequences of 50 nucleotides against the miRNAs of the whole miRbase database release v19 [29]. This tool was also used to supply MicroSNiPer results with their duplex energy values: to be consistent, each miRNA predicted by MicroSNiPer was tested to form exactly the same hybrid configuration in IntaRNA. The total number of predicted miRNA-RNA interactions was 74,977 (73,178 from IntaRNA and 1799 from MicroSNiPer). Sites with insufficient binding energy were omitted with the limit of −8.5 kcal/mol [32].

Tissue-Specific miRNA Profiling
The miRNA profile in MSCs was taken from Clark et al., 2014 [33]. From this set, 26 miRNAs derived from human bone marrow MSCs were used as the most abundant miRNAs in MSCs. In a more recent study, miRNAs of bone marrow-derived MSCs were also studied [34,35]. Moreover, it was shown that MSCs and MSCs' exosomal vesicles have, in general, similar miRNA expression [36]. We built the final set of 336 miRNAs from all available publications dealing with the bone marrow-derived MSCs exosomes [35,[37][38][39][40].

Resulting Genes Variant Selection
miRNAs that were shown to be abundant in MSCs were used in the main datasets. Additionally, sets of miRNAs commonly detected in MSCs were used as an additional dataset. Two main parameters were calculated for each variant: the change in the number of miRNAs hybridizing to the 3 -UTR, and the difference in the sum of hybridization energies. Distributions of both parameters were analyzed, and the "statistical outliers" were taken further as the most significant result. The variants that lead to an increase in the number of miRNAs and a decrease in hybridization energy sum are shown in Tables 1 and 2. All analyzed variants, the data of interacting miRNAs and the code can be found at the links provided in the Data Availability Statement. Population allele frequencies were obtained for selected variants from dbSNP, which collects data from multiple databases (ALFA

Building the Pipeline for 3 -UTR SNPs Analysis Considering Tissue-Specific miRNA Expression
General approach to evaluate the effect of 3 -UTR SNPs on miRNA binding is based on prediction of the miRNA-mRNA interaction. Most of the current tools mainly compute the putative binding sites by estimating the strict seed pairing along with site accessibility, conservation, and base pairing stability [41][42][43][44][45][46]. Some methods specialize on prediction of the SNP impact on putative miRNA targets [27,47,48]. In order to build the generalized approach for miRNA target prediction depending on the acquired SNPs, the pipeline should cover the following important features: (1) evaluate both "canonical" sites and "noncanonical" miRNA binding with 3 -UTR, known as effective expression regulators [49], on the basis of the defined set of interaction patterns and hybridization energies [50,51]; (2) analyze not only catalogued variants but also novel SNPs and indels; (3) allow tissuespecific prediction of found miRNAs for altered gene sites, because the miRNA composition varies across tissues [52].
Taking these features into account, we propose the pipeline shown in Figure 1. The Mi-croSNiPer application has been chosen as the only one flexible, customizable tool to analyze any input SNP position in 3 -UTR sequence. This tool interrogates the 3 -UTR and predicts whether a SNP within the target site will disrupt/eliminate or enhance/create a miRNA binding site. MicroSNiPer's algorithm is based on seed region binding in the miRNA as the major criterion for prediction [27]. The important feature of MicroSNiPer is the characterization of both novel SNPs and "in-house" transcripts. In addition to the "seed-oriented" method of "canonical" miRNA target evaluation, the IntaRNA algorithm was incorporated into the proposed pipeline. It represents one of the most widely used state-of-the-art approaches for general RNA-RNA interaction prediction. It enables screening a large target sequence and searching the most energetically favorable interactions, incorporating seed constraints and interaction site accessibility along with enhanced parameterization as well as fully customizable control [30]. The main feature introduced in the proposed pipeline is filtering the variants with the most altering potential for tissue-specific miRNA binding compared with the original sequence. The cutoff value for hybridization energy can be chosen for each RNA-RNA duplex followed by selection of only those interactions with miRNAs that are abundant in specific tissue of interest.
Hereby, the lists of interacting miRNAs were generated for each target allele (original and mutated) of each locus (a subsequence in the vicinity of which the variant occured) for DICER1 and DROSHA by using MicroSNiPer and IntaRNA algorithms. The distribution plot of hybridization energies for all predicted interactions between miRNA and DROSHA 3 -UTR for original and mutated loci (Figure 2) demonstrates slight differences in results obtained by the two methods. Our pipeline covers evaluation of both "canonical" and "non-canonical" types of interactions, considering their thermodynamic significance for miRNA-mRNA hybridization.  "canonical" and "non-canonical" types of interactions, considering their thermodynamic significance for miRNA-mRNA hybridization. Figure 2. The distribution of miRNA-RNA hybridization energy values for all predicted interactions between miRNA and DROSHA 3′-UTR with and without SNPs. MicroSNiPer finds mostly strong canonical seeds compared to IntaRNA. The difference between "canonical" and "non-canonical" interaction energies calculated by IntaRNA is probably due to the higher demand of "non-canonical" seeds for internucleotide bonds.
Only a subset of miRNAs that are found in MSCs was taken into consideration. Interaction sites with insufficient (> −8.5 kcal/mol) hybridization energy were discarded as this energy value was shown to be required for suboptimal duplex formation [32]. Moreover, all interactions were validated to match a pattern of canonical (6-8-mer site covering the first 7-8 bases of miRNA) or non-canonical seeds (3′-compensatory site and centered site, described earlier).
Finally, for each locus, differences in sums of hybridization energies with all interacting miRNAs were calculated, as well as the differences in numbers of interacting miRNAs between original and mutated allele. The "statistical outliers" as the most significant distinctive values identified the variants that probably shift the gene expression. DICER1 and DROSHA were shown to be downregulated in patients with MDS [25], therefore, we focused only on variants that increase the target mRNA affinity for the pool of miRNAs, therefore, decreasing the overall gene expression.

Variants in DROSHA 3′-UTR Predicted to Downregulate Gene Expression
Using our pipeline, we analyzed 211 3′-UTR variants of DROSHA (193 SNPs and 18 indels). Variants that are predicted to downregulate gene expression (showing significant difference in miRNA binding compared to the original sequence) are listed in Table 1. It can be concluded the majority of efficient downregulating variants were predicted by MicroSNiPer having "canonical" seed interactions with found miRNA in MSCs. Nevertheless, the results of both algorithms should be considered side-by-side and complement each other as they are based on different sets of duplexes formed. Figure 2. The distribution of miRNA-RNA hybridization energy values for all predicted interactions between miRNA and DROSHA 3 -UTR with and without SNPs. MicroSNiPer finds mostly strong canonical seeds compared to IntaRNA. The difference between "canonical" and "non-canonical" interaction energies calculated by IntaRNA is probably due to the higher demand of "non-canonical" seeds for internucleotide bonds.
Only a subset of miRNAs that are found in MSCs was taken into consideration. Interaction sites with insufficient (> −8.5 kcal/mol) hybridization energy were discarded as this energy value was shown to be required for suboptimal duplex formation [32]. Moreover, all interactions were validated to match a pattern of canonical (6-8-mer site covering the first 7-8 bases of miRNA) or non-canonical seeds (3 -compensatory site and centered site, described earlier).
Finally, for each locus, differences in sums of hybridization energies with all interacting miRNAs were calculated, as well as the differences in numbers of interacting miRNAs between original and mutated allele. The "statistical outliers" as the most significant distinctive values identified the variants that probably shift the gene expression. DICER1 and DROSHA were shown to be downregulated in patients with MDS [25], therefore, we focused only on variants that increase the target mRNA affinity for the pool of miRNAs, therefore, decreasing the overall gene expression.

Variants in DROSHA 3 -UTR Predicted to Downregulate Gene Expression
Using our pipeline, we analyzed 211 3 -UTR variants of DROSHA (193 SNPs and 18 indels). Variants that are predicted to downregulate gene expression (showing significant difference in miRNA binding compared to the original sequence) are listed in Table 1. It can be concluded the majority of efficient downregulating variants were predicted by Mi-croSNiPer having "canonical" seed interactions with found miRNA in MSCs. Nevertheless, the results of both algorithms should be considered side-by-side and complement each other as they are based on different sets of duplexes formed.

Variants in DICER1 3 -UTR Predicted to Downregulate Gene Expression
The total 959 DICER1 3 -UTR variants (854 SNPs and 105 indels) were analyzed with respect to their impact on the gene expression. Variants with the highest potential to downregulate gene expression are listed in Table 2. These 64 SNPs are likely to influence the DICER1 expression in MSCs. DICER1 has an unusually long 3 -UTR (>4000 nt) [53], which implies high potential for miRNA binding and might indicate the importance of post-transcriptional gene expression regulation. Moreover, it was shown to be targeted by several miRNAs, for example, miR-103/107, miR-192 [54] and miRNAs of let-7 family [55]. Therefore, the variants found to be altering the miRNA binding in MSCs, especially under mutational pressure in cancer progression, might have an impact on gene expression.

Discussion
Advances in high-throughput sequencing technologies have revolutionized the field of medical genetics, providing important insights into the genetic basis of many diseases, and has opened the path to promising preventive, diagnostic, and therapeutic strategies [56]. It is widely accepted that MDS-related mutations occur mainly in genes involved in mRNA splicing, epigenetic regulation, signal transduction, transcription regulation, tumor suppression, and adhesion [57,58]. Recent studies have demonstrated that mutations in miRNA-processing endonucleases DICER1 and DROSHA are associated with MDS progression [26,59]. Impaired DICER1 and DROSHA expression was reported in mesenchymal stromal cells of MDS patients [25]. Taken together the observed downregulation of these genes in some MDS patients [25] and significant prevalence of mutations in their UTRs [26], call for analysis of these variants in the context of DICER1 and DROSHA regulation, specifically during MDS progression.
Due to the important role of miRNAs in gene regulation, identification of their targets is essential, especially in terms of mutational pressure in cancer along with dysregulation of protein expression. Although the experimental approaches to evaluate the functional effect of SNPs in 3 -UTR exist [70][71][72], they are highly demanding in terms of resources and time, and the assessment of all known gene mutations at once is not always feasible. Therefore, the accurate in silico methods for predicting functional consequences of SNPs that are found in 3 -UTRs are necessary to guide further experiments. Here, we present a modified pipeline for analysis of 3 -UTR variants, which takes into account both "canonical' and "non-canonical" miRNA target evaluation as well as tissue-specific miRNA expression. Several previous studies aimed to analyze non-coding regulatory variants and to predict their functional effect on disease progression. For example, Sabina et al., 2015 [50] calculated the minimum free energies by applying each miRNA to the 3 -UTR sequence of the H2AFX gene using the RNAcofold tool from Vienna RNA suite [31]. When the whole 3 -UTR sequence is used in RNAcofold, complementary sites may emerge, establishing intra-molecular loops that can shield the potential binding site. Other oligo-and polynucleotides, as well as the rest of the target RNA sequence may also interfere with potential miRNA-binding sites. Since effects of many possible interactions in the complex cell environment cannot be accurately predicted, we proposed to use two sets of short (50 nt) subsequences of the 3 -UTR sequences in the variant vicinity as a target in our pipeline. The search of expression-changing candidate miRNAs should be based on two main criteria: (1) calculation of thermodynamic properties (implemented in, for instance, the Vienna RNA package v2.* [31], which is incorporated in the IntaRNA [30] algorithm), and (2) classification of targets as effective and ineffective, which may come from definitions of "canonical" and "non-canonical" seeds or from applying deep learning methods [73]. That is why the algorithms of MicroSNiPer (a "canonical" interaction predictor) and IntaRNA (an extended flexible RNA-RNA binding analyzer) were incorporated into the proposed pipeline in order to evaluate a wide spectrum of miRNA targeting sites. Moreover, it should be noted that each tissue possesses its own miRNA profile, and consequently the set of query miRNAs should be limited to miRNAs that are abundant (or at least shown to exist) in the cells of a studied tissue. The present pipeline can be applied for evaluation of other genes targets in various tissues of interest.
In conclusion, we analyzed a set of 3 -UTR variants in DICER1 and DROSHA genes, and, as a result, identified mutations with the highest potential impact on the miRNA binding in MSCs. Population analysis showed that all variants predicted to downregulate DICER1 and DROSHA can be considered as rare (population allele frequency < 0.5%). This resulting list of candidate variants should be studied experimentally to reveal their association with downregulation of gene expression and involvement in MDS progression.

Data Availability Statement:
The data presented in this study are openly available via following links: all analyzed variants are listed in supplementary tables (http://epicenter.1spbgmu.ru/static/ files/all_variants.zip, accessed on 6 July 2021); the raw data of interacting miRNAs with corresponding variants and their hybridization energies were deposited at http://epicenter.1spbgmu.ru/static/ files/json.zip, accessed on 6 July 2021; the code is available online at http://epicenter.1spbgmu.ru/ static/files/DICER1_DROSHA_miRNA_research.zip, accessed on 6 July 2021.