Bioinformatics Analysis of Evolution and Human Disease Related Transposable Element-Derived microRNAs

Transposable element (TE) has the ability to insert into certain parts of the genome, and due to this event, it is possible for TEs to generate new factors and one of these factors are microRNAs (miRNA). miRNAs are non-coding RNAs made up of 19 to 24 nucleotides and numerous miRNAs are derived from TE. In this study, to support general knowledge on TE and miRNAs derived from TE, several bioinformatics tools and databases were used to analyze miRNAs derived from TE in two aspects: evolution and human disease. The distribution of TEs in diverse species presents that almost half of the genome is covered with TE in mammalians and less than a half in other vertebrates and invertebrates. Based on selected evolution-related miRNAs studies, a total of 51 miRNAs derived from TE were found and analyzed. For the human disease-related miRNAs, total of 34 miRNAs derived from TE were organized from the previous studies. In summary, abundant miRNAs derived from TE are found, however, the function of miRNAs derived from TE is not informed either. Therefore, this study provides theoretical understanding of miRNAs derived from TE by using various bioinformatics tools.

In 1993, miRNA was first found in Caenorhabditis elegans (C. elegans) and it is comprised with 19 to 24 nucleotides of non-coding RNAs which is associated with gene regulations by targeting mRNAs for cleavage or translational inhibition [42][43][44][45]. The essential function of miRNA is correlated with oncogenesis, immunity, developments, and cell differentiations. Generally, 3 untranslated region (UTR) is the miRNA binding sites of target mRNA. miRNA recognizes the complementary binding sites of target gene for its seed regions which is approximately 6 nucleotides long in miRNA. Previously, numerous studies have provided the evidence of identification on miRNA derived from TE [30,31,33,34,36] and some of the miRNAs derived from TE had a strong correlation with human disease [46][47][48][49][50][51][52] as well as evolution [53][54][55][56][57]. Considering the many substantial aspects of TE, miRNAs derived from TE mimic the functions of TE.
Bioinformatics tools are useful for the initial steps before starting the experiments, which is what understanding the primary information on what the study will be about. In the case of TE, analyzing the sequences and predicting structure of TE is important considering the function of TE. A TE-based database called 'Repeatmasker' provides the proportion of each type of TEs among diverse species [15]. Additionally, there are more of TE related bioinformatics tools and databases that determines which TE has merged into the target sequences, however, TE-based databases and programs are still limited [58][59][60][61]. In contrast with TE, numerous miRNA-based databases provide basic information about miRNA, miRNA related cancer, target genes, TFs, and so on [60,[62][63][64][65][66][67].
In this study, evolution as well as human disease-related miRNAs derived from TE were examined through published research papers. Those determined miRNAs derived from TE were analyzed by using several bioinformatics tools to provide fundamental information of miRNAs derived from TE.

Bioinformatics Analysis of Transposable Elements
The distribution of TEs in various species (human, chimpanzee, gorilla, orangutan, gibbon, macaque, rhesus, marmoset, mouse, horse, cow, cat, dog, chicken, zebrafish, and C. elegans) were verified by RepeatMasker Genomic Datasets [15]. The species that are examined for whole genome sequencing are listed in both UCSC genome browser and RepeatMasker [15,60]. Table 1 shows the percentage of which TE is or not include in the genome. After the name of the superfamily element such as SINE, LINE, and long terminal repeats (LTR), the specific name of the element is given. The 'other' after the superfamily elements are representing the unspecified elements. The primates, including humans and the other seven of the species, are reasonably chosen to present the percentage of each TEs for evolutionary aspects.
Chicken had the lowest percentage of TEs in the genome (9.3%), and orangutan had the highest percentage of TEs in the genome (48.5%). From primates to mammalians (human, chimpanzee, gorilla, orangutan, gibbon, macaque, rhesus, marmoset, mouse, horse, cow, cat, and dog), variation of TE is well spread out in each species genome, excluding SVA element. SVA element was exclusive in humans. C. elegans is another species with the lowest percentage of TE in the genome (9%) and few of the TEs were included (LINE-CR1, LTE-other, DNA-TcMar, DNA-hAT, and DNA-other). Zebrafish is representing species of fish in this table and zebrafish contains the highest percentage of DNA transposons-other (19.6%).

Selection of Microrna Related Papers and Bioinformatic Analyses of Transposable Element-Derived microRNAs
miRNAs related with keywords of 'evolution and primates' and 'human disease and cancer' were searched from National Center for Biotechnology Information (NCBI)-PubMed database [68] and google scholar [69] (Figure 1). Each paper contained numerous miRNAs and the information of miRNAs were examined from miRbase v22.1 (http://www.mirbase.org) [66]. Then each miRNA was localized in human genome (GRCh38) by UCSC Genome Browser (http://genome.ucsc.edu) [60].
Life 2020, 10, x FOR PEER REVIEW 4 of 15

Selection of Microrna Related Papers and Bioinformatic Analyses of Transposable Element-Derived microRNAs
miRNAs related with keywords of 'evolution and primates' and 'human disease and cancer' were searched from National Center for Biotechnology Information (NCBI)-PubMed database [68] and google scholar [69] (Figure 1). Each paper contained numerous miRNAs and the information of miRNAs were examined from miRbase v22.1 (http://www.mirbase.org) [66]. Then each miRNA was localized in human genome (GRCh38) by UCSC Genome Browser (http://genome.ucsc.edu) [60]. To determine miRNAs derived from TE from human disease and cancer and evolution and primate-related miRNAs, a total of 41 papers were selected from NCBI-PubMed and google scholar with 31 studies on human disease and cancer and 10 studies on evolution and primate-related miRNAs. MiRNAs derived from TE are fully and partially derived from TE, and some of the miRNAs derived from TE share more than one TEs in the sequence.

Bioinformatic Analyses of Evolution Related Transposable Element-Derived microRNAs
The evolution and primate related miRNAs derived from TE from 10 studies were then localized in UCSC genome browser to check the location in the human genome and which type of TE that miRNAs are derived from ( Table 2). From a total of 51 miRNAs derived from TE related with evolution and primates, 16 miRNAs were derived from LINE family, 19 from SINE, 3 from LTR, and 15 miRNAs were derived from DNA transposon. Ten of the miRNAs derived from TE were derived from more than one TEs, and interestingly, hsa-miR-548a-2 and hsa-miR-619 are derived from different types of TEs. For instance, hsa-miR-548a-2 is derived from two LTR16A2 at the terminal of To determine miRNAs derived from TE from human disease and cancer and evolution and primate-related miRNAs, a total of 41 papers were selected from NCBI-PubMed and google scholar with 31 studies on human disease and cancer and 10 studies on evolution and primate-related miRNAs. MiRNAs derived from TE are fully and partially derived from TE, and some of the miRNAs derived from TE share more than one TEs in the sequence.

Bioinformatic Analyses of Evolution Related Transposable Element-Derived microRNAs
The evolution and primate related miRNAs derived from TE from 10 studies were then localized in UCSC genome browser to check the location in the human genome and which type of TE that miRNAs are derived from (Table 2). From a total of 51 miRNAs derived from TE related with evolution and primates, 16 miRNAs were derived from LINE family, 19 from SINE, 3 from LTR, and 15 miRNAs were derived from DNA transposon. Ten of the miRNAs derived from TE were derived from more than one TEs, and interestingly, hsa-miR-548a-2 and hsa-miR-619 are derived from different types of TEs. For instance, hsa-miR-548a-2 is derived from two LTR16A2 at the terminal of miRNA and one DNA transposon MADE1 is in the middle and hsa-miR-619 has one of each L1MC4 and AluSz6. From a total of 51 evolution and primate-related miRNAs derived from TE, 21 of miRNAs derived from TE with different type of TEs were chosen for further bioinformatics analysis. The evolution and primate related miRNAs derived from TE were analyzed by ECR browser to briefly check the conservation in chimpanzee, rhesus, mouse, cow, dog, chicken, and zebrafish [72]. Additionally, the structure of each 21 miRNAs derived from TE were predicted by RNAfold webserver which generates the structure of minimum free energy (MFE) contributed by secondary structure of RNA sequences [73]. The strong base-pairing probability shows in color red with value close to 1 and weak base-pairing probability shows in color blue with value close to 0.

Bioinformatic Analyses of Human Diseases Related Transposable Element-Derived microRNAs
The human disease and cancer related miRNAs derived from TE from 31 studies were then localized in the UCSC genome browser to check the location in the human genome and which type of TE that miRNAs are derived from ( Table 3). As mentioned previously, miRNAs derived from TE are not only derived from one TE, however, it could be derived from more than one TE with different families. From a total of 34 human diseases and cancer-related miRNAs derived from TE, 16 miRNAs The secondary structure of 21 of evolution and primate related miRNAs derived from TE were predicted by RNAfold webserver [73]. Almost all the MFE structure of miRNAs derived from TE had strong base-pairing MFE values, with the exception of hsa-miRNA-1202 which shows weakest MFE structure.

Discussion
Bioinformatics tools are useful and important when not much information is provided or studied for the target subjects. Numerous bioinformatics tools are provided online, and are ready to be used right away or downloaded. There are several bioinformatics tools of miRNAs, however, TE related bioinformatics tools are still insufficient. By using bioinformatics database related with TE, the distribution of TE has been modified (Table 1) [15]. The distribution of TE is highly scattered in the genome of most of the species. In the evolutionary aspects on distribution of TE, SVA element is exclusive in humans only. Alu element from SINE is a primate and mouse specific element excluding few mammalians (horse, cow, cat, and dog), chicken, zebrafish, and C. elegans. From LINE, the proportion of CR1 element is very low amongst all of mammalians, fish, and C. elegans, however, one study provided the evidence that CR1 element is moderately scattered in avian, crocodilian, turtle, and lepidosaurian, also known as diapsid reptiles [84]. The distribution of ERVK from LTR element presents in primates until rhesus monkey, mouse and cow amongst the mammalians. Most of the ERVK studies are performed in primates, thus mouse shows highest of percentages of ERVK element among all the species, and one study mentioned that human and mouse contain numerous LTR-derived TFBS which contributes in other TFs to bind, and they did not mention the reason why mouse has a high percentage of ERVK element, yet it might be due to embryonic stem cells of mice [85][86][87].
Approximately half of the genome is covered in TEs for mammalians and zebrafish, and over 10 percent for chicken and C. elegans and these highly distributed TEs are capable of generating miRNAs and TFBSs [30,31,34,87]. Based on the miRNA studies, miRNAs derived from TE were filtered into two types, primate and evolution and human disease and cancer. Table 2 shows 51 of primate and evolution related miRNAs derived from TE and Table 3 shows 34 of human disease and cancer-related miRNAs derived from TE. First, to analyze miRNAs derived from TE related in primates and evolution, ECR browser was used to check the conservation on few of selected miRNAs derived from TE. The conservation is influential to miRNAs derived from TE related in primates and evolution due to selecting the target species or samples before going into the actual experiments. Previous studies checked the conservation of each target miRNAs they found to applicate them on primate and evolution related miRNAs [54,70,88]. ECR browsers are used to predict the conservation of the target gene, miRNA, or the specific region of the genome. The conservation on few of the selected primate and evolution-related miRNAs derived from TE show conservation well until mammalians, however, few of miRNAs derived from TE are not randomly conserved ( Figure 2). To examine the conservation precisely, the sequence of each miRNA is needed to be downloaded from each species. TargetScan database provides the sequences of conserved miRNAs in the target genes and this method is more accurate than the prediction from ECR browser [89,90]. The RNAfold result predicts the strongness of base pairing as well as the MFE value of the miRNAs by the colors. The miRNA with the weakest structure is predicted as miRNA-1202.
The human disease and cancer related miRNAs derived from TE were analyzed with TFs. TransmiR database provides the information on TFBS that regulates or correlates with miRNAs. First, the examination of all 34 human disease and cancer related miRNAs derived from TE from Table 3 were analyzed to check the correlation between miRNAs derived from TE and TFs by human disease and cancer provided from TransmiR database (Figure 3). Four miRNAs derived from TE were found from three human disease and cancers provided from TransmiR. Other miRNA and TF studies used TransmiR to predict which TFs that their target miRNA is targeting or correlate together and applicate them on further bioinformatics analysis or experiments [91,92]. In addition, few of TFs were determined on human disease and cancer related miRNAs derived from TE. As shown in Figure 4, some miRNAs derived from TE interact with numerous TFs and on the other hand, some miRNAs derived from TE interact with few TFs. The study of miRNA-548m and MYC supported the data of TransmiR based on the result of stroma-inducing miRNA-549m inhibition leads to the c-Myc overexpression [93]. By using TransmiR databases, the hypothesis was suggested that enhancer activity of miRNAs derived from TE is increased by TFs, and the report actually mentioned that the enhancer activity of miRNAs derived from TE OF-miRNA-307 might induced by the TFs near OF-miRNA-307 binds in 3 UTR of target gene [94].
The aim of this study was to introduce the basic bioinformatics tools used for TE and miRNAs derived from TE studies. The evolution and human disease-related miRNAs derived from TE were identified by published papers and they were analyzed with bioinformatics tools. Abundant miRNAs were derived from TEs and they have a close relation with primate and evolution and human disease and cancer. Here, fundamental information of miRNAs derived from TE by using several of the bioinformatics tools are analyzed.