Genome-Wide Screening for Pathogenic Proteins and microRNAs Associated with Parasite–Host Interactions in Trypanosoma brucei

Simple Summary Tsetse flies are blood-sucking insect vectors belonging to the order Diptera and are widely distributed in the areas of sub-Saharan Africa. These vectors can transmit the pathogenic parasite Trypanosoma brucei (T. brucei) which can cause a disease called African trypanosomiasis. Currently, discovering effective therapeutic drugs and vaccines with minor side effects is still one of the ongoing efforts to treat the disease as well as to prevent the epidemic. Genome technology has recently played an important role in uncovering the molecular mechanisms of many vector-borne diseases, which facilitates the identification of potential drug targets. To better understand the pathological contribution of parasite–host interactions to the disease, we studied the genomic and transcriptomic profiles of T. brucei. We identified a panel of pathogenic proteins and microRNAs supported by their molecular functions in T. brucei for the first time. Our study may pave a new avenue for designing preventive and therapeutic strategies to control this insect vector. Abstract Tsetse flies are a type of blood-sucking insect living in diverse locations in sub-Saharan Africa. These insects can transmit the unicellular parasite Trypanosoma brucei (T. brucei) which causes African trypanosomiasis in mammals. There remain huge unmet needs for prevention, early detection, and effective treatments for this disease. Currently, few studies have investigated the molecular mechanisms of parasite–host interactions underlying African trypanosomiasis, mainly due to a lack of understanding of the T. brucei genome. In this study, we dissected the genomic and transcriptomic profiles of T. brucei by annotating the genome and analyzing the gene expression. We found about 5% of T. brucei proteins in the human proteome, while more than 80% of T. brucei protein in other trypanosomes. Sequence alignment analysis showed that 142 protein homologs were shared among T. brucei and mammalian genomes. We identified several novel proteins with pathogenic potential supported by their molecular functions in T. brucei, including 24 RNA-binding proteins and six variant surface glycoproteins. In addition, 26 novel microRNAs were characterized, among which five miRNAs were not found in the mammalian genomes. Topology analysis of the miRNA-gene network revealed three genes (RPS27A, UBA52 and GAPDH) involved in the regulation of critical pathways related to the development of African trypanosomiasis. In conclusion, our work opens a new door to understanding the parasite–host interaction mechanisms by resolving the genome and transcriptome of T. brucei.


Introduction
Tsetse flies are a type of blood-sucking insect that live in diverse locations in sub-Saharan Africa. Unlike many other insect vectors, tsetse flies feed solely on and proliferate

Protein Function Annotation
A proportion of 65.2% (5709/8758) proteins in the current version ( ID:ASM244v1) of the T. brucei genome were annotated with "uncharacterized p "hypothetical protein" (UCP). We applied a two-step strategy to re-annotate teins. We first performed homology searching by sequence alignment, a tradit to predict protein function [11]. Pairwise sequence alignment was conducted 2022) between T. brucei and other animals using BLAST (version 2.13.0) with a value = 1 × 10 −6 . Because some sequence alignment similarities were relatively homology was ambiguous and needed to be checked by secondary structure Next, proteins with homologs not identified by BLAST were then subject to SSE a tool of homology detection based on the secondary structure of proteins. Th of resulted protein homologs were calculated and plotted.

Identification of TbrMam Proteins
We searched for protein homology between T. brucei and mammals, given are the main hosts of trypanosome. Based on proteome annotation details in U tabase, we selected ten well-studied species (homo sapiens, Mus musculus, Rattus n Bos taurus, Canis familiaris, Equus caballus, Cavia porcellus, Oryctolagus cuniculus, Rhesus macaque) as representative mammals, to compare their protein profil brucei proteins using BLAST, respectively. Proteins with BLAST E-values less th

Protein Function Annotation
A proportion of 65.2% (5709/8758) proteins in the current version (Assembly ID:ASM244v1) of the T. brucei genome were annotated with "uncharacterized protein" or "hypothetical protein" (UCP). We applied a two-step strategy to re-annotate these proteins. We first performed homology searching by sequence alignment, a traditional way to predict protein function [11]. Pairwise sequence alignment was conducted (20 June 2022) between T. brucei and other animals using BLAST (version 2.13.0) with a cutoff E-value = 1 × 10 −6 . Because some sequence alignment similarities were relatively low, their homology was ambiguous and needed to be checked by secondary structure similarity. Next, proteins with homologs not identified by BLAST were then subject to SSEalign [17], a tool of homology detection based on the secondary structure of proteins. The statistics of resulted protein homologs were calculated and plotted.

Identification of TbrMam Proteins
We searched for protein homology between T. brucei and mammals, given mammals are the main hosts of trypanosome. Based on proteome annotation details in UniProt database, we selected ten well-studied species (homo sapiens, Mus musculus, Rattus norvegicus, Bos taurus, Canis familiaris, Equus caballus, Cavia porcellus, Oryctolagus cuniculus, Sus scrofa, Rhesus macaque) as representative mammals, to compare their protein profiles with T. brucei proteins using BLAST, respectively. Proteins with BLAST E-values less than 1 × 10 −6 were considered to be present in both T. brucei and mammals. We denoted these shared proteins as TbrMam proteins in this study. Metascape (www.metascape.org) is a web tool that provides a comprehensive gene annotation in the system level for biologists [18].
Functional enrichment analysis of these TbrMam proteins was then performed (10 July 2022) by Metascape to find the enriched molecular pathways and ontology. ShinyGO is an application developed based on R programming and integrated with many genomic and biological pathway resources [19]. Genes that encode the TbrMam proteins were also analyzed by ShinyGO to characterize their genomic specificity.

Proteins Related to Pathogenicity
Trypanosome has evolved various mechanisms for survival in hosts. Three classes of proteins were reported for trypanosome pathogenicity [20] and were scrutinized in this study: (1) RNA-binding protein (RBP): Trypanosome is deficient in specific promoters of proteincoding genes and is highly reliant on RBPs to control its RNA fate [21]. RBPs are critical to post-transcriptional gene regulation. Therefore, characterization of RBP is a key premise for elucidating their pathogenic functions; (2) Variant surface glycoprotein (VSG): VSGs are major surface antigens recognized by the host immune system during trypanosome infection [22]. The modulation of trypanosomes can prevent long-lasting immunity in the host; (3) Phosphatidylinositol 3 Kinase (PI3K): PI3K is a heterodimeric complex composed of a regulatory subunit and a catalytic subunit. PI3K is a lipid kinase that regulates cellular processes such as proliferation, differentiation and survival in the trypanosome. A previous report suggested that this protein is essential for autophagosome formation of trypanosome [23].

miRNA Identification
The miRNAs play crucial roles in various diseases. Some miRNAs were selected to target those "undruggable proteins" and approved for pre-clinical trials [24]. Although many practical difficulties need to be overcome, the miRNA-mediated method showed a possible future treatment for some diseases [25]. This type of RNA is highly conserved among different animals; thus, we could identify novel miRNAs in new animals by homolog searching. The EST-based method has been reported as a standard practice to identify novel miRNAs in the studied species [13,14]. Bowtie is an excellent tool for the alignment of short sequences and RNAfold is an effective tool for predicting RNA structure; thus, these tools and corresponding parameters were also applied in this project. We aligned all known animal miRNAs against EST sequences of T. brucei by tool Bowtie [26] with the toleration mismatch ≤ 2. The frequency of minimum free energy (FMFE) of matched sequence fragments was then predicted by RNAfold [27]. A cutoff FMFE ≤ 0.05 was used to determine the possibility of predicted structures. The remaining sequence fragments were regarded as novel miRNAs in this study. Furthermore, the identified miRNAs were then compared with their homologs in other species, and the homolog distribution was plotted.

Homolog Distribution of Identified miRNAs
Homolog distribution could help the researcher better observe the function of studied molecules. The identified miRNAs were compared with their homologs in other species and the homolog distribution was drawn. Because the mammal is the main host of T. brucei, all known mammalian miRNAs were compared with identified miRNAs in this study. The T. brucei unique miRNAs are then picked out for further analysis.

Network Analysis
The miRWalk [28] and miRTarBase [29] are commonly-used RNA databases including a large number of miRNA function information, the targeted genes of the identified miRNAs were then predicted by these two databases. The Cytoscape is a good tool for drawing the molecule-molecule interaction network. The predicted target genes, as well as the miRNAs, were then used to construct a miRNA-gene network visualized by Cytoscape [30]. To identify reliable biomarkers in the miRNA-gene network, we analyzed the nodes of the Insects 2022, 13, 968 5 of 16 network using the Cytoscape plug-in "NetworkAnalyzer", which is a common toolkit for analyzing the network topology. We selected several critical miRNAs and explored their molecular functions.

Potential Drug Search
In 2021, around 750 human cases of trypanosome-induced diseases were reported by World Health Organization [31]. Currently, available drugs for African trypanosomiasis are relatively unsatisfying. Some drugs have been reported to have high levels of toxicity that resulted in fatal side effects; thus, searching for potential new drugs is an urgent need. We searched (28 August 2022) for potential drugs for African trypanosomiasis by Drugbank database (http://www.drugbank.ca/). The drug names and their targets were also recorded.

Homolog Distribution of Proteins
We compared the proteins of T. brucei with those in other animals by sequence alignment. On one hand, we found 87.6% of the T. brucei proteins had homologs in Trypanosoma equiperdum, while 73.8% of T. brucei proteins shared homology with Trypanosoma vivax (Figure 2A). On the other hand, compared with previous results in other parasites, such as Leishmania [32], the protein homolog percentage of the trypanosome was relatively higher. The higher proportion of protein homologs between trypanosomes indicated they have evolved more similar physiological characteristics in survival.
are relatively unsatisfying. Some drugs have been reported to have high levels of toxicity that resulted in fatal side effects; thus, searching for potential new drugs is an urgent need. We searched (28 August 2022) for potential drugs for African trypanosomiasis by Drugbank database (http://www.drugbank.ca/). The drug names and their targets were also recorded.

Homolog Distribution of Proteins
We compared the proteins of T. brucei with those in other animals by sequence alignment. On one hand, we found 87.6% of the T. brucei proteins had homologs in Trypanosoma equiperdum, while 73.8% of T. brucei proteins shared homology with Trypanosoma vivax (Figure 2A). On the other hand, compared with previous results in other parasites, such as Leishmania [32], the protein homolog percentage of the trypanosome was relatively higher. The higher proportion of protein homologs between trypanosomes indicated they have evolved more similar physiological characteristics in survival.
Because mammal is the main host of T. brucei, we also compared the protein homology between T. brucei and each mammal, respectively. A previous study showed that most vertebrates and invertebrates shared at least 30% protein homologs analyzed by sequence alignment [33]. However, less than 5% of proteins of T. brucei could be found homologs in studied mammals ( Figure 2B). For example, only 4.97% T. brucei proteins had homologs found in homo sapiens (human), while only 4.80% of proteins could be found in Sus scrofa (boar). We further compared the similarity of protein homologs between T. brucei and homo sapiens and found the protein similarities between the two species were mostly enriched in 35~40% ( Figure S1). Most of the protein similarity was below 60%, while a few similarities were higher than 80%. The trypanosome was significantly different from the mammalian host in the proteome aspect, indicating that most proteins in the trypanosome could be used as drug target candidates for the treatment.  Because mammal is the main host of T. brucei, we also compared the protein homology between T. brucei and each mammal, respectively. A previous study showed that most vertebrates and invertebrates shared at least 30% protein homologs analyzed by sequence alignment [33]. However, less than 5% of proteins of T. brucei could be found homologs in studied mammals ( Figure 2B). For example, only 4.97% T. brucei proteins had homologs found in homo sapiens (human), while only 4.80% of proteins could be found in Sus scrofa (boar). We further compared the similarity of protein homologs between T. brucei and homo sapiens and found the protein similarities between the two species were mostly enriched in 35~40% ( Figure S1). Most of the protein similarity was below 60%, while a few similarities were higher than 80%. The trypanosome was significantly different from the mammalian host in the proteome aspect, indicating that most proteins in the trypanosome could be used as drug target candidates for the treatment.

Functional Enrichment of TbrMam Proteins
Mammals are the main hosts of this trypanosome; thus, the protein homologs among T. brucei and ten common mammals were compared by BLAST. We detected a set of 142 proteins present in both T. brucei and all studied mammalian genomes. These 142 proteins formed the TbrMam proteins. Enrichment analysis revealed that the term "Neutrophil degranulation" (R-HSA-6798695) was significantly over-represented in TbrMam proteins with a p-value = 1.48 × 10 −11 in the category of "Reactome Gene Sets" (Table 1). It was reported that neutrophils could enhance early infection of T. brucei [34]; thus, these proteins may function as pathogenic factors that may attack the host immune system. Besides, the term "intracellular protein transport" (GO:0006886) was significantly enriched in Tbr-Mam proteins with a p-value = 5.62 × 10 −11 in the category of "GO Biological Process", which suggested that T. brucei may use these proteins to acquire essential nutrients from mammalian hosts. We further investigated the disease category associated with the TbrMam proteins by Metascape tool to identify which disease was related to African trypanosomiasis. We listed the top 20 enriched diseases in Figure 3, where the term "HIV coinfection" was the most significant associated disease (p-value = 6.3 × 10 −10 ). Interestingly, trypanosome infection is always accompanied by HIV coinfection and the phenomenon is regarded as a clinical event of great relevance [35,36]. Scientists have reported that the presence of an intracellular pathogen (such as trypanosome) could impair HIV-1 transduction in the cellular experiment in vitro [37]. On the other hand, HIV aspartyl peptidase inhibitors could also act on pathogenic trypanosome [38]. We believe that other enriched diseases could also be associated with trypanosome infection, which warrants a future investigation.

Genomic Specificity of TbrMam Genes
We applied ShinyGO to analyze the genomic specificity of T. brucei genes that encode the TbrMam proteins. We compared four aspects: exon number, coding sequence length, GC content and 3′-UTR length ( Figure 4). When statistically comparing exon numbers between TbrMam genes and the rest genes in the genome, a significant p-value (0.07) was obtained by the Chi-squared test. Similarly, a significantly low p-value (0.00065) was calculated when statistically comparing the coding sequence length by Student's t-test. These results indicated that our identified TbrMam genes could share strong transcription characteristics with other genes, which might be involved in the survival of T. brucei in the host. Moreover, an extremely low p-value (0.00067) was observed when statistically comparing 3′-UTR length, while an acceptable p-value (0.041) was obtained when statistically comparing GC content. Because 3′-UTR is the binding region of miRNA, the associated 3′-UTRs of these genes might regulate the gene expression in the trypanosome-host interaction.

Genomic Specificity of TbrMam Genes
We applied ShinyGO to analyze the genomic specificity of T. brucei genes that encode the TbrMam proteins. We compared four aspects: exon number, coding sequence length, GC content and 3 -UTR length ( Figure 4). When statistically comparing exon numbers between TbrMam genes and the rest genes in the genome, a significant p-value (0.07) was obtained by the Chi-squared test. Similarly, a significantly low p-value (0.00065) was calculated when statistically comparing the coding sequence length by Student's t-test. These results indicated that our identified TbrMam genes could share strong transcription characteristics with other genes, which might be involved in the survival of T. brucei in the host. Moreover, an extremely low p-value (0.00067) was observed when statistically comparing 3 -UTR length, while an acceptable p-value (0.041) was obtained when statistically comparing GC content. Because 3 -UTR is the binding region of miRNA, the associated 3 -UTRs of these genes might regulate the gene expression in the trypanosome-host interaction.

Proteins Related to Pathogenicity
Trypanosome has evolved many mechanisms for survival in the host. Three types of proteins were reported for T. brucei pathogenicity ( Table 2). The details of these proteins are discussed as follows.
Variant surface glycoprotein (VSG) is an important protein that has long been thought to be activated by the hydrolysis of its glycolipid membrane anchor [39]. In this study, we identified six novel VSGs in T. brucei by homology searching. All these VSGs showed an e-value = 0 in the sequence alignment, which is the minimum value in BLAST tool. Five VSGs shared an identity > 99%, suggesting that they had almost the same amino acid sequences. These results indicated that our identified VSGs were very reliable in T. brucei.
RNA-binding proteins (RBPs) play a particularly important role in regulating gene expression in trypanosomes [40]. Therefore, characterization of the RNA molecules bound by RBPs represents a key step in elucidating their function. Using sensitive database searching, a set of 24 novel RBPs were identified in T. brucei. Most of these RBPs are highly similar to known RBPs with an alignment identity >90%. Especially, we found that 16 RBPs showed an alignment identity = 100%, meaning that the identified RBPs are exactly the same as known sequences. These results also indicated that RBPs were conserved among different trypanosomes.

Proteins Related to Pathogenicity
Trypanosome has evolved many mechanisms for survival in the host. Three types of proteins were reported for T. brucei pathogenicity ( Table 2). The details of these proteins are discussed as follows.
Variant surface glycoprotein (VSG) is an important protein that has long been thought to be activated by the hydrolysis of its glycolipid membrane anchor [39]. In this study, we identified six novel VSGs in T. brucei by homology searching. All these VSGs showed an e-value = 0 in the sequence alignment, which is the minimum value in BLAST tool. Five VSGs shared an identity > 99%, suggesting that they had almost the same amino acid sequences. These results indicated that our identified VSGs were very reliable in T. brucei.
RNA-binding proteins (RBPs) play a particularly important role in regulating gene expression in trypanosomes [40]. Therefore, characterization of the RNA molecules bound by RBPs represents a key step in elucidating their function. Using sensitive database searching, a set of 24 novel RBPs were identified in T. brucei. Most of these RBPs are highly similar to known RBPs with an alignment identity > 90%. Especially, we found that 16 RBPs showed an alignment identity = 100%, meaning that the identified RBPs are exactly the same as known sequences. These results also indicated that RBPs were conserved among different trypanosomes.
Phosphatidylinositol-3 kinase (PI3K) is one of the most frequently activated pathogenic signaling proteins in human disease [41]. Two PI3Ks were identified in T. brucei by  Phosphatidylinositol-3 kinase (PI3K) is one of the most frequently activated pathogenic signaling proteins in human disease [41]. Two PI3Ks were identified in T. brucei by the sequence alignment method. Multiple sequence alignment of PI3K in various species conducted using Clustal Omega [42] suggested a very low similarity of PI3K between trypanosomes and mammals, while high similarities were observed among trypanosomes ( Figure 5). These results suggest that PI3K might serve as a reliable drug target to control trypanosome.

miRNA Identification
The miRNAs are a type of conserved non-coding RNAs involved in post-translational regulation. Although many practical difficulties need to be overcome, the miRNAmediated method showed a possible future treatment for some diseases [43]. Currently, no trypanosome miRNAs have been reported in the miRBase database; thus, a systemic identification of T. brucei miRNAs was needed. The EST-based method is an effective

miRNA Identification
The miRNAs are a type of conserved non-coding RNAs involved in post-translational regulation. Although many practical difficulties need to be overcome, the miRNA-mediated method showed a possible future treatment for some diseases [43]. Currently, no trypanosome miRNAs have been reported in the miRBase database; thus, a systemic identification of T. brucei miRNAs was needed. The EST-based method is an effective method for identifying novel miRNAs in published genomes [13,14]. By aligning the T. brucei genome to known miRNAs in the miRBase database. Attributes of these miRNAs, including length, EST accession and mismatch are listed in Table 3. The length of these miRNAs ranged from 18nt to 23nt. The mismatch value is an important parameter to discriminate miRNAs from other non-coding RNAs. One miRNA (tbr-miR-466i-5p) showed mismatch = 0, suggesting that this miRNA is identical to known miRNAs (mmu-miR-466i-5p). A set of 15 identified miRNAs showed only one base difference (mismatch = 1) with known miRNAs. The identified 26 miRNAs were categorized into 23 different families based on the miRNA classification. Only three miRNA families (miR-3613, miR-466, miR-551) had more than one member (tbr-miR-3613 and tbr-miR-3613-5p; tbr-miR-466f-3p and tbr-miR-466i-5p; tbr-miR-551-3p and tbr-miR-551b-3p).

Homolog Distribution of Identified miRNAs
T. brucei can cause African trypanosomiasis, which is one of the neglected tropical diseases worldwide [44]. After searching in the miRBase database, 57.7% (15/26) of the T. brucei miRNAs were found to have homologous miRNAs in homo sapiens (human) genome (Table 4). Although T. brucei is a type of protozoan, it had fewer homologous miRNAs with Platyhelminthes (such as Schmidtea mediterranea) than mammals (such as human). Moreover, some miRNAs showed only one homology in other species, meaning that this type of miRNA was rare in animals. A set of 21 miRNAs had homology detected in mammals, while five miRNAs (tbr-miR-2491-3p, tbr-miR-752-3p, tbr-miR-8406-3p, tbr-miR-1599 and tbr-miR-4171-5p) were unique in T. brucei, with no homologs found in mammalian hosts. Interestingly, only one T. brucei miRNA (miR-2491-3p) was identified in tsetse fly but could not be found in human genome. Since tsetse fly is the vector of trypanosome and could transmit T. brucei to human, we hypothesized that this miRNA might be a critical molecule involved in the infection of T. brucei. Table 4. Homolog distribution of identified miRNA when compared to other species. The block "YES" indicated that the homolog of corresponding miRNA was present in the corresponding species. The blue block indicated the mammals, while yellow block indicated the non-mammal species. Abbreviation of the species: bta, Bos Taurus; cli, Columba livia; cin, Ciona intestinalis; cpo, Cavia porcellus; dme, Drosophila melanogaster; gga, Gallus gallus; hsa, Homo sapiens; mmu, Mus musculus; sma, Schistosoma mansoni; sme, Schmidtea mediterranea; ssc, Sus scrofa; tch, Tupaia chinensis.

Mammal
Non-Mammal hsa mmu bta cpo ssc gga sma sme tch dme cli cin

Network Analysis
To explore the regulatory mechanisms of miRNA in T. brucei, we constructed the miRNA-gene network using Cytoscape. The resulting network was composed of 15 miR-NAs and 551 edges ( Figure S2). Most genes were found to co-regulate by over two miRNAs and these genes were supposed to work systematically through co-regulated miRNAs in special biological processes. We then applied toolkit "NetworkAnalyzer" inside Cy-toscape to calculate the betweenness centrality of each edge in the network (Table S1). We further selected the top edges to construct a sub-network ( Figure 6). In the resulting sub-network, several key nodes were found: RPS27A, UBA52 and GAPDH. The RPS27A was reported to regulate proliferation, promote cell progression and inhibit apoptosis of leukemia cells. This gene might act as a controller of microglia activation in triggering neurodegeneration in trypanosome infection [45]. The UBA52 regulates ubiquitination of ribosome and sustains embryonic development, and is essential for the replication of virus-host interaction in chicken [46]. GAPDH is a multifunctional protein present both in eukaryotes and prokaryotes. This molecule has been reported to be correlated with redox status in trypanosome [47]. These results suggest that the identified nodes are critical molecules involved in the infection of T. brucei.
Insects 2022, 13, x FOR PEER REVIEW 13 of 17 status in trypanosome [47]. These results suggest that the identified nodes are critical molecules involved in the infection of T. brucei.

Drug Target Search
By searching potential drugs in Drugbank database [48], we speculated six candidates for treating T. brucei ( Table 5). Some of these drugs are in the experiment period, while others have been approved by FDA for treatment of other diseases, such as Decitabine and Mebendazole. In this study, Decitabine was supposed to target Adenylate kinase (XP_827176.1) in T. brucei. Decitabine is a pyrimidine nucleoside analog used for the treatment of myelodysplastic syndromes by inducing alterations in gene expression [49]. Mebendazole was supposed to target Alpha tubulin (XP_001218940.1) by drug target analysis. Mebendazole is a benzimidazole anthelmintic used to treat parasite infections. Mebendazole acts by interfering with carbohydrate metabolism and inhibiting polymerization of microtubules [50]. We believe these drugs could also be used in the treatment of T. brucei infection.

Drug Target Search
By searching potential drugs in Drugbank database [48], we speculated six candidates for treating T. brucei ( Table 5). Some of these drugs are in the experiment period, while others have been approved by FDA for treatment of other diseases, such as Decitabine and Mebendazole. In this study, Decitabine was supposed to target Adenylate kinase (XP_827176.1) in T. brucei. Decitabine is a pyrimidine nucleoside analog used for the treatment of myelodysplastic syndromes by inducing alterations in gene expression [49]. Mebendazole was supposed to target Alpha tubulin (XP_001218940.1) by drug target analysis. Mebendazole is a benzimidazole anthelmintic used to treat parasite infections. Mebendazole acts by interfering with carbohydrate metabolism and inhibiting polymerization of microtubules [50]. We believe these drugs could also be used in the treatment of T. brucei infection.

Discussion
Tsetse flies are a type of blood-sucking insect that live in diverse locations in sub-Saharan Africa. They are insect vectors transmitting the parasite T. brucei. This parasite can cause a tropical disease called African trypanosomiasis. The treatment for this disease is not satisfying, exploring more detail about T. brucei genome is urgent [2]. With the development of bioinformatics technology, more and more tools have been developed, which offer an unprecedented opportunity for genome-wide identification of proteins and miRNAs in this parasite.
A set of 8758 proteins is reported in the current version ASM244v1 of T. brucei genome, among which 5709 protein functions remain unknown. The proteins of T. brucei and other trypanosomes were compared by sequence alignment method. The protein homolog proportion between two studied trypanosomes is larger than 80%, while the top proportion was found between T. brucei and T. equiperdum. However, when compared with the trypanosome and mammal genomes, the corresponding homolog proportion was extremely low. All the mammals shared less than 5% protein homology with T. brucei, meaning that the trypanosome is highly different from mammal in the genome aspect.
To investigate the possible evolution mechanism in the molecular function of this parasite, the protein homologs shared in T. brucei and mammals were studied. A set of 142 proteins were found to be present in T. brucei and mammalian genomes and these proteins were then denoted as TbrMam proteins. By Metascape enrichment analysis, we found that these TbrMam proteins were significantly enriched in related functions, such as "Cellular responses to stress" and "cell redox homeostasis". This result suggested that T. brucei may use some proteins to acquire and transport essential nutrients from mammalian hosts. Trypanosome lives in the blood of mammal where high oxidative stress is present [51]. Thus, trypanosomes have evolved a strong antioxidant defense system to cope with these stressors. A previous study reported that trypanosomes have evolved a rapid and precise response toward oxidative stress and maintain redox homeostasis [52].
By ShinyGO genomic analysis, we have further found that these TbrMam genes were significantly enriched in four genomic aspects (number of exons, coding sequence length, GC content and 3 -UTR length). In addition, by comparing with proteins in other trypanosomes, we have also identified 32 novel pathogenic proteins (24 RBPs, 6 VSGs and 2 PI3K proteins) in T. brucei. These proteins were shared among different trypanosomes. These pathogenic proteins were supposed to be deeply involved in trypanosome-host interaction and could be used as drug targets in future studies.
Using the EST-based approach, we identified 26 new miRNAs in T. brucei. A set of 15 identified miRNAs showed only one base difference with known miRNAs, indicating that the results of our miRNA identification are highly reliable. By comparison with miRNAs in other animals, we found that human miRNAs shared the greatest number of T. brucei miRNAs, meaning that trypanosome share the similar miRNA evolution mechanism with human. Five miRNAs (tbr-miR-2491-3p, tbr-miR-752-3p, tbr-miR-8406-3p, tbr-miR-1599 and tbr-miR-4171-5p) were found to have no homologs with mammalian hosts. The miR-1599 was previously found specifically in chicken genome. The miR-1599 could bind to Toll-like receptors, which recognize diverse pathogen-associated molecular patterns and play a critical role in the immune response [53].
We then constructed a miRNA-gene network by Cytoscape. Network topological analysis suggested that several nodes (RPS27A, UBA52 and GAPDH) are located in the center of the network. These central nodes might serve as drug targets for trypanosome infection. Previous studies suggested that trypanosome depended entirely on glycolysis for the acquisition of energy supply [4]. Therefore, targeting its special energy metabolism is one of effective strategies to control trypanosome [54]. Among glycolysis, the GAPDH is the most important enzyme in glycolysis reaction. The GAPDH was suggested as a potential drug target and was validated in the previous study [55]. Inhibitors of trypanosome GAPDH may represent promising lead structures for the design of innovative anti-trypanosome agents.

Conclusions
We applied a series of bioinformatics tools to study the protein-coding genes and miRNAs in T. brucei. Through comparative analyses of T. brucei and mammalian genomes, we identified several novel proteins and miRNAs, which could be used as novel biomarkers for studying parasite-host interactions. These molecules could be applied as effective targets for future drug development of African trypanosomiasis.

Supplementary Materials:
The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/insects13110968/s1, Figure S1: The percentage of protein similarity between T. brucei and human; Figure S2: Full view of miRNA-gene interaction network; Table S1: TbrMam protein list; Table S2: betweenness centrality of edge in miRNA-gene interaction network.  Data Availability Statement: All data are available from the NCBI genome database (gene), UniProt database (protein) and miRBase database (miRNA).