Genetic Diversity and Population Genetic Structure Analysis of Plasmodium knowlesi Thrombospondin-Related Apical Merozoite Protein (TRAMP) in Clinical Samples

The simian malaria parasite Plasmodium knowlesi causes a high number of zoonotic infections in Malaysia. The thrombospondin-related apical merozoite protein (TRAMP) is an essential ligand for binding to the erythrocyte cell surface, whereby it facilitates the invasion. This study is the first attempt to determine the genetic diversity, phylogeography, natural selection and population structure from 97 full-length PkTRAMP gene sequences originating from Malaysia. We found low levels of nucleotide diversity (π~0.0065) for the full-length gene despite samples originating from geographically separated regions (i.e., Peninsular Malaysia and Malaysian Borneo). The rate of synonymous substitutions was significantly higher than that of non-synonymous substitutions, indicating a purifying selection for the full-length gene within the clinical samples. The population genetic analysis revealed that the parasite population is undergoing a significant population expansion. The analysis of the amino acid sequence alignment of 97 PkTRAMP sequences identified 15 haplotypes, of which a major shared haplotype was noted Hap 1 (n = 68, Sarawak; n = 34, Sabah; n = 12, Peninsular Malaysia; n = 22). The phylogenetic analysis using DNA sequences identified two clusters that separated due to geographical distance and three mixed clusters with samples from both Peninsular Malaysia and Malaysian Borneo. Population structure analyses indicated two distinct sub-populations (K = 2). Our findings point to the potential for independent parasite evolution, which could make zoonotic malaria control and elimination even more challenging.


Introduction
Malaria represents one of the most fatal diseases in the developing world, occurring mainly in tropical and sub-tropical regions. Despite considerable advances in our understanding of the disease, malaria continues to be one of the most serious threats to public health, globally contributing to a high morbidity and mortality rate. According to the World Malaria Report 2022, there were an estimated 241 million malaria cases and 627,000 deaths in 2020. Notably, as per the report, there was a significant rise in malaria incidences during the COVID-19 pandemic, thereby affecting various malaria control strategies [1]. Therefore, newer control measures are urgently needed to combat the disease. Among the five Plasmodium spp. known to cause human malaria, P. knowlesi, a simian malaria parasite, emerged as a major concern in Southeast Asian countries [2,3]; specifically, a high number of cases are being reported from Malaysian Borneo (Sabah and Sarawak) and Peninsular Malaysia [4], thus resulting in malaria researchers continuing to develop effective control measures such as the identification of potential malaria vaccine candidates. Previous P. knowlesi whole-genome and genetic studies from clinical samples of Sarawak and Malaysian Borneo revealed Macaca fascicularis and Macaca nemestrina to be the primary monkey hosts [5,6]. In vitro studies have revealed that the parasite can adapt to both human and macaque RBCs [7]. P. knowlesi is known to share microscopic similarities with P. falciparum, P. malariae and P. vivax [8]. Physiological symptoms of P. knowlesi infections include diarrhea, vomiting and multi-organ failure, such as kidney failure, and in extreme cases may even lead to death [9]. P. knowlesi is an ideal parasite model for malaria research because of its ability to invade primate, i.e., Rhesus macaque, red blood cells (RBCs) [10]. The erythrocytic life-cycle of P. knowlesi is very short, being completed in 24 h, a process that otherwise takes 48-72 h in other P. species [11]. These rapid developments of parasitemia in the host makes P. knowlesi even more life-threatening. The P. genome sequencing makes it possible to detect new proteins that have important functions in the parasite life-cycle. Among them are the host-specific cell invasion proteins that trigger the parasite to invade the RBCs and are therefore viewed as important malaria vaccine targets [12]. The invasion of malaria parasites inside the host RBCs is a complex process involving molecular interactions between the parasite ligand and host cell receptor [13]. However, for vaccine development and efficacy, antigenic variation among field isolates is a major challenge. Various genome sequencing studies from clinical samples of P. knowlesi exhibited extensive genomic dimorphism [6]. Although various erythrocytic-stage antigens of P. knowlesi were studied from clinical samples, e.g., Duffy binding protein (DBP) and several merozoite surface proteins (MSP-1, MSP-1 paralog and MSP-3), none of them were found to be under a strong positive natural selection [5,14]. Therefore, for any merozoite surface protein, it is important to determine the level of polymorphisms, the type of natural selection and its importance, in order to study it as a vaccine candidate. Apicomplexan parasites possess a highly conserved region known as the thrombospondin structural homology repeats (TSR), which plays a major role in host cell invasion. In Plasmodium spp., the TSR has been identified and is widely characterized for the thrombospondin-related adhesive protein (TRAP) gene, which is involved in sporozoite gliding motility and mosquito salivary gland invasion. One such important TSR-containing gene is the thrombospondin-related apical merozoite protein (TRAMP), which binds to the host cell surface and increases the binding affinity of the host-ligand association [15]. Located in the merozoites, TRAMP facilitates the entry of parasites into RBCs [16]. P. falciparum TRAMP is a protein that is present in the organelles present at the apical portion of merozoites within intra-erythrocytic schizonts [17]. Orthologs of this gene have been reported in other P. spp., showing the specific conserved function of host cell invasion by the TRAMP gene. Similarly, the orthologs of Pf TRAMP have also been observed in P. vivax. All this information on the TRAMP gene makes it an ideal malaria vaccine candidate, and yet no prior study has been conducted on the genetic diversity of the P. knowlesi TRAMP gene. Therefore, the present study aimed to compare the PkTRAMP gene to its orthologs, i.e., P. vivax, P. falciparum and P. coatneyi, and to determine the genetic diversity, natural selection, population structure and phylogeographic analysis of a full-length PkTRAMP gene obtained from 97 clinical samples originating from 3 regions of Malaysia, i.e., Sarawak, Sabah from Malaysian Borneo and Peninsular Malaysia.

PkTRAMP Sequence Data
Ninety-seven full length PkTRAMP gene sequences originating from Peninsular Malaysia (n = 42) and Malaysian Borneo (n =55; Betong (n = 12), Kapit (n =21), Sarikei (n = 9) and Sabah (n = 13)) were retrieved from public databases (https://www.ncbi.nml.nih.gov (accessed on 23 June 2022)) and (https://www.ebi.ac.uk/ena/browser/home (accessed on 23 June 2022)) from clinical samples (along with the H_strain, PKNH_1437600). A map representing the various geographical regions of the samples used in this study is depicted in Figure 1 and Supplementary Table S1. Signal peptide for the full-length PkTRAMP gene was identified using Signal IP 5.0 prediction software [18]. Alignments of sequence data were performed using the CLUSTAL-W program in MegAlign Lasergene v 7.0 (DNASTAR, Madison, WI, USA) software. To determine the relationship among the PkTRAMP sequences (clinical samples from Malaysian Borneo and Peninsular Malaysia), a phylogenetic analysis was performed using nucleotide sequences by using the Maximum Likelihood (ML) method with 1000 bootstraps in MEGA 5.0 software [19]. To determine the evolutionary relationships, TRAMP ortholog sequences in P. falciparum (PF3D7_1218000), P. vivax (PVX_123575), P. coatneyi (PCOAH_00055120) and P. knowlesi H-strain (PKNH_1437600) were also included.

PkTRAMP Sequence Data
Ninety-seven full length PkTRAMP gene sequences originating from Peninsular Malaysia (n = 42) and Malaysian Borneo (n =55; Betong (n = 12), Kapit (n =21), Sarikei (n = 9) and Sabah (n = 13)) were retrieved from public databases (https://www.ncbi.nml.nih.gov (accessed on 23rd June 2022)) and (https://www.ebi.ac.uk/ena/browser/home (accessed on 23rd June 2022)) from clinical samples (along with the H_strain, PKNH_1437600). A map representing the various geographical regions of the samples used in this study is depicted in Figure 1 and Supplementary Table S1. Signal peptide for the full-length PkTRAMP gene was identified using Signal IP 5.0 prediction software [18]. Alignments of sequence data were performed using the CLUSTAL-W program in MegAlign Lasergene v 7.0 (DNASTAR, Madison, WI, USA) software. To determine the relationship among the PkTRAMP sequences (clinical samples from Malaysian Borneo and Peninsular Malaysia), a phylogenetic analysis was performed using nucleotide sequences by using the Maximum Likelihood (ML) method with 1000 bootstraps in MEGA 5.0 software [19]. To determine the evolutionary relationships, TRAMP ortholog sequences in P. falciparum (PF3D7_1218000), P. vivax (PVX_123575), P. coatneyi (PCOAH_00055120) and P. knowlesi H-strain (PKNH_1437600) were also included.

Sequence Diversity and Natural Selection
DnaSP v5.10 software was used to study the sequence diversity (π) [20]. The software was also used to evaluate the number of polymorphic sites, synonymous substitutions (silent mutations) and nonsynonymous substitutions (replacement change), singletons (a variant of nucleotide that appears once in sequence), the number of haplotypes (H), parsimony informative sites (sites with a minimum of two nucleotides that appear at least twice) and haplotype diversity for the full-length PfTRAMP gene [20] along with the reference strain H_strain, PKNH_1437600. Nucleotide diversity was graphically represented using a window length of 80 bp and a step size of 15 bp in DnaSP software. Polymorphism and phylogenetic analyses were performed in MEGA 5.0 software [19].
Furthermore, to determine natural selection, Nei and Gojobori's method was used to compute the rate of nonsynonymous substitutions per nonsynonymous site (dN) and the rate of synonymous substitutions per synonymous site (dS). In addition to this, to calculate the natural selection, other analyses such as Tajima's D, Fu & Li's D* and F* neutrality tests were also conducted using the DnaSP v5.10 software. Under neutrality, Tajima's D value is expected to be zero. A significant and positive value of Tajima's D represents

Sequence Diversity and Natural Selection
DnaSP v5.10 software was used to study the sequence diversity (π) [20]. The software was also used to evaluate the number of polymorphic sites, synonymous substitutions (silent mutations) and nonsynonymous substitutions (replacement change), singletons (a variant of nucleotide that appears once in sequence), the number of haplotypes (H), parsimony informative sites (sites with a minimum of two nucleotides that appear at least twice) and haplotype diversity for the full-length Pf TRAMP gene [20] along with the reference strain H_strain, PKNH_1437600. Nucleotide diversity was graphically represented using a window length of 80 bp and a step size of 15 bp in DnaSP software. Polymorphism and phylogenetic analyses were performed in MEGA 5.0 software [19].
Furthermore, to determine natural selection, Nei and Gojobori's method was used to compute the rate of nonsynonymous substitutions per nonsynonymous site (dN) and the rate of synonymous substitutions per synonymous site (dS). In addition to this, to calculate the natural selection, other analyses such as Tajima's D, Fu & Li's D* and F* neutrality tests were also conducted using the DnaSP v5.10 software. Under neutrality, Tajima's D value is expected to be zero. A significant and positive value of Tajima's D represents positive/balancing selection, while a negative value indicates negative selection or population expansion. Furthermore, DnaSP software was also used to represent the graphical illustration of Tajima

Population Structure and Genetic Differentiation Analysis
To estimate the population structure of P. kTRAMP, STRUCTURE v 2.3.4 software was used [21]. An admixture model was selected within the software to determine the most likely number of populations (K). For each population, K was set from 1 to 10 and a single run was conducted with 15 iterations. A Markov Chain Monte Carlo generation of 500,000 was used for each run following a burn-in period of 50,000 steps. The STRUCTURE Harvester [22] online server was used to estimate the most likely K-value by computing ∆K values that were based on the rate of change in log probability (LnP(D)) between consecutive K-values. ARLEQUIN version 3.5.1.3 software (University of Berne, Bern, Switzerland) [23] was further used to compute the pair-wise differences (F ST ) between P. knowlesi parasite populations originating from Peninsular Malaysia and Malaysian Borneo. F ST values can be interpreted as no genetic differentiation when the value is 0, lower when <0-0.05, moderate when 0.05-0.15, and high differentiation when F ST is 0.15-0.25.

PkTRAMP Sequence Identity within Ortholog members and Diversity within P. knowlesi Population
The signal peptide of the PkTRAMP protein was identified between amino acid positions 1 and 20, as shown in Supplementary Figure S1. The analysis of amino acid sequences revealed that the P. knowlesi TRAMP (reference H strain) protein has 90.29%, 88.24% and 61.95% sequence identities with its orthologs in P. coatneyi, P. vivax and P. falciparum, respectively. The schematic representations of the TRAMP gene structure and inter-species amino acid polymorphism were observed in the thrombospondin structural homology repeat (TSR) domain and transmembrane domain (TM) and are shown in Figure 2A. The inter-species phylogenetic relationships between TRAMP orthologs are shown in Figure 2B.

Nucleotide Diversity and Polymorphisms
The sequence analysis of the full-length PkTRAMP gene (n = 97) identified 53 polymorphic sites, of which 36 were synonymous and 17 were nonsynonymous substitutions sites. Among these, 23 sites were from two variants, 69 haplotypes with a haplotype diversity 0.99 ± 0.003 (Table 1). The overall nucleotide diversity of 97 full-length PkTRAMP was found to be π = 0.00652 ± 0.00028 (Table 1). The region-wise analysis of 97 clinical samples, i.e., Peninsula Malaysia (n = 42), Sarawak (n = 42) and Sabah (n = 13), is shown in Table 1. The nucleotide diversity of Peninsular Malaysia (π = 0.00662 ± 0.00035) is found to be greater than that of Sabah (π = 0.00615 ± 0.00051) and Sarawak (π = 0.00457 ± 0.00029) ( Table 1). The haplotype diversity values of Peninsular Malaysia, Sabah and Sarawak are found to be in the same range (Table 1). In these three locations, the rate of synonymous substitutions is found to be higher. The higher number of singleton sites within TRAMP gene samples originating from Peninsular Malaysia and Sarawak has led to a higher amount of haplotype diversity and number of haplotypes, as shown in Table 1.

Natural Selection in PkTRAMP
To determine whether natural selection contributes to polymorphism in the PkTRAMP full-length gene, multiple tests were conducted at the intra-species level. The full-length gene analysis of 97 sequences showed negative values for dS-dN, Tajimas's D and Fu & Li D* and F*, indicating a purifying selection and population expansion ( Table 1). The values obtained for Fu & Li D* are significant for the overall samples and Sarawak samples (Table 1). A graphical representation of the nucleotide diversity and Tajima D values, indicating regions of high diversity (showing peaks) within the gene, is shown in Figure 4A,B.

Natural Selection in PkTRAMP
To determine whether natural selection contributes to polymorphism in the PkTRAMP full-length gene, multiple tests were conducted at the intra-species level. The full-length gene analysis of 97 sequences showed negative values for dS-dN, Tajimas's D and Fu & Li D* and F*, indicating a purifying selection and population expansion ( Table  1). The values obtained for Fu & Li D* are significant for the overall samples and Sarawak samples (Table 1) Figure 4A,B.

Phylogenetic Analysis
The phylogenetic analysis of 97 full-length PkTRAMP nucleotide sequences using the ML method showed an intermixed clustering of samples along with some pure clusters that constituted samples originating from a single region. The clusters were named as pure cluster and mixed cluster. Three pure clusters were identified, of which two included samples from Peninsular Malaysia (pure clusters 1 & 2) and one from Malaysian Borneo (pure cluster 3), as were three mixed clusters, which constituted samples from different regions of Malaysia ( Figure 5). Overall, the phylogenetic tree indicated the presence of two types of parasite populations.

Phylogenetic Analysis
The phylogenetic analysis of 97 full-length PkTRAMP nucleotide sequences using the ML method showed an intermixed clustering of samples along with some pure clusters that constituted samples originating from a single region. The clusters were named as pure cluster and mixed cluster. Three pure clusters were identified, of which two included

Genetic Differentiation and Population Structure
The population structure of the three distinct geographical regions was estimated using pair-wise genetic differentiation (FST) values using ARLEQUIN software version 3.5.1.3. A moderate level of genetic differentiation was observed between samples from Sarawak and Sabah (FST = 0.01146; p < 0.05), followed by Peninsular Malaysia and Sabah (FST = 0.00045; p < 0.05), as shown in Table 2. Significant differentiation values were observed between the considered regions. Bayesian inference produced the two most significant genetic populations based on their significant K value (K = 2), indicated in red and green colors ( Figure 6A) and the corresponding ∆K =730 values are shown in ( Figure 6B)

Genetic Differentiation and Population Structure
The population structure of the three distinct geographical regions was estimated using pair-wise genetic differentiation (F ST ) values using ARLEQUIN software version 3.5.1.3. A moderate level of genetic differentiation was observed between samples from Sarawak and Sabah (F ST = 0.01146; p < 0.05), followed by Peninsular Malaysia and Sabah (F ST = 0.00045; p < 0.05), as shown in Table 2. Significant differentiation values were observed between the considered regions. Bayesian inference produced the two most significant genetic populations based on their significant K value (K = 2), indicated in red and green colors ( Figure 6A) and the corresponding ∆K =730 values are shown in ( Figure 6B).

Discussion
Located at the merozoite surface, blood-stage antigens play a significant role in the parasite invasion of the host RBC membrane. During merozoite egress from the liver cells, these antigens, being directly exposed to host defensive immunity, represent an important vaccine candidate. In the context of P. knowlesi, no study has been conducted to genetically characterize the PkTRAMP gene. This study is perhaps a preliminary attempt, involving the comparative analysis of the P. knowlesi TRAMP gene with its orthologous sequences, which are P. falciparum, P. vivax and P. coatneyi, in order to understand the antigenic variability, natural selection and population structure in clinical samples from Malaysia. P. knowlesi protein PkTRAMP contains Thrombospondin structural repeats (TSR) and the transmembrane domain. At various levels of the parasite's life cycle, the TSR domain plays an important role in the molecular interaction between the parasite and diverse host tissues, guiding motility and host cell recognition [12]. The characterization of the TSR domain showed that its expression in P. knowlesi merozoites implicated its importance in erythrocytic cell invasion [16]. In the case of P. falciparum, the TRAMP gene is proteolytically cleaved and then released during the host cell invasion in the erythrocytic cell [17]. The main aim of this study was to genetically characterize the PkTRAMP gene from clini-

Discussion
Located at the merozoite surface, blood-stage antigens play a significant role in the parasite invasion of the host RBC membrane. During merozoite egress from the liver cells, these antigens, being directly exposed to host defensive immunity, represent an important vaccine candidate. In the context of P. knowlesi, no study has been conducted to genetically characterize the PkTRAMP gene. This study is perhaps a preliminary attempt, involving the comparative analysis of the P. knowlesi TRAMP gene with its orthologous sequences, which are P. falciparum, P. vivax and P. coatneyi, in order to understand the antigenic variability, natural selection and population structure in clinical samples from Malaysia. P. knowlesi protein PkTRAMP contains Thrombospondin structural repeats (TSR) and the transmembrane domain. At various levels of the parasite's life cycle, the TSR domain plays an important role in the molecular interaction between the parasite and diverse host tissues, guiding motility and host cell recognition [12]. The characterization of the TSR domain showed that its expression in P. knowlesi merozoites implicated its importance in erythrocytic cell invasion [16]. In the case of P. falciparum, the TRAMP gene is proteolytically cleaved and then released during the host cell invasion in the erythrocytic cell [17]. The main aim of this study was to genetically characterize the PkTRAMP gene from clinical samples from Sarawak, Sabah and Peninsular Malaysia, where a high number of cases of P. knowlesi is being reported. The low level of polymorphism in the TSR domain revealed from the inter-species analysis of the TRAMP gene indicates that it might be an excellent candidate; however, immunological and functional studies need to be conducted. The amino acid alignment analysis of Pf TRAMP and PvTRAMP revealed that both species share approximately a 88.2% identity, indicating the potential use of this antigen for vaccine development [12,15]. Moreover, the amino acid sequence analysis of P. knowlesi reference Hstrain was found to have 90.29%, 88.24% and 61.95% sequence identities with its orthologs P. coatneyi, P. vivax and P. falciparum, respectively. A comparison of amino acid haplotypes of PkTRAMP revealed that one major haplotype (Haplotype-1) had the highest frequency with samples from all regions of Malaysia.
The test for the natural selection analysis (dS-dN) indicated that the PkTRAMP gene was under negative selection. Furthermore, the negative values of other population genetic analysis, e.g., Taj's D and Li & Fu's D* and F*, were also indicative of PkTRAMP being under purifying selection. The significant values for Li & Fu's D* and F* were also indicative of population expansion, owing to a high number of singleton sites. The graphical analysis of the nucleotide diversity and Taj's D values showed the highest peaks towards the end of the 3 -region (nucleotide positions: 800-900), which had the highest Taj'D value, implicating a higher degree of polymorphism compared to the rest of the gene. Several similar studies with a strong negative selection pressure on P. knowlesi merozoite surface antigens have been reported [5,[24][25][26][27][28].
The phylogenetic analysis's findings revealed that among the 97 PkTRAMP, some clusters formed as a result of geographic clustering, but host-specific clustering resulting from adaptation to the natural hosts Macaca fascicularis and Macaca nemestrina, as identified by other previous studies [24,25,27], was not noted. Three pure clusters (geographical clustering) were identified, i.e., clusters 1, 2 and 3, two of which originated from Peninsular Malaysia and one from Malaysian Borneo. These clusters observed in the phylogenetic analysis may be due to the adaption and population expansion of the parasite to the primary hosts, i.e., the macaque species. This was represented by the genetic diversity in spite of samples being collected from the same location. However, one should note that a robust population genetic structure analysis using STRUCTURE revealed that only two major sub-populations (K = 2) exist in Malaysia, similar to previous studies [29].

Conclusions
This is the first study to determine the level of polymorphism, natural selection and population structure of the PkTRAMP gene from clinical samples from Malaysia. The low level of genetic diversity observed within the PkTRAMP sequences could be helpful to vaccine-oriented studies; however, further immunological and functional studies are necessary. The multiple pure and mixed clusters identified in this study were indicative of sub-populations coexisting in the regions, with a population expansion being evident. Future studies should investigate the genetic diversity of this gene among P. knowlesi samples from all Southeast Asian countries.