In Silico Evaluation of the Haplotype Diversity, Phylogenetic Variation and Population Structure of Human E. granulosus sensu stricto (G1 Genotype) Sequences

Echinococcus granulosus sensu lato is the causative agent of cystic echinococcosis (CE), which is a neglected zoonotic disease with an important role in human morbidity. In this study, we aimed to investigate the haplotype diversity, genetic variation, population structure and phylogeny of human E. granulosus sensu stricto (s.s.) (G1 genotype) isolates submitted to GenBank from different parts of the world by sequencing the mitochondrial CO1 and ND1 genes. The sequences of the mt-CO1 (401 bp; n = 133) and mt-ND1 (407 bp; n = 140) genes were used to analyze the haplotype, polymorphism and phylogenetic of 273 E. granulosus s.s. (G1 genotype) isolates. Mutations were observed at 31 different points in the mt-CO1 gene sequences and at 100 different points in the mt-ND1 gene sequences. Furthermore, 34 haplotypes of the mt-CO1 sequences and 37 haplotypes of the mt-ND1 sequences were identified. Tajima’s D, Fu’s Fs, and Fu’s LD values showed high negative values in both mt-CO1 and mt-ND1 gene fragments. The haplotype diversities in the sequences retrieved from GenBank in this study indicate that the genetic variation in human isolates of E. granulosus s.s. in western countries is higher than in eastern countries. This may be due to demographic expansions due to animal trades and natural selections.


Introduction
Cystic echinococcosis (CE) is a neglected parasitic zoonotic disease caused by Echinococcus granulosus sensu lato and has an important role in human morbidity. CE is distributed worldwide, especially in Asia, Africa, Europe, South America, Canada and Australia [1][2][3]. In 2014, CE was ranked as the third most important foodborne parasitic disease globally by the Food and Agriculture Organization (FAO) and World Health Organization (WHO), and E. granulosus s.l. infections are responsible for billions of dollars of economic loss per year [4][5][6].
Two interrelated hosts play an important role in the life cycle of E. granulosus s.l. Carnivores, mostly dogs, are final hosts, and many mammals, including humans, are intermediate hosts. Accidental ingestion of eggs from the feces of infected host species leads to the infection of different internal organs of the intermediate host, but mainly the liver and lungs [12][13][14].
Although CE is a benign disease, it can progress with high morbidity and mortality as a result of unexpected and serious complications [15]. However, the clinical symptoms vary based on the size, location and condition of the cyst. After ingestion, the embryos are released from the eggs into the small intestine, where they penetrate the mucous membranes, mix with the blood and reach many organs. Although a single cyst is prevalent in the majority of infected organisms, multiple cysts or cyst formation in multiple organs can be observed in 20-40% of individuals. Most cysts occur in the liver (>65%), followed by the lung (25%), while they are less common in the spleen, kidney, bone, heart and central nervous system [16,17].
Communities in which sheep breeding is widespread contribute greatly to this distribution, with E. granulosus s.s. (G1 genotype) playing an important role in transmission in humans [32][33][34].
Due to the maternal inheritance and high mutation rates of mitochondrial (mt) DNA sequences, these sequences are commonly analyzed to determine the genetic structure of the population and the degree of close kinship [35]. In many studies, partial sequences of mt-CO1 and mt-ND1 genes have been used successfully to distinguish genetic variants among Echinococcus species and between E. granulosus strains [36][37][38].
Global evaluation of genetic variation among human isolates of E. granulosus s.s. (G1 genotype) is important to reveal the population dynamics of the parasite. In the current study, we evaluated the haplotype diversity, genetic variation, population structure and phylogeny of human E. granulosus s.s. (G1 genotype) submitted to GenBank from different parts of the world by analyzing the mt-CO1 and mt-ND1 gene sequences.

Data Collection
After filtering the mt-CO1 (n = 382) and mt-ND1 (n = 199) gene sequences containing human (Homo sapiens) isolates of the E. granulosus s.s. (G1 genotype) submitted to the National Center for Biotechnology Information, USA, (NCBI) (www.ncbi.nlm.nih.gov) database until 6 April 2022, a total of 581 gene sequences were obtained and a dataset was created.

Alignment and Phylogenetic Analysis
All the gene sequences were loaded into the CLC Sequence Viewer 8 [39] in FASTA format. All sequences were trimmed from both ends and were then aligned using the mt-CO1 (accession no. MG672129) and mt-ND1 (accession no. KU925413) reference sequences. After the removal of short gene sequences, the remaining 273 gene sequences [401 bp mt-CO1 (n = 133) and 407 bp mt-ND1 (n = 140)] were used for bioinformatic analysis. Individual phylogenetic trees were created from the sequences of both gene regions using the neighborjoining (NJ) model and the Jukes-Cantor nucleotide distance measure. Statistical support for the specificity of the branches was obtained using 1000 bootstrap replicates. Taenia saginata and T. solium sequences were added as outgroups to show the degree of relations.

Haplotype Analysis and Networking
The haplotype analysis was carried out using the DnaSP 6 program in which the sequences were investigated in FASTA format [40]. The haplotype and nucleotide change values, nucleotide and haplotype numbers and neutrality indexes were calculated to determine the genetic structure of both gene regions. The sequences were converted to Nexus format [41] and a haplotype network was generated by using the PopArt (Population Analysis with Reticulate Trees) program [42] for a visual representation of the relationships between haplotypes.

Results
In this study, we analyzed a total of 273 gene sequences of E. granulosus s.s. (G1 genotype) isolates obtained from the NCBI database, consisting of 133 mt-CO1 sequences from 15 countries and 140 mt-ND1 sequences from 16 countries (Table 1). The distribution of the collected mt-CO1 and mt-ND1 sequences over the world is shown in Figure 1.

Polymorphism and Haplotype Analysis
Mutations were observed at 31 different points within the mt-CO1 gene sequences, with the longest conserved areas detected between 116 bp and 167 bp. Within the mt-ND1 sequences, mutations were observed at 100 different points, with conserved areas detected between 361 bp and 407 bp. No protein-coding domain was found in either of the datasets. Analysis of 133 mt-CO1 gene sequences revealed 34 different haplotypes (Table 2). Among these, Hap03 constituted the main haplotype with 79 gene sequences, of which 23 existed as a single haplotype. Analysis of 140 mt-ND1 gene sequences revealed 37 haplotypes (Table 3). Among these, Hap01 constituted the main haplotype, with 83 gene sequences, constitutes of which 28 existed as a single haplotype.
The mt-ND1 haplotype network consisted of 37 haplotypes ( Figure 3). A comparison of the main haplotype with the others in this network revealed between one and 50 mutations. The main haplotype was Hap01, accounting for 59.28% (83/140) of the haplotype network, followed by Hap05, accounting for 10.71% (15/140). A unique single haplotype constituted 75.67% (28/37) of the haplotype network. Single haplotypes were from Uzbekistan (n = 17), Slovenia (n = 4), China (n = 4), Algeria (n = 2) and Iraq (n = 1).        The nucleotide positions of the mt-CO1 and mt-ND1 genes among the haplotypes were presented in Supplementary Tables S1 and S2.

Phylogenetic Tree
The results of the phylogenetic analysis were consistent with the haplotype network. The phylogenetic tree generated by aligning the mt-CO1 gene sequences is shown in Figure 4A. In this tree, Hap04 (EU006783), Hap12 (DQ356874), Hap16 (MK229315) and Hap23 (AB688619) were the haplotypes farthest apart, with mutations at seven points. The phylogenetic tree generated by aligning the mt-ND1 gene sequences is shown in Figure 4B. In this tree, Hap10 (MN696602) and Hap36 (MN231833) were the haplotypes farthest apart, with mutations at 50 points. Taenia saginata and T. solium were added as outgroups in both phylogenetic trees.   The nucleotide positions of the mt-CO1 and mt-ND1 genes among the haplotypes were presented in Supplementary Tables S1 and S2.

Phylogenetic Tree
The results of the phylogenetic analysis were consistent with the haplotype network. The phylogenetic tree generated by aligning the mt-CO1 gene sequences is shown in Figure 4A. In this tree, Hap04 (EU006783), Hap12 (DQ356874), Hap16 (MK229315) and Hap23 (AB688619) were the haplotypes farthest apart, with mutations at seven points. The

Gene Flow, Diversity and Neutrality Analysis
The diversity and neutrality indices of the mt-CO1 and mt-ND1 groups are shown in Table 4. Tajima's D (Tajima, 1989) and Fu's FS (Fu, 1997) values were calculated to determine whether populations were subject to selection pressure. Tajima D, Fu's Fs and Fu's LD values showed high negative values in both the mt-CO1 and mt-ND1 regions, providing evidence of a large number of alleles.

Discussion
Genetic diversity and population structure of E. granulosus s.s. (G1 genotype) were investigated in the current study. This was carried out using sequenced data of mt-CO1 and mt-ND1 retrieved from GenBank, commonly used for the differentiation of Echinococcus species. Results obtained in this study emerged information about gene flow and population dynamics in human E. granulosus s.s. infections globally. A total of 133 mt-CO1 (401 bp) and 140 mt-ND1 (407 bp) gene sequences of E. granulosus s.s. (G1 genotype) human isolates already registered in the NCBI database were used for us in

Gene Flow, Diversity and Neutrality Analysis
The diversity and neutrality indices of the mt-CO1 and mt-ND1 groups are shown in Table 4. Tajima's D (Tajima, 1989) and Fu's FS (Fu, 1997) values were calculated to determine whether populations were subject to selection pressure. Tajima D, Fu's Fs and Fu's LD values showed high negative values in both the mt-CO1 and mt-ND1 regions, providing evidence of a large number of alleles.

Discussion
Genetic diversity and population structure of E. granulosus s.s. (G1 genotype) were investigated in the current study. This was carried out using sequenced data of mt-CO1 and mt-ND1 retrieved from GenBank, commonly used for the differentiation of Echinococcus species. Results obtained in this study emerged information about gene flow and population dynamics in human E. granulosus s.s. infections globally. A total of 133 mt-CO1 (401 bp) and 140 mt-ND1 (407 bp) gene sequences of E. granulosus s.s. (G1 genotype) human isolates already registered in the NCBI database were used for us in silico analyses to determine the genetic diversity and variations of the E. granulosus s.s. (G1 genotype) human isolates.
Although the prevalence and incidence of CE have decreased significantly in recent years, it still remains an important public health concern, especially in some countries and geographical regions that cannot implement a control program due to economic difficulties [43]. In addition, it is an important problem for human health in developing countries where animal husbandry is intense, and sheep meat is consumed intensively [44]. The incidence of CE increases with age and is more common between the ages of 20 and 40 years. The incidence of the disease is higher in societies with a low socio-economic ratio [45].
The results of the current study show an extremely high global haplotype diversity within the G1 genotype. The 273 samples analyzed represented a total of 34 haplotypes for mt-CO1 and 37 for mt-ND1. High genetic diversity within E. granulosus s.s. has also been reported by Kinkar et al. [46]. They [46] analyzed 212 samples (near complete mitochondrial sequence) and found 171 haplotypes (overall haplotype diversity was 0.994). The main reason for the haplotype difference between the studies is related to the length of the gene regions analyzed. Therefore, more haplotypes can be determined by sequencing longer mitochondrial gene fragments.
Neutrality indices such as Tajima D, Fu's Fs, and Fu's LD were used to measure nucleotide variability and population expansion [47]. The Tajima D test evaluates the deviation of populations from the standard neutral model, with a positive Tajima D value representing heterozygosity, defined as having a selective advantage, while negative values indicate that a particular allele has a selective advantage over the other allele. A negative value also indicates a rapid increase in the population [48,49]. In our study, Tajima D values were low in both the mt-CO1 and mt-ND1 gene fragments, indicating a high probability of population increase in the future. However, the lower Tajima D value of the mt-ND1 gene sequence (−2.80355) compared with that of the mt-CO1 gene sequence (−2.47269) indicates a higher rate of population growth in the former. The negative value of the neutrality indices Tajima's D suggests population expansion (Animal movements among the countries indicate that this expansion may continue in the coming years. Fu's FS represents a marker of sensitivity to population growth, with a significantly negative value (p < 0.05), indicating that the populations have common growth patterns and belong to the same gene pool [50,51]. Our analysis yielded highly negative and statistically significant Fu's Fs values in both the mt-CO1 and mt-ND1 haplotype groups, indicating that these populations are subject to expansion globally.
Nucleotide diversity was examined to determine the degree of polymorphism in the population. We determined that the mean nucleotide difference of the mt-ND1 (0.00611) gene sequence was higher than that of the mt-CO1 (0.00255) gene sequence. In addition, haplotype diversity was assessed to evaluate the uniqueness of haplotypes within the population. In our study, the values of the mt-CO1 (0.640) and mt-ND1 (0.639) gene sequences were very similar.
In total, 34 haplotypes were identified in our analysis of the mt-CO1 gene sequences. The main haplotype constituted 59.39% of the total network, and there were 23 single haplotypes. Thirty-seven different haplotypes were identified in our analysis of the mt-ND1 gene sequence. The main haplotype constituted 59.28% of the total network, and there were 28 single haplotypes. The major haplotypes represent a single ancestor.
In total, 31 different mutations were detected across the 401 bp mt-CO1 gene sequences, and 100 different mutations were detected within the 407 bp mt-ND1 sequences. The higher mutation rates may reflect the long and complex evolutionary history of E. granulosus. The genetic diversity within E. granulosus s.s. (G1 genotype) is very high worldwide, and the observed complex phylogeographic patterns emerging from the phylogenetic and geographic analyses suggest that the current distribution of E. granulosus s.s. (G1 genotype) has been shaped by the intensive animal trade [46]. The high number of haplotypes detected in some Asian and Middle Eastern countries (China, Mongolia, Pakistan, Iran, etc.) in this study may indicate that E. granulosus s.s. (G1 genotype) has existed in these countries for many years compared to some western countries (Finland and Spain).

Conclusions
E. granulosus s.s. (G1 genotype) poses an important problem in communities where sheep breeding is common. Although different molecular studies have been conducted to date, this study is the first bioinformatics study to evaluate the genetic structure and gene flow of human isolates of the E. granulosus s.s. (G1 genotype) collected worldwide. In this study, all the sequence data reported from humans related to E. granulosus s.s. (G1 genotype), the most common species in humans were screened, and aggregated data were given comparatively. We think that this study can fill the knowledge gaps on the subject. Our findings also represent an important step in future epidemiological, bioecological, vaccine and diagnostic studies that could yield efficient treatments for species/strains.  . The funders had no role in the study design, the data collection, and analysis, the decision to publish, or the preparation of the manuscript.