The Complete Mitochondrial Genome of the Hermit Crab Diogenes edwardsii (Anomura: Diogenidae) and Phylogenetic Relationships within Infraorder Anomura

A complete mitochondrial genome (mitogenome) can provide important information for gene rearrangement, molecular evolution and phylogenetic analysis. Currently, only a few mitogenomes of hermit crabs (superfamily Paguridae) in the infraorder Anomura have been reported. This study reports the first complete mitogenome of the hermit crab Diogenes edwardsii assembled using high-throughput sequencing. The mitogenome of Diogenes edwardsii is 19,858 bp in length and comprises 13 protein-coding genes, 2 ribosomal RNA genes, and 22 transfer RNA genes. There are 28 and six genes observed on the heavy and light strands, respectively. The genome composition was highly A + T biased (72.16%), and exhibited a negative AT-skew (−0.110) and positive GC-skew (0.233). Phylogenetic analyses based on the nucleotide dataset of 16 Anomura species indicated that D. edwardsii was closest related to Clibanarius infraspinatus in the same family, Diogenidae. Positive selection analysis showed that two residues located in cox1 and cox2 were identified as positively selected sites with high BEB value (>95%), indicating that these two genes are under positive selection pressure. This is the first complete mitogenome of the genus Diogenes, and this finding helps us to represent a new genomic resource for hermit crab species and provide data for further evolutionary status of Diogenidae in Anomura.


Introduction
In the past twenty years, the mitochondrial genome (mitogenome) has been widely used in the studies of molecular evolution and reconstruction of phylogeny [1,2]. Nucleotide sequences or amino acid data, gene arrangement and RNA secondary structures of mitogenomes are greatly useful for exploring the relationships between metazoan lineages [2,3]. Generally, a metazoan mitogenome is a closed circular DNA molecule, ranging in size from 14 to 20 kb, and typically contains 37 genes: 13 protein-coding genes (PCGs) (cob, cox1-3, atp6, atp8, nad1-6, and nad4l), 2 rRNA genes (rrnS and rrnL), and 22 tRNA genes [4]. In addition, there are always several noncoding regions in the mitogenome, and the longest noncoding "AT-rich" region is known as the control region, which contains some of the initiation sites for genome transcription and replication [4]. Mitochondrial DNA generally forms an independent unit of genetic information that evolves independently of the nuclear genome [5]. Due to their haploid characteristics, maternal inheritance, limited recombination and rapid rate of evolution, mitogenomes have been commonly used for Raw reads were filtered into clean reads prior to assembly, and then undesirable reads were removed. These were reads with adaptors; low quality reads (i.e., those showing a quality score below 20 (Q < 20)); reads which comprised more than 10% of unknown bases (N); and any duplicated sequences. The mitochondrial genome was reconstructed using a combination of de novo and reference-guided assemblies, and the following three steps were performed to assemble the mitogenome. Firstly, the filtered reads were assembled into contigs using SPAdes 3.14.1 (http://bioinf.spbau.ru/spades, accessed on 24 March 2022). Then, BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi, accessed on 24 March 2022) was used to align contigs with reference mitogenomes from Anomura species, and aligned contigs (≥80% similarity and query coverage) were ordered according to the reference mitogenomes. Finally, GapCloser 1.12 with default parameters were used to map the clean reads to the assembled mitogenome, and most gaps were filled with local assembly.
The mitochondrial genes were annotated using the online MITOS tool (http://mitos. bioinf.uni-leipzig.de/index.py, accessed on 24 March 2022), and the protein-coding genes, transfer RNA (tRNA) genes, and ribosome RNA (rRNA) genes were predicted with default parameters. BLAST searches for reference mitochondrion genes were used to determine the location of each coding gene. Manual corrections of genes for start/stop codons were performed in SnapGene Viewer by referring to the reference mitogenomes. The circular map of mitogenome of D. edwardsii was drawn by the CGview tool (http://stothard.afns. ualberta.ca/cgview_server/, accessed on 24 March 2022).

Sequence Analysis
The nucleotide composition and codon usage were computed using DnaSP 5.1 [21]. The AT and GC skews were calculated with the following formulas: AT skew = (A − T)/(A + T) and GC skew = (G − C)/(G + C) [22], where A, T, G, and C are the occurrences of the four nucleotides. The complete mitochondrial DNA sequence was deposited in GenBank with the accession number OP047688.

Phylogenetic Analysis
Along with the complete mitogenome DNA sequence from D. edwardsii, all currently complete mitochondrial sequences of 16 Anomura species were used in phylogenetic analysis (Table 1). Ovalipes punctatus (GenBank: NC_042695) and Callinectes sapidus (GenBank: NC_006281) were used as outgroups (Table 1). Our data set was based on nucleotide sequences from 13 mitochondrial PCGs (cox1, cox2, cox3, cob, atp6, atp8, nad1, nad2, nad3, nad4, nad4l, nad5, and nad6). Multiple alignments of the whole 13 genes were conducted using MUSCLE 3.8.31. Ambiguously aligned regions and gaps in aligned sequences were removed using Gblocks server 0.91b [21] with the default setting. ModelTest 2.1.10 [22] was used to select the best-fit evolutionary models GTR + I + G for the nucleotide dataset. The Maximum likelihood (ML) analysis of phylogenetic trees and model parameters were performed by RAxML 8.2.8 [23], and the best model was determined through AIC and BIC scores. Topological robustness for the ML analysis was evaluated with 100 replicates of bootstrap support. The phylogenetic tree (http://www.atgc-montpellier.fr/phyml/, accessed on 22 July 2022) was constructed by PhyML v3.0 and graphically edited with the iTOL 3.4.3 (https://itol.embl.de/itol.cgi, accessed on 22 July 2022).

Positive Selection Analysis
As reported in [24], comparing the nonsynonymous/synonymous substitution ratios (ω = dN/dS) has become a useful approach for quantifying the effect of natural selection on molecular evolution: ω > 1, =1 and <1 represent positive selection, neutrality and purifying selection, respectively [25]. The codon-based maximum likelihood (CodeML) method implemented in the PAML package [26] was applied to estimate the dN/dS ratio ω. The combined database of 13 mitochondrial PCGs was used for the selection pressure analyses. The ML phylogenetic tree was used as the working topology in the CodeML analyses.
We used branch models in the present study to evaluate positive selection in the infraorder Anomura. First, under an assumption of no adaptive evolution in the gene sequences, the one-ratio model (M0) estimated the distribution of ω values as a benchmark, which assumes a single ω ratio for all branches in the phylogenetic relationships [27]. Then, the two-ratio model (M2) that allows the background lineages and foreground lineages to have distinctive ω ratio values was used to detect positive selection acting on branches of interest [28,29]. Last, a free-ratio model (M1), which allows ω ratio variation among branches, was used to estimate ω values on each branch [29]. The one-ratio model (M0) and the free-ratio model (M1) were compared to determine whether different clades in Anomura had different ω values. In addition, the one-ratio model (M0) and the two-ratio model (M2) were used to compare whether Paguroidea species underwent more selection pressure than other Anomura species. Here, ω0 and ω1 represent the values for the background branch and foreground branch, respectively. Pairwise models were compared with the critical values of the Chi square (χ 2 ) distribution using likelihood ratio tests (LRTs). The test statistic was estimated as twice the difference in log likelihood (2∆L) and the degrees of freedom were estimated as the difference in the number of parameters for each model.
Additionally, branch-site models were fitted to examine the positive selection on some sites among the Anomura species. Branch-site models allow ω to vary both among sites in the protein and across branches on the tree. Branch-site model A (positive selection model) was used to identify the positive selected sites among the lineages of Paguroidea species (marked as foreground lineages). The sites with ω > 1 suggest that model A fits the data significantly better than the corresponding null model. Bayes Empirical Bayes analysis was used to calculate posterior probabilities to identify positive selection sites on the foreground lineages if the LRTs was significant [30].

Dedwardsii Mitogenome Organization
The Illumina HiSeq runs resulted in 59,106,998 paired-end reads from the D. edwardsii library with an insert size of approximately 450 bp. Selective-assembly analysis showed that 185,891 reads were assembled into a 19,858 bp circular molecule, representing the complete mitogenome of D. edwardsii ( Figure 1 and Table 2). This length is longer than the mitogenome length of other reported Anomura species, which ranges from 15,324 bp in N. maculatus to 17,910 bp in Munidaisos ( Table 1). The genome encodes 37 genes, including 13 PCGs, 2 rRNA genes, and 22 tRNA genes. A total of 28 genes are encoded on the heavy (H) strand, and nine genes are encoded on the light (L) strand ( Table 2). The intergenic regions are distributed in 28 locations, and the longest region is 1971 bp (between trnS2-tca and nad1). Meanwhile, 31 overlapping nucleotides are located in four pairs of neighboring genes for the mitogenome. These overlapping nucleotides vary in length from 1 to 18 bp, and the longest overlap is located between rrnL and trnV-gta. The D. edwardsii mitogenome has a nucleotide composition of 32.11% A, 10.68% C, 17.16% G, and 40.05% T and an overall AT content of 72.16%. For the D. edwardsii mitogenome, the AT skew is −0.110, and the GC skew is 0.233, which indicates bias toward T and G. This asymmetry often occurs on Anomura species [11,26,31,32]. In mammals, the duration of single-stranded state of the "heavy-stranded" genes during mitochondrial DNA replication can explain this asymmetry [32]. Whether or not the same explanation works on our results remains difficult to predict at this time due to the scarcity of information regarding DNA replication of invertebrate mitochondria.    For most families of the order Decapoda, congeners belonging to the same family generally possess identical gene arrangement [11], which has particularly been confirmed based on the reported crab mitogenomes [26][27][28]. However, comparing with the reported mitogenomes of the species C. infraspinatus and D. arrosor belonging to the same family Diogenidae, the D. edwardsii mitogenome possesses a distinctive gene arrangement [11,29]. For instance, nad5 is located between trnH-cac and trnF-ttc in the mitogenome of D. edwardsii, instead of the common location between trnF-phe and trnH-his in the mitogenomes of C. infraspinatus and D. arrosor. The gene rearrangement found in the D. edwardsii mitogenome deepens our understanding of this genomic feature about hermit crabs, and represents a special gene order in the genus Diogenes.

Protein-Coding Genes
The total length of all 13 PCGs of D. edwardsii is 10,888 bp, accounting for 54.83% of the complete length of the mitogenome ( Table 2). The 13 PCGs ranged in size from 159 bp (atp8) to 1707 bp (nad5), and all 13 PCGs are encoded on the heavy strand (Table 2). Furthermore, there are two reading-frame overlaps on the same strand: atp6 and atp8, and atp6 and cox3 each share seven and one nucleotides ( Table 2). The PCGs encode 3629 amino acids, among which Ser (16.19%) and Leu (9.84%) are the most and frequently used, and Met (1.64%) and Trp (1.64%) are the least frequently used, respectively. The A + T contents of PCGs and third codon positions in D. edwardsii mitogenome are 69.08% and 72.30%, respectively. The strong AT-bias in the third codon positions is similar to many species in the infraorder Anomura, i.e., D. arrosor, Shinkaia crosnieri and P. longicarpus [11,31,32].
Codon usage bias is a phenomenon in which specific codons are used more frequently than other synonymous codons by certain organisms during the translation of genes to proteins [11]. As shown in Figure 2, a total of 61 available codons are used in D. edwardsii mitogenome. Start and stop codons varied among the PCGs. Four of 13 PCGs use ATT (cox1, atp8, nad6, and nad1) and ATA (nad2, cob, nad4, and nad5) as start codons, and three PCGs use ATG (cox2, atp6, and cox3) and nad3 uses GTG. Seven of 13 PCGs use TAA (cox1, cox2, atp6, cox3, cob, nad1, and nad4l) as a stop codon, and five PCGs use TAG (nad2, atp8, nad4, nad5, and nad3). Finally, nad6 exhibits an incomplete (A) stop codon. The presence of an incomplete stop codon is a common phenomenon in both invertebrate and vertebrate mitochondrial genes [30,33,34]. It is assumed that truncated stop codons are completed via post-transcriptional polyadenylation [35]. Previous studies have shown that truncated stop codons were common in the metazoan mitogenomes and may be corrected by posttranscriptional polyadenylation [36,37]. The relative synonymous codon usage (RSCU) values indicated that the six most commonly used codons are TTA(Leu), TCT(Ser), CCT(Pro), ACT(Thr), GCT(Ala), and AGA(Ser) (Figure 2). Based on RSCU and amino acid composition in the PCGs, comparative analyses showed that the acid composition kept similar with most hermit crabs, whereas the codon usage pattern of D. Edwardsii is unidentified with some Paguroidea species. The codon TCT(Ser) instead of TTA(Leu) was often found to be used most frequently in Paguroidea species mitogenomes, i.e., D. arrosor, B. latro, P. longicarpus, and Coenobita rugpsus [9,11,32,38], while without the most frequency in D. edwardsii. TAA (cox1, cox2, atp6, cox3, cob, nad1, and nad4l) as a stop codon, and five PCGs use TAG (nad2, atp8, nad4, nad5, and nad3). Finally, nad6 exhibits an incomplete (A) stop codon. The presence of an incomplete stop codon is a common phenomenon in both invertebrate and vertebrate mitochondrial genes [30,33,34]. It is assumed that truncated stop codons are completed via post-transcriptional polyadenylation [35]. Previous studies have shown that truncated stop codons were common in the metazoan mitogenomes and may be corrected by post-transcriptional polyadenylation [36,37]. The relative synonymous codon usage (RSCU) values indicated that the six most commonly used codons are TTA(Leu), TCT(Ser), CCT(Pro), ACT(Thr), GCT(Ala), and AGA(Ser) (Figure 2). Based on RSCU and amino acid composition in the PCGs, comparative analyses showed that the acid composition kept similar with most hermit crabs, whereas the codon usage pattern of D. Edwardsii is unidentified with some Paguroidea species. The codon TCT(Ser) instead of TTA(Leu) was often found to be used most frequently in Paguroidea species mitogenomes, i.e., D. arrosor, B. latro, P. longicarpus, and Coenobita rugpsus [9,11,32,38], while not being used in the D. edwardsii mitogenome.

Ribosomal RNA and Transfer RNA Genes
The rrnL and rrnS genes of D. edwardsii are 1370 bp (AT% = 67.2) and 796 bp (AT% = 67.7) in length, respectively. The total length of rRNA genes accounts for 10.09% of the D. edwardsii mitogenome. Both of these two genes are encoded on the light strand. The AT and GC content of rRNAs are 77.42% and 22.58%, and the AT skew and GC skew are −0.008 and 0.010, respectively, suggesting a slight bias toward the use of T and G.
A total of 22 tRNA genes were identified in the mitogenome of D. edwardsii (Table 2).

Ribosomal RNA and Transfer RNA Genes
The rrnL and rrnS genes of D. edwardsii are 1370 bp (AT% = 67.2) and 796 bp (AT% = 67.7) in length, respectively. The total length of rRNA genes accounts for 10.09% of the D. edwardsii mitogenome. Both of the two genes are encoded on the light strand. The AT and GC content of rRNAs are 77.42% and 22.58%, and the AT skew and GC skew are −0.008 and 0.010, respectively, suggesting a slight bias toward the use of T and G.
A total of 22 tRNA genes were identified in the mitogenome of D. edwardsii ( Table 2). The length of tRNA genes is 1491 bp, accounting for 7.51% of the complete length of the mitogenome. A total of 15 genes (trnL1-cta, trnK-aaa, trnD-gac, trnR-cga, trnN-aac, trnE-gaa, trnT-aca, trnS2-tca, trnP-cca, trnH-cac, trnF-ttc, trnC-tgc, trnL2-tta, trnG-gga, trnS1-aga) are encoded on the heavy strand and the rest four genes (trnL-atc, trnM-atg, trnY-tac, trnV-gta) are encoded on the light strand. The AT skew of tRNA genes is 0.013, and the GC skew is 0.108, indicating a bias toward A and G. In addition, the codons with A and T in the third position are used more frequently than other synonymous codons. These characteristics reflect codon usage with A and T bias at the third codon position, similar to the bias found in many metazoans [39][40][41].

Phylogenetic Relationships
In the present study, the phylogenetic relationship of D. edwardsii was constructed on the sequences of the concentrated 13 PCGs from the mitochondrial genomes of 16 Anomura species and two outgroups ( Figure 3). Phylogenetic analyses were carried out based on nucleotide sequences of 13 PCGs by the ML method. Anomura was divided into two main branches in our phylogenetic tree, which was identified in the previous researches [24,42]. The first clade comprised five families (Porcellanidae, Munididae, Paguridae, Lithodidae, and Kiwaidae) and the second clade contained two families (Coenobitidae and Diogenidae). It is not difficult to find that the two families of the superfamily Paguroidea belong to two branches of the phylogenetic tree, which presents challenges for the current classification based on the morphology. Even if inter-superfamiliar relationships are not fully resolved, clear trends become visible. The phylogenetic relationships of these seven families show as being ((((Porcellanidae + Munididae) + Paguridae) + Lithodidae) + Kiwaidae) + (Coenobitidae + Diogenidae). trnT-aca, trnS2-tca, trnP-cca, trnH-cac, trnF-ttc, trnC-tgc, trnL2-tta, trnG-gga, trnS1-aga) are encoded on the heavy strand and the rest four genes (trnL-atc, trnM-atg, trnY-tac, trnV-gta) are encoded on the light strand. The AT skew of tRNA genes is 0.013, and the GC skew is 0.108, indicating a bias toward A and G. In addition, the codons with A and T in the third position are used more frequently than other synonymous codons. These characteristics reflect codon usage with A and T bias at the third codon position, similar to the bias found in many metazoans [39][40][41].

Phylogenetic Relationships
In the present study, the phylogenetic relationship of D. edwardsii was constructed on the sequences of the concentrated 13 PCGs from the mitochondrial genomes of 16 Anomura species and two outgroups ( Figure 3). Phylogenetic analyses were carried out based on nucleotide sequences of 13 PCGs by the ML method. Anomura was divided into two main branches in our phylogenetic tree, which was identified in the previous researches [24,42]. The first clade comprised five families (Porcellanidae, Munididae, Paguridae, Lithodidae, and Kiwaidae) and the second clade contained two families (Coenobitidae and Diogenidae). It is not difficult to find that the two families of the superfamily Paguroidea belong to two branches of the phylogenetic tree, which presents challenges for the current classification based on the morphology. Even if inter-superfamiliar relationships are not fully resolved, clear trends become visible. The phylogenetic relationships of these seven families show as being ((((Porcellanidae + Munididae) + Paguridae) + Lithodidae) + Kiwaidae) + (Coenobitidae + Diogenidae).  atp6, atp8, cox1, cox2, cox3, cob, nad1, nad2, nad3, nad4, nad4L, nad5, and nad6). Species belonging to the same superfamily are marked with the same color. Numbers on branches are Bayesian posterior probabilities (percent).
Consistent with the traditional morphological classification, the phylogenetic analyses based on molecular methods showed that D. edwardsii clustered with the Diogenidae species C. infraspinatus and D. arrosor. However, the members of the family Coenobitidae B. latro and C. brevimanus are clustered within the same clade. As reported, C. infraspinatus possesses a favor of codon TAT(Ser) and owns the highest Ser content in amino acid com-  atp6, atp8, cox1, cox2, cox3, cob, nad1, nad2, nad3, nad4, nad4L, nad5, and nad6). Species belonging to the same superfamily are marked with the same color. Numbers on branches are Bayesian posterior probabilities (percent). Consistent with the traditional morphological classification, the phylogenetic analyses based on molecular methods showed that D. edwardsii clustered with the Diogenidae species C. infraspinatus and D. arrosor. However, the members of the family Coenobitidae B. latro and C. brevimanus are clustered within the same clade. As reported, C. infraspinatus possesses a favor of codon TAT(Ser) and owns the highest Ser content in amino acid composition [29], which showed the same pattern with D. edwardsii. This supports the phylogenetic analysis result that these two species possess a close relationship. The ML bootstrap values supporting D. edwardsii and C. infraspinatus as well as D. arrosor were not high (<80), which may be explained by the different gene arrangements between D. edwardsii and these two species, as discussed in Section 3.1. Thus, more species mitogenomes in the family Diogenidae still need to be investigated to fulfill the systematic phylogeny within the infraorder Anomura. Our newly acquired mitogenome data and phylogenetic results can also be better used to provide a basis for studies of the mitochondrial evolution of Anomura.

Positive Selection Analysis
The ω(dN/dS) ratio calculated in branching model analysis D. edwardsii's 13 mitochondrial PCGs was 0.02684 under the one-ratio model (M0) ( Table 3), indicating that these genes undergo constrained selection pressure to maintain primary function. When using two-ratio model, the ω ratio of the D. edwardsii branch possess no significant difference compared with that in another branch (ω1 = 0.02703 and ω0 = 0.02683), indicating similarity in the selection pressure between the species in the family Coenobitidae and Diogenidae and other Anomura species. However, in the comparison of the one-ratio model (M0) and the free-ratio model (M1), the LRTs indicated that the free-ratio model fit the data better than the one-ratio model ( Table 3), representing that the ω ratios are distinctive among different lineages. Positive selection often occurs over a short evolutionary time and performs on only a few sites [43,44]. Thus, the signals of positive selection are often overwhelmed by those for successive purifying selection that occur at most sites in the gene sequence [43][44][45]. In the current study, branch-site models were used to determine possible positively selected sites in D. edwardsii (Table 4). Two residues, located in cox1 and cox2, were identified as positive selected sites in D. edwardsii (>95%), respectively, indicating that these two genes are under positive selection pressure. It is known that mitochondrial PCGs are crucial in the oxidative phosphorylation pathway. Cytochrome c oxidase catalyzes the oxygen terminal reduction, and the catalytic core is encoded by three mitochondrial PCGs (cox1, cox2, and cox3) [45,46]. Cytochrome c oxidase is confirmed to be a particularly important site of the positive selection in marine anoxic adaptation [47,48]. This study also indicates the mechanism applicability of Cytochrome c oxidase on the hermit crabs living in the specific marine environment. The cytochrome c oxidase requires reactive oxygen species (ROS) when the living cells are exposed to anoxia. The increase in ROS concentration helps to stabilize Hif-1α, which then results in the induction of Hif-1-dependent nuclear hypoxic genes [48][49][50]. In this study, the positive selection sites in cox1 and cox2 of D. edwardsii indicate a phylogenetic adaption of ATP synthase of hermit crabs in the presence of reduced oxygen availability or increased energy requirements. Cox1 and cox2 in mitochondrion are involved in the regulation of platelet aggregation and vasomotor to maintain the stability of physiological functions of cells and tissues [45]. Functional modification mediated by positive selection pressure in D. edwardsii may increase the affinity between the enzyme and oxygen, and then efficiently maintain basic metabolic levels under a hypoxia environment in the marine [45]. The marine environment is characterized by a lack of high pressure and low dissolved oxygen, both of which could directly affect the mitochondrial aerobic respiration process [51]. The positive selected sites found in cox1 and cox2 may be associated with the adaptive evolution of D. edwardsii within the hermit crabs living in the marine environment. Our study provides a foundation for understanding the mitochondrial evolution mechanisms of D. edwardsii to obtain adequate energy and maintain essential metabolic levels in specific marine ecosystem.

Conclusions
In this study, we determined and described the complete mitogenome of D. edwardsii, supplementing the first mitogenome information of the genus Diogenes (Anomura: Paguroidea: Diogenidae). Our phylogenetic tree based on molecular methods showed that D. edwardsii possessed the closest relationship with the species C. infraspinatus in the same family. Positive selection analysis showed that cox1 and cox2 of D. edwardsii were under positive selection pressure, indicating the potential mechanisms of this species that efficiently maintain basic metabolic levels under a hypoxia environment in the marine ecosystem. Our results provide useful mitogenome information for a better understanding of the hermit crabs, as well as the phylogenetics of D. edwardsii in the infraorder Anomura.

Data Availability Statement:
The mitochondrial genome has been deposited in the NCBI with accession number OP047688.