Complete Mitochondrial Genome Sequence and Phylogenetic Analysis of Procambarus clarkii and Cambaroides dauricus from China

To enhance the management and protection of crayfish genetic diversity and germplasm resources in Cambaroides dauricus (C. dauricus), a common species of Procambarus clarkii (P. clarkii) was used as a control group to compare the whole mitochondrial genome sequence using Illumina sequencing technology. This study found that the mitochondrial genome of C. dauricus is 15580 bp in length, with a base composition of A (31.84%), G (17.66%), C (9.42%), and T (41.08%) and a C + G content of 27.08%. The C + G in the D-loop is rich in 17.06%, indicating a significant preference. The mitochondrial genome of C. dauricus contains 13 protein-coding genes, 22 tRNA genes, and 2 rRNA genes, with most of the genes labeled in the negative direction, except for a few genes that are labeled in the positive direction. The start codons of the ten coding sequences are ATG, and the quintessential TAA and TAG are the stop codons. This study also found that the Ka/Ks ratios of most protein-coding genes in the mitochondria of both shrimps are lower than 1, indicating weak natural selection, except for nad 2, nad 5, and cox 1. The Ka/Ks ratio of cox 3 is the lowest (less than 0.1), indicating that this protein-coding gene bears strong natural selection pressure and functional constraint in the process of mitochondrial genetic evolution of both shrimps. Furthermore, we constructed phylogenetic analyses based on the entire sequence, which effectively distinguishes the high body from other shrimp species of the genus based on the mitochondrial genome. This study provides molecular genetic data for the diversity investigation and protection of fishery resources with Chinese characteristics and a scientific reference for the evolutionary study of Procambarus.


Introduction
Although Cambaroides dauricus (C. dauricus) is only found in the Heilongjiang, Jilin, and Liaoning provinces of Northeast China, it has potential for freshwater cultivation due to its desirable characteristics such as flavorful meat, high nutrient content, rapid fertility, dietary versatility, and continuous improvement pace [1]. Meanwhile, Procambarus clarkii (P. clarkii) is the most commonly cultivated freshwater crayfish species worldwide, especially in the middle and lower reaches of the Yangtze River in China [2,3]. A temperature of 25 • C is ideal for its growth and development, and P. clarkii has become a significant source of high-quality protein [4] and a valuable economic resource. However, while there have been many studies on the reproductive habits [5][6][7][8] and larval nutrition [9,10] of wild P. clarkii, the artificial propagation technique is still immature, resulting in overfishing and depletion of juvenile resources [11].
Fortunately, C. dauricus, a close relative of P. clarkii, has been actively researched for its fundamental ecology and aquaculture technologies, and is more tolerant to cold temperatures, with an optimal water temperature range of 16-21 • C [12][13][14]. Additionally, research on both species in agriculture has increased, although the mitochondrial genomes of these two shrimp species have not been extensively studied in comparison to other shrimp species. Therefore, this study aims to combine molecular genetics to clarify population genetic diversity, breed connections, and the phylogenetic process of variation generation in both species to better understand Astacidae genotypes.
As is widely known, cross-species genetic diversity is determined by changes in heterozygotes and variables that impact gene frequency, including migration, mutation, selection, and genetic drift [15]. Population genetic analysis, on the other hand, seeks to describe the extent of interactive visualization of genetic differences and to explain the variability in genetic mutations [15]. For aquaculture, mitochondrial genome DNA (mtDNA) is purely maternally inherited and self-replicating [16,17]. As a result, mtDNA has been broadly utilized in the domains of phylogeny, population genetics, and adaptive evolution in fish [18][19][20][21], as well as being used more rarely in the studies of shrimp. Because it has been demonstrated that using longer mitochondrial genome sequences to examine evolutionary connections across species is far more reliable [22,23], no relevant research with comparative study of the entire sequences of the mitochondrial genomes of adjacent shrimp species is available. Thus, the fundamental properties and sequencing discrepancies of the mitochondrial genomes of shrimp from two distinct cultivars are compared in this study; factors including mitogenomic organizations, gene rearrangements, nucleotide compositions, protein-coding genes (PCG)codon usages, secondary structures of transfer genes (tRNA), and control regions (CRs)were examined in C. dauricus and P. clarkii, which were rebuilt to assess the validity of the newly available molecular data for the evolutionary connections of Astacidae further.

Structural Characteristics of the Mitochondrial Genomes
In this study, we annotated and submitted the completed mitochondrial genome sequences of P. clarkii (accession no: OL542520) and C. dauricus (accession no: OL542521) to GenBank. The mitogenomes of both species were typical double-stranded circular molecules, with lengths of 15,937 bp ( Figure 1A) for P. clarkii and 15,580 bp ( Figure 1B) for C. dauricus, and they displayed similar mitogenome maps. The base composition of P. clarkii was A (31.84%), G (17.66%), C (9.42%), and T (41.08%), with a significantly richer content of C + G (27.08%) than A + T (72.92%), indicating a preference for CG. Similarly, C. dauricus also exhibited a CG preference, with a higher proportion of C + G (27.50%) than A + T (72.50%).
The mitochondrial DNA structure of P. clarkia and C. dauricus is consistent with that of other crayfish, comprising 37 genes in the standard set, including 13 protein-coding genes (cox 1-3, ATP6, ATP8, ND1-6, ND4L, and Cytb), 22 tRNA genes, 2 rRNA genes, and a control region (D-loop region). These genes are listed in Tables 1 and 2 for P. clarkii and C. dauricus, respectively. In P. clarkii, four pairs of adjoining genes had 9 overlapping nucleotides, while in C. dauricus, nine gene limitations had 36 overlapping nucleotides. There were 227 intergenic nucleotides (IGNs) at 20 locations in P. clarkii and 478 IGNs at 19 locations in C. dauricus, suggesting a loose structure of the mitochondrial DNA. The entire mitochondrial genomes of both species were heavily skewed towards A and T nucleic acids with 72.92% and 72.50%, respectively, with P. clarkii having a negative AT skew and a positive GC skew, while C. dauricus had a positive AT skew and a negative GC skew.

Structural Analysis of rRNA, tRNA, and D-Loop Regions
Tables 1 and 2 reveal that both P. clarkii and C. dauricus have rrnS and rrnL in their mitochondrial genome. In P. clarkii, rrnS is 793 bp long and is located between trnN-aac and trnV-gta, while rrnL is 1050 bp long and is located between trnV-gta and trnL1-cta.
Similarly, in C. dauricus, rrnS is 791 bp long and is positioned between trnN-aac and trnVgta, while rrnL is 1054 bp long and is placed between trnV-gta and trnL1-cta. rrnL is more conserved than rrnS in both species.
The length of the 22 tRNAs in P. clarkii is 1422 bp, ranging from 61 to 68 bp, with trnA-gca being the shortest and trnQ-caa being the longest. Fifteen tRNAs are labeled in the negative direction, and trnQ-caa, trnS1-aga, trnN-aac, trnS2-tca, trnT-aca, trnC-tgc, and trnY-tac are marked in the positive direction.
Similarly, in C. dauricus, the total length of the 22 tRNAs is 1480 bp, ranging from 61 to 69 bp, with trnA-gca and trnF-ttc being the shortest and trnQ-caa being the longest. Fifteen tRNAs are also labeled in the negative direction, and trnQ-caa, trnS1-aga, trnN-aac, trnT-aca, trnS2-tca, trnC-tgc, and trnY-tac are marked in the positive direction.

Predicted Structures of rRNA, tRNA, and D-Loop Regions
For P. clarkii ( Figure 2), 6 tRNAs, namely trnS1-aga, trnP-cca, trnH-cac, trnF-ttc, trnDgac, and trnE-gaa, lacked the dihydrouracil arm (DHU arm), while the other 15 tRNAs contained typical "clover" secondary structures that could be predicted online using tRNAscan-SE. The C-T conversion in the amino acid arm caused U-U mismatches in trnQ-caa ( Figure 2A) and trnW-tga ( Figure 2I), resulting in the deletion of the DHU ring in trnL2-tta ( Figure 2L). The control region (D-loop) of P. clarkii is located between the trnE-gaa and trnQ-caa genes, with a total length of 844 bp, and no tandem repeats were found using the tandem repeat finder software. The base content and percentage were as follows: A 42.54%, T 40.40%, C 4.86%, G 12.20%. A + T were rich in 82.94% and C + G were rich in 17.06%, which showed a clear preference for C + G and an anti-adenine phenomenon.
For C. dauricus (Figure 3), the DHU arm was missing in trnS1-aga, trnN-aac, trnFttc, trnI-atc, trnK-aaa, trnD-gac, trnG-gga, trnA-gca, and trnR-cga, while the other 13 tR-NAs contained typical "clover" secondary structures, which were predicted online using tRNAscan-SE. The C-T conversion in these peptide arms caused a U-U mismatch in trnQcaa ( Figure 3A) and trnW-tga ( Figure 3I). The control region (D-loop) of C. dauricus is located between the trnE-gaa and trnQ-caa genes, with a total length of 791 bp, and no tandem repeats were found using the software Tandem Repeat Finder. The base content and percentage were as follows: A 43.87%, T 40.58%, C 3.79%, G 11.76%. A + T was rich in 84.45%, while C + G was rich in 15.55%, indicating a clear preference for C + G and an anti-adenine phenomenon.

Analysis of the Codon Preference Profiles in Protein-Encoding Genes
In this study, we separately analyzed the codons of protein-encoding genes in P. clarkii and C. dauricus using frequency and relative synonymous codon usage (RSCU), as presented in Tables 3 and 4. For P. clarkii (Table 3), a total of 3750 codons were encoded, with the bold font indicating the codon with the most usage for the same amino acid. RSCU values were greater than 1 for all codons, except AUG (M), UGG (W), and CGA (R), which means that the remaining codons were used with preference. Furthermore, the UNN-type codons accounted for 35.92% of the total number of codons. With the exception of UGG (W, 0.69), the RSCU of other UNN-type codons was greater than 1, indicating that the second and third codon sites were G and thus used more frequently. For C. dauricus (Table 4), a total of 3733 codons were encoded, and the bold font indicates the codon with the most usage for the same amino acid. RSCU values were greater than 1 for all codons, except AGU (S), UGG (W), and CGA (R), indicating preference for the remaining codons. Furthermore, the UNN-type codons accounted for 35.59% of the total number of codons. With the exception of UGG (W, 0.85), the RSCU of other UNN-type codons was greater than 1, indicating that the second and third codon sites were G and thus used more frequently.

Analysis of the Codon Preference Profiles in Protein-Encoding Genes
In this study, we separately analyzed the codons of protein-encoding clarkii and C. dauricus using frequency and relative synonymous codon usag presented in Tables 3 and 4. For P. clarkii (Table 3), a total of 3750 codons w with the bold font indicating the codon with the most usage for the same RSCU values were greater than 1 for all codons, except AUG (M), UGG (W), a which means that the remaining codons were used with preference. Furth UNN-type codons accounted for 35.92% of the total number of codons. With t of UGG (W, 0.69), the RSCU of other UNN-type codons was greater than 1, in the second and third codon sites were G and thus used more frequently. For (Table 4), a total of 3733 codons were encoded, and the bold font indicates the the most usage for the same amino acid. RSCU values were greater than 1 fo except AGU (S), UGG (W), and CGA (R), indicating preference for the remai Furthermore, the UNN-type codons accounted for 35.59% of the total numbe With the exception of UGG (W, 0.85), the RSCU of other UNN-type codons than 1, indicating that the second and third codon sites were G and thus us quently.

The Genetic Variation of the Mitochondrial Genome
PAML was used to calculate the non-synonymous substitution rate (Ka) and synonymous substitution rate (Ks) of protein-coding genes in the mitochondrial genomes of P. clarkii and C. dauricus. The results showed that with the exceptions of nad2, nad5, and cox1, the Ka/Ks ratio of the remaining 13 protein-coding genes was less than 1 (ranging from 0.0582 to 0.7266), indicating that these genes were subjected to specific negative purification selection during the evolutionary processes of coding sequences in both species, as shown in Table 5.  The Ka/Ks ratios of the nad5 (2.2423) and nad2 (1.3190) genes were found to be greater than 1, indicating that these genes were subject to positive or productive natural selection. The Ka and Ks values of the nad2 gene were similar, suggesting that this gene underwent mostly neutral selection with inadequate natural selection pressure. However, the difference between the Ka and Ks values was much larger for the nad5 gene, resulting in a higher Ka/Ks ratio and demonstrating that biological evolution was a strong driving force in the evolutionary process of this gene.
The cox1 gene had a Ka/Ks ratio of 2.1049, indicating that it underwent certain selection pressure during evolution. The Ka/Ks ratio of the cox3 gene was the lowest among all genes studied, with a value of less than 0.1. This suggests that protein-coding genes were under strong natural selection pressure, bound by the protein function encoded by the gene, to ensure the normal biological function of the protein encoded by the cox3 gene. Thus, the cox3 gene plays an important role in the survival and evolution of both P. clarkii and C. dauricus.

The Phylogenetic Analysis
The evolutionary model was used to construct a phylogenetic tree (Figure 4) based on the nucleotide sequences of 13 protein-coding genes from 18 mitochondrial genomes. The results indicate that each shrimp species has its own branch, and the shrimps of Orconectes, Procambraus Cambaroides, and Austropotamobius first gather into a single fine branch. Then, the three species of Pacifastacus leniusculus, Astacus astacus, and Austropotamobius form a small branch of parallel evolution, forming an independent evolutionary branch with Cambaroides. Additionally, Procambraus and Orconectes join into a single evolutionary branch. These results are consistent with the taxonomic kinship of Astacidae, which is in line with the traditional classification based on morphological identification. all genes studied, with a value of less than 0.1. This suggests that protein-coding genes were under strong natural selection pressure, bound by the protein function encoded by the gene, to ensure the normal biological function of the protein encoded by the cox3 gene. Thus, the cox3 gene plays an important role in the survival and evolution of both P. clarkii and C. dauricus.

The Phylogenetic Analysis
The evolutionary model was used to construct a phylogenetic tree (Figure 4) based on the nucleotide sequences of 13 protein-coding genes from 18 mitochondrial genomes. The results indicate that each shrimp species has its own branch, and the shrimps of Orconectes, Procambraus Cambaroides, and Austropotamobius first gather into a single fine branch. Then, the three species of Pacifastacus leniusculus, Astacus astacus, and Austropotamobius form a small branch of parallel evolution, forming an independent evolutionary branch with Cambaroides. Additionally, Procambraus and Orconectes join into a single evolutionary branch. These results are consistent with the taxonomic kinship of Astacidae, which is in line with the traditional classification based on morphological identification.

Discussion
In this study, the complete mitochondrial genome sequence of P. clarkii and C. dauricus was determined, with a total length of 15,937 bp and 15,580 bp. The base composition of P. clarkii was A (42.54%), T (40.40%), C (4.86%), and G (12.20%), and the composition ratio of C. dauricus was A (43.87%), T (40.58%), C (3.79%), and G (11.76%). The above situation follows an apparent A + T preference; the C + G content was about 17.06% and 15.55%, respectively, far less than the A + T content (82.94%, 84.45%). The mitochondrial base structure of vertebrates is similar [24,25]. Among them, the high guanine (G) content in the body of the fish is similar to that of other bony fish such as Seriola lalandi (17.84%) [26], Sillago aeolus (18.75%) [27], and Oryzias celebensis (17.60%) [28], exhibiting significant resistance to bird uric acid. This study systematically examined the two mitochondrial genomes' genomic structural characteristics, base composition, codon preference, and protein-coding genes. The results showed that 13 proteins were similar in their mitochondrial genomes, showing significant anti cytosine phenomena in P. clarkii and C. dauricus.
We found that the overlap length of the two animals was between 1 and 46 bp. The largest overlap fragment was between the protein-coding genes cox2 and trnLys(k) (46 bp), while the overlap fragment between ATP8 and ATP6 was 6 bp, the overlap fragment between cox3 and trnGly(G) was 1 bp, and the overlap fragment between trnSer(UCN) and CYTB was 16 bp for P. clarkii and 19 bp for C. dauricus. The overlap fragment between trnMet(M) and ND2 was 23 bp for P. clarkii and 20 bp for C. dauricus, while the overlap fragment between trnTyr(Y) and cox1 was 7 bp for P. clarkii and 2 bp for C. dauricus. In addition, P. clarkii had a 1 bp overlap fragment between ND2 and trnTrp(W), and ND3 and trnAla(A). In fish, the overlap fragment is generally only 7-10 bp [29], while in mammals, it is generally 40-46 bp [30].
The start codons for P. clarkii were ATG or ATT, and the stop codons were TAA or TAG in this study, and the start codons for C. dauricus were ATG, ATT, or ATT, and the stop codons were TAA or TAG. Both species showed positive and negative chain specificity. The codon preference analysis showed that the frequency of NNU-type codons was the highest, which was consistent with the preference of the base T in their protein-coding gene composition, and similar to the results of crustaceans [31].
The 22 tRNA genes in the mitochondrial genomes of the two animal species show no variation. In P. clarkii, 15 tRNA genes have typical cloverleaf structures, while trnS1-aga, trnP-cca, trnH-cac, trnF-ttc, trnD-gac, and trnE-gaa lack the dihydrouridine arm (DHU stem) and form a simple loop at the corresponding position. In C. dauricus, 13 tRNA genes have typical cloverleaf structures, while trnS1-aga, trnN-aac, trnF-ttc, trnI-atc, trk-aaa, trnD-gac, trnG-gga, trnA-gca, and trnR-cga lack the dihydrouridine arm and form a simple loop at the corresponding position. The secondary structures of different tRNAs may differ due to the effects of missing aminoacyl arms, unstable structures, and non-typical base pairing, which may be caused by external environmental factors [32].
The Ka/Ks ratio can be used to determine the degree of selective pressure on proteincoding genes. Ka/Ks > 1 indicates positive selection, Ka/Ks = 1 indicates neutral evolution, and Ka/Ks < 1 indicates purifying selection [33,34]. The Ka/Ks values of nad5, nad2, and cox1 are greater than 1, indicating that these genes are affected by positive or productive natural selection, while the other genes have values less than 1, indicating that they are affected by purifying selection. The Ka/Ks value is generally considered the best parameter to represent the selective pressure on protein-coding genes.
According to the above data, we know that Cambarus phylogenetic trees were constructed partly through the maximum likelihood method based on their complete mitochondrial genome sequence. Orconectes, Procambraus Cambaroides, and Austropotamobius create a single acceptable subsidiary, followed by Pacifastacus leniusculus, Astacus astacus, and Austropotamobius forming one tiny strand of parallel evolution to form an independent evolutionary branch with the Cambaroides. This is consistent with the classification results obtained by Zheng Wenjuan et al. using a single gene, 16S rRNA [35,36].

Sample Collection and Identification and DNA Extraction
P. clarkii were collected at the Yangtze River Fisheries Research Institute of Chinese Academy of Fishery Sciences while C. dauricus were selected at Qingdingzi forest farm (Huinan County, Tonghua City, China). The morphological traits of these specimens were promptly recognized by the Heilongjiang River Fishery Research Institute of Chinese Academy of Fishery Sciences, and then they were preserved in 95% ethanol and kept at −80 • C. Total DNA was isolated from muscle tissues according to the manufacturer's instructions using an E.Z.N.A. ® Tissue DNA Kit (Omega, Norcross, GA, USA).

Sequence Assembly
Total genomic DNA was extracted using a modified cetyltrimethylammonium bromide (CTAB) method and 500-bp paired-end library construction was applied using the NEBNext Ultra DNA Library Prep Kit for Illumina sequencing. Sequencing was carried out on the Illumina NovaSeq 6000 platform (BIOZERON Co., Ltd., Shanghai, China). Approximately 2 Gb of raw data from Cambaroides dauricus (GenBank No. OL542521) was generated, with 150 bp paired-end read lengths. For PacBio library construction and sequencing, more than 5 µg of sheared and concentrated DNA was subjected to size selection using the Blue Pippin system (Sage Science, Beverly, MA, USA). Approximately 20 kb SMRTbell libraries were prepared according to the manufacturer's instructions (PacBio, Menlo Park, CA, USA). One SMRT cell and 4.5 Gb data composed of 35.9 million reads were sequenced on the PacBio Sequel platform.
De novo assembly with GetOrganelle v1.6.4 referencing the mt genome of closely related species Procambarus clarkii-ref (GenBank No. OL542520) produced contigs of the mt genome. A number of potential mitochondrion reads were extracted from the pool of Illumina reads using BLAST searches against the mt genomes of related species Procambarus clarkii-ref and the GetOrganelle result. The mitochondrion Illumina reads were obtained to perform cp genome de novo assembly using the SPAdes-3.13.1 package. Clean PacBio long reads were aligned against the GetOrganelle and SPAdes assembled scaffolds using the BLASR program. All aligned PacBio reads were extracted to perform self-correction and mt genome de novo assembly using the Canu v2.1.1 package, followed by error correction using Pilon v1.22. The GetOrganelle assembly contigs were optimized using the scaffolds from SPAdes-3.13.0 result. Finally, the assembled sequences were reordered and oriented according to the reference mt genome, thus generating the final assembled mitochondrion genomic sequence.

Gene Annotation
The mitochondrion genes were annotated using the online MITOS tool, using default parameters to predict protein-coding genes, transfer RNA (tRNA) genes, and ribosome RNA (rRNA) genes [37]. The position of each coding gene was determined using BLAST searches against ref mt genes. Manual corrections of genes for start/stop codons were performed in SnapGene Viewer by referencing the ref mt genome. The circular mt genome map of P. clarkii and C. dauricus was drawn using the CGview tool (http://stothard.afns. ualberta.ca/cgview_server/, accessed on 19 May 2023). Functional annotations were performed using sequence similarity Blast searches with a typical cut-off E-value of 10 −5 against several publicly available protein databases: the NCBI non-redundant (Nr) protein database, Swiss-Prot, Clusters of Orthologous Groups (COGs), and Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) terms.

Sequence Feature Analysis
The nucleotide makeup and codon usage of mitochondrial genome sequences of P. clarkii and C. dauricus were estimated using MEGA [38], while calculations for AT and GC offset were used [39]. Then, DNA SP 5.0 [40] was used to examine gene features and variation locations for the principal mRNA transcripts.

The Phylogenetic Analysis of Genetic Variation in the Mitochondrial Genome
Mega 7.0 software was used to perform the preference and phylogenetic analysis on the high body base composition and codons. Table 6 shows the mitochondrial genome sequences of 18 scad species obtained from the GenBank. A maximum likelihood evolutionary tree was constructed after finishing multiple sequence alignments and 1000 bootstrap tests.

Conclusions
This study suggests that P. clarkii and C. dauricus may belong to the same population, as indicated by their similar mitochondrial genome structure. However, there may be some degree of geographic isolation between the two species. Despite this, P. clarkii and C. dauricus were effectively differentiated from other crayfish genera by utilizing both morphological and bioinformatic approaches. These findings can contribute to the development of techniques for species identification, population division, and the preservation and sustainable use of germplasm resources. Ultimately, this research can help promote the sustainable and healthy growth of the aquaculture industry in China and worldwide.
Supplementary Materials: The supporting information can be downloaded at: https://www.mdpi. com/article/10.3390/ijms241411282/s1. Author Contributions: L.L. and Y.X. were responsible for the design, validation, resources, database collecting, writing, and preparation of this study. S.W., R.Z., and K.G. analyzed data and edited preliminary drafts. W.X. contributed to methods and several types of software. Z.Z. contributed to the visualization. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The complete mitogenome sequence with gene annotation has been submitted to the NCBI GenBank under the accession numbers OL542520 and OL542521. The sequence data utilized in this study can be found in the Supplementary Materials (S1-S4).