Characterization of the Complete Mitochondrial Genome of the Spotted Catfish Arius maculatus (Thunberg, 1792) and Its Phylogenetic Implications

The spotted catfish, Arius maculatus (Siluriformes), is an important economical aquaculture species inhabiting the Indian Ocean, as well as the western Pacific Ocean. The bioinformatics data in previous studies about the phylogenetic reconstruction of Siluriformes were insufficient and incomplete. In the present study, we presented a newly sequenced A. maculatus mitochondrial genome (mtDNA). The A. maculatus mtDNA was 16,710 bp in length and contained two ribosomal RNA (rRNA) genes, thirteen protein-coding genes (PCGs), twenty-two transfer RNA (tRNA) genes, and one D-loop region. The composition and order of these above genes were similar to those found in most other vertebrates. The relative synonymous codon usage (RSCU) of the 13 PCGs in A. maculatus mtDNA was consistent with that of PCGs in other published Siluriformes mtDNA. Furthermore, the average non-synonymous/synonymous mutation ratio (Ka/Ks) analysis, based on the 13 PCGs of the four Ariidae species, showed a strong purifying selection. Additionally, phylogenetic analysis, according to 13 concatenated PCG nucleotide and amino acid datasets, showed that A. maculatus and Netuma thalassina (Netuma), Occidentarius platypogon (Occidentarius), and Bagre panamensis (Bagre) were clustered as sister clade. The complete mtDNA of A. maculatus provides a helpful dataset for research on the population structure and genetic diversity of Ariidae species.


Introduction
In metazoans, the typical complete mitochondrial genome (mtDNA) is usually a circular, double-stranded molecule, with sizes ranging from 13 to 20 kb, containing 13 proteincoding genes (PCGs), 2 ribosomal RNA (rRNA) genes, and 22 transfer RNA (tRNA) genes. Moreover, an A + T-rich region, which includes the initial sites for RNA transcription and mtDNA replication, is regarded as the non-coding region or the control region (CR) [1]. Owing to its maternal inheritance, rapid evolutionary rate, short coalescence time, conserved gene content, small genome size, and low levels of sequence recombination [2,3], mtDNA is widely used in various research fields, such as species identification and taxonomic resolution [4,5], comparative and molecular evolution [6][7][8], population genetics [9], and non-synonymous (Ka) and synonymous substitutions of many species [5,[10][11][12].
Moreover, mtDNA is commonly known as a helpful molecular marker for phylogenetic analyses among fish taxa. A single mitochondrial gene fragment has limitations in resolving complex phylogenetic relationships in plentiful fish lineages [13]. The additional informative sites from complete mtDNA allow these detailed branches and higher-level relationships to be more adequately resolved [14]. Consequently, in the present study, the

Samples and Mitogenome Sequencing
Adult A. maculatus fish (about 2400 g) was obtained from Beibu Gulf, China (longitude 21 • 36 ′ 50 N and latitude 108 • 44 ′ 00 E) in June 2019 and directly frozen. Genomic DNA was extracted from muscle tissues, according to the instructions of Genomic DNA Extraction Ver.5.0 kit (TaKaRa, Kyoto, Japan). The concentration of the isolated gDNA was detected using the NANODROP 2000 spectrophotometer (Thermo Scientific, Waltham, MA, USA). The quality of the extracted gDNA was evaluated by electrophoresis with 1% agarose gel and stained with Gel Red™ (Biotium). Then, normalized genomic DNA (4 µg) was used to prepare the paired-end library, according to the instructions of the NEBNext DNA sample libraries kit (NEB, New England). The quantification and size of the library was estimated using a Bioanalyzer 2100 High Sensitivity DNA chip (Agilent, CA, USA). Sequencing of the normalized library (2 nM) was performed on a HiSeq 2500 platform (2 × 101 bp paired-end reads) (San Diego, IL, USA).

Genome Sequence Analysis
To confirm tRNAs, the tRNAscan-SE Search Server 1.21 program was used [27,28]. OGDRAW1.2 was used to create the gene map of A. maculatus mtDNA, and hand annotation was completed [29]. An estimate of strand skew was developed using a previous study's formulae [30]. By using "models-> Compute Codon Usage Bias" in MEGA 6.0, relative synonymous codon usage (RSCU) was calculated [31]. The nonsynonymous/ synonymous mutation (Ka/Ks) ratio and codon usage in the 13 PCGs were calculated using DnaSP 5.10.01 to investigate the evolutionary branching of the Ariidae lineage [32]. In addition, we determined the skew of AT and GC in the whole mtDNA, PCG, tRNA, rRNA, or control region sequence, using the following formula: AT skew = (A − T) /(A + T) and GC skew = (G − C)/ (G + C) [33].

Phylogenetic Analysis
Phylogenetic analysis of Siluriformes was carried out using 13 PCG nucleotide and amino acid sequences from 21 species. Based on MUSCLE v.3.8.31, each of the 13 PCG nucleotide and amino acid sequences from all 21 species was individually aligned (http: //www.drive5.com/muscle/, accessed on 8 May 2021) [34] and then aggregated into a sequence matrix to reconstruct the phylogeny. The 21 mitogenome data were all downloaded from the NCBI database. Twenty-one species were divided into 15 genera and 7 families in the order Siluriformes. To test the nucleic acid and amino acid models, we used jModelTest2.1.7 (https://code.google.com/p/jmodeltest2/, accessed on 8 May 2021) [35] and Prottest3.2 [36]. Akaike information criterion(AIC), was considered the best model for tree formation. The maximum likelihood (ML) tree was implemented in RAxML 8.0.12 [37] under the GTR-γ model and MtMam+I+G model for nucleic acid and amino acid trees, respectively, and node support was calculated with 1000 bootstrap replications (random seed value of 1,234,567). Further, MrBayes 3.2.5 [38] for Bayesian inference (BI) was used to reconstruct phylogenetic trees with 10,000,000 generations. The BI analysis used the CAT-GTR model, and two independent Markov chain Monte Carlo (MCMC) chains were run for 10,000 cycles. Phylogenetic trees were generated using FigTree v1.4.2 (http://tree. bio.ed.ac.uk/software/figtree/, accessed on 8 May 2021).

Genome Size and Organization
Raw data of approximately 1.5 G with read lengths of 150 bp were generated. The mtDNA sequences covered 100% of the genome and were approximately 57X deep. The total number of bases (bp) was 965,100, and the read number was 6434. The whole mtDNA of Arius maculatus was a circular double-chain DNA molecule with a length of 16,710 bp (Gen-Bank: MN604079; Figure 1, Table 1), which is comparable to the mtDNA of other Siluriformes species, ranging from 16,471 bp (Pangasius larnaudii) to 16,830 bp (Ariopsis seemanni) [39] (Table S2) (Table S2). Moreover, the mtDNA of A. maculatus comprised 2 rRNA genes, 13 PCGs, 22 tRNA genes, and a D-loop region. The arrangement of the genes in the A. maculatus mtDNA was identical to that of other reported Ariidae mtDNAs (Table 1) [22][23][24]. Of these genes, 29 (12 PCG, 2 rRNA, and 15 tRNA) were Genes 2022, 13, 2128 4 of 17 located in the heavy strand (H-strand); the rest (1 PCG and 8 tRNA) were located in the light strand (L-strand) ( Table 1). As valid species markers and genus authentication features, these typical features have also been observed in other Siluriformes [39][40][41][42].    The nucleotide composition of the mtDNA was A (29.63%), T (25.42%), C (29.65%), and G (15.30%), with a high A + T nucleotide content (55.05%), which was 54.88, 55.52, 52.75, and 62.55% for the PCGs, tRNAs, rRNAs, and D-loop region, respectively ( Table 2). Among Siluriformes, A. maculatus had the lowest A + T nucleotide composition. With more As than Ts, the AT skew (0.0764) observed here was similar to that in O. platypogon (0.0765), N. thalassina (0.0775), and B. panamensis (0.0716), which are evolutionarily closely related. Most Siluriformes, however, exhibited a positive AT skew in their mtDNA (Table 2). GC skews ranged from −0.3308 in O. platypogon to −0.2799 in S. soldatovi (Table 2). In A. maculatus, the mtDNA was negative (−0.3194), showing that it had a GC skew more toward Cs than Gs.

Protein-Coding Gene Features
The PCG sequences make up 68.26% of the A. maculatus mtDNA, with 11,407 bp. Furthermore, 19 Siluriformes were shown to have AT skews and GC skews that differed from nucleotide composition ( Table 2). Among Siluriformes species, A. maculatus mtDNA had a moderate AT skew value (0.0489) of the PCG region. Other species, however, showed a negative GC skew (−0.3888) [43][44][45]. In addition, thirteen PCGs with AT and GC skews were also calculated in Figure S1, indicating that they were mutually consistent and closely related [43,45].  Note: The A + T biases of whole mitogenome, protein-coding genes, tRNA, rRNA, and control regions were calculated by AT-skew= (A − T) / (A + T) and GC-skew= (G -C) / (G + C), respectively.
A. maculatus encodes 3792 amino acids in its 13 PCGs. Moreover, codon usage is displayed in Table 3. A. maculatus PCGs were dominated by the following amino acids: leucine (Leu, 17.1%), alanine (Ala, 8.84%), isoleucine (Ile, 8.07%), and threonine (Thr, 7.68%), whereas those encoding cysteine (Cys, 0.71%) were rare (Table 3). RSCU analysis of the 13 PCGs indicated that the codons encoding Leu and serine (Ser) were most abundant in A. maculatus (Figure 2). In the PCGs of the eight species examined, there was homology in amino acid content and codon distribution between those species (Figure 3). It was deduced that conserved amino acid sequences were found in Siluriformes [39,42,48]. Furthermore, A or T in the third position was overused compared with other synonymous codons [4]. For example, codons for leucine (TTG) and serine (TCG) were rare, whereas CTA and TCA were widespread (Figure 4).

Transfer RNAs and Ribosomal RNAs
Twenty-two classical structures of tRNAs were validated in A. maculatus mtDNA, with lengths ranging from 66 (tRNA Cys ) to 75 bp (tRNA Leu ), with a total length of 1558 bp ( Table 1). The lowest A + T content of tRNAs was found in A. maculatus (55.52%), N. thalassina (55.52%), and S. soldatovi (55.44%), and the highest was found in Mystus cavasius (55.84%). (Table 2). On the H strand, there were fourteen tRNA genes encoded, whereas the residues are on the L strand indicated that there were four tRNA genes ( Table 1). The AT (0.1283) and GC skews (−0.1631) of A. maculatus were similar to those of several sequenced Siluriformes mtDNAs, such as O. platypogon and N. thalassina ( Table 2). The predicted tRNAs are shown in Figure 5. Except for tRNA Ser (GCT), which lacks the dihydrouridine 'DHU' arm, all tRNAs formed typical clover-leaf secondary structures in A. maculatus ( Figure 5). The tRNA Ser 'DHU' arm is a large loop substitute for the conserved stem-and-loop structure. This representative characteristic [1] was also observed in the mtDNA of other Siluriformes species, including Ompok bimaculatus [48], Hemibagrus sp. [49], Silurus lanzhouensis [50], and Chrysichthys nigrodigitatus [51]. Twelve tRNA genes had at least one G-T mismatch, which caused a weak bond. In the amino acid acceptor stems of tRNA Cys (GCA) and tRNA Met (CAT) , five T-T mismatches were observed. (Figure 5). It was also found that tRNA Leu (TAA) contained an A-G mismatch. In tRNA sequences, the RNA-editing mechanism, well-known in vertebrate mtDNA, can correct unmatched base pairs [52].    mtDNA of other Siluriformes species, including Ompok bimaculatus [48], Hemibagrus sp. [49], Silurus lanzhouensis [50], and Chrysichthys nigrodigitatus [51]. Twelve tRNA genes had at least one G-T mismatch, which caused a weak bond. In the amino acid acceptor stems of tRNA Cys (GCA) and tRNA Met (CAT) , five T-T mismatches were observed. (Figure 5). It was also found that tRNA Leu (TAA) contained an A-G mismatch. In tRNA sequences, the RNA-editing mechanism, well-known in vertebrate mtDNA, can correct unmatched base pairs [52]. According to the rRNA gene content, all rRNA genes had 52.75% A + T, which indicated a trend toward A + C, as observed in other Siluriformes [22,24]. The AT and GC skews were positive (0.2369) and negative (−0.1463), respectively ( Table 2). The A. macu- According to the rRNA gene content, all rRNA genes had 52.75% A + T, which indicated a trend toward A + C, as observed in other Siluriformes [22,24]. The AT and GC skews were positive (0.2369) and negative (−0.1463), respectively ( Table 2). The A. maculatus 12S rRNA subunit gene was 958 bp, and its 16S rRNA subunit gene was 1675 bp, respectively. As in other fish [4], the two genes were located between tRNAPhe and tRNALeu, and separated by the tRNAVal gene ( Figure 1, Table 1). The rRNA gene content of A. maculatus was similar to that of other Siluriformes [41].

The Control Region
In A. maculatus, the D-loop region was 1075 bp in length, which was longer than in the majority of Siluriformes and was only shorter than that in A. seemanni. A + T content was 62.55%, which is similar to A + T content in other Siluriformes species (Table 2). This was consistent with those of previous reports of other teleosts [45]. Additionally, both AT and GC skews were strongly negative ( Table 2).

Overlapping and Intergenic Spacer Regions
Nine gene boundaries overlapped between adjacent genes, ranging in size from 1 to 10 bp. There was a 10 bp overlap between ATP8 and ATP6 (Table 1), which was observed in several other Siluriformes mtDNA sequences. In addition, 12 intergenic spacers, ranging in size from 1 to 31 bases, contained 66 nucleotides. tRNA Asn and tRN Cys constitute the longest intergenic spacer regions (31 bp) (Table 1), which was identified with the results in Clarias fuscus [41].

Synonymous and Nonsynonymous Substitutions
In general, Ka/Ks ratios reflect evolutionary relationships between homogenous and heterogeneous species and selective pressure at the molecular level [53,54]. A ratio of Ka/Ks > 1, Ka/Ks = 1, and Ka/Ks > 1 indicate that there has been positive selection, neutral mutation, and negative selection, respectively [55]. Four Ariidae mtDNAs (A. maculatus, A. arius, N. thalassina, and O. platypogon) were investigated for their evolutionary rate differences, and the Ka and Ks substitution rates were used to calculate sequence divergences. In all 13 PCGs of the four Ariidae, the average Ka/Ks was 0.1747 and varied from 0.015 (COXII between OP/AM, OP/AA, or OP/NT) to 1.672 (ND4 between AM/NT) ( Figure 6). The Ka/Ks ratios indicated that there had been strong purifying selections on multiple genes. In other words, this result showed that natural selection occurred against deleterious mutations with negative selective coefficients [56]. A high percentage of AM/NT and AA/NT variable sites was observed in ND4 (1.672) and ND2 (1.190) among the groups, respectively, whereas the percentage in the COXII gene were the lowest, indicating that ND4 and ND2 underwent positive selection and COXII was the most selectively pressured mitochondrial protein. Furthermore, compared with other species, the ratio of Ka/Ks in A. maculatus and A. arius or N. thalassina was the lowest in all 13 protein-coding genes, implying that these three Ariidae fish had a closer phylogenetic relationship with each other than with O. platypogon, which is consistent with their traditional taxonomy. There were likely differences in selection pressures between the genes, and consequently, they evolved differently. It is interesting to note that the ND2 and ND4 genes have the highest ratios, indicating strand-independent selection pressures.

Phylogeny
Two methods (BI and ML) were used to establish phylogenetic relationships between 21 species, including the concatenated nucleic acid ( Figure 7A) and amino acid ( Figure 7B) sequences of the 13 PCGs. Phylogenetic tree topologies of the two superfamilies (Ariidae, Pangasiidae, Bagridae, Mochokidae, Cl ariidae, Siluridae, and Sisoridae) were similar, and strong statistics supported the following relationship among them (Figure 7). The clustering pattern of seven superfamilies was obviously consistent with previous studies [21][22][23][24]. (According to the ML method, 15 closely related genera were identified within the seven superfamilies, and A. maculatus (Arius) was most closely related to A. arius (Arius), N. thalassina (Netuma), O. platypogon (Occidentarius), and B. panamensis (Bagre), which was consistent with a recent study's findings about nucleotide sequence identity [22,23]. Netuma was most closely related to Arius. To determine the location of Ariidae within Siluriformes, further taxon sampling was required within Ariidae and related superfamilies.

Phylogeny
Two methods (BI and ML) were used to establish phylogenetic relationships between 21 species, including the concatenated nucleic acid ( Figure 7A) and amino acid ( Figure 7B) sequences of the 13 PCGs. Phylogenetic tree topologies of the two superfamilies (Ariidae, Pangasiidae, Bagridae, Mochokidae, Cl ariidae, Siluridae, and Sisoridae) were similar, and strong statistics supported the following relationship among them (Figure 7). The cluster-ing pattern of seven superfamilies was obviously consistent with previous studies [21][22][23][24]. (According to the ML method, 15 closely related genera were identified within the seven superfamilies, and A. maculatus (Arius) was most closely related to A. arius (Arius), N. thalassina (Netuma), O. platypogon (Occidentarius), and B. panamensis (Bagre), which was consistent with a recent study's findings about nucleotide sequence identity [22,23]. Netuma was most closely related to Arius. To determine the location of Ariidae within Siluriformes, further taxon sampling was required within Ariidae and related superfamilies.   Table S1.

Conclusions
In conclusion, the present study represents common and characteristic features of A. maculatus and other 26 Siluriformes mtDNA, and reveals their phylogenetic relationship. The phylogenetic results strongly support the close relationship between Arius, Netuma, Occidentarius, and Bagre. Our results will provide insight into the basics of evolutionary biology, molecular identification, and conservation of the diverse Siluriformes species, as well as the gene rearrangement process and matrilineal inheritance of A. maculatus.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/genes13112128/s1, Supplementary Figure S1: Graphical illustration showing the AT-and GC-skew in the PCGs of the mtDNA of A. maculatus; Supplementary Table S1: Primers used to verify the accuracy of the assembled mtDNA sequence; Supplementary Table S2: The information of Superfamily, Genera, Species, Size, Genbank number, and Identity in the Siluriformes.
Author Contributions: X.L. and C.L. implemented the molecular genetic studies; M.Y. participated in data analyses and contributed to prepare the figures and tables; M.Y. and Z.Y. conducted data analyses and drafted the manuscript; K.Z. directed this study and revised the manuscript. All authors have read and agreed to the published version of the manuscript. Informed Consent Statement: This study did not involve humans.

Data Availability Statement:
The generated mitochondrial DNA has been submitted to the GenBank database under the accession number MN604079. Moreover, the experimental data involved in this article can be obtained by the corresponding author.