Three Complete Mitochondrial Genomes of Erotylidae (Coleoptera: Cucujoidea) with Higher Phylogenetic Analysis

Simple Summary Erotylid beetles are phytophagous and mycophagous. Phylogenetic studies on this family were mainly based on morphological characters or several gene fragments. Research on the mitochondrial genome of Erotylidae is rare. Therefore, we sequenced and analyzed three complete mt genomes of Erotylinae with a comparative mt genomic analysis of Erotylinae and Languriinae for the first time to reveal mitochondrial genome characterizations and reconstruct phylogenetic relationships of this group. The comparative analyses showed the mt genome characterizations of Erotylinae are similar to Languriinae. These results provided a comprehensive framework and worthy information for the future research of this family. Abstract The family Erotylidae belongs to the superfamily Cucujoidea, which are phytophagous and mycophagous. So far, only two representative complete mitochondrial (mt) genomes of Erotylidae have been sequenced. Mitochondrial genomes of Tritoma metasobrina, Neotriplax arisana, and Episcapha opaca, which all belong to the subfamily Erotylinae, were sequenced using next-generation sequencing technology to better understand the diversity of mt genomes of Erotylidae. A comparative mt genomic analysis was conducted on the three sequenced representatives of Erotylinae and Languriinae sp. (Languriinae). The size of the complete mt genome of the 4 species ranged from 15,581 bp to 16,502 bp in length, including 37 genes (13 protein-coding genes, 22 transfer RNAs, and 2 ribosomal RNAs) and the control region. The arrangements of their mt genomes are highly consistent with other Coleoptera species. The start codons of two PCGs (ND1 and ND5) and the stop codons of one PCG (ATP8) were illustrated differences between Languriinae sp. and the other three species. All tRNAs of these 4 species exhibited cloverleaf secondary structures except that the dihydorouridine (DHU) arm of tRNASer(AGN) was absent. The phylogenetic analyses using both Bayesian inference (BI) and maximum likelihood (ML) methods all supported that Erotylidae as monophyletic. Erotylinae was monophyletic being the sister group to Xenocelinae. Languriinae was closely related to ‘Erotylinae-Xenocelinae’. Our results recovered Languriinae nested within Erotylidae.


Introduction
The diverse family Erotylidae belongs to the superfamily Cucujoidea (Coleoptera: Polyphaga), which includes about 260 genera and 3500 species in the world [1]. The larvae and adults of Erotylidae have different feeding habits [2]. The highest numbers of species are contained in the basidiomycete fungus-feeding Erotylinae and the mainly phytophagous-feeding Languriinae, whereas other subfamilies include fewer species with mixed diets [2]. The monophyly and composition of Erotylidae have never been explicitly determined since Latreille established this family [3]. The monophyly of Erotylidae was Insects 2021, 12, 524 2 of 14 supported by Leschen and Lawrence based on morphological analysis of Cucujoidea [4][5][6]. In Leschen's study, the sister group with Erotylidae was not clear [4]. The monophyly of the family Erotylidae was also put forward by Hunt and Mcelrath according to the molecular phylogenetic study [7,8] and the results showed Erotylidae was closely related to Protocucujoidae, Helotidae, and Monotomidae. But the paraphyly of Erotylidae has been recovered by a molecular phylogeny based on several nuclear and mitochondrial genes [9]. The paraphyletic or polyphyletic Erotylidae was also proposed by the morphological characters in Cucujoidea [10].
The relationships within Erotylidae are still controversial, especially between Erotylinae and Languriinae. Languriidae should be merged into Erotylidae as indicated by Crotch as well as Wegrzynowicz and Leschen [11][12][13]. This conclusion was also supported by Robertson, who analyzed the 18S rRNA and 28S rRNA gene sequences of 61 species [14]. But there has also been some research that showed Languriidae and Erotylidae to be separate families [15][16][17][18][19]. Arrow showed that there were significant morphological differences between Erotylidae and Languriidae, such as the procoxal cavities open and the larval spiracles are divided into two pairs in Languriidae, whereas in Erotylidae the procoxal cavities are closed and the larval spiracles are of a simple annular shape [15]. This result was also supported by Crowson, Sen, and Leschen [16][17][18]. Kai indicated that Languriidae should be regarded as a separated family based on their feeding characteristics [19]. So, the phylogenetic position of Languriinae and the internal phylogenetic relationships of Erotylidae are still unclear.
Phylogenetic investigations of Erotylidae have remained at the level of morphology and mitochondrial (mt) markers for a long time. In recent years, mt genome has been widely used in the analysis of species heredity and molecular evolution due to their advantages such as small molecular weight, ease to operate, fast mutation speed, and maternal inheritance [20][21][22]. The complete mt genome can provide more informative genetic information as well as a suite of genomic-level characters, such as RNA secondary structures and gene arrangements in comparison with the individual mt genes. [23,24].
Currently, the study of mt genomes from Erotylidae is still scarce. A total of 10 mt genomes label as Erotylidae were available in GenBank (accession date, 9 January 2021). Of them, one mt genome (accession No. KT696227.1) was likely mislabeled, because manifested high sequence identity (99% for COX1 gene) with Thamiaraea americana. Another 8 mt genomes for 6 unidentified species of Erotylidae sp., Loberonotha olivascens, and Tritoma bipustulata, are highly incomplete. Here, the mt genomes of three species, Tritoma metasobrina Chûjô, 1941, Neotriplax arisana Miwa, 1929 and Episcapha opaca Heller, 1920 were sequenced using next-generation sequencing to better understand the diversity of mt genomes and the phylogenetic of Erotylidae [25][26][27]. A comparative mt genomic analysis of two subfamilies in Erotylidae was conducted including T. metasobrina, E. opaca, N. arisana (Erotylinae), and Languriinae sp. (Languriinae). The results will lay a foundation for the study of Erotylidae mitogenomes. The phylogenetic relationships were reconstructed to demonstrate the phylogenetic position of Erotylidae as well as to explore the phylogeny of Cucujoidea. The results will lay a foundation for the study of Erotylidae as well as Cucujoidea.
Sequencing libraries (Illumina NovaSeq) were prepared using genomic DNA with an average insert size of 400 bp, and were sequenced on the Illumina Hiseq platform with 150 bp paired-end reads at Majorbio (Shanghai, China). According to the principle of second-generation sequencing data, the original offline data (Raw Data) was subjected to fastp [28] quality control filtering to obtain Clean Data, and the analysis of Clean Data was performed according to GATK Best Practice for mutation detection. MitoZ [29] was used to conduct assembly and annotation, then checked by manual proofreading according to its relative species. TRNA scan-SE Search Server v1.21 [30] was used to identify the tRNA genes and then manually proofread. The secondary structures of two rRNAs (rrnS, rrnL) were predicted by RNA Structure (http://rna.urmc.rochester.edu/ RNAstructureWeb/) (accessed on 25 December 2020). [31]. All PCGs were annotated by alignment with homologous genes from T. metasobrina, N. arisana, and E. opaca using Geneious 8.0.5 software (Biomatters, Auckland, New Zealand) [32]. MEGA 7.0 was used to calculate the A + T content and relative synonymous codon usage (RSCU) for PCG analysis [33]. The bias of base usage was measured by AT-skew and was calculated to AT-skew = (A − T)/ (A + T) [34]. The numbers of synonymous substitutions (Ks) and nonsynonymous substitutions (Ka), and the ratios of Ka/Ks for each PCG were also measured in the software KaKs_Calculator 2.0. [35] (Table S1). MEGA 7.0 [33] was used to analyze the genetic distances based on Kimura-2-parameter among the 4 mt genomes were analyzed by. Genome organization and base composition, PCGs, RSCU, tRNAs, rRNAs, CR, intergenic spacer, and overlapping regions of the mt genomes were compared between Erotylidae species. The newly sequenced mitogenome sequences of T. metasobrina, E. opaca, and N. arisana were submitted to GenBank with the accession numbers MZ014622, MZ014623, and MZ014624, respectively.

Phylogenetic Analyses
In this study, the phylogenetic analyses were conducted based on 17 mt genomes from GenBank (Available online: http://www.ncbi.nlm.nih.gov) (accessed on 9 January 2021). including three newly sequenced ( Table 1). Most of the mt genomes chosen were complete. The ingroup taxa included 15 species from Cucujoidea representing 7 families and the outgroup taxa included Curculionidae and Chrysomelidae for their close relationships with Cucujoidea [36,37]. Clustal_X [38] was used to conduct DNA alignment from the amino acid alignment of PCGs. MEGA 7.0 was used to connect all alignment sequences excluding the stop codon. Bayesian inferences were conducted with PhyloBayes 3 [39] on DNA sequence alignment with the CAT-GTR model, and on the amino acid alignment with the CAT-Poisson model [40][41][42], respectively. Two Markov chain Monte Carlo (MCMC) chains were employed. The maximum likelihood analysis was conducted with IQ-Tree in Phylosuite [43,44] on DNA sequence alignment with the GTR + F + I + G4 model selected by ModelFinder [45]. Branch supports were evaluated through the ultra-fast bootstrapping method [46] with 1000 replicates (Table S1). The inferred phylogenetic trees were viewed and illustrated using FigTree [47].

General Features of Erotylidae mt Genomes
Including the newly sequenced mt genomes of T. metasobrina, E. opaca, and N. arisana, a total of four mt genomes representing three tribes and two subfamilies of Erotylidae were used in our comparative analyses. The mt genomes of four specimens were typical circular double-strand molecules, ranging from 15,581 bp to 16,502 bp in length. The mt genomes included 13 Protein-coding genes (PCGs), 22 transfer RNAs (tRNAs) and 2 ribosomal RNAs (rRNAs), and the control region (CR) (Figure 1). Twenty-three genes were encoded at the majority strand (J), while the remaining 14 genes were oriented at the minority strand (N). The 4 species displayed a 929-1785 bp CR, identified by the position between the rrnS and tRNA Ile . The overlap nucleotides were located between tRNA Tyr /COX1 of T. metasobrina (8 bp), tRNA Trp /tRNA Cys of E. opaca (8 bp), ATP8/ATP6 of N. arisana (4 bp) and ATP8/ATP6 and ND4/ND4L of Languriinae sp. (7 bp). In addition to the CR, the largest non-coding region of these 4 species was located between tRNA Ile and tRNA Gln in T. metasobrina with a length of 145 bp ( Table 2).
The mt genomes of Erotylidae had a significant bias towards A and T, with the nucleotide composition of A and T in total ranging from 74.80% in T. metasobrina to 78.70% in E. opaca [48]. The content of A + T in the PCGs, tRNAs, rRNAs and CR were ranging from 72.90% to 77.79%, 77.48% to 79.20%, 80.20% to 80.4% and 75.50% to 81.59%. AT-skews and GC-skews were used to measure the strand bias of nucleotide composition of metazoans mt genomes [49]. The AT-skew was −0.01 to 0.04 and the GC-skew was −0.29 to −0.20 (Table 3).

Protein-Coding Genes
Most PCGs started with ATN, except for several genes that started with GTG or TTG. The PCGs stopped with TAA/TAG or truncated termination codons with T/TA-tRNA [50]. The start codons of two PCGs (ND1 and ND5) and stop codons of one PCG (ATP8) were illustrated differences between Languriinae sp. and the other three species ( Table 2).
The number of each codon, the amino acid(aa) compositions of PCGs, and relative synonymous codon usage (RSCU) values in four species were given in Figure 2. A total of 3641-3691 codons were used excluding termination codons. The frequency of A and U in the third site of four species was much higher than C and G. The four most frequently used codons were CGA, AGA, CCU, and GGA. The highest value of RSCU was 2.84 of AGA in Languriinae sp. The lowest value of RSCU was 0 of UGC and CUG in E. opaca. Leucine, Isoleucine, Phenylalanine, and Methionine were the most frequent coding amino acids (aa). The most frequently used codons were TTA, ATT, TTT, and ATA indicating the preference of nucleotide composition A/T.   Analysis of pairwise genetic distance showed similar results with ND6 (0.641), ATP8 (0.581), and ND2 (0.552) evolving relatively faster, while COX1 (0.256) and ND1 (0.274) were slower (Figure 3). Average non-synonymous (Ka)/synonymous (Ks) substitution rate ratios can be used to estimate the evolutionary rate of mitogenome PCGs [51]. We calculated Ka/Ks ratios for each PCG of 4 species in this study, which was shown in Figure 3. Ratios ranged from 0.012 for COX1 to 0.118 for ND6, which indicated that all PCGs were under purifying selection [52]. Our results showed ND6 and ATP8 exhibited relaxed purifying selection, while COX1 was under the strongest purifying selection. Analyses of nucleotide diversity, genetic distance, and evolutionary rate are useful for designing specific markers among different groups. Like other Coleoptera [53], our comprehensive analysis showed that COX1 had the lowest evolution rate and evolves under comparative relaxed purifying selection, two genes (ND6 and ATP8) exhibited a faster evolution rate and diversity than other PCGs. Analysis of pairwise genetic distance showed similar results with ND6 (0.641), ATP8 (0.581), and ND2 (0.552) evolving relatively faster, while COX1 (0.256) and ND1 (0.274) were slower (Figure 3). Average non-synonymous (Ka)/synonymous (Ks) substitution rate ratios can be used to estimate the evolutionary rate of mitogenome PCGs [51]. We calculated Ka/Ks ratios for each PCG of 4 species in this study, which was shown in Figure  3. Ratios ranged from 0.012 for COX1 to 0.118 for ND6, which indicated that all PCGs were under purifying selection [52]. Our results showed ND6 and ATP8 exhibited relaxed purifying selection, while COX1 was under the strongest purifying selection. Analyses of nucleotide diversity, genetic distance, and evolutionary rate are useful for designing specific markers among different groups. Like other Coleoptera [53], our comprehensive analysis showed that COX1 had the lowest evolution rate and evolves under comparative relaxed purifying selection, two genes (ND6 and ATP8) exhibited a faster evolution rate and diversity than other PCGs.

Transfer RNAs, Ribosomal RNAs, and Control Region
The predicted secondary structures of tRNAs were shown in Figure 4. The 22 tRNAs of four species were typical and included all 20 types of amino acids (aa). Almost all tRNAs could be folded into cloverleaf structures except tRNA Ser(AGN) whose DHU arm simply formed a loop. In the tRNAs secondary structure of the 4 species, the anticodons were consistent. The anticodon of tRNA Ser (AGN) was UCU, which was not a common GCU. UCU was used as the anticodon for the polyphagia group, which was the common form of the metazoans [48]. The aminoacyl (AA) stem length was conservatively 7 bp. Anticodon (AC) arm length was 5 bp except for tRNA Lys , tRNA Met , and tRNA Leu(UUR) . Almost all tRNAs had the same length of the anticodon (AC) loop (7 nucleotides) except tRNA Leu(UUR) (9 nucleotides). The length of the TψC arm varied from 4 to 6 bp while the TψC loop from 3 to 11 nucleotides. The dihydrouridine (DHU) stem varied from 3 to 4 bp except for tRNA Ser(AGN) . 11-13 pairs of G-U which form weak attraction and constitute bonds situated at the T arm, the AA arm, the AC arm, and the DHU arm, respectively.

Transfer RNAs, Ribosomal RNAs, and Control Region
The predicted secondary structures of tRNAs were shown in Figure 4. The 22 tRNAs of four species were typical and included all 20 types of amino acids (aa). Almost all tRNAs could be folded into cloverleaf structures except tRNA Ser(AGN) whose DHU arm simply formed a loop. In the tRNAs secondary structure of the 4 species, the anticodons were consistent. The anticodon of tRNA Ser (AGN) was UCU, which was not a common GCU. UCU was used as the anticodon for the polyphagia group, which was the common form of the metazoans [48]. The aminoacyl (AA) stem length was conservatively 7 bp. Anticodon (AC) arm length was 5 bp except for tRNA Lys , tRNA Met , and tRNA Leu(UUR) . Almost all tRNAs had the same length of the anticodon (AC) loop (7 nucleotides) except tRNA Leu(UUR) (9 nucleotides). The length of the TψC arm varied from 4 to 6 bp while the TψC loop from 3 to 11 nucleotides. The dihydrouridine (DHU) stem varied from 3 to 4 bp except for tRNA Ser(AGN) . 11-13 pairs of G-U which form weak attraction and constitute bonds situated at the T arm, the AA arm, the AC arm, and the DHU arm, respectively. The rrnL was located in tRNA Leu (CUN) and tRNA Val , and its length ranged from 1052 to 1379 bp. The longest rrnL was shown in Languriinae sp. The shortest rrnL was shown in E. opaca. The rrnS was located in tRNA Val and CR in the four mt genomes of Erotylidae with the length ranging from 759 to 777 bp. The shortest rrnS was shown in E. opaca. The longest rrnS was shown in N. arisana. The AT content of rrnL and rrnS ranged from 80.70% to 81.70% and 77.90% to 81.03%, respectively, which showed a high AT bias. The highest AT content of rrnL and rrnS was found in T. metasobrina and E. opaca (Table 3).
The largest noncoding region located between rrnS and tRNA Ile was CR, which is the place that controls replication and transcription [54,55]. The length of CRs were determined and ranged from 929 bp in E. opaca to 1785 bp in T. metasobrina (Figure 1). This region contained the highest proportion of A and T, ranging from 75.5% to 81.59%. The AT-skew was −0.02 to 0.17 and the GC-skew was −0.66 to −0.27, which indicated that A and T are more numerous than C and G (Table 3).

Phylogenetic Analyses
Phylogenetic analyses were performed both on the nucleotide dataset (PCG123R) and the amino acid dataset (AA). The PCG123R dataset includes 12,592 sites for 13 PCGs and two rRNA genes. The AA dataset is composed of 3592 amino acids translated from the PCGs. The results of phylogenetic analyses were shown in Figure 5. Irrespective of the methods used, analyses on the PCG123R dataset and the AA dataset reached the same topology ( Figure 5). Most nodes were highly supported.

Conclusions
The mt genomes sequences of T. metasobrina, E. opaca, N. arisana, and Languriinae sp. were obtained by next-generation sequencing. Their mt genomes had a total length of 16,502 bp in T. metasobrina, 15,581 bp in E. opaca, 16,478 bp in N. arisana, and 16,082 bp in Languriinae sp., respectively. Each of the mt genomes was composed of 13 PCGs, 2 rRNAs, 22 tRNAs, and a control region. In general, mt genomes for members of Erotylidae were similar in genome size and gene order to other species of Cucujoidea. The base composition was biased towards A and T as was commonly found in other insects [48]. However, the content of A + T is the highest in rrnL among four species of Erotylidae were have compared here. Most PCGs were initiated with the typical ATT/ATG codon and terminated with TAA/TAG or T/TA-tRNA. However, the start codon for ND1 of all three spe- Erotylidae was recovered as monophyletic in all analyses. This result was also supported based on morphological characters [4][5][6] and molecular evidence [7,8]. The sistergroup relationship between Erotylidae and "Cryptophagidae-Laemophloeidae-Cucujidae-Silvanidae'-'Monotomidae-Nitidulidae'" was highly supported and together being the sister group to Silvanus bidentatus (Silvanidae) in all analyses. The result was also supported by analyzing nuclear protein-coding genes in Coleoptera [56]. Within the family Erotylidae, three species of Erotylinae were as a monophyletic group, sister to Xenoceninae. The sister relationship between Languriinae and other sampled subfamilies (Erotylinae and Xenocelinae) was recovered by analyses of both BI and ML. The result showed that Languriinae should belong to Erotylidae, which supported the earlier results from Crotch, Leschen, Wegrzynowicz, and Robertson [11][12][13][14].

Conclusions
The mt genomes sequences of T. metasobrina, E. opaca, N. arisana, and Languriinae sp. were obtained by next-generation sequencing. Their mt genomes had a total length of 16,502 bp in T. metasobrina, 15,581 bp in E. opaca, 16,478 bp in N. arisana, and 16,082 bp in Languriinae sp., respectively. Each of the mt genomes was composed of 13 PCGs, 2 rRNAs, 22 tRNAs, and a control region. In general, mt genomes for members of Erotylidae were similar in genome size and gene order to other species of Cucujoidea. The base composition was biased towards A and T as was commonly found in other insects [48]. However, the content of A + T is the highest in rrnL among four species of Erotylidae were have compared here. Most PCGs were initiated with the typical ATT/ATG codon and terminated with TAA/TAG or T/TA-tRNA. However, the start codon for ND1 of all three species of Erotylinae, T. metasobrina, E. opaca, and N. arisana is TTG, while that for species of Languriinae is ATT. The stop codons for ATP8 are TAG in Languriinae species but are TAA in Erotylinae species. The start codons of ND5 were GTG in Languriinae sp., not typical ATN condon [47]. The phylogenetic reconstructions with BI and ML methods all supported the monophyly of Erotylidae and Erotylinae. This representative of Languriinae fitted well to the clade Erotylidae as shown in previous research. This subfamily was recovered being the sister group to 'Erotylinae-Xenocelinae'. However, only one mt genome of Languriinae and one mt genome of Xenocelinae were sampled in this study. To better understand the phylogenetic position as well as the relationships within Erotylidae, more mt genomes from Xenocelinae and Languriinae will be needed in the future. Due to the few mt genomes in Cucujoidea, increased taxon sampling is required to resolve the relationships within the superfamily Cucujoidea.