Comparative Analysis of Two Pear Pests, Cacopsylla jukyungi and Cacopsylla burckhardti (Hemiptera: Psyllidae), Based on Complete Mitochondrial Genomes and Comparison to Confamilial Species

: Mitochondrial genome sequences have been used in diverse fields of biology. In this study, we sequenced the complete mitochondrial genomes (mitogenome) of two pear pests: Cacopsylla jukyungi , the most damaging insect pest to commercial pears in South Korea, and Cacopsylla burckhardti (Hemiptera: Psyllidae). The two mitogenomes were compared to confamilial species to accumulate genetic information and understand evolutionary characteristics of the family Psyllidae. The 15,438 bp-and 14,799 bp-long complete mitogenomes of C. jukyungi and C. burckhardti, respectively, had many features typical of insect mitogenomes; however, at 1283 bp, the C. jukyungi mitogenome had an unusually long A+T-rich region, which was composed of two identical 540-bp repeat sequences. Among the intergenic spacer regions, the one located at the ND1 and trnS 2 junction was relatively well conserved in length (mostly within 23–36 bp). This region had a high sequence identity in all Psyllidae, possessing a 5-bp consensus sequence (CGGTA), which is speculated to have a functional role. Though the A+T-rich region in available Psyllidae mitogenomes varied substantially in length (662–1430 bp) and sequence divergence, all species had a conserved sequence stretch at the 3 (cid:48) -end of srRNA , which is also speculated to have a functional role. Genetic divergence among genes indicated the lowest variability in srRNA , lrRNA , and COI , whereas ATP8 and ND6 showed the highest variability at both family and genus ( Cacopsylla ) levels. Our data provide evidence that the family Psyllidae, including current C. jukyungi and C. burckhardti , have evolutionary unique features that were previously undetected, along with the unique A+T-rich region structure in C. jukyungi .


Next-Generation Sequencing
For library construction, 50 ng of genomic DNA was isolated and randomly sheared using the Covaris System (Woburn, MA, USA) to generate inserts of~300 bp fragments. Library construction was performed using the TruSeq Nano DNA Kit (Illumina, San Diego, CA, USA) following the manufacturer's instructions. The quality and size of DNA libraries were assessed using the Agilent 2100 BioAnalyzer (Agilent Technologies, Palo Alto, CA, USA). Libraries were quantified by qPCR using the CFX96 Real-Time System (BioRad, Hercules, CA, USA). After normalization, sequencing of the prepared library was conducted on the MGISEQ-2000 sequencing platform (MGI Tech Co. Ltd., Shenzhen, China). A quality analysis of the raw sequence data was performed using FastQC software [34]. Adapter sequence reduction and trimming of 5 -and 3 -ends were performed using Skewer ver. 0.2.2. [35]. Base-calling errors or insertions/deletions (indels) were corrected from the filtered set of reads using the alignment-based error correction tool Karect [36]. Consequently,~3.99 giga-bases (Gb) of nucleotides from 133 million C. jukyungi reads and 4.38 Gb of nucleotides from 146 million C. burckhardti reads were obtained. The Phred quality score (Q) indicated that base call accuracy was 87.75% for C. jukyungi and 88.15% for C. burckhardti from the Q30 score.

Assembly and Gap Filling
The two mitogenomes were constructed using MITObim ver. 1.9 [37] by de novo assembly of the Illumina reads with Cacopsylla citrisuga [25]. The assembled mitogenomes were remapped with the whole genome sequence reads using Bowtie2 [38] before conducting manual curation. Mismatch calling and correction of the assembled sequences were conducted using GATK [39]. Finally, primary annotation of protein-coding genes (PCGs), tRNAs, rRNAs, and the A+T-rich region of each mitogenome was carried out using the MITOS Web Server [40] (http://mitos.bioinf.uni-leipzig.de/index.py, accessed on 10 April 2022).
Gap filling was usually not necessary, but the C. jukyungi A+T-rich region was unexpectedly long (1282 bp) and needed confirmation. Thus, a long fragment (LF) encompassing lrRNA and COI, in which the A+T-rich region is included, was amplified using a primer designed from the C. jukyungi mitogenome. Amplification of the LF was conducted using LA Taq (Takara Biomedical, Tokyo, Japan) under the following conditions: 94 • C for 4 min, 30 cycles at 98 • C for 10 s and 48 • C for 15 min, and a final extension step at 72 • C for 10 min. The PCR product was purified with a PCR purification kit (Bioneer, Daejeon, Korea) and sequenced using barcode-tagged sequencing technology (Celemics, Inc., Seoul, Korea) and the Illumina MiSeq platform (Illumina, San Diego, CA, USA).

Gene Annotation
The identification, boundary delimitation, and secondary structure folding of tRNAs were performed using the MITOS Web Server by following the protocols presented by Cameron [41]. Twenty-one tRNA genes were well-identified, but trnS1, which has a truncated dihydrouridine (DHU) arm, was detected using a hand-drawn secondary structure in conjunction with an alignment of the predicted trnS1 regions of C. citrisuga [25]. Start and stop codons of PCGs were further confirmed by alignment against mitochondrial PCGs of Psyllidae species. The nucleotide sequences of the PCGs were translated based on the invertebrate mitochondrial DNA (mtDNA) genetic code. Sequence data were deposited into the GenBank database under accession numbers ON553958 for C. jukyungi and ON411626 for C. burckhardti.

Comparative Genome Analyses
For the comparative analysis, 11 Psyllidae mitogenomes were downloaded from the GenBank database. These sequences, along with the two mitogenome sequences obtained in the present study, were compared for several genomic characteristics. The A/T content of each gene and whole genome was calculated using DNASTAR (Madison, WI, USA). Codon usage was determined by MEGA 6 [42], and the gene overlap and intergenic spacer sequences were hand-counted. The genetic distance for each gene was calculated at within-Psyllidae and within-Cacopsylla levels using unrooted pairwise distances estimated with PAUP ver. 4.01b10 [43]. Genetic distances were then plotted with minimum, maximum, and median values presented with boxplots using JMP software ver. 15.0.0 (SAS Institute, Carry, NC, USA).

General Mitochondrial Genome Features
Mitogenome sizes were 15,438 bp in C. jukyungi and 14,799 bp in C. burckhardti (Table 1). These sizes are within the range previously reported for complete Psyllidae mitogenomes, which range from 14,790 bp (Arytainilla spartiophila) to 16,047 bp (Russelliana solanicola) ( Table 2). The two mitogenomes contained 37 genes (13 PCGs, 22 tRNA genes, and two rRNA genes), which is typical in animals, and one major non-coding A+T-rich region [44]. In both species, thirteen PCGs had the typical ATN start codon, whereas the stop codons were incomplete in COII, ND5, and ND4 in C. jukyungi and COII, COIII, ND5, and ND4 in C. burckhardti; in these cases, stop codons were present only as a single thymine ( Table 1). The gene arrangement was identical to that typically observed in other insects [41].
The number of codons of the two species (3598 and 3597 in C. jukyungi and C. burckhardti, respectively) was well within the range found in other Psyllidae species (Table 2). Regions with the highest A/T content were, in descending order, the A+T-rich region, 82.20%; srRNA, 77.06%; lrRNA, 76.40%; tRNAs, 75.29%; whole genome, 73.69%; and PCGs, 72.10% in C. jukyungi. On the other hand, C. burckhardti A/T content was greater in tRNAs (76.38%) than in lrRNA (76.09%) compared to that of C. jukyungi. Similarly, other species of Psyllidae also showed variable order in A/T content among genes, such that srRNA, lrRNA, and tRNAs were variable, whereas those of the A+T-rich region, whole genome, and PCGs were consistent ( Table 2).
A total of 22 tRNA genes (one for each amino acid and two for leucine and serine) were identified for each mitogenome sequenced in this study ( Figure S1 for C. jukyungi and Figure S2 for C. burckhardti). All tRNAs except trnS1, which lacked the DHU loop, folded into the expected cloverleaf secondary structures in both species. This incomplete trnS1 structure has been detected frequently in the mitogenomes of other animals, including insects [45]. The postulated tRNA cloverleaf structure of C. jukyungi and C. burckhardti harbors an invariant 7 bp in the aminoacyl stem, 5 bp in the anticodon stem, and 7 bp in the anticodon loop, whereas the DHU and TΨC arms, particularly within the loops, were variable in length (3-5 bp in stems, but 0 bp in trnS1 DHU stems; 3-9 bp in loops; Figures S1 and S2). A total of 38 and 40 unmatched base pairs were detected in C. jukyungi and C. burckhardti tRNAs, respectively. Of these unmatched base pairs, 18 and 19 were G-U base-pairs, while 20 and 21 were non-Watson-Crick in C. jukyungi and C. burckhardti, respectively. The biased A/T content shown in the whole genome and genes reflects a biased usage of A/T-containing codons (Table 3). Among 64 codons, TTA (leucine), ATT (isoleucine), TTT (phenylalanine), and ATA (methionine), which are composed only of A and T nucleotides, were most frequently used, accounting for 31.43% of codons in C. jukyungi and 32.0% in C. burckhardti. In other species of Psyllidae, this bias is also obvious, with the frequency of these A/T-containing codons ranging from 28.91% in Cacopsylla coccinea to 36.59% in a Heteropsylla sp. (Table 3). tRNAs are denoted as one-letter symbols in accordance with the IUPAC-IUB single-letter amino acid codes, except those encoding leucine and serine, which are labeled L1 for the CTN codon family, L2 for the TTR codon family, S1 for the AGN codon family, and S2 for the TCN codon family. * D, direction (F and R, forward and reverse transcriptional directions, respectively) and ** AC, anticodon.

Intergenic Spacer Sequences
The genes of C. jukyungi and C. burckhardti mitogenomes were interleaved with 84 and 53 bp intergenic spacer sequences (ISSs), which were spread over 13 and 10 regions, respectively, ranging in size from 1 to 27 bp in both species (Table 4). In contrast, genes (including the A+T-rich region) overlapped by 39 and 43 bp in 9 and 10 regions in C. jukyungi and C. burckhardti, respectively, with the size of these overlapping sequences  (Table 4). The majority of ISS are short (1-10 bp), with a few exceptions, and variable in their position and length. Other species of Psyllidae also present similar overlapping sequence and ISS patterns to those of the two species sequenced in this study, with some notable exceptions. Cyamophila willieti [26] has an exceptionally long 871 bp ISS at the trnI-trnQ junction, whereas other species have either no ISS, a 2-4-bp long ISS, or a 3-bp long overlap at this junction. Furthermore, Freysuila caesalpiniae and R. solanicola [30] also have longer ISS at the trnS 2 and ND1 junction (273 and 469 bp, respectively; Table 4). An ISS at the trnS 2 and ND1 junction is common in all species of Psyllidae except F. caesalpiniae and R. solanicola. The length of this ISS is highly conserved at 23-36 bp, with the length in the two species considered in this study both being 27 bp. Sequence alignment of this junction allowed us to detect a well-conserved stretch of pentanucleotides, CGGTA, at the trnS 2 and ND1 junction of all species, regardless of ISS length (Figure 1).

The A+T-Rich Region
The A+T-rich region of C. jukyungi was 1282 bp in size, which is nearly two-fold larger than that of C. burckhardti (662 bp; Table 2). Likewise, the size of this region is highly variable among species in Psyllidae, ranging from 355 bp (Heteropsylla sp.) to 1430 bp (R. solanicola) ( Table 2). The C. jukyungi sequenced in this study showed exceptional structure, having the second-longest A+T-rich region in the family. It had two identical 540-bp repeat sequences surrounded by 46, 52, and 104 bp of non-repeat sequences, which do not have any homology (Figure 2A).

The A+T-Rich Region
The A+T-rich region of C. jukyungi was 1282 bp in size, which is nearly two-fold larger than that of C. burckhardti (662 bp; Table 2). Likewise, the size of this region is highly variable among species in Psyllidae, ranging from 355 bp (Heteropsylla sp.) to 1430 bp (R. solanicola) ( Table 2). The C. jukyungi sequenced in this study showed exceptional structure, having the second-longest A+T-rich region in the family. It had two identical 540-bp repeat sequences surrounded by 46, 52, and 104 bp of non-repeat sequences, which do not have any homology (Figure 2A).
The longest R. solanicola A+T-rich region is more notable, having both a tandem repeat region and a peculiar poly-running microsatellite DNA-like region ( Figure 2B). The tandem repeat region consisted of nine complete copies (49 bp) and one partial copy (28 bp), which lacks the end part. Eight of the nine copies are identical, but the fifth copy has one substitution (C T) compared to the common copy. This repeat region is surrounded by 184 and 59 bp of non-repeat sequences. This 460-bp long repeat region is one source of longer sequences in this species, but the species additionally has a microsatellite DNA-like region ( Figure 2B). This microsatellite-like region is unusual in Psyllidae, in that each copy is blended with different types of nucleotide repeats, such as CA, TA, GT, and ATA, infrequently interrupted by non-repeat sequences. This region also spans~323 bp, making it another source of longer A+T-rich regions in this species. In other species of Psyllidae, such a microsatellite-like region also is present, but the length is much shorter than that of R. solanicola and is not blended with different nucleotide repeats as much as those of R. solanicola (data not shown).
In a comparison of the A+T-rich region, one highly conserved sequence stretch (conserved element [CE]), which spans~33 bp, was detectable in Psyllidae, including in the two species analyzed in this study ( Figure 2C). Located closer to the 5 -end of srRNA (beginning of the A+T-rich region), the CE was composed of poly-T and poly-A sequences, each of which abuts non-poly-running sequences. This CE was present in all species of Psyllidae, and was highly conserved in sequence identity and position in the family. In the case of C. jukyungi, which has two identical 540-bp repeat sequences, this CE also was duplicated within the repeats (Figure 2A). The longest R. solanicola A+T-rich region is more notable, having both a tandem repeat region and a peculiar poly-running microsatellite DNA-like region ( Figure 2B). The tandem repeat region consisted of nine complete copies (49 bp) and one partial copy (28 bp), which lacks the end part. Eight of the nine copies are identical, but the fifth copy has one substitution (C ⇨ T) compared to the common copy. This repeat region is surrounded by 184 and 59 bp of non-repeat sequences. This 460-bp long repeat region is one source of longer sequences in this species, but the species additionally has a microsatellite DNAlike region ( Figure 2B). This microsatellite-like region is unusual in Psyllidae, in that each copy is blended with different types of nucleotide repeats, such as CA, TA, GT, and ATA, infrequently interrupted by non-repeat sequences. This region also spans ~323 bp, making it another source of longer A+T-rich regions in this species. In other species of Psyllidae, such a microsatellite-like region also is present, but the length is much shorter than that of R. solanicola and is not blended with different nucleotide repeats as much as those of R. solanicola (data not shown).
In a comparison of the A+T-rich region, one highly conserved sequence stretch (conserved element [CE]), which spans ~33 bp, was detectable in Psyllidae, including in the two species analyzed in this study ( Figure 2C). Located closer to the 5′-end of srRNA (be-

Potential Motif Sequences in Intergenic Spacer Sequences
In contrast to the variability in length and position of the ISSs, one located at the trnS2 and ND1 junction is conserved in majority of Psyllidae and, more importantly, all species of Psyllidae consistently have penta-nucleotides, CGGTA at the junction (Figure 1). Considering that the evolutionary pressure on nucleotide substitution is higher in the ISS than in genic regions, an identical sequence stretch in an ISS may not be maintained in all members of Psyllidae if no functional role is granted. Previously, the ISS at the trnS2 and ND1 junction in insects was reported to contain conserved motif sequences, specifically TTAG-

Potential Motif Sequences in Intergenic Spacer Sequences
In contrast to the variability in length and position of the ISSs, one located at the trnS 2 and ND1 junction is conserved in majority of Psyllidae and, more importantly, all species of Psyllidae consistently have penta-nucleotides, CGGTA at the junction (Figure 1). Considering that the evolutionary pressure on nucleotide substitution is higher in the ISS than in genic regions, an identical sequence stretch in an ISS may not be maintained in all members of Psyllidae if no functional role is granted. Previously, the ISS at the trnS 2 and ND1 junction in insects was reported to contain conserved motif sequences, specifically TTAGTAT in the order Lepidoptera, TAGTA in the order Coleoptera, and TAGTA in the hemipteran superfamily Fulgoroidea, with a slight sequence alteration in some species [17,[46][47][48]. This motif sequence has been noted for its role as a binding site for the mitochondrial transcription termination peptides (mtTERM), which terminate the transcription of PCGs [49,50]. Consistently located after CytB, which is the last PCG encoded in the major strand in the circular mitogenome, the motif sequence was identified as the site responsible for the binding of the mtTERM that signals the termination of transcription after CytB is transcribed. Currently, we strongly speculate that these pentanucleotide stretches found in Psyllidae may have such a functional role; however, further sampling of this region, along with studies of the region's functional role, will be required to determine the precise location and length of the motif.

The A+T-Rich Region Structure and Conserved Element
The size variation of the A+T-rich region among Psyllidae was substantial (355-1430 bp; Table 2), and such large variation resulted from the presence of large repeat sequences in C. burckhardti and R. solanicola (Figure 2A,B). One of the common interpretations of the occurrence of such identical repeats in animal mitogenomes includes slipped-strand mispairing, in concert with unequal crossing over during DNA replication, resulting in an expanded repeat [51][52][53].
Previous studies have analyzed the A+T-rich region of some hemipteran groups, although no species in Psyllidae were considered [17,54,55]. These studies also found that hemipteran species often have a long A+T-rich region (>1500 bp in most and >2000 bp in a few species) and have tandem repeat sequences of variable copy number and length. For example, three species of Flatidae in the hemipteran infraorder Fulgoromorpha had 1702-1836 bp-long A+T-rich regions. These A+T-rich regions are commonly structured into four smaller regions, composed of one repeat region, a large non-repeat region, another repeat region, and a short non-repeat region [17]. However, copy number and length in each repeat region differed among all species, although each species commonly had four dividable regions. Therefore, the structural composition of the A+T-rich region in the Fulgoromorpha, particularly Flatidae, may require further analysis before generalizations can be made. Similarly, generalizing about the A+T-rich region in Psyllidae also is difficult in that, for the two species that have the longest A+T-rich regions (R. solanicola and C. jukyungi), these regions' structure differs. In addition, other species in Psyllidae do not have any notable repeat elements. Thus, an extended exploration of the A+T-rich region of Psyllidae will be necessary to generalize the structural composition of this region across the family.
Due to the high variability in length and composition of the A+T-rich region in Hemiptera, detection of CE, except for simple poly-running sequences, has not been feasible [17,54,55]. However, a comparison of this region allowed us to detect a CE within the region ( Figure 2C). Such a CE in the insect A+T-rich region has been searched for because it contains a stretch of sequences that are responsible for functional roles such as signaling for replication and transcription initiation [56][57][58]. In the insect order Lepidoptera, an ATAGA motif that is located immediately upstream of the poly-T sequence, close to the 5 -end of srRNA in the A+T-rich region, has been suggested to be the precise position of the origin of replication. The poly-T sequence has been suggested to function as a structural signal for protein recognition in the initiation of replication for minor-strand mtDNA in Bombyx mori Linnaeus (Bombycidae: Lepidoptera) [58]. Indeed, both the motif and poly-T sequences have been found to be well-conserved in diverse taxonomic groups within Lepidoptera, although the length of poly-T sequence varies [24,47,59]. Currently, no functional study of the hemipteran mitogenome that includes the CE of the A+T-rich region is available. However, the findings that the CE was located at a relatively conserved position with high sequence homology in all species of Psyllidae may suggest that the CE has a functional signal, although current understanding is too limited to make such a conclusion.

Individual Gene Divergence
The species in the genus Cacopsylla have relatively higher sequence variation even in the genes with lower variation (10.86-14.86% in srRNA, lrRNA, and COI). Thus, the result of current mitogenome comparison emphasizes that pest species identification in the genus can casually be possible with any mitochondrial gene, although two other Cacopsylla species occurring in Korea (C. maculatili and C. sandolbaea) are additionally required for their mitogenome sequences for further accurate estimation of sequence divergence. Given the limited availability in mitogenome sequences for conspecifics, further data are required, but available data indicate that COI and ND5, which were not highly variable relative to other genes at both the family and genus levels, were the most variable genes at the intra-species level. These results suggest that evolutionary pressure for nucleotide substitution among mitochondrial genes differs among taxonomic levels. Thus, for population-level studies, particularly those involving C. burckhardti, COI and ND5 could be suitable markers to uncover population structures if mtDNA is considered. On the other hand, ATP8 and ND6 can be considered for phylogenetic inference up to the family level if gene segments of mtDNA are considered.
In the case of Fulgoroidea, in which 2-8 mitogenomes are available per species, a higher genetic divergence was detected in ATP6, ND5, and COI in Laodelphax striatellus; ND5, lrRNA, and COIII in Lycorma delicatula; ND2, ND4L, and ATP6 in Nilaparvata lugens; and ND4, ND2, and srRNA in Sogatella furcifera [17]. This result differs from that of the C. burckhardti mitogenome, for which the three most variable genes were ND5, COI, and COIII, although these genes appeared at least once as highly variable genes in L. striatellus and L. delicatula. These results further reinforce the importance of understanding variability among mitochondrial genes of target species to draw meaningful results in the study of intraspecific variation. One of the utilities of mitogenome sequences is their role as molecular markers for intraspecific variation. These markers are involved in diagnostics (e.g., origins of inset-driven foods), history tracing (e.g., origins of invasive populations, genetically compatible populations for restoration), and population dynamics (e.g., genetic diversity and gene flow), in which one or more portions of the mitochondrial genic regions are used [17,18,[22][23][24]60,61]. In such studies, COI is often used to trace population history and dynamics as well as conduct species identification using DNA barcoding [21,62]. Nevertheless, this gene alone infrequently provided low variability and low numbers of haplotypes within species, particularly in introduced species [24,[63][64][65][66][67]. In such cases, understanding the variability among mitochondrial genes may provide the knowledge needed to select the proper genes for the study of population structure and diversity.

Conclusions
The addition of two mitogenome sequences to the Psyllidae family that contains notorious pear pests will enhance our understanding of the mitogenome characteristics of the family, particularly by comparative approach, which we employed firstly for the family in this study. The Psyllidae including the two mitogenomes sequenced in this study have several features shared with other members (e.g., codon usage, arrangement, presence of motif sequences at the trnS 2 and ND1 junction, and presence of a conserved element in the A+T-rich region). However, the family Psyllidae, in detail, has divergent characteristics compared to non-Psyllidae members: conserved CGGTA at the trnS 2 and ND1 junction, a conserved element in the A+T-rich region, and different groups of genes with higher or lower intra-specific variation. Furthermore, the C. jukyungi sequenced in this study is unique in that the A+T-rich region has two identical 540-bp repeat sequences surrounded by non-repeat sequences, whereas the majority of other species in Psyllidae do not have repeat sequences. Our data suggest that sequencing of C. jukyungi and C. burckhardti mitogenomes and comparing their genomic characteristics to the family Psyllidae offers a promising avenue for pursuing further mitogenome sequences of the family members and extended taxonomic groups for a better understanding of the evolution of mitogenome characteristics.

Informed Consent Statement: Not applicable.
Data Availability Statement: The genome sequence data supporting this study's findings are openly available in GenBank of NCBI at https://www.ncbi.nlm.nih.gov under the accession nos. ON553958 for C. jukyungi and ON411626 for C. burckhardti. The associated BioProject, Sequence Read Archive, and Bio-Sample numbers are PRJNA843483 for both species, SRX16308894 and SAMN29783646 for C. jukyungi, and SRX15496259 and SAMN28744766 for C. burckhardti, respectively.

Conflicts of Interest:
The authors declare no conflict of interest.