An Improved Chromosome-Level Genome Assembly of the Firefly Pyrocoelia pectoralis

Simple Summary The nocturnal firefly Pyrocoelia pectoralis is an endemic and endangered species in China. To more fully understand the role of sexual dimorphism and the evolution of courtship behavior in this species and improve its conservation, an improved chromosome-level genome assembly of P. pectoralis was conducted, and a high-quality draft of the genome was generated. Our research provides an insight into the evolution of courtship behavior in fireflies. Abstract The endemic and endangered Chinese firefly Pyrocoelia pectoralis is a sexually dimorphic, nocturnal species. A previous attempt by this team to assemble a draft genome of P. pectoralis using PacBio and Illumina HiSeq X Ten platforms was limited in its usefulness by high redundancy and contamination. This prompted us to conduct an improved chromosome-level genome assembly of P. pectoralis. Ten chromosomes were further assembled based on Hi-C data to a 532.25 Mb final size with a 52.87 Mb scaffold N50. The total repeat lengths in the genome of P. pectoralis amount to 227.69 Mb; 42.78%. In total, 12,789 genes could be functionally annotated using at least one public database. Phylogenetic inference indicated that P. pectoralis and P. pyralis diverged ~51.41 million years ago. Gene family expansion and contraction analysis of 12 species were performed, and 546 expanded and 2660 contracted gene families were identified in P. pectoralis. We generated a high-quality draft of the P. pectoralis genome. This genome assembly should help promote research on the species’ sexual dimorphism and its unique courtship behavior, which involves a combination of pheromonal and bioluminescent signals. It also can serve as a resource for accelerating genome-assisted improvements in the conservation of this species.


Introduction
Fireflies (Coleoptera: Lampyridae) are the most common representatives of terrestrial bioluminescent animals [1][2][3].The firefly Pyrocoelia pectoralis (Olivier) (Lampyridae: Lampyrinae) is endemic to China and has terrestrial larvae, which play an important ecological role as biological agents to control the land snail Bradybaena ravida [4].The snail is widely distributed throughout China, Japan, Korea, and Russia, and the species is known to cause considerable damage to various vegetables, peaches, grapes, and corn [5].During courtship, the stationary females of P. pectoralis glow continually and are presumed to also emit a sex pheromone to attract flying, bioluminescent males [6].The adults of this species are sexually dimorphic (Figure 1) with wingless females that are considerably larger than the males (♂15 mm versus ♀25 mm total body lengths).In many ways, their pre-copulation Insects 2024, 15, 43 2 of 12 behavior resembles the firefly Rhagophthalmus ohbai Wittmer, which was analyzed and described by Lau et al. [7], who showed that the males in search of their wingless females use their large eyes with elevated spectral sensitivity to light of longer wavelengths in the ventral eye half to detect the yellowish light emitted by the females on the ground.
Insects 2024, 15, 43 2 of 12 considerably larger than the males (♂ 15 mm versus ♀ 25 mm total body lengths).In many ways, their pre-copulation behavior resembles the firefly Rhagophthalmus ohbai Wittmer, which was analyzed and described by Lau et al. [7], who showed that the males in search of their wingless females use their large eyes with elevated spectral sensitivity to light of longer wavelengths in the ventral eye half to detect the yellowish light emitted by the females on the ground.Fireflies generally and the species P. pectoralis in particular are highly appreciated for their luminescence and in some places, the bioluminescence of these insects attracts tourists trying to see and photograph the spectacle.To lose the species would, therefore, be a double tragedy.The second reason why this species ought to be saved from becoming extinct is a non-applied purely zoological one related to basic science, as the phylogenetic relationships between lampyrid species and their evolutionary histories have still not been fully worked out.For the two reasons explained above, a study focusing on this firefly's genome is, therefore, timely and important.
As the cost of sequencing continuously decreases, genomic-scale data generation through high-throughput sequencing technologies can now routinely be used in studies on reproduction, phylogenetic relationships, and species delimitation [8].Highthroughput sequencing technologies in genomics can enhance insect breeding and support diversity studies and conservation goals by generating suitable genetic markers [9][10][11].To explore the evolution of courtship signals in fireflies, genome evolutionary analysis, such as the evolution of genome size and the expansion and contraction of gene content in P. pectoralis, becomes necessary.Using PacBio and Illumina HiSeq X Ten platforms, the first firefly draft genome (original genome Ppec-1.0) was assembled (genome size 760.4Mb, scaffold N50 3.04 Mb) (Table S1) [12].We detected non-firefly genes such as mitochondrial, bacterial, viral, fungal, and other no-hit genes (Table S2) when we performed annotation of the original genome Ppec-1.0.Additionally, BUSCO assessment showed the completeness of complete and single-copy in the original genome, Ppec-1.0 is low at 60.8%, while complete and duplicated BUSCOs (D) are high at 38.3% (Table S3).Thus far, contamination and high redundancy have limited its usefulness.
In this study, we used Hi-C technology to improve the P. pectoralis genome assembly (Hi-C genome Ppec-2.0) in the chromosomal level.We also assembled and annotated its Fireflies generally and the species P. pectoralis in particular are highly appreciated for their luminescence and in some places, the bioluminescence of these insects attracts tourists trying to see and photograph the spectacle.To lose the species would, therefore, be a double tragedy.The second reason why this species ought to be saved from becoming extinct is a non-applied purely zoological one related to basic science, as the phylogenetic relationships between lampyrid species and their evolutionary histories have still not been fully worked out.For the two reasons explained above, a study focusing on this firefly's genome is, therefore, timely and important.
As the cost of sequencing continuously decreases, genomic-scale data generation through high-throughput sequencing technologies can now routinely be used in studies on reproduction, phylogenetic relationships, and species delimitation [8].High-throughput sequencing technologies in genomics can enhance insect breeding and support diversity studies and conservation goals by generating suitable genetic markers [9][10][11].To explore the evolution of courtship signals in fireflies, genome evolutionary analysis, such as the evolution of genome size and the expansion and contraction of gene content in P. pectoralis, becomes necessary.Using PacBio and Illumina HiSeq X Ten platforms, the first firefly draft genome (original genome Ppec-1.0) was assembled (genome size 760.4Mb, scaffold N50 3.04 Mb) (Table S1) [12].We detected non-firefly genes such as mitochondrial, bacterial, viral, fungal, and other no-hit genes (Table S2) when we performed annotation of the original genome Ppec-1.0.Additionally, BUSCO assessment showed the completeness of complete and single-copy in the original genome, Ppec-1.0 is low at 60.8%, while complete and duplicated BUSCOs (D) are high at 38.3% (Table S3).Thus far, contamination and high redundancy have limited its usefulness.
In this study, we used Hi-C technology to improve the P. pectoralis genome assembly (Hi-C genome Ppec-2.0) in the chromosomal level.We also assembled and annotated its mitochondrial genome.Gene family evolution and phylogenetic reconstruction analyses were performed.Collectively, our findings represent a valuable resource for studies on the Insects 2024, 15, 43 3 of 12 species' sexual dimorphism and its unique courtship behavior that uses a combination of pheromonal and bioluminescent signals.They can also serve as a resource for accelerating genome-assisted improvements in conservation.

Sample Collection and Feeding Scheme
Male and female individuals of P. pectoralis were lab-reared for two generations at the Huazhong Agricultural University in Wuhan (China).The original population was collected in Ezhou City, Hubei Province, in October 2018.Larvae were bred in transparent plastic boxes (20 cm diameter × 6 cm high) and provided with crushed land snails (B.ravida) as prey [7].

Karyotype Analysis
Mitotic and meiotic chromosomes were obtained from the gonads of ten fifth instar larvae following [13].The gonads were removed into a 10 mg/mL colchicine solution (in insect saline solution) for 120 min and then subjected to hypotonic treatment for 30 min.All the gonads were fixed in Carnoy I (three parts methanol and one part acetic acid), for 60 min.For preparation of the slides, the gonads were macerated in 45% acetic acid until a cell suspension was acquired, which was then spread over a slide and dried on a metal plate at 40 • C. To determine the number, size, and morphology of chromosomes, the slides were stained with DAPI for 10 min.Images were obtained with a Leica TCS SP8 confocal microscopy station (Leica Camera, Wetzlar, Germany).Adobe Photoshop (version 2021, Adobe, San Jose, CA, USA) was used to arrange karyotypes, and karyotypes were organized in decreasing order of size.For the karyotype analysis, the chromosomes of P. pectoralis were stained with DAPI (blue).

Hi-C Library Preparation and Chromosome Assembly by Hi-C Data
We previously applied the Illumina HiSeq X Ten and PacBio platforms to sequence the genome of P. pectoralis [12].However, high redundancy and contamination were detected and limited its usage.To reduce sequence redundancy and contamination, we performed a chromosome-level assembly.We constructed the Hi-C library and obtained sequencing data via the Illumina Novaseq platform.A whole fresh adult female P. pectoralis body was vacuum-infiltrated in a nuclei-isolation buffer supplemented with 2% formaldehyde.The fixed tissue was then ground to powder before re-suspending in a nuclei-isolation buffer to obtain a suspension of nuclei.The purified nuclei were digested with 100 units of DpnII and marked by incubating with biotin-14-dCTP.The ligated DNA was sheared into 300-600 bp fragments and was then blunt-end repaired and A-tailed, followed by purification through biotin-streptavidin-mediated pulldown.Finally, the Hi-C libraries were quantified and sequenced using the Illumina Novaseq platform.
In total, 594 million paired-end reads were generated from the libraries.Quality controlling of Hi-C raw data was performed using Hi-C-Pro [14].Firstly, low-quality sequences (quality scores < 20), adaptor sequences, and sequences shorter than 30 bp were filtered out using fastp, and then the clean paired-end reads were mapped to the draft assembled sequence using bowtie2 (v2.3.2) (-end-to-end --very-sensitive -L 30) to obtain the unique mapped paired-end reads.Valid interaction paired reads were identified and retained by HiC-Pro (v2.8.1) [14] from unique mapped paired-end reads for further analysis.Invalid read pairs, including dangling-end, self-cycle, re-ligation, and dumped products were filtered by HiC-Pro (v2.8.1).The scaffolds were further clustered, ordered, and oriented onto 10 pseudo chromosomes by LACHESIS (https://github.com/shendurelab/LACHESIS(accessed on 18 January 2022)), with parameters CLUSTER_MIN_RE_SITES = 100, CLUS-TER_MAX_LINK_DENSITY = 2.5, CLUSTER NONINFORMATIVE RATIO = 1.4,ORDER MIN N RES IN TRUNK = 60, and ORDER MIN N RES IN SHREDS = 60.Finally, placement and orientation errors exhibiting obvious discrete chromatin interaction patterns were manually adjusted.These 10 pseudochromosomes correspond to the 10 chromosomes for P. pectoralis.To check the completeness and quality of the assembly, BUSCO version 5.1.3[15,16] was used to search the 1367 benchmarking universal single-copy orthologous genes in insecta_odb10.

Genome Annotation
The simple repeat sequences (SSRs) of the Hi-C genome Ppec-2.0 were analyzed using GMATA v2.2 [22] software, while a Tandem Repeats Finder (TRF v4.07b) [20] recognized all tandem repeat elements in the Hi-C genome Ppec-2.0.To obtain a better estimation of tandem repeats, the putative satDNA elements of the IlluminaHiSeq X paired-end unassembled short read sequences were identified using the RepeatExplorer and TAREAN pipelines in the Galaxy platform [23].
Transposable elements (TEs) in the P. pectoralis genome were then identified using a combination of ab initio and homology-based methods.Briefly, an ab initio repeat library for P. pectoralis was first predicted using MITE-hunter [24] and RepeatModeler (v1.0.11; http://www.repeatmasker.org/RepeatModeler/,accessed on 20 April 2022) with default parameters, in which LTR_FINDER [25], LTR_harverst [26], and LTR_ retriever [27] were used to obtain as much reliable information as possible on long terminal repeat (LTR) retrotransposons.The obtained library was then aligned to the TE class Repbase (http: //www.girinst.org/repbase,accessed on 20 April 2022) to classify the type of each repeat family.For further identification of the repeats throughout the genome, RepeatMasker version 4.0.7.(http://www.repeatmasker.org,accessed on 20 April 2022) was applied to search for known and novel TEs by mapping sequences against the de novo repeat library and Repbase TE library.Overlapping transposable elements belonging to the same repeat class were collated and combined.

Phylogenetic Analysis
On the basis of the identified orthologous gene sets with OrthMCL v2.0.9 [34], a molecular phylogenetic analysis was performed using the shared single-copy genes.Briefly, the coding sequences were extracted from the single-copy families, and each ortholog group was multiple-aligned using Mafft v7.313 [35].Poorly aligned sequences were then eliminated using Gblocks v0.91b [36], and the GTRGAMMA substitution model of RAxML v8.2.10 [37] was used for the phylogenetic tree construction with 1000 bootstrap replicates.The generated tree file was displayed with MEGA CC v10.1.8[38].Based on the phylogenetic tree, the RelTime of MEGA-CC was utilized to compute the mean substitution rates along each branch and estimate the species' divergent time.Three fossil calibration times were obtained from the Time Tree database (http://www.timetree.org/,accessed on 20 April 2022) as the time control, including the divergence times of Drosophila melanogaster versus Tribolium castaneum, in which the estimated time is 308 MYA (234-370 MYA), and Ignelater luminosus versus Photinus pyralis, in which the estimated time is 133 MYA (103-127 MYA).

Species-Specific Genes and Gene Family Expansion and Contraction
Proteins with no homologs in the other 11 insect genomes were extracted as speciesspecific genes, including P. pectoralis-specific unique genes and unclustered genes.Functional annotation of species-specific genes and enrichment tests were performed using information from homologs in the Gene Ontology (http://www.geneontology.org/,accessed on 18 January 2022) and KEGG (Kyoto Encyclopedia of Genes and Genomes) databases.
TE expansion and contraction of gene families were determined using CAFE v5.0.02937 [39].The results from the phylogenetic tree with divergence times were used as inputs.A p-value of 0.05 was used to identify families that were significantly expanded and contracted.Gene ontology (GO) enrichment of expanded orthogroups and species-specific orthogroups of P. pectoralis were analyzed and visualized by REVIGO v1.8.1 [40].

Chromosome-Level Genome Assembly
We obtained high-quality metaphase chromosome spreads with large metaphase areas, large numbers of chromosomes, and few chromosome overlaps (Figure S1a).Ten pairs of chromosomes were observed (2n = 20) (Figure S1b) and showed that this individual was most probably a female.We quantified our observations and found that 52.3% of 21 mitotic plate cells had 19 or 20 isolated chromosomes.However, there was no evidence of a Y chromosome.We used Hi-C technology to improve the genome assembly to the chromosomal level.A total of 83.42 Gb of high-quality sequencing data was generated from a 350 bp insert size Hi-C library, and the quality assessment of the Hi-C data is shown in Table S4.In total, removing redundant genes and contamination led to a final genome assembly of 532.25 Mb, containing 363 scaffolds with N50 of 52.87 Mb (Table S1).The BUSCO assessment indicated that the completeness of the Hi-C genome Ppec-2.0 is 94.7%, lower than the original genome Ppec-1.0 with 99.1% (Table S3).We super-scaffolded the P. pectoralis genome assembly into 10 pseudo-chromosomal linkage groups (Figure S2), with a size of 521.95 Mb and 337 scaffolds (98.07%) anchored on all chromosomes (Figure 2a, Table S5).
The maximum proportion of the mitogenome is taken up by PCGs, occupying 66.08% of the whole mitogenome.The rRNA and tRNA accounted for 11.95% and 8.42%, respectively, and the non-coding regions accounted for 13.54%.There are two A+T-rich regions in the mitogenome, one having six tandem repeats with a period size of 133 bp, while the other is the control region containing the origin of replication.
From the codon usage analysis, seven PCGs, including atp6, cox2, cox3, cytb, nad3, nad4, and nad4l, used the same ATG/ATA as the starting codon.In the other six PCGs, starting codons are different, i.e., nad1 used TTG, nad6 used ATC, and ATT was used in four other PCGs.Only five PCGs used the complete termination codon, such as TAA or TAG; the other PCGs used the incomplete termination codon, such as T.

Genome Annotation
Repeated sequences were mined and annotated.In P. pectoralis, the total length of the repeats is 227.69Mb (42.78% of the genome) (Table S6).With a value of 42.78%, the total repeats ratio of P. pectoralis occupies the middle ground but is lower than the ratios of Lamprigera yunnana (65.37%) and Photinus pyralis (46.33%) and higher than A. terminalis (33.76%) and Aquatica lateralis (27.46%) [41,42] (Table S7).To be specific, the number of TEs (transposable elements) was identified to be 839,559, with the sequence percentage being 38.8%.Among these TEs, the dominant repetitive sequence type involves the DNA transposon, accounting for 18.21% (~96.92Mb).The detailed statistics of TEs are listed in Table S6.Four major types of TEs are identified and compared with other firefly species (Table S7).Sequence divergence rate analyses showed that TE sequences of P. pectoralis form a peak with a low divergence rate of ~2.3% (Figure S4), indicating a recent expansion of TEs in P. pectoralis.
The content of tandem repeat elements was analyzed and compared between Illumina HiSeq X paired-end unassembled short read sequences and Hi-C genome Ppec-2.0. Figure S5a shows the size distribution of superclusters and information about the number of reads that were actually analyzed in unassembled Illumina sequences.From the 1,360,133 analyzed reads, 23,488 reads in clusters were annotated as organellar (mitochondrial or plastid) sequences from sequencing control DNA, leaving 1,336,645 reads representing nuclear DNA (Figure S5b).Considering the read length of 120 bp, the analyzed reads represented 0.307× coverage of the nuclear genome, providing sufficient sensitivity to analyze highly and moderately repeated sequences.The analysis revealed that satellite DNA accounts for up to 1.56% of the genome (20,857 reads) (Figure S5b).Meanwhile, a total of 16,467 (0.04% of the genome) simple repeat sequences (SSRs) was identified in the Hi-C genome Ppec-2.0, as well as a total of 20,713 (0.88% of the genome) tandem repeat sequences (Table S6).
The total of the protein-coding genes is much lower than the original genome Ppec-1.0 with 23,092 protein-coding genes [12].We defined the models of protein-coding genes in the Hi-C genome Ppec-2.0 using the de novo prediction, transcriptome, and homology-based methods, producing a total of 13,292 protein-coding genes with an average gene length of 14,806 bp and CDS 1481 bp (Table S8).The average exon number per gene was 5.18.The average intron length was 3188 bp (Table S8).The total number of coding genes was smaller than in L. yunnana, A. terminalis, and P. pyralis, where the respective numbers were 19,443, 20,436, and 20,646; however, the number is close to that of other insects.This may be related to the degree of redundancy during genome assembly.The values of other parameters were close to those of the published genomes (P.pyralis, L. yunnana, and A. terminalis), indicating the reliability of the annotation results.

Gene Family Identification and Phylogenetic Relationships
To infer the evolutionary status and trace the phylogenetic placement of P. pectoralis, gene family clustering was performed using OrthoMCL.The gene family clusters were divided into five categories: (1) multiple-copy orthologs have multiple copies in one species; (2) single-copy orthologs have only one copy in one species; (3) the other orthologs are the rest of the orthologs; (4) unclustered genes have no homology with others; and (5) unique paralogs are genes that only exist in one specific species (Figure S7a).A total of 9543 gene families (12,325 total genes) was identified.P. pectoralis contained 3016 multiple-copy ortholog genes, 268 unique genes, and 967 unclustered genes.There are 1235 speciesspecific ortholog genes in P. pectoralis.In addition, 1709 single-copy ortholog genes were defined in all species, which were aligned to develop a super-sequence for each species that was used to construct a phylogenetic tree (Figure S7b, Table S11).Phylogenetic results showed that P. pectoralis is closely related to P. pyralis of the subfamily Lampyrinae.Phylogenetic inference also indicated that L. yunnana represents a sister taxon to Luciolinae and that P. pectoralis and P. pyralis diverged ~51.41 million years ago (Figure 2b).

Patterns of Gene Family Expansion and Contraction
Our analysis of gene family expansion and contraction of the 12 species identified 546 expanded and 2660 contracted gene families (p-value < 0.05) in P. pectoralis (Figure 2b).GO and KEGG enrichment analyses revealed that in the P. pectoralis-expanded gene, families were enriched by the nucleotide-binding pathway (including DNA replication, DNA recombination, nucleosome assembly, DNA-templated transcription, initiation, and DNA integration in the nucleosome), the oxidoreductase pathway (including oxidoreductase activity, oxidoreductase activity, acting on the CH-CH group of donors, 2-alkenal reductase [NAD(P)+] activity, and 15-oxoprostaglandin 13-oxidase activity), and some important receptors related to signal transduction pathways (including ionotropic glutamate receptor activity, olfactory receptor activity and sensory perception of smell) (Table S12).Conversely, P. pectoralis showed significantly contracted gene families in the ABC transporters process, bile secretion process, cAMP signaling pathway, and fatty acid biosynthesis process (Table S13).In addition, odorant binding genes (GO: 0005549) were contracted.

Figure 1 .
Figure 1.The courtship behavior of the firefly P. pectoralis.(a) A female is courting with her abdomen curling up and emitting a bright green light.(b) Mating, male on top of the female.

Figure 1 .
Figure 1.The courtship behavior of the firefly P. pectoralis.(a) A female is courting with her abdomen curling up and emitting a bright green light.(b) Mating, male on top of the female.

Figure 2 .
Figure 2. Chromosomal characteristics and phylogeny of P. pectoralis.(a) Chromosomal features.From outer to inner circles: I chromosomal, II gene density, III GC content, IV TE density, drawn in 0.5 Mb non-overlapping windows; (b) a maximum-likelihood phylogenetic tree is shown for P. pectoralis and 11 other insects.Drosophila melanogaster was used as the outgroup.The bootstrap value of all nodes is supported at 100/100.Support at nodes are divergence times (million years).Pie charts

Figure 2 .
Figure 2. Chromosomal characteristics and phylogeny of P. pectoralis.(a) Chromosomal features.From outer to inner circles: I chromosomal, II gene density, III GC content, IV TE density, drawn in 0.5 Mb non-overlapping windows; (b) a maximum-likelihood phylogenetic tree is shown for P. pectoralis and 11 other insects.Drosophila melanogaster was used as the outgroup.The bootstrap value of all nodes is supported at 100/100.Support at nodes are divergence times (million years).Pie charts and numbers below represent the proportion and specific values of the gene families of expansion (green) and contraction (red), respectively.