Construction of a Full-Length Transcriptome of Western Honeybee Midgut Tissue and Improved Genome Annotation

Honeybees are an indispensable pollinator in nature with pivotal ecological, economic, and scientific value. However, a full-length transcriptome for Apis mellifera, assembled with the advanced third-generation nanopore sequencing technology, has yet to be reported. Here, nanopore sequencing of the midgut tissues of uninoculated and Nosema ceranae-inoculated A. mellifera workers was conducted, and the full-length transcriptome was then constructed and annotated based on high-quality long reads. Next followed improvement of sequences and annotations of the current reference genome of A. mellifera. A total of 5,942,745 and 6,664,923 raw reads were produced from midguts of workers at 7 days post-inoculation (dpi) with N. ceranae and 10 dpi, while 7,100,161 and 6,506,665 raw reads were generated from the midguts of corresponding uninoculated workers. After strict quality control, 6,928,170, 6,353,066, 5,745,048, and 6,416,987 clean reads were obtained, with a length distribution ranging from 1 kb to 10 kb. Additionally, 16,824, 17,708, 15,744, and 18,246 full-length transcripts were respectively detected, including 28,019 nonredundant ones. Among these, 43,666, 30,945, 41,771, 26,442, and 24,532 full-length transcripts could be annotated to the Nr, KOG, eggNOG, GO, and KEGG databases, respectively. Additionally, 501 novel genes (20,326 novel transcripts) were identified for the first time, among which 401 (20,255), 193 (13,365), 414 (19,186), 228 (12,093), and 202 (11,703) were respectively annotated to each of the aforementioned five databases. The expression and sequences of three randomly selected novel transcripts were confirmed by RT-PCR and Sanger sequencing. The 5′ UTR of 2082 genes, the 3′ UTR of 2029 genes, and both the 5′ and 3′ UTRs of 730 genes were extended. Moreover, 17,345 SSRs, 14,789 complete ORFs, 1224 long non-coding RNAs (lncRNAs), and 650 transcription factors (TFs) from 37 families were detected. Findings from this work not only refine the annotation of the A. mellifera reference genome, but also provide a valuable resource and basis for relevant molecular and -omics studies.


Introduction
Honeybees, which are recognized as social insects, play a pivotal part in pollination for up to 70% of crop species and wild plants worldwide [1,2].Consequently, they are of significant importance to agricultural economics, food security, environmental ecology, and scientific research.Given their gentle nature, their strong foraging and productivity capacities, and the ease with which large colonies can be maintained, the western honeybee (Apis mellifera) enjoys global favor [3,4].
Third-generation sequencing technologies, commonly referred to as long-read sequencing technologies, enable the direct sequencing of large DNA fragments.This offers significant advantages in de novo genome assembly and metagenomics [5].Nanopore sequencing technology, as one of the leading third-generation sequencing technologies, is capable of generating reads up to 100,000 bases in length [6] and thereby has substantial advantages in the identification of full-length transcripts.A full-length transcriptome is beneficial for performing molecular studies in organisms ranging from the identification of alternative splicing (AS) and alternative polyadenylation (APA) to the precise quantification of genes and transcripts, especially when there is no reference genome available for the organism [7][8][9][10].Nanopore sequencing has now provided full-length transcriptomes of animals, plants, and microorganisms such as Muscovy ducklings (Cairina Moschata) [11], Asparagus [12], and Saccharomyces cerevisiae [13].In insects, the full-length transcriptomes of species such as Cydia pomonella L. [14] and Bactrocera dorsalis [15] have been reported.Separately, the full-length transcriptome of the plant Fraxinus chinensis [16] was also studied.
Second-generation sequencing technology has been widely applied in dissecting many aspects of honeybees, such as genetics [17], ethology [18], and host-pathogen interaction [19].For instance, following deep sequencing utilizing the Illumina platform, Manfredini et al. analyzed the change in gene-expression patterns in brains of A. mellifera queens from virgin to mated reproductive status and discovered that the mating process significantly altered the expression of genes related to vision, chemoreception, metabolism, and immunity [18].Comparatively, third-generation-sequencing-based studies on honeybees are currently very limited.Recently, Zheng et al. [20] reported the first full-length transcriptome of A. mellifera based on PacBio single-molecule sequencing technology with systematic identification of the AS events and APA sites as well as detection of differentially expressed transcripts among queen, drone, and worker bees.However, studies on the nanopore-sequencing-based full-length transcriptome of A. mellifera have been lacking until now.
The long reads generated by nanopore sequencing have been utilized in the refinement of reference genomes across multiple species, providing enhancements even for reference genomes for which chromosomal resolution has already been achieved [21][22][23][24][25].For instance, Chen et al. employed full-length transcriptome data acquired via nanopore sequencing to refine the reference genome of Nosema ceranae [26].This process resulted in the structural optimization of 2340 genes within the N. ceranae genome, featuring extensions at the 5 ′ end in 1182 genes and at the 3 ′ end in 1158 genes.In 2006, the A. mellifera genome (Amel_4.0) was first sequenced, revealing key genomic features; however, gene prediction was limited, indicating the need for improvement [27].A subsequent version (Amel_4.5)published by Elisk et al. [28] in 2014, although more comprehensive, remained fragmented, with significant gaps in areas like centromeres and telomeres.In 2019, Wolberg et al. [29] enhanced the assembly to Amel_HAv3.1 using advanced sequencing techniques, achieving higher contiguity and structural integrity close to the chromosomal level.Nanopore sequencing is believed to offer an opportunity for improving the reference genome of A. mellifera.
In this current work, midgut samples of uninoculated and N. ceranae-inoculated A. mellifera workers were prepared and sequenced by nanopore sequencing technology, the full-length transcripts were identified followed by construction and annotation of the full-length transcriptome of A. mellifera.Additionally, detection, annotation, and verification of novel genes and transcripts were conducted, and the structures of those genes annotated in the A. mellifera reference genome were then optimized.Moreover, prediction and investigation of simple sequence repeats (SSRs), transcription factors (TFs), open reading frames (ORFs), and long non-coding RNAs (lncRNAs) were performed.In a follow-up study, the differential expression profile of the full-length transcripts in uninoculated and N. ceranae-inoculated A. mellifera workers and their potential functions will be investigated to decipher the host response to N. ceranae infection.Our data could not only enrich and improve the annotations of the current reference genome of A. mellifera, but also provide a solid basis for facilitating future molecular and -omics studies on A. mellifera.

Bee and Fungi
Three A. mellifera colonies were reared in the teaching apiary of the College of Bee Science and Biomedicine, Fujian Agriculture and Forestry University, Fuzhou, China.N. ceranae was previously prepared and conserved at the Honeybee Protection Laboratory of the College of Bee Science and Biomedicine, Fujian Agriculture and Forestry University, Fuzhou, China.

Fungal Inoculation and Midgut Sample Preparation
At 24 h after emergence, A. mellifera workers (n = 35) in the treatment group were each immobilized and fed 5 µL of 50% (w/v) sucrose solution containing 1 × 10 6 N. ceranae spores, while workers (n = 35) in the control group were each immobilized and fed 5 µL of 50% (w/v) sucrose solution without spores.There was one cage each for the treatment and control groups.Workers in the cages were reared in two separate incubators at 34 ± 0.5 • C and 60%-70% RH.After initial feeding, both treatment and control groups were provided with a feeder containing 4 mL of 50% (w/v) sucrose solution without spores, which was replaced daily throughout the whole experiment.Each cage was carefully checked every 24 h, and the dead honeybees were removed each day.At 7 days post-inoculation (dpi) and 10 dpi, the midgut tissues of three workers in the treatment and control groups were dissected and transferred into clean Eppendorf (EP) tubes.The samples in the treatment and control groups collected at 7 dpi were named AmT1 and AmCK1, whereas the samples harvested at 10 dpi were named AmT2 and AmCK2, respectively.The midgut samples were quickly placed in liquid nitrogen and then kept in a −80 • C cryogenic refrigerator until the nanopore sequencing and molecular experiments were conducted.

Total RNA Extraction, cDNA Library Construction, and Nanopore Sequencing
The total RNA of midgut samples in the above-mentioned four groups were extracted using the TRizol Kit (Thermo Fisher Scientific, Bremen, Germany).Reverse transcription was then performed with a Maxima H Minus Reverse Transcriptase Kit (Thermo Fisher Scientific, Bremen, Germany).The genomic library for ONT sequencing was constructed using the ONT 1D ligation sequencing kit SQK-LSK109 (Oxford Nanopore Technologies, Oxford, UK) according to the manufacturer's instructions.Full-length transcriptome sequencing of the constructed cDNA libraries was conducted on a PromethION sequencing platform (Oxford Nanopore Technologies, Oxford, UK).The duration of the sequencing reaction was 72 h.The nanopore-generated raw data were deposited in the NCBI SRA database (https://www.ncbi.nlm.nih.gov/sra/?term=(accessed on 19 April 2024)) and linked to the SRA number SUB14364771.

Data Quality Control and Full-Length Transcript Identification
Using the MINKNOW software (v.1.4.3)local base caller, the sequencing data with original FAST5 format were converted to raw reads in FASTQ format.Next, all raw reads were filtered to remove low-quality (Q score < 7) and short raw reads (<500 bp).Based on the principle of nanopore cDNA sequencing, a primer sequence identified at both ends of a read was regarded as a full-length transcript sequence.The identified transcript sequences were aligned to the N. ceranae reference genome (assembly ASM98816v1), the aligned data were removed and the remaining data were subjected to subsequent analyses.

Identification and Annotation of Novel Transcripts and Novel Genes
We aligned the full-length transcripts identified in this study to the existing transcripts in the reference genome of A. mellifera (assembly Amel_HAv3.1)to identify novel transcripts and novel genes.Subsequently, these novel transcripts and novel genes were aligned to the Nr, Swiss-Prot, Pfam, KOG, eggNOG, GO, and KEGG databases to obtain the corresponding annotations.

Molecular Validation of Novel Transcripts
Specific upstream primers (F) and downstream primers (R) for three randomly selected novel transcripts (ONT.5166.8,ONT.6348.2, and ONT.6348.3)were designed utilizing PrimerPremierv5.0 software.The total RNA was isolated from the midgut tissues of uninoculated and N. ceranae-inoculated 8-day-old workers using the RNA-extraction kit (Plomag, Beijing, China), following which reverse transcription was conducted with a NeuScript II 1st strand cDNA synthesis kit (Nuoweizan, Nanjing, China).The obtained cDNA served as a template for RT-PCR amplification.The reaction was performed using the RT-PCR kit (Yisheng, Shanghai, China), with all procedures strictly adhering to the manufacturer's instructions.The thermal-cycling conditions were as follows: an initial denaturation step at 94 • C for 5 min, followed by 30 cycles of denaturation at 94 • C for 30 s, annealing at 56 • C for 30 s, and extension at 72 • C for 10 min.The amplified products were detected by 1.8% agarose gel electrophoresis, and the target fragments were purified and then ligated to the pMD-19T vector (TaKaRa, Beijing, China), then transformed into Escherichia coli DH5α competent cells and identified by PCR.The bacteria liquid with a positive signal was subjected to Sanger sequencing by Sangon Biotech-Shanghai, China.

Structural Optimization of Annotated Genes in the A. mellifera Reference Genome
Gffcompare v0.12.7 software [37] was utilized to compare the identified transcripts in this study with the known transcripts annotated in the A. mellifera reference genome (Amel_HAv3.1).Following the comparison result, the annotated gene's boundary was optimized by extending the upstream and (or) downstream untranslated region (UTR).

Prediction of SSR, ORF, TF Family, and LncRNA
The full-length transcripts longer than 500 bp were screened from the non-redundant full-length transcripts, and the SSR loci were then predicted using MISA v2.1 software (http: //pgrc.ipk-gatersleben.de/misa/(accessed on 3 February 2024)) with the de-fault parameters [37].TransDecoder v5.7.1 software (https://github.com/TransDecoder/TransDecoder/wiki (accessed on 3 February 2024)) was employed to detect potential CDS and ORFs from all full-length transcripts, and those ORFs with both the start codon and stop codon were considered complete ORFs [38].The sequences of predicted proteins from all full-length transcripts were aligned to the transcription factor (TF) database (transcription factor (TF) database) by hmmscan v2.41.2 (https://www.ebi.ac.uk/Tools/hmmer/search/hmmscan (accessed on 3 February 2024)) to obtain the predicted TF family.From the identified fulllength transcripts, a combination of CPC [39], CNCI [40], CPAT [41], and Pfam Scan [42] was employed to predict lncRNAs, and the intersection was regarded with high confidence as a set of lncRNAs.

Processing and Quality Control of Nanopore Sequencing Data
Here, nanopore sequencing of the AmCK1, AmCK2, AmT1, and AmT2 groups produced 7,100,161, 6,506,665, 5,942,745, and 6,664,923 raw reads, respectively, with N50 of 1347 bp, 1388 bp, 1328 bp, 1394 bp and average length of 1178 bp, 1201 bp, 1148 bp, 1196 bp (Table 1).The length distribution of raw reads ranged from 1 kb to more than 10 kb, with the largest group of raw reads distributed around 1 kb in length (Figure S1A-D).Additionally, the Q-value distribution of these raw reads was in the range Q6-Q16, with a significant number of raw reads exhibiting a quality value of Q9 (Figure S1E-H).2).The length distribution of the clean reads ranged from 1 kb to more than 10 kb, and the largest group consisted of reads 1 kb in length (Figure S2).  3).Following the merger, a total of 28,019 non-redundant full-length clean reads were obtained.In addition, the length distribution of full-length transcripts was up to ~8 kb, with the greatest number of full-length transcripts distributed around 2 kb in length (Figure S3).

Annotation of the Full-Length Transcripts
Based on the union of transcripts identified in our study and those in the existing reference genome, a total of 43,666 full-length transcripts were successfully annotated to the Nr database.Among the annotated species, A. mellifera (30,678) had the greatest number of annotated full-length transcripts, followed by Apis dorsata (3711) and Apis florea (3059) (Tables 4 and S1, Figure 1A).There were 30,945 full-length transcripts annotated to 25 functional categories in the KOG database.The top three categories were general function prediction (5642); signal-transduction mechanism (5236); and post-translational modifications, protein flipping and molecular chaperones (2767) (Tables 4 and S1, Figure 1B).In addition, 41,771 full-length transcripts were annotated to 25 functional categories in the eggNOG database, including unknown function (20,417); post-translational modifications, protein flipping, and molecular chaperones (3300); and intracellular transport, assecretion, and vesicular transport (2923), as shown in Tables 4 and S1, Figure 1C.In the GO database, 26,442 full-length transcripts were annotated to 53 functional terms, of which 16 were associated with cellular components such as the cell (8511) and membrane (9987), 15 were related to molecular functions such as catalytic activity (10,083) and transporter activity (2033), and 22 were relevant to biological processes such as cellular processes (10,391) and single-tissue processes (7121) (Tables 4 and S1, Figure 2A).As presented in Tables 4 and S1, Figure 2B, 24,532 full-length transcripts could be annotated to 231 KEGG pathways, including endocytosis (642), protein processing within the endoplasmic reticulum (589), carbon metabolism (527), ribonucleic acid transport (504), and oxidative phosphorylation (488).In the GO database, 26,442 full-length transcripts were annotated to 53 functional terms, of which 16 were associated with cellular components such as the cell (8511) and membrane (9987), 15 were related to molecular functions such as catalytic activity (10,083) and transporter activity (2033), and 22 were relevant to biological processes such as cellular processes (10,391) and single-tissue processes (7121) (Tables 4 and S1, Figure 2A).As presented in Tables 4 and S1, Figure 2B, 24,532 full-length transcripts could be annotated to 231 KEGG pathways, including endocytosis (642), protein processing within the endoplasmic reticulum (589), carbon metabolism (527), ribonucleic acid transport (504), and oxidative phosphorylation (488).

Identification and Annotation of Novel Genes
In total, 501 novel genes were identified.In the Nr database, 255 novel genes could be annotated to A. mellifera, followed by A. dorsata (74) and A. florea (55) (Tables 5 and S1, Figure S4A).In the KOG database, 193 novel genes could be annotated to 25 functional categories, such as signal-transduction mechanisms (32), general function prediction (31),

Identification, Annotation, and Validation of Novel Transcripts
In total, 20,326 novel transcripts were identified; of these, 20,255 (Nr), 13,365 (KOG), 19,186 (egg-NOG), 12,093 (GO), and 11,703 (KEGG) were annotated (Figure S5, see also Table S1).RT-PCR results showed that fragments of the expected size were amplified from three randomly selected isoforms, including ONT.5166.8(about 170 bp), ONT.6348.2(about 290 bp), and ONT.6348.3(about 150 bp) (Figure 3A).Additionally, the results of Sanger sequencing suggested that the sequences of these amplification fragments were consistent with those of predicted isoforms based on nanopore sequencing (Figure 3B-D).These results together verified the expression and sequences of these three isoforms, as well as the reliability of nanopore sequencing data.

Structural Optimization of Annotated Genes in the A. mellifera Reference Genome
Based on the identified genes, the structures of 4111 annotated genes in the A. mellifera reference genome were optimized.Among these, the 5′ UTRs of 2082 genes, the 3′ UTRs of 2029 genes, and both the 5′ and 3′ UTRs of 730 genes were extended (Table 6).

Structural Optimization of Annotated Genes in the A. mellifera Reference Genome
Based on the identified genes, the structures of 4111 annotated genes in the A. mellifera reference genome were optimized.Among these, the 5 ′ UTRs of 2082 genes, the 3 ′ UTRs of 2029 genes, and both the 5 ′ and 3 ′ UTRs of 730 genes were extended (Table 6).

Discussion
Here, based on long reads from nanopore sequencing of uninoculated and N. ceranaeinoculated workers' midgut tissues, a total of 28,019 full-length transcripts were identified, with an N50 of 1876 bp and an average length of 1531 bp.Previously, following nanopore sequencing, the full-length transcriptomes of two widespread fungal pathogens, N. ceranae and Ascosphaera apis, were constructed by our group [43,44].Recently, by using long reads produced by nanopore sequencing of cDNA libraries of larval guts, our team performed construction and annotation of the full-length transcriptome of the Asian honeybee, Apis cerana, including 40,562 full-length transcripts [45].In this work, the midgut tissues of both uninoculated and N. ceranae-inoculated workers were subjected to nanopore sequencing.The reasons for this analysis were that the major objectives of this research were (1) to construct and annotate the first full-length transcriptome of A. mellifera and (2) to improve the annotation of current reference genome based on nanopore long reads.It is believed that a higher quality full-length transcriptome including more complete annotations could be constructed by using more data from nanopore sequencing of both uninoculated and N. ceranae-inoculated workers' midguts.Our next work is to dissect the mechanism underlying the response of A. mellifera workers to N. ceranae invasion at the isoform level on basis of the high-quality long reads obtained in this study.
Notably, the number of full-length transcripts discovered in this work is more than the annotated transcripts in the A. mellifera reference genome (assembly Amel_HAv3.1),which was constructed using a subseries of latest sequencing technologies including Pac-Bio, 10× Chromium, BioNano, and Hi-C [29].This indicates that there is also room for improving the annotated transcripts in a chromosol-level genome utilizing Nanopore sequencing-produced long reads.Additionally, 43,712 (99.94%) full-length transcripts were found to be annotated to at least one of the above-mentioned five databases.However, as many as 25 (0.06%) full-length transcripts could not be annotated to any of these five databases, reflecting the necessity of continuous cloning and functional study of A. mellifera genes and isoforms.The constructed A. mellifera full-length transcriptome is a valuable resource for relevant molecular studies, such as the detection of genetic variants and cloning and functional investigation of various isoforms [46][47][48].
Nanopore-sequencing-produced long-read data have also been applied for optimizing the structures of annotated genes in the reference genomes of various animals, plants, and microorganisms [11,49,50].In comparison with the genome of A. mellifera previously constructed using second-generation sequencing, the current reference genome of A. mellifera has a contig N50 of 5.381 Mbp and a scaffold N50 of 13.62 Mbp, representing a 120fold improvement in contig-level contiguity and a 14-fold increase in scaffold-level conti-

Discussion
Here, based on long reads from nanopore sequencing of uninoculated and N. ceranaeinoculated workers' midgut tissues, a total of 28,019 full-length transcripts were identified, with an N50 of 1876 bp and an average length of 1531 bp.Previously, following nanopore sequencing, the full-length transcriptomes of two widespread fungal pathogens, N. ceranae and Ascosphaera apis, were constructed by our group [43,44].Recently, by using long reads produced by nanopore sequencing of cDNA libraries of larval guts, our team performed construction and annotation of the full-length transcriptome of the Asian honeybee, Apis cerana, including 40,562 full-length transcripts [45].In this work, the midgut tissues of both uninoculated and N. ceranae-inoculated workers were subjected to nanopore sequencing.The reasons for this analysis were that the major objectives of this research were (1) to construct and annotate the first full-length transcriptome of A. mellifera and (2) to improve the annotation of current reference genome based on nanopore long reads.It is believed that a higher quality full-length transcriptome including more complete annotations could be constructed by using more data from nanopore sequencing of both uninoculated and N. ceranae-inoculated workers' midguts.Our next work is to dissect the mechanism underlying the response of A. mellifera workers to N. ceranae invasion at the isoform level on basis of the high-quality long reads obtained in this study.
Notably, the number of full-length transcripts discovered in this work is more than the annotated transcripts in the A. mellifera reference genome (assembly Amel_HAv3.1),which was constructed using a subseries of latest sequencing technologies including PacBio, 10× Chromium, BioNano, and Hi-C [29].This indicates that there is also room for improving the annotated transcripts in a chromosol-level genome utilizing Nanopore sequencingproduced long reads.Additionally, 43,712 (99.94%) full-length transcripts were found to be annotated to at least one of the above-mentioned five databases.However, as many as 25 (0.06%) full-length transcripts could not be annotated to any of these five databases, reflecting the necessity of continuous cloning and functional study of A. mellifera genes and isoforms.The constructed A. mellifera full-length transcriptome is a valuable resource for relevant molecular studies, such as the detection of genetic variants and cloning and functional investigation of various isoforms [46][47][48].
Nanopore-sequencing-produced long-read data have also been applied for optimizing the structures of annotated genes in the reference genomes of various animals, plants, and microorganisms [11,49,50].In comparison with the genome of A. mellifera previously constructed using second-generation sequencing, the current reference genome of A. mellifera has a contig N50 of 5.381 Mbp and a scaffold N50 of 13.62 Mbp, representing a 120-fold improvement in contig-level contiguity and a 14-fold increase in scaffold-level contiguity [29].On the basis of the full-length transcriptome data, we have optimized the annotated genes in the A. mellifera reference genome: the 5 ′ UTRs of 2082 existing genes have been extended, with extensions ranging from 1 bp to 162,043 bp, while the 3 ′ UTRs of 2059 existing genes have also been extended, with extensions spanning from 1 bp to 150,208 bp.In view of the close relationship between UTRs and regulation of gene expression in eukaryotes [51,52], the structural improvement of A. mellifera genes is of great importance for the cloning of full-length sequences of genes and the regulation of gene expression and transcription.
In recent years, nanopore sequencing has been employed to assemble high-quality genomes of diverse species like Arabidopsis [53], Chrysomallon squamiferum [54], and Mycoplasma bovis [55].However, the current cost of nanopore-based genome sequencing is still high.In contrast, third-generation transcriptome sequencing is much more cost-effective.Accumulating evidence have shown that nanopore sequencing is highly efficient in exploring novel genes and transcripts [56,57].Bayega et al. employed nanopore sequencing to elucidate transcription dynamics during early embryonic development in Bactroceraoleae and identified 1768 novel genes and 79,810 isoforms, significantly enhancing the transcriptome diversity [57].Here, we discovered 501 novel genes and 20,326 novel transcripts, among which 489 (20,255), 193 (13,365), 414 (19,186), 228 (12,093), and 202 (11,703) novel genes (transcripts) could be annotated to the Nr, KOG, eggNOG, GO, and KEGG databases, respectively.These newly discovered genes and transcripts can further enrich the annotations in the A. mellifera reference genome.Additional work is needed to dissect the functions of these new genes and transcripts.
SSRs, which have several advantages such as simple experimental manipulation, good reproducibility, and high multi-allelicity, exhibit high levels of intraspecific and interspecific variation, making them useful for analysis of genetic diversity and genetic structure [16,58].We previously identified 6312 A. mellifera SSRs by utilizing RNA-seq datasets from the gut tissues of worker larvae, among which the most abundant types were dinucleotide repeats (3435, 54.42%) and trinucleotide repeats (2051, 32.49%) [59].Here, using nanopore sequencing data from the midgut tissues, 17,345 SSRs were identified, with the greatest number being single-nucleotide repeats (11,616, 67.0%).This suggests that greater quantities and more types of SSRs were detected using long reads generated from nanopore sequencing, a result similar to the findings in other animals [11,60,61].These increased SSR resources can establish a solid foundation for future studies on the conservation and genetic breeding of A. mellifera [62].Also, these SSRs will facilitate the interpretation of the genetic relationships among A. mellifera and their closely related species from the perspective of functional molecular markers [63][64][65].
TFs modulate the expression of target genes by binding to cis-acting elements within the promoter regions of these genes [59].Previous studies have shown that TFs play important roles in insect physiological processes [63,64].In Drosophila, the two TFs belonging to the ZBTB family, Chinmo and Broad, played antagonistic roles in the process of adult disc regeneration, affecting the self-renewal and regenerative potential of epithelial progenitor cells [66].Here, 650 members of 37 TF families were identified, including 101 members of the ZBTB family, 84 members of the zf-C2H2 family, and 33 members of the Homeobox family, providing a valuable resource for continuous investigation of their functions in physiological and pathological processes in A. mellifera.LncRNAs are crucial regulators in diverse biological processes, ranging from gene expression [67] and chromatin regulation [68] to cellular development [69] and the stress response [70].Studies based on second-generation sequencing have demonstrated that lncRNAs in A. mellifera were potentially engaged in transcriptional regulation, ovarian development, midgut growth, and the immune response [71].Here, using nanopore sequencing data, 94 sense lncRNAs, 315 antisense lncRNAs, 387 intronic lncRNAs, and 428 intergenic lncRNAs were discovered, with most of these lncRNAs ranging from 250 nt to 5988 nt in length.Although fewer lncRNAs were discovered in this work (1224) than in our previous RNA-seq-based study (6353) [72], the average length was much longer.This offers an opportunity for cloning of the full lengths of these lncRNAs and investigation of their regulatory functions and action mechanisms.

Figure 2 .
Figure 2. GO (A) and KEGG (B) database annotation of A. mellifera full-length transcripts.

Figure 2 .
Figure 2. GO (A) and KEGG (B) database annotation of A. mellifera full-length transcripts.

Figure 4 .
Figure 4. Density statistics of various types of SSRs.c: mixed SSRs containing at least two perfect SSRs at a distance less than 100 bp; c*: mixed SSRs with overlapping positions; p1: perfect single-base repeat, p2: perfect double-base repeat, p3: perfect three-base repeat, p4: perfect four-base repeat, p5: perfect five-base repeat, p6: perfect six-base repeat.

Figure 5 .
Figure 5. Length distribution of amino acids encoded by complete A. mellifera ORFs.

Figure 6 .
Figure 6.Counts of A. mellifera TF families and members.The number above each column indicates the quantity of involved members.

Figure 5 .
Figure 5. Length distribution of amino acids encoded by complete A. mellifera ORFs.

Figure 5 .
Figure 5. Length distribution of amino acids encoded by complete A. mellifera ORFs.

Figure 6 .
Figure 6.Counts of A. mellifera TF families and members.The number above each column indicates the quantity of involved members.

Figure 6 .
Figure 6.Counts of A. mellifera TF families and members.The number above each column indicates the quantity of involved members.

Figure 7 .
Figure 7. Number (A) and type (B) of A. mellifera lncRNAs.(A) Venn diagram of lncRNAs predicted by four software programs; (B) counts of various types of lncRNAs.

Figure 7 .
Figure 7. Number (A) and type (B) of A. mellifera lncRNAs.(A) Venn diagram of lncRNAs predicted by four software programs; (B) counts of various types of lncRNAs.

Table 2 .
Overview of full-length clean reads.

cDNA Library Number of Clean Reads Number of Full-Length Clean Reads Percentage of Full-Length Clean Reads
After redundant full-length clean reads had been removed, 16,824, 17,708, 15,744, and 18,246 non-redundant full-length transcripts were detected in the four groups mentioned above, with N50 values of 1889 bp, 1830 bp, 1797 bp, and 1858 bp and average lengths of 1503 bp, 1478 bp, 1516 bp and 1546 bp, respectively (Table

Table 3 .
Overview of Apis mellifera full-length transcripts.

Table 4 .
Overview of the annotation of full-length transcripts in A. mellifera.The figures enclosed in parentheses denote the number of annotated transcripts.Only the principal annotation details are shown.

Table 5 .
Overview of the annotation of novel genes in A. mellifera.The figures enclosed in parentheses denote the number of annotated novel genes.Only the principal annotation details are exhibited.

Table 6 .
Detailed information about structural optimization of annotated genes in the A. mellifera reference genome (10 presented only).

Table 6 .
Detailed information about structural optimization of annotated genes in the A. mellifera reference genome (10 presented only).

Table 7 .
Search results of A. mellifera SSRs based on MISA.

Table 7 .
Search results of A. mellifera SSRs based on MISA.