1. Introduction
Olfaction is the primary sensory system by which fish detect chemical signals in their aquatic environment and plays a crucial role in behaviors such as foraging, reproduction, migration, population recognition, and predator avoidance [
1]. Compared to other senses such as vision and hearing, olfaction offers several advantages, including longer perception duration, greater detection range, and high sensitivity to various water-soluble compounds, especially in turbid or low-visibility environments [
2,
3,
4]. Fish olfactory recognition primarily relies on the olfactory rosette, where olfactory sensory neurons express multiple receptor gene families that play a crucial role in detecting chemical cues, including main olfactory receptors (MORs), class A G protein–coupled receptors (ORAs), class C receptors (OlfCs), trace amine–associated receptors (TAARs), and formyl peptide receptors (FPRs). Collectively, these gene families form a highly diverse and specialized olfactory system [
5,
6,
7,
8]. Numerous studies have shown that the morphological structure of the olfactory rosette, the types and expression patterns of olfactory receptor genes vary significantly among different fish species and are closely associated with their ecological niches, feeding habits, and environmental adaptations [
9,
10,
11,
12,
13,
14].
Fish feeding habits can be categorized as herbivory, carnivory, omnivory, and filter feeding based on dietary preferences and food selection under natural or aquaculture conditions. From a broad perspective, factors such as species genetic background, developmental stage, environmental conditions, food availability, and artificial feeding interventions can all influence the feeding habits of fish [
15,
16,
17]. At a more specific level, feeding behavior is influenced not only by physiological factors such as digestive tract structure and digestive enzyme activity, but also by their sensory systems, particularly the olfactory system, which plays a crucial role in food selection and feeding behavior [
18,
19,
20]. Research has shown that olfactory receptor genes in fish with different dietary habits exhibit significant differences in gene family composition and expression patterns [
12,
21,
22]. For instance, carnivorous species tend to express receptors that detect amino acids and peptides, whereas herbivorous species preferentially express receptors associated with plant-derived compounds [
23,
24]. Although the essential role of olfaction in feeding regulation has been preliminarily demonstrated, systematic investigations into the expression profiles, functional differentiation, and evolutionary dynamics of olfactory genes in fish with contrasting diets remain limited. In particular, it remains unclear whether herbivorous and carnivorous species exhibit consistent expression trends or signals of positive selection in their olfactory systems.
The family Xenocyprididae, which belongs to the class Actinopteri and order Cypriniformes, represents a taxonomically diverse and ecologically adaptable group of freshwater fish in China. These species are predominantly distributed in the freshwater basins of the East Asian plains, with virtually no natural populations found upstream of the Tiger Leaping Gorge in the Jinsha River or the Hukou Waterfall in the Yellow River [
25,
26]. Some species have also been introduced to western China, Europe, and North America. At present, the family comprises approximately 45 genera and 160 recognized species, exhibiting highly diversified feeding strategies that range from filter-feeding and omnivory to specialized herbivory and carnivory [
27,
28,
29]. Due to their relatively close phylogenetic relationships yet markedly distinct dietary preferences, Xenocyprididae serve as an ideal model for investigating the adaptive mechanisms and regulatory processes underlying feeding behavior in fish. Previous studies have mainly addressed their phylogeny, morphological divergence, and the evolution of feeding-related structures, revealing preliminary associations between dietary traits and morphological features such as mouth morphology and gill raker configuration [
30,
31]. However, within this family, the transcriptional regulation and molecular evolutionary patterns of the olfactory system in the context of divergent feeding habits remain insufficiently characterized, leaving it unclear whether dietary divergence is reflected at the molecular level of olfactory regulation and adaptation. Clarifying these mechanisms is not only important for understanding sensory adaptation and evolutionary processes in fishes, but may also provide useful insights for aquaculture, such as feed optimization and husbandry practices, and for conservation efforts by emphasizing the chemosensory requirements that influence habitat use.
Building on this context, the present study focuses on four representative Xenocyprididae species with distinct feeding habits: two herbivorous species (Ctenopharyngodon idella and Megalobrama amblycephala) and two carnivorous species (Elopichthys bambusa and Culter alburnus). C. idella is widely distributed across major river systems in China and primarily feeds on aquatic macrophytes, while M. amblycephala mainly consumes algae and plant detritus. In contrast, E. bambusa and C. alburnus are typical carnivorous species that prey predominantly on small fish and crustaceans. This study focuses on olfactory organ tissues, aiming to explore the adaptive characteristics of the olfactory system in fish with different feeding habits from two dimensions: transcriptional regulation and molecular evolution. To this end, we generated RNA-seq datasets derived from olfactory rosettes and performed de novo transcriptome assemblies, followed by systematic cross-species comparisons. Specifically, the study (1) identified olfactory-related candidate genes based on transcriptome annotation and analyzed their functional composition and pathway enrichment profiles; (2) performed clustering and trend analysis of expression profiles based on single-copy orthologous genes to uncover feeding-related transcriptional regulatory patterns; and (3) identified genes showing significant expression trends and positively selected genes based on log2FC and Ka/Ks ratios, respectively, and examined their potential roles in ecological adaptation through enrichment analysis. This work may provide new molecular insights and a theoretical basis for understanding the olfactory mechanisms underlying dietary differentiation in freshwater fish.
2. Materials and Methods
2.1. Sample Collection and Olfactory Rosette Processing
Samples were collected in January 2024 from four Xenocyprididae species inhabiting the Jingjiang section of the Yangtze River, China (31.95428° N, 120.12725° E). These included the carnivorous E. bambusa and C. alburnus, and the herbivorous C. idella and M. amblycephala. For each species, six healthy adult individuals were selected. The average total length and body mass of the sampled individuals were: C. idella (585.17 ± 13.12 mm, 3563.50 ± 63.97 g), M. amblycephala (297.83 ± 5.78 mm, 545.05 ± 33.65 g), C. alburnus (447.83 ± 18.19 mm, 906.95 ± 12.39 g), and E. bambusa (582.83 ± 13.39 mm, 2377.12 ± 28.96 g). Age and gonadal stage were not assessed; however, all specimens were confirmed as adults based on body size and external morphology, which minimized potential variation associated with developmental or reproductive status. Following the administration of MS-222 (120 mg/L) anesthesia for 90 s, the olfactory rosettes were rapidly dissected. The tissues were rinsed with phosphate-buffered saline (PBS), transferred to RNA stabilization solution, and flash-frozen in liquid nitrogen. In order to concentrate on interspecific expression trends and reduce individual variation, the olfactory rosettes from six individuals of each species were pooled in equal amounts for RNA extraction and sequencing. All animal experiments were approved by the Ethics Committee of Shanghai Ocean University (Approval No. SHOU-DW-2023-208) and conducted in accordance with relevant animal use regulations.
2.2. RNA Extraction, Library Construction, and Sequencing
Total RNA was extracted using TRIzol reagent (Invitrogen, Waltham, MA, USA), and its concentration and purity were assessed with a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). The integrity of the RNA was evaluated by agarose gel electrophoresis, and the RNA Quality Number (RQN) was determined using an Agilent 5300 Fragment Analyzer (Agilent Technologies, Santa Clara, CA, USA). The mRNA was enriched using Oligo(dT)-attached magnetic beads and fragmented into approximately 300 base pair fragments with a fragmentation buffer. First-strand cDNA was synthesized using random primers and reverse transcriptase, followed by second-strand synthesis to generate double-stranded cDNA. The termini of the cDNA fragments were then subjected to a repair process that involved the addition of an A-tail. Subsequently, Y-shaped adapters were ligated to the fragments. The ligated products were then purified, size-selected, and amplified by PCR to obtain the final libraries. The high-throughput sequencing was performed on the Illumina NovaSeq 6000 platform (Illumina Inc., San Diego, CA, USA).
2.3. De Novo Transcriptome Assembly and Quality Assessment
Raw sequencing reads were subjected to quality control using fastp (
https://github.com/OpenGene/fastp (accessed on 11 September 2025)) to remove adapter sequences, low-quality reads, reads with high N content, and overly short sequences, resulting in high-quality clean data. To preserve species-specific transcriptional information, Trinity (
https://github.com/trinityrnaseq/trinityrnaseq/wiki (accessed on 11 September 2025)) was employed to perform de novo assembly independently for each species using their respective clean data sets. The initial assembly results were optimized using TransRate (
http://hibberdlab.com/transrate/ (accessed on 11 September 2025)), and redundant transcripts were removed using CD-HIT (
http://weizhongli-lab.org/cd-hit/ (accessed on 11 September 2025)) to generate a non-redundant set of transcripts. Assembly completeness was assessed with BUSCO (
http://busco.ezlab.org (accessed on 11 September 2025)) based on the presence of conserved single-copy orthologous genes. In addition, N50 values and transcript length distributions were calculated to evaluate assembly quality. Finally, the clean reads from each sample were mapped back to the assembled reference transcripts to obtain alignment statistics. The Illumina sequence data generated during the current study are accessible through BioProject accession number PRJNA1163501.
2.4. Functional Annotation of Unigenes
Functional annotation was conducted on the non-redundant unigenes assembled for each species. Initially, open reading frames (ORFs) were predicted using TransDecoder (
https://github.com/TransDecoder/TransDecoder (accessed on 11 September 2025)). The resulting sequences were then aligned against six major databases: NR, Swiss-Prot, Pfam, EggNOG, GO, and KEGG. The annotation coverage for each database was calculated concurrently. Protein functions were assigned based on sequence similarity using Diamond and Blast+ against the NR and Swiss-Prot databases. EggNOG was used to obtain COG classifications and orthologous group (OG) information via Diamond, while GO terms were assigned and categorized into biological process (BP), cellular component (CC), and molecular function (MF). KEGG pathways were identified through ID mapping, and conserved protein domains were detected by aligning sequences to the Pfam database using HMMER.
2.5. Identification and Functional Characterization of Olfaction-Related Candidate Genes
To identify candidate functional genes related to olfaction, two annotation strategies were employed in this study: (1) unigenes annotated to the KEGG olfactory transduction pathway (ko04740), and (2) unigenes containing the typical olfactory receptor domain (PF13853, 7tm_4) as identified in the Pfam database. For each species, gene sets derived from both annotations were merged and subjected to clustering and de-redundancy using CD-HIT with a similarity threshold of 80%, resulting in a non-redundant set of olfaction-related candidate genes.
To further understand the functional composition of olfactory-related candidate genes, multi-gene set enrichment analysis was performed based on the union of the two gene sets before redundancy removal. KEGG enrichment analysis was performed using an R script (Fisher’s exact test, Benjamini–Hochberg correction, p-adjust < 0.05), and the results were visualized using a bubble chart. This analysis aimed to elucidate the functional composition characteristics and biological background of olfactory receptor-related genes. The enrichment statistics only reflect functional distribution trends and are not used for statistical inference.
2.6. Clustering and Phylogenetic Tree Analysis Based on Single-Copy Orthologous Gene Expression Profiles
To compare the expression regulation characteristics of conserved genes in different Xenocyprididae species, we conducted an analysis based on single-copy orthologous. First, we used OrthoFinder (
https://github.com/davidemms/OrthoFinder (accessed on 11 September 2025)) to align the protein sequences of the four species and identify single-copy orthologous gene groups (Orthogroups). Next, we used RSEM (
http://deweylab.biostat.wisc.edu/rsem (accessed on 11 September 2025)) in combination with bowtie2 to align the clean reads of each species to their own assembled transcripts, counted the read counts at the gene level, and converted them to TPM values to represent expression levels.
For each single-copy orthologous gene identified, extract the TPM values of the corresponding annotated CDS unigenes in each species to construct a species × gene expression matrix. To eliminate differences in gene expression levels and focus on relative expression trends, standardize the matrix using Z-scores. The standardized expression matrix is used for: (1) bidirectional clustering analysis: Using the R package pheatmap (v1.0.13), hierarchical clustering is performed on both genes (rows) and species (columns) using Euclidean distance and average linkage, and the overall distribution of expression patterns is visualized as a heatmap. (2) expression profile phylogenetic tree construction: Based on the Euclidean distance between species in the standardized expression matrix, the SciPy library (v1.16.0, linkage and dendrogram functions) in Python (v3.11.9) was used to construct a species expression profile phylogenetic tree using the UPGMA method.
2.7. Analysis of Differential Expression Trends and Functional Enrichment Based on Single-Copy Orthologous Genes
Since this study employed a pooled-sample design (six individuals per species mixed together) for cross-species expression comparison, traditional differential expression statistical tests are not applicable. To investigate the trends in expression regulation differences among different feeding habits of Xenocyprididae, we used the MARS method (MA-plot-based method with random sampling) from DEGSeq to analyze four species pair combinations (control vs. experimental: C. idella vs. C. alburnus, C. idella vs. E. bambusa, M. amblycephala vs. C. alburnus, M. amblycephala vs. E. bambusa).
The screening of differentially expressed trend genes (DETGs) follows three criteria: (1) The gene set is limited to single-copy orthologous genes identified by OrthoFinder (excluding paralogous interference); (2) An empirical threshold of |log
2(Fold Change)| ≥ 4 is used to define strong expression change trends; (3)
p-values and false discovery rates (FDR) generated by MARS are used solely as references for gene ranking and not as criteria for statistical significance. To explore the potential biological functions of DETGs, functional enrichment analysis was conducted using the full set of annotated single-copy orthologous genes as the background. GO enrichment analysis was performed using Goatools (Fisher’s exact test, Benjamini–Hochberg correction,
p-adjust < 0.05), while KEGG pathway enrichment was conducted following the same method as described in
Section 2.5.
2.8. Analysis of Positive Selection Detection and Adaptive Gene Functional Enrichment
To detect selection pressures associated with dietary differentiation, this study employed the same herbivorous-carnivorous species pairing design as in the expression trend analysis (experimental vs. control: C. alburnus vs. C. idella, C. alburnus vs. M. amblycephala, E. bambusa vs. C. idella, E. bambusa vs. M. amblycephala), and comparative analysis was conducted based on the single-copy orthologous gene CDS sequences of each paired combination.
First, multiple sequence alignments were performed using PRANK, followed by calculation of the non-synonymous substitution rate (Ka) and synonymous substitution rate (Ks) using KaKs Calculator, resulting in the Ka/Ks ratios for each gene. Significance testing was conducted using Fisher’s exact test to control for false positives. Genes with Ka/Ks > 1 were classified as positively selected genes, those with Ka/Ks < 0.1 as highly conserved genes under purifying selection, and the remaining genes (neutral/weakly selected) were excluded from further analysis. GO/KEGG enrichment analysis was performed separately on the selected positively selected genes and conserved genes (methods as in
Section 2.7). By independently analyzing the enrichment results of the four paired combinations (taking the top 20 entries sorted by
p-adjust), functional terms that occurred in ≥3 combinations were selected as the core functional set related to adaptive evolution.
2.9. qRT-PCR Validation
To validate the reliability of RNA-seq data, qRT-PCR was performed to verify gene expression levels. Seven representative unigenes were selected based on the olfactory transduction pathway.
β-actin was used as the internal reference gene, and the relative expression levels of target genes were calculated using the 2
−ΔΔCt method [
32]. Primer design was performed using Primer Premier 6 software (
Table S1), and primers were synthesized by Meiji Biotechnology Co., Ltd. (Shanghai, China). qRT-PCR reactions were conducted on an ABI 7300 quantitative real-time PCR instrument (ABI, New York, NY, USA), with three biological replicates per sample. For downstream analyses, both qRT-PCR relative expression values (2
−ΔΔCt) and RNA-seq expression values (TPM) were transformed to log
2(Expression + 1). For bar chart visualization, mean ± SD was calculated from the log
2-transformed qRT-PCR replicates. To account for gene-specific differences in expression scales, both RNA-seq and qRT-PCR values were further subjected to z-score transformation within genes before correlation analysis between the two platforms.
Unless otherwise specified, all analyses in this study were performed using the default parameters of the respective tools.
3. Results
3.1. Transcriptome Data Statistics
RNA quality was assessed prior to sequencing. Except for
M. amblycephala, which exhibited a slightly lower RNA quality number (RQN) of 7.4, all other samples had RQN values above 8.5, indicating high RNA integrity. Subsequent transcriptomic sequencing using the Illumina NovaSeq 6000 platform yielded high-quality clean data, with output ranging from 6.35 to 7.17 Gb and Q30 base percentages exceeding 95.5% across all samples (
Table S2).
All clean data were de novo assembled using Trinity, and the assembly results were optimized and summarized (
Table S3). The number of unigenes obtained for per species ranged from 40,229 to 42,405, with N50 values ranging from 1805 bp to 2217 bp. TransRate scores exceeded 0.46 for all assemblies. BUSCO analysis indicated high completeness, with scores above 92.7% for all species except
M. amblycephala (89.2%). Statistical analysis of unigene lengths revealed that 99% of unigenes in all samples were longer than 200 bp, with approximately 90% concentrated between 200 and 3000 bp (
Figure 1A). When clean reads were aligned to the Trinity-assembled transcript reference sequences, the alignment rates were all above 79.74%. Finally, based on Salmon quantitative analysis, the number of expressed unigenes (>0 TPM) was 41,289 (
C. idella), 39,385 (
M. amblycephala), 41,492 (
C. alburnus), and 40,646 (
E. bambusa), respectively.
3.2. Functional Annotation of Unigenes
Functional annotation of unigenes from the four fish species (
C. idella,
M. amblycephala,
C. alburnus, and
E. bambusa) was conducted using six major databases, including NR, Swiss-Prot, Pfam, EggNOG, GO, and KEGG (
Table S4,
Figure 1B). The number of unigenes annotated in at least one database was 26,672 for
C. idella, 25,576 for
M. amblycephala, 26,653 for
C. alburnus, and 25,785 for
E. bambusa. The distribution patterns of functional categories in EggNOG, GO, and KEGG were highly consistent across species.
Most unigenes of each species were annotated to the NR database and showed the highest homology with protein sequences of fish species such as
C. idella,
M. amblycephala,
Anabarilius grahami,
Labeo rohita, and
Cyprinus carpio (
Figure 1C). Functional classification of orthologous groups (OGs) based on EggNOG revealed conserved functional profiles among species (
Figure 1D). The five most prevalent functional categories in all species were: Intracellular trafficking, secretion, and vesicular transport [U]; Posttranslational modification, protein turnover, chaperones [O]; Signal transduction mechanisms [T]; Transcription [K]; and Cytoskeleton [Z].
The annotation patterns of the four species are highly consistent across the three major categories of the Gene Ontology (GO) (
Figure S1A). In the Molecular Function (MF) category, the items with the most annotated genes are binding (GO:0005488) and catalytic activity (GO:0003824); in Cellular Component (CC), they are cell part (GO:0044464) and membrane (GO:0016020); in Biological Process (BP), they are cellular process (GO:0009987) and metabolic process (GO:0008152). Unigenes were annotated to KEGG pathways to elucidate their involvement in biological processes (
Figure S1B). All species were annotated to six major metabolic pathways: environmental information processing (EIP), human diseases (HD), organismal systems (OS), cellular processes (CP), genetic information processing (GIP), and metabolism (M). The top six pathway categories with the highest number of annotated unigenes for each species were concentrated in signal transduction (EIP), cancer: overview (HD), infectious disease: viral (HD), immune system (OS), transport and catabolism (CP), and endocrine system (OS) pathways. Notably, pathways related to chemosensory and neural regulation, such as olfactory transduction (ko04740) and neuroactive ligand-receptor interaction (ko04080), and taste transduction (ko04742), as well as key metabolic and signaling pathways influencing feeding behavior, such as AMPK signaling pathway (ko04152) and insulin signaling pathway (ko04910), were annotated to varying numbers of unigenes. The widespread annotation of these key pathways suggests their potential roles in environmental sensing (particularly food recognition), feeding behavior regulation, and energy metabolism in Xenocyprididae species.
3.3. Identification and Functional Characterization of Olfactory-Related Candidate Genes
To identify candidate genes associated with olfactory function, a dual-annotation strategy was employed, combining Pfam domain (7tm_4, PF13853) and KEGG pathway (ko04740) information to screen unigenes from the four Xenocyprididae species. The results showed that a total of 64 (C. idella), 74 (M. amblycephala), 76 (C. alburnus), and 88 (E. bambusa) candidate unigenes were identified; based on Pfam annotation; based on KEGG annotation, 101 (C. idella), 122 (M. amblycephala), 118 (C. alburnus), and 122 (E. bambusa) candidate unigenes were identified, respectively.
The two types of gene sets identified for each species were merged, and CD-HIT clustering with an 80% similarity threshold was applied to remove redundancy, yielding a non-redundant combined set of domain- and pathway-based candidate genes. To account for clustering units containing multiple unigenes with identical annotations but varying expression levels, the unigene with the highest expression level in each cluster was retained to construct the final functionally non-redundant olfactory candidate gene set (
Table S5). The number of genes was 39 (
C. idella), 36 (
M. amblycephala), 46 (
C. alburnus), and 48 (
E. bambusa), respectively. Among these, the number of genes annotated as olfactory receptor genes was 8 (
C. idella), 15 (
M. amblycephala), 17 (
C. alburnus), and 19 (
E. bambusa), respectively.
To explore the potential functions of these candidate genes, we further conducted KEGG pathway enrichment analysis using the union of domain-type and pathway-type unigenes before redundancy removal (
Figure S2). The top 20 enrichment results showed that the candidate genes of the four species were enriched in the olfactory transduction pathway, supporting their functional relevance to olfaction. Additionally, they were widely enriched in a series of pathways related to neural signal transduction and behavioral regulation, including neuroactive ligand-receptor interaction, circadian entrainment, cAMP signaling pathway, calcium signaling pathway, dopaminergic synapse, adrenergic signaling in cardiomyocytes, gastric acid secretion, GnRH signaling pathway, and oxytocin signaling pathway.
Furthermore, there are certain differences in the distribution of the top 20 enriched pathways among species. For example, the phototransduction pathway is enriched in C. alburnus, E. bambusa, and M. amblycephala but not detected in C. idella, suggesting that some species may exhibit synergistic regulation of olfactory and visual signals. The Salivary secretion and long-term potentiation pathways co-occur in C. idella and E. bambusa, which may be associated with olfactory learning and odor memory. The glutamatergic synapse and insulin secretion pathways are only observed in M. amblycephala, suggesting that olfactory candidate genes may participate in neuro-metabolic coupling processes. The renin secretion pathway is only present in C. alburnus, potentially reflecting its specialized function in regulating homeostasis.
3.4. Identification and Expression Analysis of Orthologous Genes
A total of 3681 single-copy orthologous genes were identified across the transcriptomes of the four species (
Table S6). Based on their TPM values, a Z-score-normalized expression matrix was constructed and subsequently used for bidirectional hierarchical clustering and expression profile phylogenetic tree analysis to investigate the similarities and differences in transcriptomic regulatory patterns among species.
The bidirectional clustering heatmap (
Figure 2A) revealed pronounced expression differences among orthogroups, reflecting a certain degree of species-specific expression trends. Among them,
C. alburnus and
M. amblycephala exhibit low expression trends in most orthogroups and cluster into the same branch. In contrast,
C. idella and
E. bambusa exhibit relatively high expression levels in most orthogroups, but their orthogroup expression distributions exhibit certain differences, and thus they did not exhibit the same degree of topological consistency as the former in the heatmap. To further analyze the overall similarity among different species at the level of orthologous gene expression profiles, Euclidean distances between species were calculated based on Z-score standardized expression values, and a phylogenetic tree of expression profiles was constructed (
Figure 2B). The clustering results showed that the four fish species exhibited a differentiation structure consistent with the heat map on the expression profile phylogenetic tree.
C. alburnus and
M. amblycephala first clustered together as a branch with high similarity, and then formed a subclade together with
E. bambusa.
C. idella was positioned at a greater clustering distance from this subclade, exhibiting distinct expression patterns that highlight its uniqueness in overall transcriptomic trends.
3.5. Analysis of Differential Expression Trends in Directly Homologous Genes
To explore potential differences in gene expression regulation between herbivorous and carnivorous Xenocyprididae species, we compared expression trends based on DEGSeq analysis for four species pair combinations (
C. idella vs.
C. alburnus,
C. idella vs.
E. bambusa,
M. amblycephala vs.
C. alburnus, and
M. amblycephala vs.
E. bambusa). Single-copy orthologous genes showing strong expression differences were defined as DETGs with |log
2(FoldChange)| ≥ 4. The results showed that the
C. idella vs.
E. bambusa pair had the highest number of DETGs (856 in total, with 337 up-regulated and 519 down-regulated), followed by
M. amblycephala vs.
E. bambusa (836 in total, with 572 up-regulated and 264 down-regulated),
C. idella vs.
C. alburnus (800 in total, with 277 up-regulated and 523 down-regulated) and
M. amblycephala vs.
C. alburnus (525 in total, with 357 up-regulated and 168 down-regulated). The distribution of DETGs is presented in a volcano plot (
Figure 2C).
3.6. Functional Enrichment Analysis of Differentially Expressed Trend Genes
GO and KEGG functional enrichment analyses were performed for the DETGs identified from the four species pair comparisons. For each comparison, the top 60 enriched terms/pathways were extracted for trend assessment.
GO enrichment analysis (
Figure S3) revealed that DETGs from multiple species pair comparisons were enriched in functions related to post-transcriptional regulation, predominantly including RNA processing, RNA metabolic process, ribonucleoprotein complex, and RNA binding. These terms were ranked within the top 20 across all four comparisons, indicating a common trend of expression differentiation. These enrichment patterns suggest potential interspecific differences in RNA-level regulatory mechanisms in the olfactory rosette. In addition, terms such as nucleic acid metabolic process and nuclear protein-containing complex were highly enriched in ≥3 comparisons, further supporting the possible core role of nucleic acid metabolism and transcription-associated complexes in gene regulation. By contrast, broader metabolic categories, including cellular macromolecule metabolic process and nucleobase-containing compound metabolic process, were also enriched in multiple groups but ranked relatively lower, and thus can be considered as supplementary trends.
KEGG pathway enrichment analysis (
Figure S4) showed that DETGs from the different species pair comparisons were broadly enriched in pathways related to primary metabolism and xenobiotic processing. Metabolic pathways consistently ranked within the top 20 across all four comparisons, indicating a dominant trend of co-enrichment. In addition, pathways such as drug metabolism, metabolism of xenobiotics by cytochrome P450, and focal adhesion were consistently enriched in ≥ 3 comparisons, suggesting interspecific divergence in the metabolic processing of drugs, odorants, and other exogenous compounds in the olfactory rosette, as well as potential divergence in cellular structural organization and signal integration. Notably, chemical carcinogenesis and pathways in cancer also ranked highly in multiple comparisons; in this context, these terms are more likely to reflect the involvement of metabolic enzyme systems or signal transduction modules rather than pathological processes. The biological relevance of these pathways should be further interpreted in light of the specific physiological functions of the olfactory rosette and the ecological niches of the studied species.
3.7. Ka/Ks Analysis
Compared with differences in gene expression regulation, selection signals reflect slower and more stable evolutionary process, thereby providing complementary insights into species divergence. To investigate these evolutionary patterns, Ka/Ks analysis was performed for the four species pairs. The results (
Table S7) showed that the majority of genes in all pairings were under purifying selection (Ka/Ks < 1,
p < 0.05), indicating a high degree of functional conservation in core biological processes. The proportion of genes under positive selection (Ka/Ks > 1,
p < 0.05) was low across all comparisons (0.87–2.07%), but this nevertheless suggests that a subset of genes may have experienced stronger adaptive pressures during species evolution. Notably, the
E. bambusa vs.
C. idella comparison exhibited the highest proportion of positively selected genes (2.07%), implying the presence of stronger adaptive evolutionary signals in this pairing, which merits further investigation.
Building on this, we further focused on the evolutionary patterns of olfactory-related candidate genes. By comparing the non-redundant olfactory-related candidate genes from each species with the single-copy orthologous gene dataset, a total of 70 orthogroup (OG) identifiers were matched. Further analysis revealed that the oard1 gene (OG0009235) exhibited a Ka/Ks ratio greater than 1 in the E. bambusa vs. C. idella comparison, suggesting that it may potentially be involved in adaptive evolution related to olfactory neural homeostasis or oxidative stress in carnivorous fish species.
3.8. Functional Enrichment Analysis of Adaptive Genes
To explore the conservation and adaptive evolutionary characteristics of genes in the olfactory rosette of fish with different feeding habits, Ka/Ks analysis was performed for the four species pair combinations. Genes under conservative evolution (Ka/Ks < 0.1) and those under positive selection (Ka/Ks > 1) were screened separately, followed by GO and KEGG functional enrichment analyses. For each combination, the top 20 enriched terms were retained, and functional terms recurring in at least three combinations were summarized.
For conserved genes, GO functional enrichment analysis identified 10 terms that were consistently enriched in ≥3 groups (
Figure 3). Among them, membrane, nuclear protein-containing complex, protein-containing complex, and cellular component were enriched across all four groups, mainly related to membrane structure, protein complexes, and fundamental cellular architecture. The other 6 terms, such as catalytic activity and cellular anatomical entity, appeared in three groups. In KEGG pathway analysis, conserved genes were enriched in 17 pathways that were repeatedly enriched in ≥3 comparisons. Among these, 10 pathways were enriched in all four group combinations, including spliceosome, proteasome, ubiquitin-mediated proteolysis, mTOR signaling pathway, basal transcription factors, RNA transport, mRNA surveillance pathway, bacterial invasion of epithelial cells, ribosome, and endocytosis. The remaining 7 pathways, such as DNA replication, mismatch repair, and cell cycle, appeared in 3 groups, with functions primarily related to genetic information processing and cell cycle regulation.
For positively selected genes, GO functional enrichment analysis identified 11 terms that were repeatedly enriched in ≥3 combinations (
Figure 4), encompassing functions related to organelle structure, signal mediation, and molecular binding. Among these, intracellular membrane-bound organelle and molecular function were enriched across all four combinations, while the remaining nine terms, including cytokine-mediated signaling pathway, heterocyclic compound binding, binding, and organelle, were enriched in three combinations. In the KEGG pathway analysis, five pathways ranked within the top 20 in ≥3 combinations. Of these, the Pertussis pathway was enriched in all four combinations, whereas cytokine–cytokine receptor interaction, hematopoietic cell lineage, inflammatory bowel disease (IBD), and Chagas disease (American trypanosomiasis) were enriched in three combinations. These pathways are mainly associated with cytokine regulation, immune cell development, and inflammatory signaling processes. Notably, the enrichment of IBD specifically in carnivorous fish comparisons may reflect selective pressures related to intestinal immune challenges.
In addition, species-pair-specific patterns were also observed. Some functional terms and pathways appeared in the top 20 of only specific combinations, reflecting inter-combination differences. Among conserved genes, the DNA replication pathway was absent from the top 20 in the E. bambusa vs. C. idella combination, whereas the cell cycle pathway did not appear in the C. alburnus vs. C. idella combination. For positively selected genes, the IBD pathway ranked in the top 20 only in combinations involving M. amblycephala (E. bambusa vs. M. amblycephala, C. alburnus vs. M. amblycephala).
3.9. qRT-PCR Validation of Differentially Expressed Trend Genes
To verify the reliability of the RNA-seq results, seven representative unigenes from the olfactory transduction pathway were selected for qRT-PCR analysis. Both qRT-PCR relative expression values (2
−ΔΔCt, mean ± SD, n = 3) and RNA-seq expression levels (TPM) were transformed to log
2(Expression+1) prior to comparison. As shown in
Figure 5A, the qRT-PCR results exhibited expression trends that were generally consistent with those obtained from RNA-seq across the four Xenocyprididae species.
When directly comparing log
2-transformed values between RNA-seq and qRT-PCR, only a moderate correlation was observed (Pearson
r = 0.304,
p < 0.116;
Figure S5). This may reflect inherent differences in quantification scales between the two methods. After z-score transformation within genes, however, a strong positive correlation was detected (Pearson
r = 0.888,
p < 0.001;
Figure 5B). These findings demonstrate that, despite minor numerical differences, qRT-PCR effectively confirmed the relative expression patterns revealed by RNA-seq, thereby supporting the robustness and re-producibility of the transcriptomic data.
5. Conclusions
This study systematically compared the olfactory rosette transcriptomes of four Xenocyprididae species representing herbivorous and carnivorous feeding types. Through transcriptome assembly and functional annotation, olfactory-related candidate genes were identified, and expression trend analyses of single-copy orthologous genes revealed both conserved expression patterns and divergent regulatory features between feeding types. Functional enrichment analyses indicated that these differences were mainly associated with olfactory transduction, metabolism, and neural regulation pathways.
Further Ka/Ks analysis identified several positively selected genes related to sensory perception, immune processes, and metabolic functions, suggesting adaptive divergence of olfactory systems between herbivorous and carnivorous species during evolution. Overall, these findings provide new insights for understanding the molecular mechanisms underlying olfactory adaptation in Xenocyprididae and lay a solid foundation for future studies in freshwater fish sensory ecology and functional genomics.