1. Introduction
Hermaphroditic flowers, which contain both stamens and pistils within the same floral structure, facilitate self-pollination and consequently increase the risk of inbreeding depression and reduced genetic diversity. To counteract these negative effects, flowering plants have evolved a variety of reproductive strategies, among which self-incompatibility (SI) is one of the most widespread and effective mechanisms. SI enables plants to recognise and reject self-pollen, thereby promoting outcrossing and maintaining genetic variation within populations [
1]. Although SI reduces inbreeding and enhances population-level genetic diversity, it can impose constraints on plant breeding and reproductive assurance. Nevertheless, SI systems are maintained across diverse lineages due to their long-term evolutionary advantages, despite the immediate reproductive benefits associated with self-fertilisation [
2].
SI systems have been identified in more than 100 angiosperm families, with the gametophytic self-incompatibility (GSI) system being particularly well characterised in the Solanaceae, Rosaceae, and Scrophulariaceae [
3,
4]. In Rosaceae crop species, including almond (
Prunus dulcis), GSI is controlled by a highly polymorphic multi-allelic
S-locus that contains at least three major gene classes: the pistil-expressed
S-ribonuclease (
S-RNase), pollen-expressed
S-locus F-box (
SLF) or
S-haplotype-specific F-box (
SFB) genes, and associated long terminal repeat (LTR) elements [
5,
6]. Among these,
S-RNase and
SFB are recognised as the primary determinants of
S-locus specificity [
5,
6,
7,
8,
9].
S-RNases are cytotoxic glycoproteins characterised by five conserved regions (C1–C5), one or more hypervariable regions, and several variable regions, and they function as the female specificity determinants in GSI [
6,
7]. According to prevailing GSI models, S-RNases are secreted into the stylar transmitting tissue and are taken up non-selectively into the cytoplasm of growing pollen tubes [
10,
11]. However, the mechanisms regulating S-RNase cytotoxicity differ among plant lineages. In the Solanaceae and the Rosaceae subtribe Malinae, compatible pollen tubes detoxify internalised S-RNases, allowing continued growth, whereas incompatible pollen tubes fail to neutralise S-RNase activity, leading to growth arrest [
10,
11,
12,
13].
In contrast, species within the genus
Prunus exhibit a distinct self-recognition mechanism. In this system, the pollen-expressed S-haplotype-specific F-box protein (SFB) is proposed to protect self S-RNases from degradation by general inhibitor proteins, including SLF-like or SFB-like proteins, thereby triggering pollen tube rejection [
14,
15,
16]. The
Prunus SFB gene encodes an F-box domain along with multiple hypervariable and variable regions [
6,
7]. Notably,
Prunus species, including almond, possess only a single SFB gene per
S-locus, in contrast to
Malinae species, which harbour multiple
SFB-brother (
SFBB) genes [
14]. Genes encoding SLF/SFB proteins are typically arranged in tandem arrays within the
S-locus. In Solanaceae, individual haplotypes carry approximately 17–18
SLF genes, whereas Malinae haplotypes contain 17–19
SLF/
SFBB genes [
17,
18,
19]. However, the organisation, number, and evolutionary dynamics of
SLF/
SFB-related genes in
Prunus, particularly in almond, remain poorly understood.
Early models proposed that SLF and SFB proteins function as part of an SCF (SKP1–Cullin1-F-box) E3 ubiquitin ligase complex that targets non-self S-RNases for ubiquitin-mediated degradation [
20,
21,
22]. Recent studies in sweet cherry (
Prunus avium), however, suggest a more complex recognition system in which S-RNases interact with both
S-locus F-box-like (PavSLF1–3) and SFB-like (PavSFBL2) proteins [
16,
23,
24]. Despite these advances, it remains unclear whether SLF-like proteins, SFB-like proteins, or both are required for S-RNase recognition and how these interactions collectively mediate pollen compatibility or rejection. Distinct SCF complex configurations have been reported across Rosaceae species, including canonical SSK1-containing SCF complexes in sweet cherry, non-canonical SBP1-containing complexes in wild dwarf almond, and both forms in apple [
23,
25,
26]. Despite these observations, the precise molecular interactions among SCF components and their roles in determining compatibility versus incompatibility in
Prunus remain unresolved.
Evolutionary analyses indicate that the GSI system evolved only once prior to the divergence of the Asteridae and Rosidae [
5,
27]. In almond, hypervariable regions of both
S-RNase and
SFB exhibit signatures of strong positive selection and surface exposure, suggesting adaptive coevolution between male and female determinants [
6]. Comprehensive analyses of
S-gene distribution, lineage diversification, and selective pressures are therefore essential for understanding the molecular basis of
S-locus specificity and sexual diversity in
Prunus.
In this study, we conducted an integrative evolutionary–genomic analysis of S-locus architecture and SCF-mediated recognition mechanisms in three Prunus species: P. persica (self-compatible), P. dulcis (self-incompatible), and P. avium (self-incompatible) to characterise the genomic distribution of S-RNase, SFB, SLF, and their related paralogues, to investigate their molecular interactions with SCF components in determining self-compatibility versus self-incompatibility, and to infer the evolutionary forces shaping S-locus diversity and specificity in Prunus.
2. Materials and Methods
2.1. Genomic Structures and Chromosomal Localisation of S-Locus Genes in Prunus
Genomic structures of the
S-locus in
Prunus persica,
P. dulcis, and
P. avium were identified and compared by performing BLASTN (BLASTN v2.15.0) searches using the S
7-haplotype sequence from ‘Keanes’ (GenBank accession MH029539) a the query (word size = 64; E-value ≤ 1 × 10
−8) against the
P. persica v2.0.a1 genome [
28], the
P. dulcis ‘Texas’ v2.0 genome [
29], and the
P. avium v1.0.a1 genome [
30], all of which were obtained from the Genome Database for Rosaceae (GDR) (
https://www.rosaceae.org/).
Chromosomal locations of S-RNase, S-RNase-like, SLF, SLF-like, SFB, and SFB-like genes in Prunus were identified by protein homology searches using BLASTP (BLASTP v2.15.0) (word size = 6; E-value ≤ 1 × 10−5). The query protein sequences were derived from P. dulcis S7 haplotype proteins, including S7-RNase (QDB64815.1), SLF7 (QDB64799.1), and SFB7 (QDB64839.1), and were searched against the same Prunus genome assemblies used for S-locus genomic structure analyses.
BLASTP hits encoding protein sequences longer than 100 amino acids were retained for further analysis. The presence of characteristic domains in candidate S-RNase and SFB proteins was examined using PfamScan v1.6 with the PfamA database (
http://pfam.xfam.org/) [
31]. To visualise the genomic distribution of
S-RNase,
SLF,
SFB, and their related paralogues, physical positions of the putative genes were scaled by dividing genomic coordinates by 4 × 10
5, generating approximate genetic distances comparable to those used in published genetic maps of
P. dulcis [
32],
P. persica [
28], and
P. avium [
30], following the approach described by [
33].
Homologous cDNA sequences corresponding to S-RNase, SLF, and SFB genes in Prunus were further identified by querying S7-RNase (MH316075), SLF7 (MH316060), and SFB7 (MH316104) sequences using BLASTX (v2.15.0) against the NCBI RefSeq protein database, applying an E-value threshold of <1 × 10−25. The number of introns, conserved regions, and variable regions in S-RNase-like, SLF-like, and SFB-like alleles were determined by sequence alignment with previously characterised S-RNase, SLF, and SFB alleles.
S-RNase and S-RNase-like proteins were further characterised by analysing conserved and diagnostic amino acid residue patterns as described by [
28,
34]. The isoelectric points (pI) of these proteins were estimated using the IPC2.0 (
https://ipc2.mimuw.edu.pl/).
2.2. Phylogenetic Analysis of S-Locus Genes in Prunus
DNA sequences of S-RNase, S-RNase-like, SFB, SFB-like, SLF, and SLF-like genes from Prunus, Malus, Nicotiana and Antirrhinum were used for subsequent analyses. Following datasets were used in the analysis:
- (i)
A total of 154 S-RNase and S-RNase-like nucleotide sequences were used to infer phylogenetic relationships among S-RNases in the GSI system. Those include 95 S-RNase from cultivated and wild
Prunus species (
P. dulcis,
P. persica,
P. avium,
P. pseudocerasus,
P. cerasus,
P. armenica,
P. salicinia,
P.mume,
P. domestica,
P. webbii),
P. africana,
P. bucharica,
P. fenzlizna,
P. scoparia,
P. agentea,
P. davidiana,
P. tangutica, 38 from Maloideae (
Malus and
Pyrus), four from Solanaceae (three from
Petunia and two
Nicotiana), one from
Antirrhinum. For the
S-RNase-like, a total of 20
S-RNase-like (extracellular) nucleotide sequences were used. These sequences comprised five from
Prunus dulcis, three from
P. persica, three from
P. avium, one from
P. mume, one from
Pyrus × bretschneideri, one from
Malus × domestica, one from
Pyrus × communis, one from
Lactuca sativa, one from
Glycine max, and one from
Helianthus annuus. Hereafter, this dataset is referred to as the “S-RNase dataset” (
Table S1).
- (ii)
A total of 113 sequences corresponding to SLF, SFB, and related genes were used for subsequent analysis. (16 SLF and 43 SFB nucleotide sequences from
P. dulcis,
P. persica,
P. avium,
P. pseudocerasus,
P. cerasus,
P. armenica,
P. salicinia,
P.mume,
P. domestica,
P. webbii,
P. davidiana,
P.
tangutica) (Prunoideae), 14 SFB from
Malus and
Pyrus (Maloideae), four from
Petunia and
Nicotiana (Solanaceae), 36
SLF-like and
SFB-like sequences from
P. dulcis,
P. persica and P. avium. Here after this dataset is referred as the “SFB dataset” (
Table S2).
Sequences in each dataset were aligned using MUSCLE alignment algorithm in Geneious v10.3 [
35] and conserved and variable regions within the Prunoideae and Maloideae of the
Rosaceae were observed using the Jalview v2.10.5 [
36]. Raw sequence alignments were subsequently manually examined and refined to minimise the gaps using SeaView v4.7 [
37].
Prior to phylogenetic reconstruction, sequence datasets were evaluated for their suitability for evolutionary analyses using the following approaches.
- (i)
Recombination analysis was performed because recombination can generate mosaic sequences in which different regions may have distinct phylogenetic histories. The presence of recombination in the S-RNase and SFB datasets was assessed using NeighborNet network analysis and the pairwise homoplasy index (Phi) test implemented in SplitsTree v5.1.4 [
38].
- (ii)
Substitution pattern homogeneity was examined for the S-RNase and SFB datasets using the disparity index test implemented in MEGA X v10 [
39]. Statistical significance was evaluated using 1000 Monte Carlo replications, and
p-values < 0.05 were considered indicative of significant heterogeneity.
- (iii)
The best-fitting nucleotide substitution model for each dataset was identified using jModelTest v2.1.10 [
40], enabling estimation of nucleotide substitution probabilities along the branches of the inferred phylogenetic trees.
2.3. Phylogenetic Tree Reconstruction
cDNA sequence alignments of the combined S-RNase and S-RNase-like dataset and the combined SFB, SFB-like, and SLF dataset were used to reconstruct phylogenetic trees using both Maximum Likelihood (ML) and Bayesian Inference (BI) approaches.
ML analyses were performed using RAxML v8.2.6 [
41], implementing the GTRCAT evolutionary model. Trees were sampled every 1000 generations, and the optimal ML tree was inferred using default search parameters. Bootstrap support values ≥ 80% were considered to indicate well-supported clades.
BI analyses were conducted using MrBayes v3.2.6 [
42], implementing the best-fitting nucleotide substitution model identified by jModelTest [
40]. For all three datasets, jModelTest selected the general time-reversible model with a proportion of invariable sites and gamma-distributed rate variation (GTR + I + G) as the optimal model based on both the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC) (
Table S3). Sequence data were partitioned by codon position, and the number of substitution types was set to six.
Bayesian phylogenetic inference was performed using a Markov chain Monte Carlo (MCMC) approach with 50 million generations, employing four Markov chains (one cold and three heated). Trees were sampled every 1000 generations, with a burn-in of 25% applied to all datasets. During tree estimation, diagnostic parameters including the Potential Scale Reduction Factor (PSRF), gamma shape parameter, and stationary nucleotide frequencies were monitored to assess model convergence and tree reliability.
Chain convergence was further evaluated using Tracer v1.7 [
43] by inspecting log-likelihood traces and ensuring that effective sample size (ESS) values exceeded 200. A majority-rule consensus tree was generated using TreeAnnotator v2.3.2 [
44], and clades with posterior probability values ≥ 0.95 were considered well supported. Tree topology and node support were visualised and examined using FigTree v1.4.2 [
45].
2.4. Biogeographic Analysis of Prunus S-RNases
For biogeographic analyses, one representative S-RNase sequence per
Prunus species was selected from the geographic region corresponding to its reported collection locality, based on metadata obtained from NCBI records and Plants of the World Online (
https://www.plantsoftheworldonline.org) (
Table S4). Ancestral range estimation was conducted to investigate historical patterns of geographic distribution within
Prunus, with particular emphasis on almond.
Ancestral area reconstruction was performed using two likelihood-based frameworks: BioGeoBEARS (BioGeography with Bayesian and likelihood Evolutionary Analysis in R Scripts v.4.5.2) implemented in R, and RASP (Reconstruct Ancestral State in Phylogenies [
46]). Analyses were based on the BEAST-derived phylogenetic tree inferred from selected
Prunus S-RNase sequences. Three primary biogeographic models were evaluated: DEC, DIVALIKE, and BAYAREALIKE. In addition, models incorporating a founder-event speciation parameter (j) and a dispersal distance parameter (x) were tested.
Geographic distances among regions were estimated using Google Earth (
https://earth.google.com/web/ accessed on 12 May 2025). Distances (measured in kilometres) were rescaled by dividing all values by the smallest observed distance to remove the effect of measurement units on likelihood estimation; oceanic distances were not considered. Model performance was compared using likelihood-ratio tests (LRTs) and the Akaike Information Criterion (AIC).
Each taxon was assigned to one of 12 geographic regions based on its collection locality: (i) Turkey/Turkmenistan/Iraq, (ii) Afghanistan, (iii) Caucasus region (Turkey/Armenia/Azerbaijan/Iran), (iv) Asia, (v) Europe, (vi) China (Central Plain), (vii) USA, (viii) East Asia/South China, (ix) Europe/Asia, (x) Eastern Europe and Western Siberia, (xi) Albania/Algeria/Corsica/East Aegean, and (xii) Africa. Outgroup taxa were excluded from the analysis, as they did not influence ancestral range inference. Although some species occur in multiple regions, taxa were treated as single-region populations because phylogenetic relationships indicated that these species were not monophyletic across their geographic ranges.
The maximum number of ancestral areas allowed per node was set to two, and competing models were evaluated for statistical fit using AIC and LRTs. Finally, biogeographic stochastic mapping (BSM) was conducted in BioGeoBEARS using 500 stochastic simulations under the best-fitting model to estimate the frequency of biogeographic processes, including anagenetic events (range switching and range expansion dispersal) and cladogenetic events (narrow sympatry, subset sympatry, vicariance, and founder-event speciation).
2.5. Divergence Time Estimation
Divergence times were estimated by reconstructing a time-calibrated phylogeny using the Yule speciation model, a random starting tree, and an uncorrelated log-normal relaxed molecular clock as implemented in BEAST v2.5.2 [
47]. Sequence evolution was modelled using the GTR + I + G substitution model, with data partitioned by codon position.
Two independent Markov chain Monte Carlo (MCMC) analyses were run for 30 million generations each, with trees sampled every 1000 generations and a burn-in of 25% discarded prior to downstream analyses. Temporal calibration was applied by specifying divergence-time priors for selected nodes using normal distributions, with a mean of 98.25 million years (Myr) to constrain the split of S-RNase from its most recent common ancestor (MRCA) and the divergence between Rosaceae and Prunoideae, based on published fossil and molecular estimates [
48,
49].
Chain convergence and adequate sampling were assessed using Tracer v1.7 [
43] by examining log-likelihood traces and confirming that effective sample size (ESS) values exceeded 200 for all parameters. A maximum clade credibility (MCC) tree was generated using TreeAnnotator v2.3.2 [
44], and tree topology and node support were visualised and examined using FigTree v1.4.3 [
45].
2.6. 4DTv Analysis of Prunus-Specific Duplication of S-Associated Genes in Rosaceae
Four-fold degenerate third-codon transversion (4DTv) analysis was performed to assess Prunus-specific duplication events of S-associated genes within the Rosaceae. This analysis compared the identified S-RNase, SLF, and SFB genes in the Prunus dulcis genome with their corresponding homologues in P. persica and P. avium.
Syntenic regions were identified using MCScanX v0.8 [
50] for the following genome comparisons:
P. dulcis–P. dulcis,
P. persica–P. persica,
P. avium–P. avium,
P. dulcis–P. persica, and
P. dulcis–P. avium. Identified syntenic gene pairs were subsequently aligned using MUSCLE, and 4DTv values were calculated using a custom Python script (Python v3.10.0).
In addition, 4DTv values were calculated for specific gene-group comparisons to examine divergence patterns among S-associated gene families:
(i) S-RNase genes across Rosaceae, comparing Prunoideae and Maloideae, (ii) Prunoideae S-RNase versus S-RNase-like genes, (iii) SFB genes across Rosaceae, comparing Prunoideae and Maloideae, (iv) Prunoideae SFB versus SFB-like genes, (v) Prunoideae SLF versus SLF-like genes.
2.7. Evolutionary Pressure Associated with Divergence of Prunoideae and Maloideae
Selective pressures acting on
S-RNase and
SLF/
SFB gene families during the divergence of the Prunoideae and Maloideae were assessed using the nonsynonymous-to-synonymous substitution rate ratio (ω = dN/dS), an indicator of protein-level evolutionary constraint. Analyses were performed using branch and site models implemented in PAML v4.8 (Phylogenetic Analysis by Maximum Likelihood [
51]), based on Bayesian phylogenetic trees inferred for the respective gene families.
Estimation of ω values for nonsynonymous (dN/Ka) and synonymous (dS/Ks) substitution rates at individual codon sites was conducted using Codeml from the PAML package [
51] and DnaSP v6.10 [
52]. Codeml was applied to aligned protein-coding DNA sequences together with their corresponding phylogenetic trees to test models allowing heterogeneous ω ratios among sites, enabling detection of adaptive molecular evolution and identification of codons potentially under positive selection.
Selective regimes were interpreted as follows: ω < 1 indicates purifying selection, ω = 1 indicates neutral evolution, and ω > 1 indicates positive selection. Likelihood ratio tests (LRTs) were used to compare nested neutral and selection models. Specifically, Model M1 (neutral) constrains sites to ω = 0 or ω = 1, whereas Model M2 (selection) introduces an additional site class with ω freely estimated. Model M3 (discrete) assumes three site classes (ω0 < 1, ω1 = 1, and ω2 > 1) with proportions estimated from the data. In addition, Model M7 (beta) and Model M8 (beta + ω) were evaluated, with M7 assuming a beta distribution of ω values constrained to ≤1 and M8 allowing an additional site class with ω > 1; these models are considered among the most robust for detecting positive selection.
PAML analyses were conducted separately for two datasets derived from Bayesian phylogenies:
- (i)
Prunoideae S-RNase, S-RNase-like, SFB, SLF, SLF-like, and SFB-like genes;
- (ii)
Maloideae S-RNase, S-RNase-like, and SFB genes.
For each dataset, likelihoods obtained under M7 and M8a (null hypothesis, ω ≤ 1) were compared with those under M8 (alternative hypothesis, ω > 1) using LRTs. Additional LRTs were performed by comparing branch-site models Ma1 (null) and Ma2 (selection). For Model M8, a Bayesian empirical Bayes (BEB) approach was applied to estimate posterior probabilities for individual codons being subject to positive selection.
Complementary analyses were performed using DnaSP v6.10 [
52], employing a sliding-window approach with a window size of 100 bp and a step size of 25 bp to examine variation in nucleotide diversity and selection signals across gene regions. To further reduce the risk of false-positive inferences, the Sitewise Likelihood Ratio (SLR) test [
53] was applied, providing an independent assessment of selection at individual codon sites.
2.8. Functional Divergence and Site-Specific Evolutionary Rate of S-Locus-Associated Proteins in Prunoideae and Maloideae
Functional divergence among S-RNase, S-RNase-like, SFB, SFB-like, SLF, and SLF-like proteins in the Prunoideae and Maloideae were assessed using deduced amino acid sequence alignments and the program DIVERGE v3.0 [
54]. This analysis aimed to identify amino acid residues with a high probability of functional divergence following gene duplication or lineage-specific diversification.
Eight datasets were analysed, including two composite datasets: (i) S-RNase and S-RNase-like, and (ii) SFB, SFB-like, SLF, and SLF-like as well as individual datasets corresponding to each of the six gene families. Functional divergence was evaluated across the Rosaceae family and within the Maloideae and Prunoideae lineages by estimating type I (θI) and type II (θII) functional divergence between predefined clusters. Posterior probability values (Qk) were used to identify amino acid residues most likely to contribute to functional divergence between lineages.
Site-specific evolutionary rates potentially associated with functional differentiation among
S-locus-related proteins were also inferred from deduced amino acid sequences using McRate v1.0 [
55] using the JTT amino acid substitution model, incorporating a 16-category discrete gamma distribution to account for rate heterogeneity among sites. Markov chain Monte Carlo (MCMC) simulations were run for 10,000 generations using default parameter settings. to identify site-specific rate variation.
2.9. Co-Evolution Analysis of S-Locus-Associated Proteins in Prunoideae and Maloideae
Co-evolution among amino acid residue sites was assessed using protein sequences from S-RNase, S-RNase-like, SFB, SFB-like, SLF, and SLF-like datasets in Prunoideae, and S-RNase and SFB datasets in Maloideae. Analyses were conducted using CAPS (Co-evolution Analysis using Protein Sequences) [
56,
57], which detects co-evolving residues by evaluating the correlated variation in evolutionary rates between pairs of amino acid sites, corrected for divergence time among protein sequences.
BLOSUM-corrected amino acid distance matrices were used to identify statistically significant co-evolving residue pairs. Phylogenetic relationships among sequences were incorporated to remove phylogenetic and stochastic dependencies between sites, thereby reducing false-positive correlations. Divergence times were estimated as the mean number of substitutions per synonymous site, which was used for time correction in the analysis.
CAPS analyses were performed using the following parameters: an alpha significance threshold of 0.001, 1000 simulated alignments, a bootstrap confidence threshold of 0.95, and convergence criteria enabled with time correction activated. To interpret the biological relevance of detected co-evolving residues, three-dimensional protein structures generated later in this study were used to assess the spatial proximity and potential functional dependencies among co-evolving amino acid sites in Prunoideae S-locus-associated proteins.
2.10. Protein–Protein Interaction Analysis of the Prunus GSI System
Protein–protein interaction (PPI) networks associated with key components of the
Prunus gametophytic self-incompatibility (GSI) system were examined using STRING v11 [
58]. Analyses were performed for
Prunus persica (self-compatible) and
P. dulcis (self-incompatible), focusing on the proteins S-RNase, SFB, SKP1, CUL1 (Cullin-1), and RBX1. In PPI networks, nodes represent proteins and edges represent predicted or known interactions.
To construct the component PPI network, five S-locus-associated proteins: S-RNase, SFB, SLF, SFB-like, and SLF-like were included. From this component network, S-RNase and SFB were selected to derive a backbone network representing core interaction relationships within the GSI system. Network topology was analysed based on principles of graph theory, as the structural properties of PPI networks provide insights into functional organisation and biological relevance.
To evaluate the importance of individual proteins within the PPI networks and characterise node connectivity, centrality, and potential functional significance within the GSI-associated interaction networks, several topological parameters were calculated, including degree (k), betweenness centrality (BC), eccentricity, closeness centrality (CC), eigenvector centrality (EC), and the clustering coefficient [
59,
60].
3. Results
3.1. Prunus S-Loci
The S-loci of
P. dulcis,
P. avium, and
P. persica are located at comparable positions on chromosome 6, at approximately 27–28 Mb in
P. dulcis and
P. persica, and 21–22 Mb in
P. avium. Comparative sequence analysis showed that regions of the
P. dulcis S-locus share 79–97% sequence identity with
P. persica and 65–92% identity with
P. avium (
Figure S1). While the boundaries of the
S-loci exhibit high sequence similarity among the three species, most internal regions display substantial sequence divergence. Notably, a 19–28 kb region of the
S-locus in
P. dulcis shows the highest sequence identity (89–97%) with the corresponding region in
P. persica. In addition, several regions of the
P. dulcis S-locus produced significant hits to chromosomes 1, 3, 5, 7, and 8 across all three examined
Prunus genomes (
Table S5). Across the three
S-loci, we identified six
S-RNase-like genes, including the S-RNase, in both
P. dulcis and
P. persica, and nine in
P. avium (
Table S6). These genes are located on chromosomes 1, 5, 6, and 8 in
P. dulcis and
P. persica, and on chromosomes 0, 1, 3, 6, and 8 in
P. avium (
Figure 1).
The number of introns differed between
S-RNases and
S-RNase-like genes in all three genomes.
S-RNases consistently contained two introns, whereas
S-RNase-like genes contained either one or three introns (
Table S6). Despite this difference, the intron positions in
S-RNase-like genes were conserved relative to those in
S-RNases.
A total of 24, 31, and 33
SLFL genes (including
SLF) and 28, 32, and 25
SFB-like genes (including
SFB) were identified in
P. persica,
P. dulcis, and
P. avium, respectively (
Tables S7 and S8).
SLFL genes were distributed across all chromosomes except chromosomes 5 and 7 in all three genomes. In contrast,
SFB-like genes were located on all chromosomes except 7 and 8 in
P. dulcis, except 5, 7, and 8 in
P. persica, and except 7 in
P. avium (
Figure 1).
Both
SLF and
SFB genes exhibited variation in intron number, ranging from zero to three introns (
Tables S7 and S8). Comparison of
SFB and
SFB-like proteins revealed the presence of all characteristic regions, including the F-box motif, V1–V5, HVa, and HVb domains. Considerable sequence variation was detected in the F-box motif as well as in the V1, HVa, and HVb regions (
Figure S3a).
The molecular weight of S-RNases ranged from 24.14 kDa in
P. avium to 26.87 kDa in
P. dulcis and
P. persica, whereas S-RNase-like proteins ranged from 13.39 kDa in
P. persica to 27.76 kDa in
P. dulcis. The isoelectric point (pI) of S-RNases varied from 5.39 in
P. avium to 8.99 in
P. dulcis and
P. persica, while the pI of S-RNase-like proteins ranged from 4.38 in
P. persica to 9.10 in
P. avium. Across all three
Prunus genomes, S-RNase-like genes located on chromosome 1 exhibited the lowest pI values (<5), except for a single
S-RNase-like gene identified in
P. avium (
Table S6). Among the four conserved T2-RNase amino acid patterns (P2–P4) identified in S-RNases and S-RNase-like proteins, greater sequence variation was observed in the P2 and P4 regions. In addition, differences in the positions of cysteine bridges between S-RNases and S-RNase-like proteins were detected (
Figure S2a).
3.2. Phylogenetic Analysis
NeighborNet analyses of both the S-RNase/S-RNase-like and SFB/SFB-like/SLF/SLF-like datasets revealed largely unconnected and weakly reticulated networks, with only a few cross-links (
Figures S4 and S5). Consistent with this pattern, the PHI test for recombination implemented in SplitsTree provided no evidence for recombination in either the
S-RNase/
S-RNase-like or the
SFB/
SFB-like/SLF/SLF-
like gene sets (
p = 1;
Table S9). The limited number of cross-links observed likely reflects alternative patterns of relatedness among some groups rather than recombination events.
Bayesian phylogenetic analyses conducted in MrBayes showed potential scale reduction factor (PSRF) values ranging from 1.0 to 1.1, and all other convergence diagnostics were within acceptable limits, indicating that the resulting trees were reliable (
Table S10). Phylogenetic trees inferred using Bayesian and maximum likelihood (ML) methods for the S-RNase and SFB datasets were largely congruent in topology. The only notable difference was that Maloideae S-like S-RNases clustered with Prunoideae S-RNases in the ML analysis; however, this did not affect inference of relationships between Prunoideae and Maloideae S-RNases. In both analyses, the majority of backbone nodes and major clades were strongly supported (≥85% bootstrap support and ≥0.95 posterior probability), and S-RNases from the two genera were clearly separated into Prunoideae and Maloideae clades. Within each genus, alleles from different species are clustered together.
Non-S-RNases and
S-RNase-like genes formed distinct groups separate from the main S-RNase clades (
Figure 2;
Figure S6).
Three distinct types of S-RNase-like genes were identified based on phylogenetic placement. S-RNase-like type 1 grouped with Prunus S-RNases, type 2 clustered with Maloideae S-RNase-like genes, and type 3 grouped with Helianthus annuus S-RNases.
A Bayesian phylogenetic tree constructed using 20 selected S-RNase alleles from Prunoideae suggested that
P. dulcis may have originated through interspecific hybridisation involving
P. webbii,
P. bucharica, and
P. fenzliana (
Figure S7). In addition,
P. davidiana,
P. pseudocerasus, and
P. domestica were inferred as close relatives of
P. persica,
P. avium, and
P. mume, respectively. Bayesian analysis of the SFB dataset showed clear separation of
SFB genes between Prunoideae and Maloideae (
Figure S8). Within Prunoideae,
SFB genes formed a distinct clade, separate from
SLF,
SLF-like, and
SFB-
like genes. Although
SLF and
SLF-
like genes were closely related, they were phylogenetically distinct (
Figures S8 and S9). As observed for
S-RNases, alleles from different species clustered together within genus-level clades. Interestingly, Maloideae
SFB genes appeared more closely related to Prunoideae
SLFs than to Prunoideae
SFBs.
Detailed examination of branching patterns across these phylogenies suggests that at least two major split events occurred prior to the establishment of
S-locus genes in Rosaceae from their most recent common ancestor (
Figure S10).
3.3. Biogeographic Analysis
The ancestral range reconstructions inferred using BioGeoBEARS and RASP for S-RNases were largely congruent. Here, we present the results obtained from BioGeoBEARS. Within the BioGeoBEARS framework, all three standard models (DEC, DIVA-like, and BayArea-like) showed significant improvements in likelihood when both founder-event speciation (+j) and the dispersal distance parameter (+x) were incorporated. Among the twelve models evaluated, three provided acceptable fits to the data: DEC + j + x, DIVA-like + j + x, and BayArea-like + j + x (
Table S11).
These three models yielded broadly congruent ancestral range reconstructions, differing only slightly in the estimated probabilities for backbone nodes within the Asian clade. For interpretation, we selected the DIVA-like + j + x model, as it provided more conservative estimates for the ancestral ranges of key backbone nodes, particularly those involving Asia/Mediterranean versus Asia or Mediterranean origins (
Figure 3). The DIVA-like + j + x model showed a significantly better fit than the null DIVA model based on a likelihood ratio test (lnL = −32.15 vs. −41.75; df = 2;
p < 0.001). The estimated parameters under this model included an anagenetic dispersal rate (d) of 1.523, an extinction rate (e) of 1.1152, a cladogenetic dispersal (founder-event) rate (j) of 1.235, and a negative dispersal distance parameter (x = −1.4325).
Ancestral area estimation based on the DIVA-like + j + x model and biogeographic stochastic mapping (BSM) implemented in BioGeoBEARS indicated that the ancestral ranges of
P. argentea and
P. scoparia are largely unresolved, with approximately 35% uncertainty, compared with other
Prunus species (
Figure 3). BSM analyses further suggested that cladogenesis in
P. dulcis involved a combination of vicariance (16%), narrow sympatry (18.2%), and founder-event speciation (8.2%) (
Table S12). At the genus level,
Prunus was inferred to have most likely originated in the Asia/Mediterranean region.
Event matrix analyses for the P. dulcis clade identified three major dispersal events: from central regions of China to other parts of China; from the Turkey–Turkmenistan–Iraq region to the Caucasus areas of Turkey, Armenia, Azerbaijan, and Iran; and from Asia or the Mediterranean region to the United States. BioGeoBEARS also inferred four principal speciation areas associated with Prunus, two of which were located in central China (Central Plain) and one in the Caucasus/Turkey/Iran region.
3.4. Divergence Time Estimates
Divergence time analyses indicated that the initial split of S-RNases from their most recent common ancestor occurred approximately 120 million years ago (Myr; 95% HPD: 85–165 Myr). The Rosaceae clade was estimated to have diverged around 85 Myr (95% HPD: 80–90 Myr), with Prunoideae separating from other Rosaceae lineages at approximately 62 Myr (95% HPD: 56–68 Myr). Within Prunoideae, the origins of S-RNase and SFB were inferred at approximately 58 Myr (S-RNase: 48–68 Myr; SFB: 46–70 Myr, 95% HPD;
Figure 4 and
Figure 5).
S-RNase-like genes were estimated to have originated around 55 Myr (95% HPD: 54–56 Myr), while
SLF,
SLF-
like, and
SFB-
like genes emerged at approximately 56 Myr (95% HPD: 53–59 Myr;
Table 1).
Divergence time estimation based on selected S-RNase sequences from 20
Prunus species (
Figure 4) indicated that the genus
Prunus originated approximately 62 million years ago (Myr; 95% HPD: 55–67 Myr). Within
Prunus, divergence times were estimated at approximately 48 Myr for
P. dulcis (95% HPD: 41–57 Myr), 35 Myr for
P. persica (95% HPD: 23–49 Myr), and 51 Myr for
P. avium (95% HPD: 42–60 Myr). Although these three economically important species diverged within a relatively narrow temporal window of approximately 5–15 million years, the comparatively narrower 95% HPD interval associated with
P. dulcis suggests a more temporally constrained divergence history relative to
P. persica and
P. avium.
3.5. 4DTv Analysis to Determine Prunus Specific S-Associated Gene Duplication
The fourfold degenerate transversion (4DTv) values of
S-associated genes varied across the three
Prunus genomes examined. In
P. dulcis, 4DTv values ranged from 0.06 ± 0.02 to 0.41 ± 0.03, whereas in
P. persica they ranged from 0.02 ± 0.01 to 0.20 ± 0.02. In
P. avium, 4DTv values ranged from 0.08 ± 0.02 to 0.37 ± 0.02 (
Figure S11a–c). Pairwise comparisons of S-associated genes between species yielded mean 4DTv values of 0.39 ± 0.02 for
P. dulcis-
P. avium, 0.46 ± 0.02 for
P. dulcis-
P. persica, and 0.44 ± 0.01 for
P. persica–
P. avium.
At broader taxonomic scales, mean 4DTv values for
S-RNase and
S-RNase-like genes were 0.43 ± 0.02 in Rosaceae, 0.39 ± 0.01 in Prunoideae, and 0.29 ± 0.02 in Maloideae (
Figure S11d–f). For the complete SFB dataset, corresponding mean 4DTv values were 0.37 ± 0.05 for Rosaceae, 0.42 ± 0.04 for Prunoideae, and 0.28 ± 0.01 for Maloideae. Within Prunoideae, 4DTv values for
SFB-like,
SLF, and
SLF-like genes ranged from 0.06 ± 0.02 to 0.18 ± 0.02. Comparisons between Prunoideae and Maloideae yielded mean 4DTv values of 0.38 ± 0.01 for
S-RNase and 2.10 ± 0.07 for
SFB. In addition, comparisons between gene families within Prunoideae showed mean 4DTv values of 0.18 ± 0.11 for
S-RNase versus
S-RNase-like, 0.11 ± 0.03 for
SFB versus
SFB-like, and 0.08 ± 0.01 for
SLF versus
SLF-like genes.
3.6. Residues Under Selective Pressure in S-Locus-Associated Genes in Prunoideae
Analyses based on the ratio of nonsynonymous to synonymous substitutions (Ka/Ks) combined with phylogenetic methods identified amino acid residues under positive selection in
S-locus-associated genes. In S-RNases, the proportion of residues inferred to be under positive selection ranged from 21% in Maloideae to 25% in Prunoideae (
Figure S2b). In contrast, SFB proteins showed lower proportions of positively selected residues, with approximately 12% in Prunoideae and 4% in Maloideae (
Figure S3b). For S-RNase-like, SFB-
like, SLF, and SLF-
like proteins, only 3–5% of sites were inferred to be under positive selection. Site-specific evolutionary rate analyses further indicated elevated evolutionary rates within variable and hypervariable regions of both S-RNase and SFB proteins.
For S-RNases, overall Ka/Ks ratios ranged from 0.53 to 0.58 in Prunoideae (Ka = 0.12–0.15; Ks = 0.26–0.28) and from 0.69 to 0.88 in Maloideae (Ka = 0.19–0.21; Ks = 0.24–0.28). Within Prunoideae, Ka/Ks ratios for variable regions (V1, V2, and RHV) of S-RNases ranged from 0.87 to 1.5, indicating signatures of positive selection (
Figure S2b). In contrast, Ka/Ks ratios for
S-RNase-like genes ranged from 0.38 to 0.42 (Ka = 0.04–0.07; Ks = 0.11–0.17). Among the three types of
S-RNase-like genes, Ka/Ks ratios were 0.38 for type 1, 0.40 for type 3, and 0.42 for type 2. Notably, the RHV region of Prunoideae S-RNases showed evidence of positive selection (Ka/Ks = 1.5), whereas the corresponding region in Maloideae exhibited a lower Ka/Ks ratio (0.65). In comparison, the RHV region of Prunoideae
S-RNase-like genes had a Ka/Ks ratio of 0.39.
For
SFB genes, Ka/Ks ratios ranged from 0.48 to 0.57 in both Prunoideae and Maloideae (Ka = 0.10–0.16; Ks = 0.21–0.28;
Figure S3b). Lower Ka/Ks ratios were observed for
SFB-like and
SLF/
SLF-like genes, ranging from 0.21 to 0.34 (Ka = 0.01–0.05; Ks = 0.03–0.18) in both Prunoideae and Maloideae (
Figures S3b and S12).
3.7. The Evolutionary Selective Pressure
Likelihood ratio tests (LRTs) comparing codon substitution models that allow positive selection (M2a and M8; ω > 1) with corresponding null models that do not allow ω > 1 (M1a, M7, and M8a) provided strong evidence for positive selection acting on both S-RNase and SFB genes. In Rosaceae, the likelihood ratio statistics for S-RNases and SFB were 39.59 and 29.10, respectively (
p ≤ 0.001;
Table S13). Similarly, in Prunoideae, likelihood ratio values of 45.56 for S-RNases and 38.17 for SFB indicated highly significant signatures of positive selection (
p ≤ 0.001). For
S-RNase-like,
SLF,
SLF-like, and
SFB-like genes, likelihood ratio values were 22.14, 18.15, 7.90, and 7.50, respectively (
p ≤ 0.01), indicating weaker but still significant evidence of positive selection.
In Maloideae, LRTs also supported positive selection on S-RNases and SFB, with likelihood ratio values of 12.59 and 39.10, respectively (
p ≤ 0.001). Across both Prunoideae and Maloideae, the largest numbers of positively selected sites were detected in S-RNases (58 and 45 sites, respectively), followed by SFB (18 and 28 sites, respectively) (
Figures S13 and S14). In Prunoideae, fewer positively selected sites were identified in
S-RNase-like (nine sites) and
SFB-like (one site) genes (
Figures S12 and S13), and in
SLF and
SLF-like genes (12 and one site, respectively;
Table S14;
Figure S15).
Validation using the Site Likelihood Ratio (SLR) test showed that the number of positively selected residues supported at ≥90% confidence was reduced by only two to three sites per gene, indicating that the majority of inferred positively selected sites were robust to alternative statistical testing approaches (
Table S14).
3.8. Functional Divergence in S-Locus-Associated Genes
Estimates of evolutionary rates indicated that S-RNases evolve substantially faster than pollen-associated
S genes (
Tables S15–S17). The rate of evolution of S-RNases was estimated at 2.3 substitutions per site per million years (95% HPD: 0.9–4.4), whereas SLF/SFB genes showed a lower rate of approximately 1.0 substitutions per site per million years (95% HPD: 0.6–1.4). In contrast,
S-RNase-like,
SFB-like, and
SLF-like genes exhibited markedly slower evolutionary rates of 0.08, 0.08, and 0.03 substitutions per site per million years, respectively.
Functional divergence analyses revealed substantial differentiation between Prunoideae and Maloideae for core
S-locus genes. The average Type I functional divergence (θI) between Prunoideae and Maloideae was high for S-RNase (0.71 ± 0.08) and SFB (0.45 ± 0.06) (
Table S15), whereas much lower values were observed for comparisons between S-RNase and S-RNase-like (0.04 ± 0.07), SFB and SFB-like (0.03 ± 0.06), and SLF and SLF-like (0.01) (
Tables S16 and S17). Similarly, Type II divergence (θII) between Prunoideae and Maloideae was substantial for S-RNase (0.58 ± 0.18) and SFB (0.49 ± 0.06), but minimal for S-RNase versus S-RNase-like (0.03 ± 0.07), SFB versus SFB-like (0.02 ± 0.06), and SLF versus SLF-like (0.01).
Within Prunoideae, Type I divergence values were 0.39 ± 0.08 for S-RNase and 0.24 ± 0.06 for SFB, while lower values were observed for S-RNase versus S-RNase-like (0.06 ± 0.01), SFB versus SFB-like (0.04 ± 0.06), and SLF versus SLF-like (0.01). Corresponding Type II divergence values within Prunoideae were 0.10 ± 0.08 for S-RNase and 0.29 ± 0.06 for SFB, with minimal divergence between S-RNase and S-RNase-like (0.03 ± 0.01), SFB and SFB-like (0.01 ± 0.06), and SLF and SLF-like (0.01). Within Maloideae, Type I divergence for S-RNase and SFB was 0.48 ± 0.08 and 0.30 ± 0.06, respectively, whereas Type II divergence values were 0.11 ± 0.08 for S-RNase and 0.30 ± 0.06 for SFB. In contrast, divergence between
S-RNase and
S-RNase-like genes in Maloideae remained low for both Type I and Type II (0.02 ± 0.07;
Tables S15–S17).
3.9. Site-Specific Evolutionary Rates in S-Locus-Associated Proteins in Prunoideae and Maloideae
The average site-specific evolutionary rate of S-RNases across Rosaceae, based on a combined dataset of Prunoideae and Maloideae, was slightly higher (0.99) than that estimated from the Prunoideae (0.97) and Maloideae (0.87) datasets analysed separately. For SFB, average site-specific evolutionary rates were lower, with values of 0.69, 0.65, and 0.65 for Rosaceae, Prunoideae, and Maloideae, respectively. As expected, S-RNase-like, SLF, SLF-like, SFB, and SFB-like genes in Prunoideae exhibited substantially lower average site-specific evolutionary rates than canonical S-RNase and SFB genes, with mean values of 0.40, 0.30, 0.30, 0.20, and 0.20, respectively.
Within Prunoideae S-RNases, the highest site-specific evolutionary rates were detected in the RHV region, while variable regions also showed elevated rates relative to conserved regions, which exhibited comparatively lower evolutionary rates (
Figure S16a). In
SFB genes, the hypervariable regions (HVa and HVb) displayed higher site-specific evolutionary rates than other regions, whereas the variable regions (V1–V4) and the F-box motif also showed moderately elevated rates (
Figure S16c). In contrast,
SLF genes contained only a few sites with evolutionary rates exceeding 2 (
Figure S16e). Among
S-RNase-like,
SFB-like, and
SLF-like genes, certain regions showed locally elevated site-specific evolutionary rates, although overall rates remained lower than those observed in canonical
S-locus genes (
Figure S16b,d,f).
In Maloideae, S-RNases similarly exhibited elevated site-specific evolutionary rates in the RHV region, while SFB genes showed higher evolutionary rates in the hypervariable regions (
Figure S17a,b), consistent with patterns observed in Prunoideae.
3.10. Co-Evolving Amino Acid Residues in S-Locus-Associated Proteins of Prunoideae and Maloideae
Co-evolving amino acid residues were identified using CAPS analysis with a pairwise correlation threshold greater than 0.5. In the
Prunus S-RNase dataset that included both S-RNase and S-RNase-like sequences, 105 groups of co-evolving sites were detected, involving a total of 40 amino acid residues clustered into groups of two or more residues (
Table S18). In contrast, analysis of the S-RNase-only dataset identified six groups of co-evolving sites comprising 19 amino acid residues. Of these residues, 15 were located in the N-terminal region, and additional residues were distributed across the C1, V3, and V2 regions (one, nine, and three residues, respectively;
Table S18). The minimum physical distances between co-evolving residues in
Prunus S-RNases ranged from 2.5 Å to 11 Å, with a mean distance of 4.2 Å.
For S-RNase-like proteins, two groups of co-evolution were detected, involving a total of two amino acid residues. In Maloideae S-RNases, two groups of co-evolving sites comprising nine amino acid residues were identified. Within Prunoideae S-RNases, co-evolving residues were predominantly located in the C1, C2, and RHV regions. Residue pairs exhibiting higher co-evolutionary potential included Cys-Cys, Arg-Asp, Asp-Asn, Lys-Asp, His-Thr, His-Tyr, His-Glu, His-His, Asp-His, and His-Ser. In Maloideae, co-evolving residues were mainly detected in the RHV region, with Cys-Cys, His-Thr, Asp-Asn, Asp-Lys, and His-Ser showing higher co-evolutionary potential.
For the Prunoideae SFB dataset, which included SFB, SFB-like, SLF, and SLF-like sequences, 15 groups of co-evolving sites were detected, involving a total of 11 amino acid residues (
Table S19). When analysed separately, SFB proteins exhibited three groups of co-evolution comprising nine residues, with Lys-Ile, Leu-Leu, Phe-Leu, Glu-Arg, and Ser-Arg showing higher co-evolutionary potential. In SFB-like proteins, one group of co-evolving sites involving three amino acid residues was detected, while SLF proteins also exhibited a single co-evolving group comprising three residues (
Table S20). In contrast, SLF-like proteins showed five groups of co-evolution involving a total of 15 amino acid residues. In Maloideae SFB proteins, 19 groups of co-evolving sites were identified, comprising a total of 49 amino acid residues (
Table S19).
3.11. Protein-Protein Interactions in the Prunus Self-Incompatibility System
The giant component of the protein-protein interaction (PPI) network generated using STRING for
P. persica comprised 188 nodes (
Table S21) connected by 1124 edges (
Figure 6). Of these interactions, 324 involved S-RNases, 102 involved SFBs, 318 involved Cullin-1, 53 involved SKP-1, 103 involved ubiquitin, and 224 involved Rbx-1. In all cases, the observed number of edges was significantly higher than expected by chance (
p ≤ 10
−9), indicating that the inferred interactions among these proteins are unlikely to be random and represent biologically meaningful associations.
Analysis of the backbone network centred on S-RNase and SFB revealed extensive connectivity with neighbouring proteins, reflected by a high average node degree (33.2 ± 0.78) and substantial betweenness centrality (0.10 ± 0.02), indicating that these proteins occupy central positions within the network. Both the giant component analysis using five seed proteins and the backbone network analysis focusing on S-RNase and SFB (
Figure S18) consistently indicated interactions between S-RNase and Rbx-1, ubiquitin, and several additional proteins (
Tables S22 and S23). Notably, S-RNase was not inferred to interact directly with SFB or other F-box-like proteins. Instead, these network-based results suggest that interactions between S-RNase and the SCF
SFB/SLF complex are mediated indirectly through Rbx-1. Furthermore, Rbx-1 was not predicted to interact with S-RNase-like proteins.
The F-box protein binds with SKP-1, and with other F-box proteins, SKP-1 would interact with Cullin-1, F-box, Rbx-1 and heat shock proteins. Cullin-1 would bind with Rbx1, Cullin-associated NEDD8-dissociated protein, Jun kinase activation protein, proteasomal subunits, DNA damage-binding protein, non-ATPase regulatory subunit 6 of the 26S proteasome and Ubiquitin-conjugating enzyme E2. Therefore, during SI reaction in the SCFSLF/SFB complex all these proteins interact with each other with one or several other proteins.
3.12. Differences Between Self-Compatible and Self-Incompatible PPI Networks
The primary differences between self-compatible and self-incompatible protein–protein interaction (PPI) networks were observed within the S-RNase, SFB, and SKP-1 subnetworks. Overall, the self-incompatible network involved a greater number of interacting proteins and exhibited higher network complexity than the self-compatible network. In the S-RNase subnetwork of the self-incompatible system, interactions with rRNA methyltransferases, components of the exosome complex, and cyclin-dependent kinases were detected. In the SFB subnetwork, SFB was inferred to interact with the protein SGT-1 and several SKP-1-like proteins. In addition, the SKP-1 subnetwork of the self-incompatible system included interactions with components of the COP9 signalosome complex, auxin response factor 2, and an endoplasmin homolog. Pathway enrichment analyses revealed that several biological pathways were shared between the self-incompatible and self-compatible systems. These included auxin-activated and ethylene-activated signaling pathways, phenylpropanoid biosynthesis, phenylalanine metabolism, plant–pathogen interaction, ubiquitin-mediated protein catabolic processes, stamen development, pollen coat protein signaling, apoptosis, calcium signaling, and multiple defence-related pathways (
Table S24).
4. Discussion
4.1. Genomic Architecture and Evolutionary Origins of the S-Locus
Analysis of the genomic organisation of
S-RNase,
S-RNase-like,
SFB,
SFB-like,
SLF, and
SLF-like genes across three
Prunus genomes revealed broadly similar patterns of tandem arrangement, inter- and intra-chromosomal duplication, and high sequence similarity at
S-locus boundaries. These features indicate repeated duplication events involving
SFB,
SLF, and
S-RNase alleles during
Prunus evolution, consistent with previous observations in
P. avium [
64]. Notably, some
P. dulcis cultivars harbour two independent self-incompatibility loci on chromosomes 6 and 8 [
65], and our comparative mapping identified partial synteny primarily across chromosomes 6, 3, and 8, supporting the possibility of ancestral
S-locus duplication in Rosaceae. Tandem duplication of ancestral
Prunus S-RNase,
SFB, and
SLF genes may have promoted functional redundancy and accumulation of non-functional copies, facilitating rapid functional divergence, as reported in
Arabidopsis [
66] and tetraploid
Prunus [
67]. The expansion and pronounced structural variation of
SFB-like and
SLF-like genes may therefore underlie functional diversification of pollen-expressed SI components in
Prunus, potentially enabling lineage-specific mechanisms not observed in other gametophytic self-incompatibility systems, such as protection of self S-RNase from inactivation by general inhibitor proteins in
P. avium [
12].
However, at least two subsequent duplication events likely generated the present-day
S-locus architecture in Prunoideae, producing functional
S-RNase,
SFB, and
SLF genes alongside non-functional duplicates that facilitated rapid diversification, as reported in
P. avium [
61,
68]. Across Rosaceae, Prunoideae, and Maloideae, 4DTv patterns (0.43, 0.39, and 0.29, respectively) suggest that Rosaceae divergence preceded the establishment of Prunoideae and Maloideae, which share ancestral
S-RNase and
SFB genes. The lower 4DTv values observed in
P. persica (0.02–0.10) compared with
P. dulcis (0.08–0.41) and
P. avium (0.09–0.37) indicate reduced genetic diversity in the
P. persica genome.
Divergence time estimates based on BEAST analyses indicate that S-RNase originated ~122.5 Myr, consistent with the RNase-based gametophytic self-incompatibility (GSI) system predating the rosid-asterid split (~120 Myr) [
62]. In contrast, divergence of Rosaceae (~85 Myr; 95% HPD: 80–90) and Prunoideae (~62 Myr; 95% HPD: 56–68) occurred later, in agreement with recent estimates [
49,
63]. Discrepancies with earlier studies likely reflect differences in fossil calibration and taxon sampling [
69]. Within Prunoideae,
S-RNase and
SFB diverged at ~58 Myr, whereas
S-RNase-like,
SLF,
SLF-like, and
SFB-like genes originated slightly earlier (~55–56 Myr), indicating rapid radiation of GSI genes within Rosaceae and underlying extensive
S-locus diversity.
Higher 4DTv values for S-RNases relative to S-RNase-like genes further indicate greater genetic diversity and support a limited role for S-RNase-like genes in specificity determination. Together with the observed 4DTv patterns and divergence time estimates, reinforce the inference that S-associated genes have experienced long and complex evolutionary histories, characterised by both ancient divergence and lineage-specific retention.
4.2. Phylogenetic Lineages and Biogeographic Diversification
Disparity Index tests indicated that all sequences included in the S-RNase and SFB datasets evolved under similar substitution patterns, suggesting that observed sequence differences primarily reflect genuine evolutionary divergence rather than heterogeneous substitution processes. Although intragenic recombination has previously been reported in Rosaceae S-RNases [
68], PHI tests implemented in SplitsTree detected no evidence of recombination in either the S-RNase or SFB datasets (
p = 1). The presence of only a few cross-links in the resulting networks suggests limited alternative patterns of relatedness, most likely reflecting shared ancestry among S-RNase and the three S-RNase-like groups rather than recombination-mediated sequence exchange.
Further, Maximum likelihood and Bayesian phylogenetic analyses of the S-RNase/S-RNase-like and SLF/SLF-like/SFB/SFB-like datasets consistently resolved well-supported and distinct clusters, indicating that
S-RNase and
SFB genes from the two Rosaceae genera
Prunus and
Malus belong to separate evolutionary lineages derived from a common ancestral gene. Notably, none of the Prunoideae
S-RNase or
S-RNase-like genes associated with gametophytic self-incompatibility clustered with Maloideae
S-RNases, suggesting that Prunoideae and Maloideae
S-locus genes represent paralogous lineages that subsequently evolved distinct self-incompatibility mechanisms. Similar patterns have been reported previously [
18].
Although
Prunus and
Malus SFB genes formed separate clades,
Malus SFBs showed closer phylogenetic affinity to
P. dulcis SLFL genes than to
Prunus SFBs. Across Rosaceae,
SLF and
SLF-like genes consistently formed distinct clusters, supporting their classification as separate gene lineages, corresponding to SFB and SFB-like groups. Overall, the phylogenetic relationships inferred here are largely consistent with previous studies of Rosaceae
S-RNases and
SFBs [
18,
27,
64], while providing additional resolution among cultivated and wild
Prunus species and clarifying relationships among
S-RNase,
S-RNase-like, and extracellular RNases, as well as among
SFB,
SFB-like,
SLF, and
SLF-like genes.
The clustering of alleles from different species within the same genus provides evidence for trans-specific polymorphism in both
S-RNase and
SFB genes in Prunoideae and Maloideae. In addition, phylogenetic incongruences observed within these groups may also shaped by the recent (approximately 5 million years ago) events such as incomplete lineage sorting, hybridization, or introgression among taxa. Similar patterns have also, been reported in other plant lineages such as
Cyrtandra [
69].
In addition, integrated biogeographic, divergence-time, and evolutionary analyses indicate that the diversification of Prunus and its self-incompatibility (SI) system was also shaped by both deep-time climatic events and lineage-specific evolutionary dynamics. BioGeoBEARS analyses support an Asian, particularly Central/East Chinese, origin of Prunus and P. dulcis, with founder events, distance-limited dispersal, and vicariance contributing to current distributions. These patterns are consistent with reduced chloroplast diversity in Mediterranean almonds and partially reconcile Near-Eastern and Southwest Asian origin hypotheses. Divergence time estimates place early Prunus radiation at ~50–65 Myr, coinciding with the Paleocene-Eocene Thermal Maximum and Early Eocene Climatic Optimum, periods associated with major floristic turnover. Together with evidence of S-locus duplication, strong positive selection, and lineage-specific SI gene diversification, suggest that climatic change, geological uplift in Central Asia, and geographic isolation jointly promoted allopatric speciation and the evolution of distinct SI mechanisms in Prunus.
4.3. Selective Dynamics, Functional Divergence, and Structural Co-Evolution
Strong positive selection was detected for
S-RNase and
SFB in both Prunoideae and Maloideae, whereas
S-RNase-like,
SLF,
SLF-like, and
SFB-like genes showed substantially weaker selective pressure. These patterns are consistent with previous Ka/Ks analyses in
Prunus S-RNase and
SFB genes [
6,
14,
70]. In contrast to earlier reports suggesting weak selection on Maloideae
S-RNases [
64], our results reveal significant selection, indicating lineage-specific differences in evolutionary dynamics.
S-RNase exhibited higher evolutionary rates than
SFB in both lineages, with Prunoideae showing faster rates than Maloideae, supporting divergence from paralogous ancestral genes and the evolution of distinct GSI mechanisms. Differential selection, evolutionary rates, and expression patterns further suggest that
S-RNase and
SFB may not have strictly co-evolved but instead followed partially independent evolutionary trajectories to maintain self/non-self recognition [
18].
Notably, many positively selected residues are located on the protein surface of S-RNase and SFB, making them strong candidates for mediating interactions with other proteins. Such sites likely influence physicochemical properties including charge, hydrophobicity, and conformation, that underpin specific interactions between pistil-S (S-RNase) and pollen-S components (SFB and SLF/SFB-like proteins) during the self-incompatibility response. Similar roles of positive selection have been reported for sex-biased genes in other systems [
71]. Together, these results suggest that positive selection on pistil- and pollen-expressed
S-genes contributes to SI specificity, reproductive success, and may facilitate hybridization and introgression in
Prunus.
However, S-RNases in Prunoideae exhibited fewer positively selected sites (35) than those in Maloideae (40). Within
Prunus,
P. avium showed a higher number of positively selected sites than
P. persica and
P. dulcis, whereas SFB proteins across species contained relatively few positively selected sites (<20). These differences highlight lineage- and species-specific evolutionary dynamics at the
S-locus. In
Prunus, long-term balancing selection driven by negative frequency dependence maintains high polymorphism at the
S-locus [
19,
72].
Further, significant Type I and Type II functional divergence (Qk > 0.7) detected for both S-RNase and SFB between Prunoideae and Maloideae indicates site-specific shifts in selective constraints and pronounced changes in amino-acid physicochemical properties following gene duplication and lineage divergence. These high divergence values also reflect substantial differences in evolutionary rates between the two lineages. Within Prunoideae, Qk values of 0.3–0.4 identified multiple residues with altered selective constraints contributing to interspecific variation. Specifically, 48 Type I sites were detected in Prunoideae, of which 21 (43%) showed significant functional divergence, while Maloideae exhibited 35 Type I sites, including 19 (54%) with significant divergence. Such lineage-specific functional shifts likely influence protein-protein interaction interfaces, hydrophobicity, and conformational dynamics [
73], contributing to mechanistic divergence of gametophytic self-incompatibility systems in Prunoideae and Maloideae.
Co-evolution reflects coordinated evolutionary change between interacting biological entities, whereby mutations in one component impose selective pressure for compensatory changes in another to maintain functional compatibility. In protein systems, such compensatory evolution is often required to preserve complementary structural conformations and interaction interfaces. Consistent with this, we detected substantially more co-evolving sites between S-RNase and S-RNase-like proteins than within S-RNases, SFB/SFB-like, or SLF/SLF-like proteins, suggesting coordinated evolutionary adjustments among these components. Variation in the number of co-evolving sites among S-locus proteins may reflect lineage-specific changes in evolutionary rates or expression patterns driven by differential functional utilisation over time.
The extensive co-evolution detected between S-RNase and S-RNase-like proteins may represent evolutionary constraints required to preserve self-incompatibility (SI) function while maintaining allelic diversity in
Prunus, where loss of SI is often complete and irreversible, as observed in
P. persica [
2,
3]. Given that SI depends on coordinated activity among multiple pistil- and pollen-expressed genes, co-evolving residues likely contribute to maintaining compatibility across interacting proteins throughout the evolution of the gametophytic SI system.
Particularly, high co-evolutionary potential was observed among acid–base residue pairs (e.g., Asp-Arg, Glu-Arg, Glu-Lys, Asp-Lys) in S-RNase and SFB, suggesting selection to maintain balanced ionic interactions. Frequent involvement of histidine-containing pairs further highlights the importance of preserving donor–acceptor flexibility in hydrogen bonding networks, as residues such as His, Ser, Thr, and Tyr can function as both hydrogen donors and acceptors [
74]. In addition, strong co-evolution between Cys-Cys pairs underscores the functional importance of disulfide bonds in maintaining protein structure. Together, these patterns indicate that co-evolving residues play key roles in stabilising ionic interactions, hydrogen bonding, and disulfide linkages that collectively preserve SI specificity during
S-locus evolution.
4.4. Molecular Mechanisms and Interaction Networks of GSI
This study provides the first systematic analysis of the protein–protein interaction (PPI) network underlying gametophytic self-incompatibility (GSI) in Prunus. Using six SI-associated proteins to construct the initial network and focusing on S-RNase and SFB as backbone nodes, our analysis suggests that S-RNase interacts with Rbx-1 and ubiquitin, while SFB associates with the SCF complex through direct interactions with SKP-1, which in turn interacts with Cullin-1 and Rbx-1. These inferred interactions support a model in which S-RNase is regulated indirectly through the SCFSLF/SFB ubiquitin–proteasome pathway rather than via direct binding to pollen F-box proteins and provide conceptual contexts for patterns of co-evolution and functional divergence among S-locus components.
Pathway enrichment analyses identified hormone signaling (auxin and ethylene), ubiquitin-mediated protein degradation, phenylpropanoid metabolism, pollen and stamen development, calcium signaling, apoptosis, and defence-related pathways as central components of the SI response. Similar pathways have been implicated in SI reactions in
Camellia sinensis [
75] and
Theobroma cacao [
76], where auxin and ethylene show contrasting responses following compatible and incompatible pollination, respectively. Calcium signaling components, including CDPKs and annexins, are also known to regulate pollen tube growth and polarity during SI responses [
77,
78].
Together, these results highlight the complexity and dynamic nature of the GSI regulatory network, in which SI components function as part of interconnected macromolecular complexes. Characterising these interaction networks provides new insights into the molecular basis of SI and identifies candidate genes that may be targeted in future functional studies and breeding strategies aimed at manipulating self-incompatibility in Prunus species.
In conclusion, our integrative analyses reveal that the Prunus gametophytic self-incompatibility system arose from an ancient RNase-based ancestral framework and subsequently diversified through gene duplication, lineage-specific selection, functional divergence, co-evolution of pistil- and pollen-expressed components, and dynamic protein-interaction networks shaped by geological history and climatic change, ultimately generating the remarkable molecular and phenotypic diversity of S-locus mechanisms observed across Rosaceae.