Genome-Wide Comparative Analysis of SRCR Gene Superfamily in Invertebrates Reveals Massive and Independent Gene Expansions in the Sponge and Sea Urchin

Without general adaptative immunity, invertebrates evolved a vast number of heterogeneous non-self recognition strategies. One of those well-known adaptations is the expansion of the immune receptor gene superfamily coding for scavenger receptor cysteine-rich domain containing proteins (SRCR) in a few invertebrates. Here, we investigated the evolutionary history of the SRCR gene superfamily (SRCR-SF) across 29 metazoan species with an emphasis on invertebrates. We analyzed their domain architectures, genome locations and phylogenetic distribution. Our analysis shows extensive genome-wide duplications of the SRCR-SFs in Amphimedon queenslandica and Strongylocentrotus purpuratus. Further molecular evolution study reveals various patterns of conserved cysteines in the sponge and sea urchin SRCR-SFs, indicating independent and convergent evolution of SRCR-SF expansion during invertebrate evolution. In the case of the sponge SRCR-SFs, a novel motif with seven conserved cysteines was identified. Exon–intron structure analysis suggests the rapid evolution of SRCR-SFs during gene duplications in both the sponge and the sea urchin. Our findings across nine representative metazoans also underscore a heightened expression of SRCR-SFs in immune-related tissues, notably the digestive glands. This observation indicates the potential role of SRCR-SFs in reinforcing distinct immune functions in these invertebrates. Collectively, our results reveal that gene duplication, motif structure variation, and exon–intron divergence might lead to the convergent evolution of SRCR-SF expansions in the genomes of the sponge and sea urchin. Our study also suggests that the utilization of SRCR-SF receptor duplication may be a general and basal strategy to increase immune diversity and tissue specificity for the invertebrates.


Introduction
Immunity refers to the function of the body's immune system to recognize self and nonself substances and eliminate antigenic foreign substances through immune responses to maintain physiological balance [1,2].The immune system consists of innate immunity and acquired immunity [3], which are composed of immune organs, immune cells, and immune active substances, and have various functions such as immune surveillance, defense, and regulation [4].In the kingdom Animalia, more than 95% are invertebrates [5,6].It is generally believed that invertebrates lack acquired immunity and depend solely on innate immunity for pathogen resistance, encompassing the entire process from pathogen recognition to elimination [7].Pathogen-associated molecular patterns (PAMPs) are a class of conserved molecular structures found on the surface of pathogens, including bacteria, viruses, and fungi [8].Pattern recognition receptors (PRRs) are non-clonal recognition molecules distributed on the surface of natural immune cells that can recognize PAMPs [9].The PRR family includes Toll-like receptors (TLRs), scavenger receptors with cysteine-rich domains (SRCRs), C-type lectin receptors (CLRs), and nucleotide-binding oligomerization domain-like receptors (NLRs), among others [10,11].As an important receptor family in the innate immune system, PRRs recognize and interact with PAMPs on the surface of pathogens [12], which is critical for initiating innate immune responses [13].Once these patterns are recognized, PRRs trigger a series of cell signaling events, activate immune responses, and help clear pathogens from the body.
The scavenger receptor cysteine-rich gene superfamily (SRCR gene superfamily, SRCR-SF) is a receptor family that is rich in cysteine residues and was first proposed in the 1990s [14].The structure of SRCR-SFs is diverse, but most contain an N-terminal signal peptide, one or multiple SRCR domains, a transmembrane domain, and a C-terminal cytoplasmic tail [15].The number and arrangement of these domains differ among various subfamilies of SRCR-SFs.The broad criteria have led to a lack of detailed classification of SRCR-SF members, with differentiation being based solely on the type of SRCR domain.SRCR domains are divided into Group A and Group B based on the number and position of conserved cysteine residues [16,17], with Group A having six cysteine residues that form three pairs of disulfide bonds, and Group B having eight cysteine residues that form four pairs of disulfide bonds [18].For instance, CD5 (cluster of differentiation 5) [19], CD6 (cluster of differentiation 6) [20], and DMBT1 (deleted in malignant brain tumor 1) [21] belong to Group B, whereas MARCO (macrophage receptor with collagenous structure) [22], Mac2BP (Mac-2 binding protein) [23], and LOX (lysyl oxidase) family (LOX-like 2, LOX-like 3, and LOX-like 4) [24] belong to Group A. Until now, research on SRCR-SFs within Group B has primarily focused on vertebrates, with only one reported instance in the invertebrate Geodia cydonium [25].Furthermore, no SRCR-SF has been found to have both Group A and Group B SRCR domains [26].The SRCR-SFs participate in the recognition and binding of various ligands, including lipids, proteins, carbohydrates, and other molecules [27].The diverse functions of SRCR-SFs as a pattern recognition receptor have all been extensively studied, including pathogen recognition, modulation of the immune response, maintenance of epithelial homeostasis, involvement in stem cell biology, and contribution to tumor development [28,29].An expanded superfamily of 218 SRCR-SF models in the sea urchin [30] was first reported.Subsequently, the expansion of SRCR-SFs was also identified in the Pacific oyster (Crassostrea gigas) [31] and the amphioxus (Branchiostoma floridae) [32].Examination of SRCR-SFs in the scallop Chlamys farreri revealed their involvement in immune responses, not only to bacterial invasion but also to fungal invasion [33].These findings underscore the potential utilization of SRCR-SF receptors as a general strategy in invertebrate immune recognition [18,28] and suggest the possibility of predicting features of an ancestral bilaterian pattern.However, a large-scale overview of the SRCR-SF's macroevolution pattern is still absent.
In this study, we delved into the immune diversity and evolutionary history of SRCR-SFs across metazoans, with a particular focus on invertebrates.Our analysis encompassed the identification of 1747 members of SRCR-SFs in 29 metazoans.Notably, we observed a significant expansion in gene numbers within the genomes of the sponge and the sea urchin.Separately, our investigation extended to analyzing their domain architecture and phylogeny, revealing a remarkably diverse composition of domains within SRCR-SFs.We meticulously examined genomic locations, exon-intron structures, and motif arrangements to elucidate the molecular mechanisms driving this extensive expansion.Our genome-wide comparative analysis of SRCR-SFs highlighted an independent evolution of gene expansion specific to the sponge and sea urchin.This finding underscores the likelihood that the dynamic duplication of SRCR-SF genes represents a widespread strategy for immune recognition in invertebrates, potentially serving as a foundational mechanism for immune processes in metazoans.

Identification of SRCR-SFs in Metazoans
To study the evolution history of SRCR-SFs in metazoans, we combined sequence homology-based and signature domain predication tools to annotate SRCR-SFs in the genome of 29 representative species with different evolutionary positions.We annotated a total of 1747 SRCR-SF genes (Supplementary Table S7) [34][35][36][37][38][39] and found that SRCR-SFs are widely present in metazoans (Figure 1).A total of 296 SRCR-SFs were annotated in the genome of the sponge Amphimedon queenslandica of the phylum Porifera, suggesting the massive expansion of SRCR-SFs in the sponge.Besides the sponge, massive expansion of SRCR-SFs was also found in the Echinodermata Strongylocentrotus purpuratus, resulting in 357 SRCR-SF gene models.The expansions of SRCR-SFs were also found in the Chordata Branchiostoma floridae, the Echinodermata Acanthaster planci, and the Brachiopod Lingula anatina.The findings above indicate that employing the replication of SRCR receptors could be a pivotal strategy in augmenting immune diversity among metazoans [40,41].Additionally, gene family constriction was also a general pattern during metazoan evolution [40,41].For example, 5 and 13 SRCR-SFs were identified in the bilaterian basal clade Schistosoma mansoni and Hofatenia miamia.In the Ecdysozoa clade, no more than four SRCR-SF gene models were predicted across the three studied species.Notably, within the Lophotrochozoa clade, which comprises one of the most species-rich marine fauna [42], the diversity in the number of gene models encoding SRCR-SFs was evident.For instance, the Brachiopod Lingula anatina exhibited 113 identified SRCR-SFs, while the Bryozoa Bugula neritina had only 12 annotated SRCR-SFs.Furthermore, bivalves, in general, showcased a higher count of gene models compared to those in gastropods.

Expression Profiles of Metazoan SRCR-SFs
Given the substantial expansions and diverse domain structures observed in metazoan SRCR-SFs, our investigation aimed to ascertain their functionality.To achieve this, we compiled all available lophotrochozoan tissue transcriptome data from the NCBI GEO database (up to October 2023).We conducted an analysis to determine the tissue-specific expression levels of SRCR-SFs in nine evolutionarily representative species (Figure 2, Supplementary Tables S7 and S8), encompassing the porifera Amphimedon queenslandica; the cnidaria Nematostella vectensis; the phoronidan Phoronis austrailis; the nemertean Notospermus geniculatus; the brachiopod Lingula anatina; the mollusks Octopus bimaculoides and Crassostrea gigas; and the echinodermata Acanthaster planci and Strongylocentrotus purpuratus.Among the 1069 SRCR-SFs studied in the nine species, clearly visible from the graphical representation is the heightened expression of SRCR-SFs within tissues linked to digestive glands and their associated functions across all species in the spectrum, which is notably evident in Amphimedon queenslandica and Strongylocentrotus purpuratus.This discovery indirectly demonstrates the expansion of SRCR-SFs within tissues associated with digestive function across various metazoans.Furthermore, it implies that the expansions of SRCR-SFs might confer advantageous effects on specific immune functions, such as mucosal immunity, within the digestive system of these metazoans.Collectively, these findings suggest that numerous SRCR-SFs exhibit high expression levels in immune-related tissues, specifically digestive glands, and potentially play a crucial role in innate immune recognition.

Extensive Expansion of SRCR-SFs in the Sponge
To understand the distinct expansion pattern of SRCR-SFs in metazoan evolution, we focused on investigating duplication mechanisms in the two species exhibiting notable SRCR-SF expansions.Initially, we scrutinized the genomic arrangements and exon-intron structures of SRCR-SFs in sponges.Our analysis revealed that 158 out of 296 (53.38%)SRCR-SFs were organized in tandem arrays (Figure 3A,B), indicating a potential origin of sponge SRCR-SFs through tandem gene duplication.

Extensive Expansion of SRCR-SFs in the Sponge
To understand the distinct expansion pattern of SRCR-SFs in metazoan evolution, we focused on investigating duplication mechanisms in the two species exhibiting notable SRCR-SF expansions.Initially, we scrutinized the genomic arrangements and exon-intron structures of SRCR-SFs in sponges.Our analysis revealed that 158 out of 296 (53.38%)SRCR-SFs were organized in tandem arrays (Figure 3A,B), indicating a potential origin of sponge SRCR-SFs through tandem gene duplication.
exhibiting matching sequence lengths.This suggests a potential recent tandem duplication event involving these genes (Figures 4B and S1).
Subsequently, we examined the replication unit within these tandem-linked SRCRs (Figure 5).In the NW_003546273.1 tandem SRCR-SFs, Type 1 and Type 2 SRCR-SFs were identified as duplicates, each containing one and two SRCR domains, respectively.Type 3 and Type 4 SRCR-SFs displayed duplication patterns involving both one and two SRCR domains.This led us to hypothesize that the varied 'unit' duplications might arise from disulfide bonds formed internally or interdomain within the SRCR domain.Within these tandem clusters, NW_003546273.1 stood out, housing nine SRCR-SFs (Figure 3C) segregated into two subclusters by non-SRCR genes.Our focus narrowed to the longer subcluster encompassing six SRCR-SFs: LOC105316866, LOC109580453, LOC109580452, LOC105316867, LOC105316868, and LOC105316869.The initial four genes encoded similar domain architectures with two SRCR domains, while LOC105316868 and LOC105316869 were annotated with four and six SRCR domains, respectively (Figure 4A).Notably, all SRCR domains within these six SRCR-SFs displayed relative conservation in exon-intron architecture and sequence homology.For instance, LOC105316866 and LOC109580453 shared identical exon and intron structures, with six out of seven exons exhibiting matching sequence lengths.This suggests a potential recent tandem duplication event involving these genes (Figures 4B and S1).
Subsequently, we examined the replication unit within these tandem-linked SRCRs (Figure 5).In the NW_003546273.1 tandem SRCR-SFs, Type 1 and Type 2 SRCR-SFs were identified as duplicates, each containing one and two SRCR domains, respectively.Type 3 and Type 4 SRCR-SFs displayed duplication patterns involving both one and two SRCR domains.This led us to hypothesize that the varied 'unit' duplications might arise from disulfide bonds formed internally or interdomain within the SRCR domain.

Extensive Expansion of SRCR-SFs in Sea Urchin
To explore the extensive expansion of SRCR-SFs in sea urchin, we delved into deciphering the genetic basis of gene expansion within their genome.Contrasting with sponges, a higher proportion, 253 out of 359 (70.47%),SRCR-SFs in the sea urchin genome were organized into tandem clusters (Figure 6A).For instance, on scaffold NW_022145609.1, a tandem array accommodated 16 SRCR-SFs (Figure 6B), with 15 exclusively comprised of SRCR domains (Figure 6C).
encoded each SRCR domain across three exons.However, on scaffold NW_022145609.1,an exception emerged: all SRCR domains were encoded within a single exon, with the only deviation observed in the gene LOC115928495 (Figure 7A).
We aimed to ascertain the contribution of exon shuffling to SRCR-SF domain complexity [45].This hypothesis was tested by identifying sibling paralogs with high sequence similarity based on the phylogenetic tree and comparing their exon-intron architectures.Widespread exon-intron structure divergence was evident within SRCR-SFs.In the sibling paralog pair LOC586908 and LOC764936, their corresponding exonic sequences exhibited significant alignment between LOC586908 exon 1 (1S-E1) and LOC764936 exon 2 (5S-E2), displaying 84% sequence similarity at the nucleotide level.An exon loss event was inferred in LOC586908, stemming from a stop codon mutation (TAC to TAG), resulting in the loss of an SRCR domain in the SRCR-SFs.A similar exon loss variation was also observed in the domain grafting of the sibling paralogs 6S-E1/4S-E1 and 7S-E1/4S-E1 (Figure 7B).In the genomic analysis, sea urchin SRCR-SFs showcased notably diverse exon-intron architectures.Examining these 16 SRCR-SFs revealed the presence of seven distinct exonintron structure types.Interestingly, within sponge-linked SRCR-SFs, the majority encoded each SRCR domain across three exons.However, on scaffold NW_022145609.1,an exception emerged: all SRCR domains were encoded within a single exon, with the only deviation observed in the gene LOC115928495 (Figure 7A).
We aimed to ascertain the contribution of exon shuffling to SRCR-SF domain complexity [45].This hypothesis was tested by identifying sibling paralogs with high sequence similarity based on the phylogenetic tree and comparing their exon-intron architectures.Widespread exon-intron structure divergence was evident within SRCR-SFs.In the sibling paralog pair LOC586908 and LOC764936, their corresponding exonic sequences exhibited significant alignment between LOC586908 exon 1 (1S-E1) and LOC764936 exon 2 (5S-E2), displaying 84% sequence similarity at the nucleotide level.An exon loss event was inferred in LOC586908, stemming from a stop codon mutation (TAC to TAG), resulting in the loss of an SRCR domain in the SRCR-SFs.A similar exon loss variation was also observed in the domain grafting of the sibling paralogs 6S-E1/4S-E1 and 7S-E1/4S-E1 (Figure 7B).

Comparative Analysis of SRCR-SF Distribution in the Sponge and Sea Urchin
The distribution and genetic basis of SRCR-SF expansion were investigated in sponge and sea urchin genomes to unravel distinct patterns of gene expansion.In the sponge, 53.38% of SRCR-SFs were organized in tandem arrays, suggesting a potential origin through tandem gene duplication.Within specific clusters, such as NW_003546273.1,six SRCR-SFs exhibited conserved exon-intron structures and domain architectures, indicating recent tandem duplication events.These duplications displayed variations in SRCR domain numbers (ranging from two to six), hinting at potential mechanisms influenced by disulfide bond formations.
Contrastingly, in the sea urchin, 70.47% of SRCR-SFs were organized in tandem clusters, indicating a higher prevalence of tandem gene arrangements compared to sponges.Analysis of a scaffold (NW_022145609.1)revealed a tandem array of 16 SRCR-SFs, displaying more divergent exon-intron architectures than observed in sponge SRCR-SFs.The sea urchin SRCR-SFs exhibited seven distinct exon-intron structure types, notably differing from sponge exon arrangements.Investigation into exon shuffling highlighted variations in exon-intron architectures among sibling paralogs, indicating instances of exon loss and domain modifications within SRCR-SFs.

Comparative Analysis of SRCR-SF Distribution in the Sponge and Sea Urchin
The distribution and genetic basis of SRCR-SF expansion were investigated in sponge and sea urchin genomes to unravel distinct patterns of gene expansion.In the sponge, 53.38% of SRCR-SFs were organized in tandem arrays, suggesting a potential origin through tandem gene duplication.Within specific clusters, such as NW_003546273.1,six SRCR-SFs exhibited conserved exon-intron structures and domain architectures, indicating recent tandem duplication events.These duplications displayed variations in SRCR domain numbers (ranging from two to six), hinting at potential mechanisms influenced by disulfide bond formations.
Contrastingly, in the sea urchin, 70.47% of SRCR-SFs were organized in tandem clusters, indicating a higher prevalence of tandem gene arrangements compared to sponges.Analysis of a scaffold (NW_022145609.1)revealed a tandem array of 16 SRCR-SFs, displaying more divergent exon-intron architectures than observed in sponge SRCR-SFs.The sea urchin SRCR-SFs exhibited seven distinct exon-intron structure types, notably differing from sponge exon arrangements.Investigation into exon shuffling highlighted variations in exon-intron architectures among sibling paralogs, indicating instances of exon loss and domain modifications within SRCR-SFs.This comparative analysis underscores the contrasting patterns of SRCR-SF distribution between sponge and sea urchin genomes.While both exhibited tandem gene arrangements as a prevalent mode of SRCR-SF expansion, differences in exon-intron structures, domain architectures, and the extent of variation within sibling paralogs highlight distinct evolutionary mechanisms driving SRCR-SF diversity in these metazoans.These findings shed light on the genetic basis of SRCR-SF expansion, contributing to a deeper understanding of evolutionary processes shaping gene families across diverse species.

Comparative Analysis of SRCR-SF Structures in the Sponge and Sea Urchin
In our investigation, motif analysis of SRCR-SFs in both the sponge and sea urchin revealed intriguing insights.SRCR domains, categorized into Group A (six cysteines) and Group B (eight cysteines) [46], unveiled an unexpected discovery: a novel type named Group C within sponge SRCR-SFs, showcasing seven conserved cysteines (Figure 8A and Supplementary Table S1).Sequence alignment indicated six shared conserved cysteines across all three groups, with the additional C1 cysteine in the novel Group C SRCR domain, distinct from the Group B SRCR domain's cysteine conservation.This divergence suggests independent evolution of Group B with eight conserved cysteines and the newly identified Group C. Notably, genes encoding Group C SRCR domains proliferated extensively in the sponge genome, while Group A SRCR domain-rich genes expanded significantly in the sea urchin genome.To trace the evolutionary paths of these SRCR domains, particularly focusing on the novel Group C SRCR domain, we constructed a phylogenetic tree utilizing SRCR domains from six representative species spanning different evolutionary stages: A. queenslandica, S. purpuratus, L. anatina, B. floridae, A. planci, C. gigas, and H. sapiens (Supplementary Figure S2).Our phylogenetic analysis suggests that the Group A SRCR domain may represent an ancient architectural form.Furthermore, the prevalence of species encoding Group B SRCR domains within the vertebrate clade hints at lineage-specific duplications of SRCR- Our structural investigations, culminating in the construction of 3D models, revealed striking similarities among the predicted structures of the three SRCR domain groups (Figure 8B).Specifically, within the SRCR domain, the B1 and B4 cysteines formed an internal disulfide bond, whereas the novel C1 cysteine in the Group C SRCR domain showed no involvement in internal disulfide bonding.This observation led us to hypothesize that the novel C1 cysteine might engage in polymer formation through external disulfide bonds, signifying potential functional divergence.
To trace the evolutionary paths of these SRCR domains, particularly focusing on the novel Group C SRCR domain, we constructed a phylogenetic tree utilizing SRCR domains from six representative species spanning different evolutionary stages: A. queenslandica, S. purpuratus, L. anatina, B. floridae, A. planci, C. gigas, and H. sapiens (Supplementary Figure S2).Our phylogenetic analysis suggests that the Group A SRCR domain may represent an ancient architectural form.Furthermore, the prevalence of species encoding Group B SRCR domains within the vertebrate clade hints at lineage-specific duplications of SRCR-SFs harboring the Group B domain.Intriguingly, out of the 29 species examined, genes encoding Group C SRCR domains, totaling 765 genes, were predominantly identified in the sponge.Conversely, sea urchins housed only two genes, and sea stars contained just one gene encoding the Group C domain.
Additionally, to ensure high-confidence sequences, we meticulously refined the three gene models through manual optimization using transcriptomic read mapping.The presence of Group C SRCR domains across diverse clades indicates potential multiple origins or divergences, underscoring the novelty and complexity of this domain's evolutionary history.

Discussion
Invertebrates, lacking a canonical adaptive immune system, rely entirely on their innate immune systems.Natural selection and fitness have driven the emergence of diverse survival strategies among invertebrates to thrive in pathogen-rich environments.Previous studies suggest species like the sea urchin [30] and the Pacific Oyster [31] have undergone significant immune reorganization and specificity through expanding their repertoire of innate immune receptors.However, the evolutionary history and molecular functions of SRCRs, one of the major immune receptors, remain largely unknown.In this study, we systematically delineate the evolutionary trajectory of the SRCR gene family across 29 metazoans, examining their domain architectures, phylogeny, exon-intron structures, motif patterns, and tissue-level gene expressions.To the best of our knowledge, this study represents the first comprehensive analysis of the molecular evolution of SRCR-SFs across invertebrates.
There are several exciting results we would like to highlight.Firstly, a prevalent pattern of expansion in SRCR-SFs has been observed multiple times across diverse metazoans.This expansion of SRCR-SFs in the sponge and the sea urchin coincide with similar expansions in other PRRs in several species, such as TLRs in the sea urchin [30], ApeC-Containing proteins in the amphioxus [47], NLRs in the sponge [48], and C1qDCs in the Pacific oyster [31].Moreover, the degree of expansion in these PRRs varies among species, which is likely influenced by lineage-specific factors, even in the metazoan ancestor such as the sponge.These multiple time expansions underline the rapid evolution of the innate immune system in response to diverse pathogens.Our results suggest a convergent evolution of SRCR-SF expansion between sponges and sea urchins.Similar evolutionary patterns have been observed in TLRs, where the sea urchin genome extensively expanded V-type TLRs, while the Pacific oyster genome predominantly expanded short P-type TLRs [31].It has been shown in the literature that expansions within gene families encoding immune recognition receptors offer a rich source of previously unrecognized immune complexity.This suggests that while invertebrate immune systems rely on similar proteins, the types of these receptors vary widely among species [49].
In our study, SRCR-SFs duplicate in the level of 'gene' in the sponge, while duplicating in the level of 'domain' in the sea urchin.Gene or domain duplications serve as primary driving forces in the evolutionary innovation of genetic systems, generating novel genes that facilitate functional divergence [50][51][52].Tandem duplication involves structural rearrangement by serial replication and insertion of DNA segments, creating adjacent paralogous genes with short interspaces.The exon-intron structure varies among different duplicated genes under positive selection [53].Analysis of the exon-intron structure revealed that expanded SRCR-SFs in sponges were encoded by three exons per SRCR domain, while in sea urchins, each SRCR domain was encoded by a single exon.This process is frequent among innate immune-related molecules, such as TLRs, RLRs, NLRs, and lectins [54].Considering our observations regarding the expansion of SRCR-SFs in the sponge and the sea urchin, the expansion patterns of SRCR-SFs may bear similarity to those of other PRR genes.It is also worthy to note that the relationships of these expansion patterns among different gene families encoding for immune receptors require further investigation.Overall, the expansion mechanisms of immune receptor gene families are also in a lineage-specific pattern.
Secondly, molecular evolution study reveals various patterns of conserved cysteines in the sponge and sea urchin SRCR-SFs, indicating a potential different immune function role of SRCR-SFs between these species.The SRCR domain, a highly conserved component within the SRCR superfamily, encompasses over 30 proteins with diverse functions, including pathogen recognition and modulation of innate immunity.Despite this diversity, the precise roles of the SRCR domain within these proteins remain unclear [55].Cellular mechanisms like differential splicing and post-translational modifications enable a single gene to encode numerous distinct protein products, significantly augmenting the repertoire of encoded protein functions from a finite DNA genome [56], thereby enhancing protein diversity.This adaptive process aids invertebrates in recognizing and eliminating a broad spectrum of pathogens, promoting diversity and specificity in innate immunity among invertebrates [42,49].Similar to immunoglobulin domains, SRCR domains are multifunctional protein elements found in various biological activities and diverse domain contexts.The substantial diversity in the primary sequence of this domain potentially enables proteins to bind to a wide spectrum of ligands.In vertebrates, numerous proteins with SRCR domains play explicit immune roles by directly binding to pathogenic signatures [57].The diverse structures and binding specificities of SRCR proteins broaden their applicability across different categories of pathogens [56,[58][59][60].This evidence also supports the idea that gene family expansions through duplication events within the genome can generate multiple protein variants or homologous proteins at the protein level.
Previous studies have highlighted the SRCR domain as the primary ligand-binding domain in MARCO [61].This domain features conserved cysteines forming internal disulfide bonds, thereby establishing stable structures and diverse recognition regions.Notably, oxidoreductases selectively cleave disulfide bonds with relatively low reduction potentials, sparing structural bonds [62].Hence, the formation of functional disulfide bonds critically regulates protein molecular mechanisms [63,64], which is ubiquitous across numerous proteins and pivotal in governing both their structure and function [63].The extracellular regions of SRCR-SF members manifest either as exclusive arrays of tandemly repeated SRCR domains or as mosaic proteins comprising SRCR domains combined with various other protein modules, such as epithelial growth factor, C1r/C1s Uegf Bmp1, zona pellucida, collagenous regions, fibronectin, and short consensus repeats.Commonly observed among SRCR-SF members are short Pro, Ser, and Thr (PST)-rich polypeptides interspersed among contiguous SRCR domains.On the other hand, three-dimensional structures obtained from crystallization experiments reveal that both Group A and Group B SRCR domains exhibit a conserved and compact core folding pattern-a curved six-stranded β sheet cradling an α helix-while exhibiting variable outer loop regions, potentially contributing to functional diversity [65][66][67][68][69][70].It is plausible to hypothesize that the cysteine content within the SRCR structure of the sponge and sea urchin, contributing to disulfide bond formation, is intricately linked to their function.These bonds likely play a role in pathogen recognition within the innate immune system, enabling these organisms to mount immune responses for self-protection.Moreover, the discovery of the Group C SRCR domain in our study might represent a novel strategy in the innate immune system's arsenal against pathogeninduced stress.The distinct nature of this Group C SRCR domain compared to Group A and Group B SRCR domains suggests potential unique functionalities within the innate immune system.These nuances warrant further investigation for a deeper understanding of their specific roles and contributions to immune responses.
Finally, our findings indicate that numerous SRCR-SFs showed elevated expression levels within immune-related tissues, particularly the digestive glands, across nine representative metazoans with available tissue expression profiles.This suggests that the amplification of SRCR-SFs could potentially benefit specific immune functions, such as mucosal immunity, within the digestive system of these metazoans.This amplification might also play a significant role in innate immune recognition.Moreover, the question of whether proteins with identical domain architecture exhibit similar functions across different species remains unanswered [53].Examples from the families under discussion indicate that this notion does not universally apply.For instance, while Drosophila Toll-like receptors primarily function in embryonic development, their mammalian counterparts serve as pivotal regulators of immune responses [71,72].Therefore, the abundance of the SRCR gene family in the species of the invertebrates suggests a critical role for SRCR domains in the host defense mechanisms of those animals reliant solely on innate immunity.

Data Collection
In accordance with the principles of literature reporting and species representativeness, we obtained genome, protein, and annotation files for 29 metazoans from the National Center for Biotechnology Information (https://www.ncbi.nlm.nih.gov/,accessed on 1 July 2022) [73] and the OIST Marine Genomics Unit (https://marinegenomics.oist.jp/gallery,accessed on 1 July 2022), as well as other databases.All genomes used the BUSCO v5.2.2 (https://busco.ezlab.org/,accessed on 1 July 2022) suite to assess the completeness of genomes and redundancy (Supplementary Table S2).Transcriptome data of different metazoans were retrieved from the Sequence Read Archive (SRA) (https://www.ncbi.nlm.nih.gov/sra/, accessed on 1 July 2022) database on the NCBI website using the search terms "species names" and "related issues"(Supplementary Table S8).

Identification and Domain Annotation of SRCR-SFs
The identification of candidate SRCR-SF genes in the target gene set began with the use of the local version of HMMER 3.3.2(http://hmmer.org/,accessed on 1 July 2022) [34].The SRCR hmm file (PF00530 https://www.ncbi.nlm.nih.gov/Structure/cdd/PF00530,accessed on 1 July 2022) was employed, and a screening threshold of 1 × 10 −5 was applied.Subsequently, the local version of Blastp 2.12.0+ (https://www.ncbi.nlm.nih.gov/tools/primer-blast/, accessed on 1 July 2022) [35] was utilized to screen SRCR-SF candidate genes within the target species gene set, using the SRCR seed file as input sequences, with the same screening threshold.The results from both steps were integrated, and redundant genes were removed based on information from genome annotation files.To validate the presence of SRCR-SF genes in each reference genome, the TBLASTN algorithm (https://blast.ncbi.nlm.nih.gov/,accessed on 1 July 2022) was applied using validated SRCR-SF proteins as query sequences.A significance threshold of 1 × 10 −5 was employed for this analysis.

Localization and Tandem Repeat Identification of SRCR-SF Genes
We obtained the position information of each SRCR-SF gene from the genome annotation files and visualized it through Mg2c_v2.1 (http://mg2c.iask.in/mg2c_v2.1,accessed on 1 July 2022) [76].To determine the tandem repeat status of SRCR-SF genes, we used the following criteria: (1) adjacent genes are SRCR-SF genes; (2) the distance between two SRCR-SF genes is no more than five genes.For the tandem repeat SRCR-SF genes, we obtained the position information of exons, introns, and coding regions through genome annotation files and the results of interscan domain prediction, and we visualized them through GSDS 2.0 (http://gsds.gao-lab.org,accessed on 1 July 2022) [77].

Conclusions
In this study, we systematically explored the evolutionary dynamics of SRCR-SFs in the metazoans, employing an integrated framework encompassing comparative genomics, structural domain architecture, exon-intron structure, phylogenetic insights, and tissue expression profiles.Our exhaustive investigation led to the identification of a large number of genes coding for SRCR-SFs in metazoans, notably 296 in the sponge and 357 in the sea urchin.This amplification might play a significant role in the diversity and specificity of innate immune recognition.Phylogenetic analysis revealed the expansion of SRCR-SFs within sponge and sea urchin strains, elucidating their species-specific and independent evolutionary paths.In addition, cysteine-rich SRCR domains in the sponge and sea urchin contribute to the formation of disulfide bonds that may play a key role in their innate immune response.The discovery of SRCR domains in Group C not only provides a potential strategy for defense against pathogen-induced stress, but also suggests a unique function of the innate immune system.Finally, the amplified presence of SRCR-SFs, especially in sponge and sea urchin lineages, correlates with heightened expression levels in specific metazoan tissues, notably those associated with digestive functions.These results are expected to shed light on the complex innate immune system that the inveterate utilized as alternative strategies to recognize various pathogens.

20 Figure 1 .
Figure 1.The distribution of members of the SRCR-SFs in 29 representative animal species across metazoans.The colors of species represent different phyla.The red cladogram on the left illustrates the phylogenetic relationships between species, with dashed lines indicating the disputed phylogenetic positions of ctenophores and sponges.The blue histogram on the right chart tallies the total number of identified SRCR-SFs for each species.

Figure 1 .
Figure 1.The distribution of members of the SRCR-SFs in 29 representative animal species across metazoans.The colors of species represent different phyla.The red cladogram on the left illustrates the

Figure 3 .
Figure 3. Genomic distribution analysis of sponge SRCR-SFs.(A) Proportions of clustered and individual members of sponge SRCR-SFs.Blue represents the 138 members of sponge SRCR-SFs scattered on the scaffolds; orange represents the 158 members of sponge SRCR-SFs linked in tandem array.(B) The histogram shows the tandem clusters containing 2, 3, 4, 5, 6, and 7 SRCR-SFs.(C) Gene distribution of the scaffold (NW_003546273.1) with the most SRCR-SFs in the sponge is shown.Pink represents SRCR-SFs, blue represents non-SRCR-SF genes.Numbers represent the quantity of SRCR domains encoded by a certain gene.

Figure 3 .
Figure 3. Genomic distribution analysis of sponge SRCR-SFs.(A) Proportions of clustered and individual members of sponge SRCR-SFs.Blue represents the 138 members of sponge SRCR-SFs scattered on the scaffolds; orange represents the 158 members of sponge SRCR-SFs linked in tandem array.(B) The histogram shows the tandem clusters containing 2, 3, 4, 5, 6, and 7 SRCR-SFs.(C) Gene distribution of the scaffold (NW_003546273.1) with the most SRCR-SFs in the sponge is shown.Pink represents SRCR-SFs, blue represents non-SRCR-SF genes.Numbers represent the quantity of SRCR domains encoded by a certain gene.

Figure 4 .
Figure 4. Gene structure analysis of sponge SRCR-SFs.(A) Gene structure of the tandem gene cluster with the largest number of sponge SRCR-SFs.Blue represents exon, green represents UTR, orange represents SRCR domains, black horizontal lines represent intron, light orange represents SignalP (signal peptide), light green represents TM, and numbers represent sequence lengths.(B) Sequence similarity of sponge SRCR-SFs.Blue represents exon, black horizontal lines represent intron.

Figure 5 .
Figure 5.The replication units of SRCR-SFs.The predicted essential units include a single or two SRCR domains.Type 1 signifies that the SRCR gene is replicated from a single SRCR domain, while Type 2 signifies that the SRCR gene is replicated from two SRCR domains as a group.Type 3 and Type 4 signifies duplication with both Type 1 and Type 2 units.

Figure 4 .
Figure 4. Gene structure analysis of sponge SRCR-SFs.(A) Gene structure of the tandem gene cluster with the largest number of sponge SRCR-SFs.Blue represents exon, green represents UTR, orange represents SRCR domains, black horizontal lines represent intron, light orange represents SignalP (signal peptide), light green represents TM, and numbers represent sequence lengths.(B) Sequence similarity of sponge SRCR-SFs.Blue represents exon, black horizontal lines represent intron.

Figure 4 .
Figure 4. Gene structure analysis of sponge SRCR-SFs.(A) Gene structure of the tandem gene cluster with the largest number of sponge SRCR-SFs.Blue represents exon, green represents UTR, orange represents SRCR domains, black horizontal lines represent intron, light orange represents SignalP (signal peptide), light green represents TM, and numbers represent sequence lengths.(B) Sequence similarity of sponge SRCR-SFs.Blue represents exon, black horizontal lines represent intron.

Figure 5 .
Figure 5.The replication units of SRCR-SFs.The predicted essential units include a single or two SRCR domains.Type 1 signifies that the SRCR gene is replicated from a single SRCR domain, while Type 2 signifies that the SRCR gene is replicated from two SRCR domains as a group.Type 3 and Type 4 signifies duplication with both Type 1 and Type 2 units.

Figure 5 .
Figure5.The replication units of SRCR-SFs.The predicted essential units include a single or two SRCR domains.Type 1 signifies that the SRCR gene is replicated from a single SRCR domain, while Type 2 signifies that the SRCR gene is replicated from two SRCR domains as a group.Type 3 and Type 4 signifies duplication with both Type 1 and Type 2 units.

Figure 6 .
Figure 6.Gene distribution analysis of sea urchin SRCR-SFs.(A) The proportion of scattered and tandem linked SRCR-SFs in the sea urchin genome.Blue indicates that 104 SRCR-SFs are scattered on the scaffolds, while orange indicates that 253 SRCR-SFs are linked in tandem array.(B) A histogram showing the number of clusters containing 2, 3, 4, 5, 6, 7, 9, 10, 13 and 16 SRCR genes.(C) The gene distribution of SRCR-SFs on the scaffold (NW_022145609.1).Pink represents SRCR-SFs, blue represents non-SRCR-SF genes.Numbers represent the quantity of SRCR domains encoded by a certain gene.

Figure 6 .
Figure 6.Gene distribution analysis of sea urchin SRCR-SFs.(A) The proportion of scattered and tandem linked SRCR-SFs in the sea urchin genome.Blue indicates that 104 SRCR-SFs are scattered on the scaffolds, while orange indicates that 253 SRCR-SFs are linked in tandem array.(B) A histogram showing the number of clusters containing 2, 3, 4, 5, 6, 7, 9, 10, 13 and 16 SRCR genes.(C) The gene distribution of SRCR-SFs on the scaffold (NW_022145609.1).Pink represents SRCR-SFs, blue represents non-SRCR-SF genes.Numbers represent the quantity of SRCR domains encoded by a certain gene.

Figure 7 .
Figure 7. Gene structure analysis of sea urchin SRCR-SFs.(A) The gene structure of the SRCR-SFs in sea urchins is shown (gene cluster in Figure 6C).Blue represents exon, orange represents SRCR domain, black horizontal lines represent intron, and white numbers represent sequence length.(B) Analysis of special exon sequence alignment.Sequence alignment analysis was performed on the exons marked by the same color dashed box in (A).Red arrows mark the mutations.

Figure 7 .
Figure 7. Gene structure analysis of sea urchin SRCR-SFs.(A) The gene structure of the SRCR-SFs in sea urchins is shown (gene cluster in Figure 6C).Blue represents exon, orange represents SRCR domain, black horizontal lines represent intron, and white numbers represent sequence length.(B) Analysis of special exon sequence alignment.Sequence alignment analysis was performed on the exons marked by the same color dashed box in (A).Red arrows mark the mutations.

Figure 8 .
Figure 8. Analysis of different types of SRCR domain motifs and three-dimensional structure analysis.(A) Comparative analysis of motifs of Group A, Group B, and Group C SRCR domains.(B) Comparative analysis of the three-dimensional structures of Group A, Group B, and Group C SRCR domains.The numbers represent conserved cysteine sites.

Figure 8 .
Figure 8. Analysis of different types of SRCR domain motifs and three-dimensional structure analysis.(A) Comparative analysis of motifs of Group A, Group B, and Group C SRCR domains.(B) Comparative analysis of the three-dimensional structures of Group A, Group B, and Group C SRCR domains.The numbers represent conserved cysteine sites.

Table 1 .
The top five domain compositions covering more than 50% of the SRCR-SFs of the 29 representative species grouped by clusters.