Genome-Wide Characterization and Analysis of Expression of the Histone Gene Family in Razor Clam, Sinonovacula constricta

The Chinese razor clam (Sinonovacula constricta), a bivalve species widely distributed in estuaries and mudflats, is often exposed to extreme environmental and microbial stresses. Histones are fundamental components of chromatin and play an important role in innate immunity, as demonstrated by its antimicrobial activities in clams. However, little attention has been paid to histones in bivalves. To fill this gap, we investigated the genomic distribution, structural characteristics, conserved motifs, and phylogenetic relationships of histones in S. constricta. A total of 114 histone genes were detected in the S. constricta genome, which were divided into 25 types in phylogenetic analysis. Among them, partial histones exhibited a tissue-dependent expression pattern, indicating that they may be involved in sustaining the homeostasis of organs/tissues in adult S. constricta. Furthermore, mRNA expression of certain histones changed significantly in S. constricta when infected with Vibrio parahaemolyticus, suggesting that histones play a role in the immune defense of S. constricta. All together, this study on histone genes in S. constricta not only greatly expands our knowledge of histone function in the clam, but also histone evolution in molluscs.


Introduction
Histones in eukaryotes are basically structural proteins of nucleosomes, which are responsible for the packing and compaction of DNA in the cell nucleus, thereby forming chromatin fiber [1][2][3]. Based on their structure and function, the histone gene family has been classified into two major groups: core histones comprised of H2A, H2B, H3, and H4, and linker histones comprised of H1 [4][5][6][7]. Histone variants, which have been distinguished from canonical histones, exhibit unique expression, chromosomal localization, and species-distribution patterns, and have acquired divergent properties during evolution [8,9]. Increasing evidence has revealed the significant contribution of histone variants to gene regulation and nucleosome function [10,11].
Generally, histone genes with high copy numbers are structured into clusters. These genes contain no introns and lack polyadenylation in their corresponding mRNAs, but contain a unique 3 end structure. Conversely, the histone variants encoded by polyadenylated mRNAs are usually distributed independently and do contain introns. Canonical histones have a particular stem-loop structure located at the 3 untranslated region, which is generated by cleavage using a machinery involving U7 snRNP and protein factors such as the stem-loop binding protein (SLBP), as identified in both metazoans and protozoans [12]. Additionally, there is also a histone downstream element (HDE) downstream of the stem-loop structure directed by endonucleolytic cleavage. The specific subunits Lsm10 and Lsm11 of the U7 snRNP can interact with the HDE element and be involved in the processing of histone mRNA, while the SLBP probably stabilizes binding of the U7 snRNP to the HDE [13,14]. In contrast, the histone variants exhibit constant but low-level expression through the cell cycle, and are present in chromatin in a replication-independent way, which is accompanied by the specialized chaperone and remodeling proteins [15,16]. Notably, the histone variant genes contain introns that allow for alternative splicing in some cases at least, and are not clustered in the genome [2,17].
Substantial evidence has shown that histones possess antibacterial activity and play a vital role in the innate immunity of animals, although the primary role of histones is as the dominant protein components of chromatin [18]. For example, histone H1 act as an important antimicrobial protein in the liver, intestine, and stomach of Atlantic salmon Salmo salar [19]. The recombinant histones H2A and H4 showed strong antibacterial activity in freshwater prawn Macrobrachium rosenbergii and African clawed frog Xenopus laevis against Escherichia coli, Vibrio anguillarum and Micrococcus luteus at micromolar concentrations [20]. In addition, high expression levels of core histones H2A, H2B, H3, and H4 were found in hemocytes of Pacific white shrimp Litopenaeus vannamei when responding to bacterial stress [21]. In molluscs, the histone H2A homologous to buforin I found in scallop Chlamys farreri and disk abalone Haliotis discus discus, inhibits the growth of Gram-positive and Gram-negative bacteria and yeast [22,23], and the histones H2B and H4 in eastern oyster Crassostrea virginica can defend against a variety of pathogenic bacteria [24].
With the completion of genome sequencing projects, investigations have been carried out to explore the histone gene family in a variety of species, such as C. farreri [25], grooved razor shell Solen marginatus [26], mediterranean mussel Mytilus galloprovincialis [1], the parasite Schistosoma japonicum [27], and the clam Lucina pectinata [28]. Studies on S. japonicum have demonstrated that there are totally 38 histone genes, and they are divided into 5 subfamilies and distributed on 25 scaffolds [27]. Due to their wide distribution and characteristics in eukaryotes, the histones are deemed to represent a model system for studying multigene families with their organization, structure, and expression, which are of interest in phylogenetic and evolutionary studies [29]. Recently, completion of the razor clam genome sequencing project [30] offers a reliable genome. Furthermore, abundant transcriptome datasets for razor clam enable systematic characterization of the crucial histone gene family.
The razor clam (Sinonovacula constricta) is a mariculture bivalve mollusc in China with economic importance because of its delicious taste and high nutritional value [30,31]. This species is widely distributed in the intertidal region and can be vulnerable to bacterial stresses during its lifetime, especially from the pathogenic Vibrios. V. parahaemolyticus is a curved, rod-shaped, and Gram-negative bacterium found in the sea, which has been reported to cause adverse effects such as decreasing growth, weakened immunity, and even outbreaks of mass mortality of shellfish [30,32]. Notably, bivalve species have no adaptive immune defense mechanism and rely on innate immunity [33]. In this regard, it is of great significance to characterize the innate immunity of S. constricta, which can contribute to sustainable development of its production industry. Given the important role of histones in the innate immunity of animals, we identified and characterized the genome-wide histone repertoire in S. constricta. A systematic analysis of histone genes was performed, including their gene copy numbers, gene structure, conserved motifs, genomic organization, and phylogenetic relationships. In addition, we investigated the expression profiles of histone genes in different organs/tissues and under stress of V. parahaemolyticus in S. constricta. Our findings elucidate the potential function of histone genes in S. constricta, which enhance our understanding of innate immunity in bivalves.

Genome Resource
The genome sequence data of S. constricta [30] was used for this study. Other genome sequence data for Homo sapiens (GCA_000001405. 28

Genome-Wide Identification of Histone Genes
To identify histone genes in S. constricta, the known full-length histone amino acid sequences of C. virginica and P. maximus, which were retrieved from NCBI, were first used as initial query sequences to conduct a BLASTP search against the S. constricta genome with a cut-off E-value of 1 × 10 −10 [34]. Then, those retained sequences that significantly matched known histone genes were further classified into corresponding groups (H1, H2A, H2B, H3, H4) based on the best BLAST hits. Subsequently, all candidate histone genes were compared back to the NCBI non-redundant database (BLASTX), and we removed manually spurious sequences. Last, to confirm and filter uncertain histone proteins, the conserved domains of candidate histone genes were predicted by the SMART tool (http://smart. embl-heidelberg.de, accessed on 26 October 2020) and CDSearch (https://www.ncbi.nlm. nih.gov/structure/bwrpsb/bwrpsb.cgi, accessed on 1 July 2004). Simultaneously, open reading frames (ORFs) were predicted using ORF-Finder (https://indra.mullins.microbiol. washington.edu/sms2/orf_find.html, accessed on 13 December 2006). The identified sequences were used for further analysis. Similar procedures were carried out to retrieve histone genes in the genomes of H. sapiens, X. laevis, D. melanogaster, B. belcheri, C. virginica, C. gigas, T. granosa, P. maximus, O. bimaculoides, S. purpuratus, and C. elegans.

Sequence Analysis and Chromosomal Localization of Histone Genes in S. constricta
Physicochemical parameters, including the molecular weight (kDa), isoelectric point (pI) and grand average of hydropathicity (GRAVY) of each gene product, were calculated using ExPASy (http://www.expasy.org/tools/, accessed on 1 July 2020) with default parameters. The gene structures were drawn by Tbtools [35]. The conserved motifs were predicted using the online program MEME (http://meme-suite.org/tools/meme, accessed on 8 February 2021) with motif length of 6-50 bp, and the maximum number of motifs was 15 [36]. The exact location of the stop codon in the histone coding sequence was determined by the gff3 annotation file of the S. constricta genome. Then, 300 nt downstream of the stop codon was extracted and aligned to identify the stem-loop structure and purine-rich element of the histone gene, and finally visualized by Jalview. Additionally, BLASTX was used to search for homologues of SLBP, Lsm10, Lsm11, U7 snRNP, and Symplekin proteins in the S. constricta genome, while SMART was used to identify the protein domains. Each histone gene was mapped to the respective chromosomes using the TBtools software [35], and the percent identity matrix of nucleotide sequences between duplicate genes was calculated by Clustal Omega.

Classification of Histone Genes
To classify all the newly retrieved histone genes into their respective groups, we constructed an unrooted phylogenetic tree using known histone genes by MrBayes 3.2.6 [37]. All histone genes in S. constricta were sequentially named based on the phylogenetic tree, number of copies, and chromosomal position.

Sequence Alignment and Phylogenetic Analysis
The multiple sequence alignment was produced using MAFFT 7.471 [38] with the FFT-NS-i algorithm for the full-length amino acid sequences and Gblocks [39] was used to delete gaps. The result of multiple alignment was converted to Nexus format through MEGA X for further analyses. Bayesian Inference (BI) phylogenetic analysis was conducted by MrBayes 3.2.6 [40]. Four Markov chains were run for 5 × 10 7 generations, with sampling performed every 2000 generations, to yield a posterior probability distribution of 10 4 trees. The first 25% of the trees were discarded as burn-in when compiling summary statistics and consensus trees. Finally, the alignments and Bayesian tree were visualized using Jalview version 2.11.1.0 and iTOL, respectively [41]. In total, the phylogenetic analysis involved 219 amino acids across 6 animals, including S. constricta, C. gigas, P. maximus, H. sapiens, C. elegans, and S. purpuratus.

Expression Profiles Assessed by Transcriptomics and qRT-PCR
To further characterize the histone genes' expression patterns in S. constricta, the RPKM (reads per kilo per million reads) values were acquired from the public RNA-seq datasets of different organs/tissues of adult razor clam. Six organs/tissues, including gill, foot, adductor muscle, hepatopancreas, mantle and siphon, were obtained from our laboratory (accession numbers: gillSorry for the mistakes. We revised all the tables' order, SRR2162883; foot, SRR2162887; adductor muscle, SRR2162892; hepatopancreas, SRR2162895; mantle, SRR2162898; siphon, SRR2162902). To visualize the gene expression patterns, heatmaps were developed using the heatmap package [35] under the R environment. Quantitative real-time PCR (qRT-PCR) was performed to measure histone genes' expression under V. parahaemolyticus challenge with a final concentration of 1 × 10 8 cfu/mL. The 18S rRNA was used as the housekeeping gene to normalize the gene expression level. The primers for qRT-PCR are shown in Table 1. Differences were considered significant at p < 0.05.

Genome-Wide Identification and Genomic Distribution of Histone Genes in S. constricta
A total of 114 histone genes were detected from whole-genome sequences of S. constricta ( Table 2). The details on gene types, genomic distribution, and other characteristics of histone genes are displayed in Table 2, Figure 1, and Supplementary Table S1. In order to exactly describe histone genes, each member was named following three steps. First, each gene name was started with H1, H2A, H2B, H3, or H4 to distinguish five major groups based on their sequence similarity. Second, the members in each group belonging to different types in which the similarities were not 100% were designated ".1", ".2", e.g., Sc H1.2 and H1.5 at ctg4484 and Hic_sam_1, respectively. Last, the members whose sequence identities were 100% were classified into same type and then numbered by "−1", "−2" according to the genomic location, e.g., ScH2A.1-1 and ScH2A.1-2 at ctg6665 ( Figure 1). As shown in Table 2, the razor clam histone genes were divided into 25 types in total. We iden-tified seven histone types in H1 group, which consisted of five histone H1-delta (ScH1.1, ScH1.2, ScH1.3, ScH1.4, and ScH1.5), one oocyte-specific (ScH1oo), and one replicationindependent variants (ScH1.0). In the H2A group, two types (H2A.1 and H2A.2) showed significant similarity to canonical H2A, and the remaining three variant types were matched to ScH2A.V, ScH2A.1macro, and ScH2Asperm, respectively. The H2B group comprised four types and showed a high degree of similarity to the canonical forms, while the H3 group consisted of a maximum of eight types. Notably, S. constricta had only one H4 type, which was highly consistent with most organisms, but did not display variants for H4. It is worth noting that a large portion of histone types were encoded by multiple copies in the genome of S. constricta. Further, we counted all copies of these histone genes. The results showed that H1 (ScH1.1, ScH1.3, ScH1.4), ScH2A.1, ScH2B.1, ScH3.1, and ScH4 included (4, 7, 2), 19,22,19, and 23 copies, respectively ( Table 2). The gene structures of the canonical histone of razor clam proved extremely conserved and contained only one exon, but the exon number of corresponding variants exhibited significant variability (Table 2), which paralleled those of vertebrate histones. The analysis of physical and chemical properties showed that the amino acid lengths of histones ranged from 94 aa (ScH3.   The gene structures of the canonical histone of razor clam proved extremely conserved and contained only one exon, but the exon number of corresponding variants exhibited significant variability (Table 2), which paralleled those of vertebrate histones. The anal-ysis of physical and chemical properties showed that the amino acid lengths of histones ranged from 94 aa (ScH3.2-3) to 381 aa (ScH2A.1macro), while the molecular weights ranged from 10.61 kDa (ScH3.2-3) to 40.62 kDa (ScH2A.1macro). The predicted pI was 9.95 (ScH2A.1macro)~11.45 (ScH3. [2][3]. Additionally, the value of grand average of hydropathy (GRAVY) varied from −0.159 (ScH2A.1macro) to −1.023 (ScH1.0), suggesting that all histone proteins were hydrophilic in S. constricta.
As shown in Figure 1, the 114 histone genes were located at 23 loci in total, among which only 22 histone genes were successfully mapped to 5 chromosomes. One possible explanation for this pattern was due to the inadequate assembly integrity of the S. constricta genome. Moreover, a majority of canonical histones were clustered on ctg4484 (18), Hic_asm_1 (17), ctg6665 (15), and ctg5605 (11), followed by ctg7342 (8), ctg10813 (7), ctg7145 (6), ctg8584 (5), and ctg5749 (4). In addition, the survey of gene duplication events indicated that tandem duplication occurred in five histone gene pairs (marked by red box) in S. constricta. Meanwhile, the comparison of nucleotide sequences showed that 100% sequence identity was observed in all gene pairs for ScH1.4, ScH2A.1 and ScH2B.1, while the sequence identity was higher than 94% for gene pair ScH3.2 (Supplementary Table  S2). Collectively, the results supported the occurrence of tandem duplication rather than segmental duplication (Figure 1).

Phylogenetic Analysis of Histone Genes
To compare the dynamics that occurred in different types of histones in S. constricta, a multiple sequence alignment of putative proteins from molluscs and reference human sequences was implemented. The multiple sequence alignment showed that H3 and H4 were more highly conserved than H1, H2A, and H2B ( Figure 2 and Supplementary  Figures S1-S4). The two major regions, including those downstream of the N-terminal and upstream of the C-terminal, were hypervariable in core H2A and H2B ( Figure 2). Notably, both H1 and H1oo showed great variability (Supplementary Figure S1). In contrast, variants of H2Asperm, H2A.V, and H2A.1macro were highly conserved among homologs rather than paralogs. Additionally, the N-terminal alanine (A), showed conserved amino acid sites among selected species, but was replaced by proline (P) in ScH2A.V, while numerous amino acid substitutions emerged in ScH2A.1macro (Supplementary Figure S2).
As summarized in Table 3, we systematically counted the gene numbers of each histone group in different species including vertebrates and invertebrates. To make the statistics and classification more reliable, we double-checked the data using BI tree built with all counted histone genes (Supplementary Figure S5). Among all the selected species, fruit fly D. melanogaster and S. constricta had the greatest number of histone genes with n = 115 and n = 114, respectively. Intriguingly, with the exception of S. constricta and C. virginica, other molluscs had lower numbers of histone genes than vertebrates, indicating that the histone genes in S. constricta and C. virginica might have undergone species-specific expansions.
To explore the evolutionary relationships among six species, five phylogenetic trees of different histone groups (H1, H2A, H2B, H3, and H4) were constructed separately ( Figure 3). For multi-copy types, only one sequence was selected as the representative due to the completely identical sequences (Supplementary Table S3). The results illustrated good agreement with the evolutionary relationships among the selected species. Compared to those in human, the histone genes in S. constricta had a closer relationship with those of Pacific oyster C. gigas, scallop P. maximus, nematode C. elegans, and urchin S. purpuratus, most of which were clustered into subtypes. These findings suggested that the histone genes were relatively conserved during evolution.

Figure 2.
Multiple sequence alignment of core H2A and H2B amino acid sequences from selected molluscs and human. Percent similarity is shown in blue, while the intensity of the color corresponds to the degree of similarity. The amino (N) and carboxy (C) terminal ends as well as the predicted secondary structure are shown above the alignment. Unique amino acids that varied in S. constricta in comparison to the other molluscs are marked by red ovals below the alignments.
As summarized in Table 3, we systematically counted the gene numbers of each histone group in different species including vertebrates and invertebrates. To make the statistics and classification more reliable, we double-checked the data using BI tree built with all counted histone genes (Supplementary Figure S5). Among all the selected species, fruit fly D. melanogaster and S. constricta had the greatest number of histone genes with n = 115 and n = 114, respectively. Intriguingly, with the exception of S. constricta and C. virginica, other molluscs had lower numbers of histone genes than vertebrates, indicating that the histone genes in S. constricta and C. virginica might have undergone species-specific expansions. Figure 2. Multiple sequence alignment of core H2A and H2B amino acid sequences from selected molluscs and human. Percent similarity is shown in blue, while the intensity of the color corresponds to the degree of similarity. The amino (N) and carboxy (C) terminal ends as well as the predicted secondary structure are shown above the alignment. Unique amino acids that varied in S. constricta in comparison to the other molluscs are marked by red ovals below the alignments.

Gene Structure and Conserved Motif Characteristics of S. constricta Histone Genes
The exon-intron structures of histone genes were generated based on genome sequences to clarify the structural evolution in S. constricta (Figure 4). The tree topology categorized the histone genes into five major groups, which was consistent with BI tree (Figure 4A and Supplementary Figure S5). Similar to other molluscs, the core and linker histone genes contained a single exon, whereas the histone variants had multiple exons. Specifically, the schematic structures clearly revealed that all the core and linker histones  Table S3). The histone genes of S. constricta are labeled with red font and star. Sc: S. constricta, Cg: C. gigas, Pm: P. maximus, Hs: H. sapiens, Ce: C. elegans, Sp: S. purpuratus.

Gene Structure and Conserved Motif Characteristics of S. constricta Histone Genes
The exon-intron structures of histone genes were generated based on genome sequences to clarify the structural evolution in S. constricta (Figure 4). The tree topology categorized the histone genes into five major groups, which was consistent with BI tree (Figures 4A and S5). Similar to other molluscs, the core and linker histone genes contained a single exon, whereas the histone variants had multiple exons. Specifically, the schematic structures clearly revealed that all the core and linker histones genes shared only one exon structure, except that ScH3.2-4, ScH2Asperm, ScH2A.V, ScH2A.1macro, and ScH1oo variants had three, two, five, eight, and two exons, respectively ( Figure 4B). ScH2A.1macro contained the longest introns and also was the longest gene, followed by ScH2A.V, containing four introns. Further, MEME was used to detect conserved motifs in the S. constricta histone family. Obviously, members of the same group shared similar compositions of conserved motifs, but there was a distinct difference of motifs among the five groups (Supplementary Table S4, Figure 4C). In the H3 group, the motif 4, 1, 7 existed in most members, except that motif 7 was lost in ScH3.2-3. Similarly, the motif 6, 2, 9, 3, 12 was present in all members of the H2B group. Interestingly, ScH3.3 had five motifs, of which the motifs 6, 2 located in the N-terminus were consistent with H2B, and the latter three motifs 9, 3, 12 located in the C-terminus were in accord with H3. Therefore, we hypothesized that ScH3.3 might be a recombination product of H2B and H3. Moreover, groups H1 and H2A were also highly conserved. Only one motif was detected in the H4 group. In addition, the ubiquity of two characteristic sequences was discovered at the 3 -UTR of almost all core histone genes except for its variants. One was a 16-bp palindrome sequence forming a stem-loop structure, which was highly conserved and consisted of a stem with six base pairs (bp) and a 4-nt loop. Another was a purine-rich HDE sequence spaced 13 bp apart ( Figure 4D and Supplementary Table S5). A total of two Lsm10, one Lsm11, and one Symplekin protein were identified in the S. constricta genome, but there was no SLBP protein (Supplemental file S1). genes shared only one exon structure, except that ScH3.2-4, ScH2Asperm, ScH2A.V, ScH2A.1macro, and ScH1oo variants had three, two, five, eight, and two exons, respectively ( Figure 4B). ScH2A.1macro contained the longest introns and also was the longest gene, followed by ScH2A.V, containing four introns. Further, MEME was used to detect conserved motifs in the S. constricta histone family. Obviously, members of the same group shared similar compositions of conserved motifs, but there was a distinct difference of motifs among the five groups (Supplementary Table S4, Figure 4C). In the H3 group, the motif 4, 1, 7 existed in most members, except that motif 7 was lost in ScH3.2-3. Similarly, the motif 6, 2, 9, 3, 12 was present in all members of the H2B group. Interestingly, ScH3.3 had five motifs, of which the motifs 6, 2 located in the N-terminus were consistent with H2B, and the latter three motifs 9, 3, 12 located in the C-terminus were in accord with H3. Therefore, we hypothesized that ScH3.3 might be a recombination product of H2B and H3. Moreover, groups H1 and H2A were also highly conserved. Only one motif was detected in the H4 group. In addition, the ubiquity of two characteristic sequences was discovered at the 3'-UTR of almost all core histone genes except for its variants. One was a 16-bp palindrome sequence forming a stem-loop structure, which was highly conserved and consisted of a stem with six base pairs (bp) and a 4-nt loop. Another was a purinerich HDE sequence spaced 13 bp apart ( Figure 4D and Supplementary Table S5). A total of two Lsm10, one Lsm11, and one Symplekin protein were identified in the S. constricta genome, but there was no SLBP protein (Supplemental file S1).

Gene Expression in S. constricta Assessed by Transcriptomics
The expression patterns of histone genes were investigated in seven different organs/tissues from S. constricta. As shown in Figure 5, histone types presented a tissuedependent expression pattern, of which most histone genes were primarily expressed in the gill and hepatopancreas. Consistently high expression of ScH3.2-3, ScH3.2-4, ScH3.4, ScH2A.V, and ScH2A.1macro was found in the gill. A similar pattern was observed for ScH1oo, ScH1.2, ScH2B.1-1, ScH3.2-2, and ScH4-1 in the hepatopancreas, indicating that these genes might be involved in the same biological processes. In contrast, ScH1.1 was highly expressed in the mantle, whereas Sc3.2-1 was not expressed in any of the examined tissues. In contrast, the expression of ScH3.3 was not detected among selected organs/tissues. Considering its different motif components, we suggest that ScH3.3 might be a newly generated member of the histone gene family but does not function as histone H3. The mRNA levels of histones showed highly dynamic changes in the gill of S. constricta during V. parahaemolyticus infection ( Figure 6). Specifically, the highest expression levels of ScH1.3-1, ScH2A.1-1, ScH2B.1-1, ScH3.1-1, and ScH4-1 were observed at 48 h after exposure. Meanwhile, the variants (i.e., ScH1.0, ScH2A.V, and ScH2A.1macro) showed dramatic up-regulation during 24-96 h after V. parahaemolyticus infection.

Discussion
It is commonly recognized that histones are an appropriate model system for the investigation of organization, structure, and expression of multigene families, which provides insight into their evolution and phylogeny. Here, we first reported the genome-wide identification of the histone gene repertoire in the genome of S. constricta. A total of 114 histone genes were identified and classified into 25 histone types. Among core histones,

Discussion
It is commonly recognized that histones are an appropriate model system for the investigation of organization, structure, and expression of multigene families, which provides insight into their evolution and phylogeny. Here, we first reported the genome-wide identification of the histone gene repertoire in the genome of S. constricta. A total of 114 histone genes were identified and classified into 25 histone types. Among core histones, ScH2A.1, ScH2B.1, ScH3.1, and ScH4 showed the highest similarities with their homologous genes. Additionally, the copy numbers of core histones were high, and existed in numbers comparable to the ratio 19:22:19:23 (H2A.1:H2B.1:H3.1:H4), which were of considerable importance in sustaining the stoichiometry of all four core histone classes for new DNA strand packaging during replication [13]. Notably, the numbers of histone genes varied greatly among selected species, as observed in that the histone family expanded significantly in S. constricta compared to other molluscs. In general, the numbers of the histone H1 subtype range from four to seven in animals and plants, whereas the maximum number of the subtype in mammals is eleven [42]. In our study, the numbers of histone H1 genes in S. constricta was significantly higher than those in other invertebrates. The histone H2A had the greatest number of variants, including ScH2A.V, ScH2A.1macro, and ScH2Asperm, which was similar to the findings in mammals [43]. Additionally, the relatively high number of H4 genes may contribute to forming nucleosomes, which are coupled with H2A variants. Previous studies have concluded that the wide range of amino acid substitutions contributes to functional differentiation [44]. Consistent with that notion, these findings indicate that the histone genes may evolve into more functions in razor clam, in addition to the classical functions of DNA packaging, transcription, and recombination. Similarly, the evolved H1.3 in the model plant Arabidopsis is more responsible for adaptive responses to both low light and drought stress than H1.1 and H1.2, although they have the overall same binding properties as the main H1 variants [45]. In addition, comparative genomic analysis of the cytochrome P450 (CYP) gene family has shown that the new gained CYP genes are essential for detoxification and metabolic processes under adverse environmental stresses, particularly in the plant Fragilariopsis cylindrus with good adaptability to cold resistance [46]. In consequence, we hypothesize that the large number and varied types of histone genes may play important roles in adaptation of extreme environmental conditions in S. constricta.
In numerous organisms, canonical histone genes are distributed in clusters containing multi-copies of the five histone genes [47]. Studies have shown the emergence of two fundamental arrangements of histone genes in genomic distribution. In the first case, the histone genes are clustered, and their clusters have been repeated in tandem. As an example, quartets containing four core histone genes are observed in coral [48]. Moreover, similar arrangements have been reported in the nematode C. elegans, and an independently organized single-copy H1 genes was also observed [49]. Most histone genes in the echinoderm S. purpuratus are clustered in tandem repeats, and contain one copy of each of the five histone genes [47]. In another case, the histone genes are clustered but scattered throughout the whole genome, as in human, mouse, and chicken [2]. However, among metazoans, molluscs are considered to be the paradigm of arrangement diversity, which exhibit three different permutations of histone gene clusters, even within a single species. There are three types of tandem repeated units in blue mussel Mytilus edulis: five histone genes cluster, core histone gene class devoid of H1 linker histone genes and H1 cluster [50]. In this study, we found that the histone distribution in S. constricta was more similar to the first arrangement. The major clusters on Hic_asm_1, ctg4484, ctg8584, and ctg10813 contain H1, H2A-H2B, and H3-H4. It is notable that several incomplete histone clusters were observed on short scaffolds, which was likely due to the limited integrity of the S. constricta genome assembly. Furthermore, the histone H1 in S. constricta was scattered throughout the entire gene cluster. In contrast to M. edulis, H1 genes were all located in tandem and not adjacent to core histone clusters [50]. The four variants occupied solitary locations within the genome, but were not located in histone gene clusters. This finding was consistent with previous studies [2,17].
The evolutionary origin of histones can be traced back to Archaebacteria. It is noteworthy that the four core histones gradually display diversification and differentiation due to the mechanism of recurrent gene duplication, ultimately promoting DNA compaction in the time of transition to the eukaryotic cell [51]. Our investigation on evolution of histone proteins was carried out based on a Bayesian tree, which was an essential step for phylogenetic analysis. The long-term evolution of histone genes is directed by a birthand-death process, thereby contributing to genetic diversity [51], which is consistent with the alignment results of each histone group among molluscs and human. Additionally, the phylogenetic relationships of histone genes among vertebrate and invertebrate species are coincident with their species relationships. Here, the histones in S. constricta are more closely related to those of mollusks than mammals and other selected species, even though histone gene sequences are highly conserved in eukaryotes.
The core histone genes are generally intronless, while their variants contain introns. In contrast, prior studies have reported the appearance of introns in core histones H3 and H4 in Volvox, hydra, and soybean [13,52,53]. Similarly, there were introns in core histone ScH3.2-4 here. Interestingly, motif 15 existed only in Sc3.2-4 and variant ScH2A.1macro. We conjecture that the emergence of the newly evolved motif and the different pattern of conserved motifs would facilitate the process of functional differentiation among different histone types. Moreover, the conserved stem-loop structure and purine-rich HDE element existed at 3 UTRs of most S. constricta canonical histones, and both are involved in mRNA processing of replication-dependent genes. These findings imply that the regulatory mechanism of histone expression in S. constricta is similar to those of other eukaryotes [14]. It has been concluded that the regulatory factors, including SLBP, Lsm10, Lsm11, U7 snRNP, and Symplekin proteins, play essential roles in histone mRNA processing [54]. Among them, the 5 terminal portion of the U7 snRNP interacts with the HDE element by forming base-pairing. The SLBP binds to the stem-loop structure and then enables it to stabilize the binding of the U7 snRNP to the HDE, while Lsm10 and Lsm11 are subunits specific to the U7 snRNP. In addition, Symplekin participates in polyadenylation. Intriguingly, with the exception of SLBP, the above-mentioned proteins could be retrieved from the S. constricta genome. Similarly, the Phytophthora, Oxytricha, Paramecium, and Trichomonas histones contain a stem-loop structure but no SLBP [12]. Based on these findings, we hypothesize that there might be another alternative protein that is similar to SLBP. However, further investigation is needed to explore the underlying cause of the absence of some functional elements in a minority of histone genes and the transcriptional regulation of these genes in S. constricta.
The histone genes in S. constricta showed apparent differential expression patterns. Most of the histones were expressed at low levels among all organs/tissues. In contrast, consistently high expression levels of ScH1.0, ScH2A.V and ScH3.2-3 were observed in all organs/tissues, implying that these genes may play similar roles in the same biological processes (Supplementary Table S6). Among six organs/tissues, most histone genes were predominantly expressed in the gill and hepatopancreas compared to other organs/tissues. Notably, the gill and hepatopancreas play critical roles in maintaining homeostasis and immune defense [55,56]. In this regard, the histone genes highly expressed in gill and hepatopancreas may be required not only for the packaging of DNA into chromatin, but also for cell homeostasis and immune response. H1oo and H2Asperm are oocyte-and sperm-specific variants of H1 and H2A, which exist in oocytes and zygotes and are important to meiotic maturation of the sex cells [57,58]. Curiously, ScH1oo and ScH2Asperm both showed high expression in the hepatopancreas. One possible reason for this anomaly is that the specific anatomical structure in that gonads are encapsulated together with hepatopancreas in razor clam, so the tissue of hepatopancreas is easily mixed with gonadal debris when sampling. Studies have demonstrated that the H2A.V protein has multiple functions, such as active transcription [59,60], telomeric stabilization [61], DNA damage repair [1], and maintenance of genomic integrity [1,60,61]. In this regard, the high expression of H2A variants (ScH2A.1macro and ScH2Asperm) might contribute to the response to gill microstructural damage in stressful situations.
Long before the integration of histones as chromatin structural elements in eukaryotes, it has been shown that histones play an important role in the ancient host defense system against pathogenic microorganisms [62]. In recent years, its antibacterial effect has also been widely demonstrated. It has been reported that the expression level of histone genes exhibited sharp increases in epithelial cells of macaque Macuca mulatta kidney after monkeypox virus infection [63]. Similarly, the linker histone H1 has been observed as having a significant up-regulation in the skin of zebrafish Danio rerio when infected with Citrobacter freundii [64]. Besides this, relevant studies on C. virginica have revealed that the histone H2B has strong antimicrobial activity against two Gram-negative bacteria, V. parahaemolyticus and V. vulnificus [24], while similar function has been confirmed in the core histones isolated from hemolymph of Pacific white shrimp L. vannamei [21]. The razor clam lives in mudflats, a habitat that is full of various microbes. Thus, its innate immune system is vulnerable to attack. Recent studies have shown that the expression of partial histones can decrease the load of Vibrio in some marine filter feeders [65]. Consistently, the core histones and variants were significantly upregulated in the gill of S. constricta after V. parahaemolyticus infection in this study, suggesting their potential roles in immune defense. Collectively, these findings indicated that the expanded histone genes might be beneficial for razor clam to protect itself against microbes. However, further work is needed to investigate the differences of antibacterial ability among different types of histone proteins.

Conclusions
To our knowledge, this is the first study to provide comprehensive and genome-wide analysis of the histone repertoire in razor clam S. constricta. A total of 114 histone genes were identified and divided into 25 types. The summary of histone genes provided insight into their genomic organization, genic structures, and evolutionary relationships. The expression profilings of histone genes in adult organs/tissues and following V. parahaemolyticus infection imply their potential functions in bivalves, such as homeostasis maintenance and immune defense. Taken together, our findings facilitate the understanding of important functions and evolution of histone genes in bivalve molluscs.