Genome-Wide Analysis of Snf2 Gene Family Reveals Potential Role in Regulation of Spike Development in Barley

Sucrose nonfermenting 2 (Snf2) family proteins, as the catalytic core of ATP-dependent chromatin remodeling complexes, play important roles in nuclear processes as diverse as DNA replication, transcriptional regulation, and DNA repair and recombination. The Snf2 gene family has been characterized in several plant species; some of its members regulate flower development in Arabidopsis. However, little is known about the members of the family in barley (Hordeum vulgare). Here, 38 Snf2 genes unevenly distributed among seven chromosomes were identified from the barley (cv. Morex) genome. Phylogenetic analysis categorized them into 18 subfamilies. They contained combinations of 21 domains and consisted of 3 to 34 exons. Evolution analysis revealed that segmental duplication contributed predominantly to the expansion of the family in barley, and the duplicated gene pairs have undergone purifying selection. About eight hundred Snf2 family genes were identified from 20 barley accessions, ranging from 38 to 41 genes in each. Most of these genes were subjected to purification selection during barley domestication. Most were expressed abundantly during spike development. This study provides a comprehensive characterization of barley Snf2 family members, which should help to improve our understanding of their potential regulatory roles in barley spike development.


Introduction
In eukaryotes, genetic information is packaged into nucleosomes, which are repetitive structures consisting of 147-bp DNA segments wrapped around a central histone octamer composed of two copies each of the core histones H2A, H2B, H3, and H4 [1]. Nucleosomes are further compacted to form the fundamental unit of higher order organization, called chromatin, to fit into the nucleus. Such compaction of nucleosomes in chromatin restricts accessibility of the wrapped DNA to regulatory proteins such as transcription factors [2]. Two chromatin-remodeling enzymes have been implicated in making the packaged DNA accessible. One alters the contact between the histone octamer and the DNA by utilizing the energy derived from ATP hydrolysis [3,4]. The other modulates the specific residues of DNA and histones through adding or removing covalent modifications such as methylation, acetylation, phosphorylation, and ubiquitylation [5]. Proteins involved in the two processes are referred to as chromatin remodelers and modifiers, respectively. Both types are usually associated with other proteins in different multi-subunit complexes, but also are frequently involved in the same multi-subunit complexes. These multi-subunit complexes are also called chromatin remodeling complexes [6].

Barley Snf2 Protein Properties and Domain Organization
The amino acid length ranged between 681 and 3440 residues (Table S1). The molecular weights varied from 77 to 371 kDa, and the theoretical isoelectric point (pI) value ranged from 4.83 to 8.88. All proteins had negative grand average of hydrophobicity (GRAVY) values, varying from −0.89 to −0.22, indicating their hydrophilic nature. Prediction of subcellular localization revealed that all barley Snf2 proteins were localized in the nucleus (Table S1). There were 21 domains ( Figure 2). The Chromo domain was present in the Chd1 and Mi-2 subfamilies of Snf2-like, and the RING domain was found in all subfamilies of Rad5/16. By contrast, several domains were unique to a subfamily: the HAND, SANT, and SLIDE domains were found only in Iswi, the PHD domain only in Mi-2, and the HIRAN domain only in Rad5/16. lar localization revealed that all barley Snf2 proteins were localized in the nucleus (Table S1). There were 21 domains ( Figure 2). The Chromo domain was present in the Chd1 and Mi-2 subfamilies of Snf2-like, and the RING domain was found in all subfamilies of Rad5/16. By contrast, several domains were unique to a subfamily: the HAND, SANT, and SLIDE domains were found only in Iswi, the PHD domain only in Mi-2, and the HIRAN domain only in Rad5/16.

Gene Structure of Barley Snf2 Family
The barley Snf2 family genes had 2 to 33 introns ( Figure 3). The 41 AtCHR genes had 2 to 33 introns, and 45 tomato CHR genes (SlCHR) had 1 to 37 [11]. The Snf2 subfamily had the most exons (34), and 3 genes in the DRD1 and Rad5/16 subfamilies had the fewest (3). The exon and intron lengths varied among genes, even in the same subfamily. Seventeen of the 38 barley Snf2 genes spanned >10 kb from start to stop codons, possibly attributable introns longer than 5 kb ( Figure 3). Three genes in the Snf2, Ris1 and SMARCAL1 subfamilies had genomic sequences of >20 kb, with 33, 9, and 23 introns, respectively.

Gene Structure of Barley Snf2 Family
The barley Snf2 family genes had 2 to 33 introns ( Figure 3). The 41 AtCHR genes had 2 to 33 introns, and 45 tomato CHR genes (SlCHR) had 1 to 37 [11]. The Snf2 subfamily had the most exons (34), and 3 genes in the DRD1 and Rad5/16 subfamilies had the fewest (3). The exon and intron lengths varied among genes, even in the same subfamily. Seventeen of the 38 barley Snf2 genes spanned >10 kb from start to stop codons, possibly attributable introns longer than 5 kb ( Figure 3). Three genes in the Snf2, Ris1 and SMARCAL1 subfamilies had genomic sequences of >20 kb, with 33, 9, and 23 introns, respectively.

Chromosomal Distribution and Duplication Analysis of Barley Snf2 Gene Family
Chromosome localization results showed that the 38 Snf2 genes were unevenly distributed across the 7 chromosomes of the Morex genome

Chromosomal Distribution and Duplication Analysis of Barley Snf2 Gene Family
Chromosome localization results showed that the 38 Snf2 genes were unevenly distributed across the 7 chromosomes of the Morex genome ( Figure 4). Chromosome 2H had the most genes (13), followed by chromosome 3H (6). Chromosomes 1H and 7H both had 5 genes, 4H and 6H had 4, and 5H had only 1 gene. Most Snf2 genes were located in the terminal regions of the chromosomes, with few in the central regions. Five pairs composed of 7 genes were predicted to have undergone segmental duplication, but no tandem duplication was found ( Figure 4). Likewise, only segmental duplication events were found in Snf2 family genes of Arabidopsis and rice (Table S2). Selection pressure determined divergence of barley Snf2 genes after duplication. All 5 segmental-duplication gene pairs had ratios of non-synonymous (Ka) to synonymous (Ks) substitutions (Ka/Ks) lower than 1 ( Table S3), suggesting that purifying selection pressure followed duplication.

Genetic Variation and Evolutionary Analysis of Snf2 Genes in Barley Populations
We investigated copy-number variations of Snf2 genes within subfamilies among 20 barley accessions ( Table 2). About eight hundred Snf2

Genetic Variation and Evolutionary Analysis of Snf2 Genes in Barley Populations
We investigated copy-number variations of Snf2 genes within subfamilies among 20 barley accessions (Table 2). About eight hundred Snf2 family genes were identified from cultivated and wild accessions, with 38 to 41 genes in each. All cultivated accessions had 38 to 41 genes, and 2 wild accessions had 40 genes. Multiple alignment analysis of Snf2 family members revealed that there are no reliable variations in copy number of this gene family across different barley accessions (Table S4). Members missing in some accessions may result from genome assembly and gene annotation since all of them without gene IDs were validated to have alignments in genomes.

Expression Analysis of Barley Snf2 Genes in Various Tissues and Different Stresses
Expression patterns of barley Snf2 genes were analyzed by using 14 public transcriptomes from diverse tissues at different developmental stages ( Figure 5A); 37 of the 38 genes were expressed with an average FPKM > 1 in at least one organ, but 1 had no data because its corresponding gene is absent in the Morex genome [25] (Table S8). Most Snf2 genes were expressed more strongly in developing inflorescences (INF1 and INF2) than in the other organs. These expressed genes were classified into I, II, and III groups based on their expression patterns among all organs. The expression level of the Group I genes was moderate or higher than that of the Group II and III genes, most of which was moderate in only INF1, INF2, and developing grain (CAR5), and low in the other organs. Several genes were expressed predominantly in developing floral organs. HORVU.MOREX.r3.3HG0293510 from the Ris1 subfamily was highly expressed in lemma (LEM) and lodicule tissues. The expression of 2 genes from the Iswi subfamily-HORVU.MOREX.r3.1HG0022440 and HORVU.MOREX.r3.3HG0230070-was abundant in lodicule.

Expression Analysis of Barley Snf2 Genes in Various Tissues and Different Stresses
Expression patterns of barley Snf2 genes were analyzed by using 14 public transcriptomes from diverse tissues at different developmental stages ( Figure 5A); 37 of the 38 genes were expressed with an average FPKM > 1 in at least one organ, but 1 had no data because its corresponding gene is absent in the Morex genome [25] (Table S8). Most Snf2 genes were expressed more strongly in developing inflorescences (INF1 and INF2) than in the other organs. These expressed genes were classified into I, II, and III groups based on their expression patterns among all organs. The expression level of the Group I genes was moderate or higher than that of the Group II and III genes, most of which was moderate in only INF1, INF2, and developing grain (CAR5), and low in the other organs. Several genes were expressed predominantly in developing floral organs. HORVU.MOREX.r3.3HG0293510 from the Ris1 subfamily was highly expressed in lemma (LEM) and lodicule tissues. The expression of 2 genes from the Iswi subfamily-HORVU.MOREX.r3.1HG0022440 and HORVU.MOREX.r3.3HG0230070-was abundant in lodicule. We also analyzed the expression profiles of barley Snf2 genes responding to biotic and abiotic stresses based on public RNA-seq datasets (Table S9). Results showed that the vast majority of genes exhibited expression changes under at least one of four stress treatments ( Figure 5B). Most Snf2 genes were induced in spike in response to Fusarium infection, as well as in young inflorescence under drought stress. In contrast, more genes were suppressed by salt or cold stresses. Remarkably, almost all the Snf2 genes were repressed We also analyzed the expression profiles of barley Snf2 genes responding to biotic and abiotic stresses based on public RNA-seq datasets (Table S9). Results showed that the vast majority of genes exhibited expression changes under at least one of four stress treatments ( Figure 5B). Most Snf2 genes were induced in spike in response to Fusarium infection, as well as in young inflorescence under drought stress. In contrast, more genes were suppressed by salt or cold stresses. Remarkably, almost all the Snf2 genes were repressed in root under highsalinity treatment. Furthermore, several genes showed multiple stresses-responsive changes in their expression. For example, HORVU.MOREX.r3.3HG0230070 was suppressed by Fusarium disease, drought, and salt stresses and induced by cold stress. HORVU.MOREX.r3.4HG0375120 was induced by Fusarium and drought stresses and repressed by salt and cold stresses. HORVU.MOREX.r3.7HG0669610 was induced only by drought stress and repressed by the other three stresses. HORVU.MOREX.r3.2HG0217790 showed decreased expression under both drought and salt stresses and HORVU.MOREX.r3.3HG0271560 was induced in response to Fusarium, drought, and cold stresses.

Snf2 Gene Family Shows Evolutionary Conservation in Plants
Snf2 proteins, as the core 'motor', have been characterized at the genome-wide level in Arabidopsis [9], tomato [11], rice [10], and sorghum [30]. Here, we identified 38 genes encoding Snf2 proteins in barley, the same as the number identified in sorghum, but fewer than in Arabidopsis (41), tomato (45), and rice (40), indicating that species and genome size differences are not directly related to the number of Snf2 family genes.
Phylogenetic analysis using the Snf2 proteins from barley, Arabidopsis, and rice divided the 38 barley Snf2 family genes into 18 subfamilies, highly consistent with the classification in Arabidopsis [9] and rice [10]. Intriguingly, the Snf2 genes in 14 subfamilies displayed a 1:1:1 orthologous pattern across these species, implying that genes in these subfamilies are more highly conserved than the genes in the other subfamilies. Gene duplication is crucial in genome evolution [31,32]. We identified only five barley gene pairs that were involved in segment duplication events, as seen similarly in Arabidopsis and rice. Notably, CHR11 and CHR17, in the Iswi subfamily of Arabidopsis, were identified as a segmental duplication pair here. Previous studies showed that these two genes function redundantly [33,34]. Their barley homologs were also subjected to segmental duplication, which implies that they might also function redundantly in barley, as reported in rice [10]. Moreover, the other Arabidopsis segmental duplication pair, CHR12 and CHR23, has also been reported to have functional redundancy [35]. We note that these two genes correspond to a single gene in barley and rice, respectively. This implies that functional redundancy of the two Arabidopsis genes may be largely due to a segmental duplication event after dicots diverged from monocots. Two rice genes-CHR741 and CHR746, which experienced segmental duplication-have three homologs in barley that also experienced segmental duplication, demonstrating that segmental duplication driving Snf2 family expansion continued after barley diverged from rice. These results indicate that segmental duplication was the main evolutionary contributor to expansion and functional diversification of the Snf2 gene family after the divergence of dicots and monocots.
Evolution of novel gene functions is a consequence of the interaction between duplication and selection. Our analysis of the barley Snf2 family showed that all segmental duplication pairs underwent purifying selection, which reduces genetic diversity [36], implying that the functional divergence of these duplicated genes might tend to be conservative. A recent study reported that purifying selection contributed to the functional redundancy of the auxin response factor (ARF) family in Setaria italica [37]. Therefore, we hypothesize that purifying selection restricts differentiation of Snf2 gene functions. Our results support this hypothesis since a duplicated barley gene pair is likely to function redundantly and subsequently undergoes purifying selection. Furthermore, most Snf2 genes have also experienced purifying selection during barley domestication. This indicates that the Snf2 family is highly conserved in barley evolution. Taken together, these results show that the barley Snf2 gene family shows evolutionary conservation.

Characteristics of Barley Snf2 Family Genes
ATP-dependent chromatin remodeling in yeast and humans is involved in nuclear processes as diverse as DNA replication [38], transcriptional regulation [39,40], DNA repair [41], and homologous recombination [42]. Here, all barley Snf2 members were predicted to localize to the nucleus, which corresponds to the reported roles of Snf2 proteins in various DNA events. This implies that several functions of Snf2 proteins are conserved between barley and other species.
Catalytic ATPase is essential for the chromatin-remodeling activity of Snf2 proteins. Catalytic ATPase is composed of SNF2_N and Helicase_C domains, which are conserved in plants [8,9]. Both domains contain ATP-binding sites: SNF2_N mediates ATP hydrolysis and Helicase_C contributes to ATP-dependent DNA or RNA unwinding. Here, barley Snf2 family members had combinations of 21 domains, in addition to two typical domains, and had diverse domains in different subfamilies. One Snf2 subfamily member carries a bromodomain, which is suggested to recognize acetyl-lysine residues on histone tails [43]. Two Iswi subfamily members harbor HAND, SANT, and SLIDE domains, which are responsible for regulation of nucleosome spacing via interacting with the linker DNA [44]. Additionally, the Chromo domain (CHROMO) has binding activity for DNA and histone [45,46]. One Chd1 subfamily member and three Mi-2 subfamily members had a Chromo domain. The implication is that functions of barley Snf2 genes are diverse among subfamilies.

Snf2 Family Genes May Play Regulatory Roles in Barley Spike Development
Gene expression profiles provide hints to the elucidation of potential functions. The expression patterns of barley Snf2 family members in different tissues and developmental stages revealed that most genes were abundantly expressed in developing spikes, suggesting that Snf2 family genes activate cell division. Several Snf2 genes in Arabidopsis regulate flower development processes involved in floral transition [17,18], inflorescence architecture [19,20], and floral organ identity [21]. Seven barley genes were highly expressed during spike development and had diverse expression profiles in spike developmental stages in wild barley OUH602, implying multiple roles in regulating spike development. For instance, a barley Lsh gene (HORVU.MOREX.r3.4HG0338270) had extremely high expression during spike development, indicating that it may contribute to the differentiation of barley spikes. A barley Mi-2 gene (HORVU.MOREX.r3.7HG0658830) was consistently expressed during spike development. Its Arabidopsis ortholog CHR6/PICKLE (PKL) is involved in developmental phase transition and meristem maintenance [47,48], with the implication that it is required for floral meristem induction in barley. A barley ortholog of Arabidopsis BRM (HORVU.MOREX.r3.6HG0543700) was gradually increased during spike development. BRM has been implicated in the regulation of flowering time [17,49], inflorescence architecture [19], and floral organ development [21], indicating that its barley ortholog may participate in regulation of floral phase transition and of inflorescence and floral meristem development. Two other barley genes from the Iswi subfamily, which were generated by a segmental duplication identified here, had highly similar expression profiles during spike development. Their Arabidopsis homologs-CHR11 and CHR17-have redundant roles in flowering induction and floral organ identity [33,34,50]. A barley Iswi gene (HORVU.MOREX.r3.3HG0230070) had higher expression levels during spike development than another Iswi gene (HORVU.MOREX.r3.1HG0022440). We suggest therefore that both barley genes may be sub-functionalized for regulation of spike development. Overall, these results give insight into potential regulatory roles of Snf2 family genes involved in barley spike development. Moreover, the present study revealed that a subset of barley Snf2 genes displayed stresses-responsive changes in expression. The mutant of Arabidopsis PKL, pkl, is hypersensitive to cold stress [51], its barley homolog HORVU.MOREX.r3.7HG0658830 was induced by cold stress. OsCHR710, a Rad5/16 subfamily gene, was upregulated under drought stress [52]. The barley ortholog of OsCHR710 (HORVU.MOREX.r3.6HG0559070) was induced by drought stress. Thus, Snf2 genes may also play key roles on stress responses in barley.

Identification of Snf2 Gene Family in Barley
High-confidence protein sequences of the barley Morex genome assembly [26] were download from GrainGenes (https://wheat.pw.usda.gov/GG3/ (accessed on 25 May 2022)) and used as a local protein database. Sequences of SNF2_N (Pfam accession number PF00176) and Helicase_C (PF00271), as typical domains of Snf2 family proteins, were downloaded from the Pfam database (https://pfam.xfam.org/ (accessed on 14 July 2022)). The Hidden Markov model (HMM) profiles of these two domains were used as queries against the barley local protein database by the hmmsearch tool provided in HMMER v. 3.3.2 software [53] with an E-value threshold of 1e-5. After removal of short and redundant sequences, the candidate protein sequences were further examined and validated in Pfam v. 33.1 with SNF2_N (pfam00176; cl37620) and Helicase_C (pfam00271) domains in the NCBI's Conserved Domain Database (CDD) platform (http://www.ncbi.nlm.nih.gov/Structure/ cdd/wrpsb.cgi/ (accessed on 26 July 2022); [54]). Only proteins having both SNF2_N and Helicase_C domains were selected. The Snf2 family members were also identified as above from the barley pan-genome [27] and the wild barley OUH602 genome [29].

Chromosome Localization, Duplication, and Evolution of Snf2 Genes
The chromosome distribution of all identified Snf2 family genes was determined on the Morex genome sequence from a GFF file (Hv_Morex.pgsb.Jul2020.HC.gff3; [26]). A Circos plot [55] was used to visualize the physical position of each gene. Gene duplication analysis was performed by using local BLAST comparisons with coding sequences. The coding sequences of Arabidopsis and rice Snf2 genes were obtained as above for protein sequences. Gene duplication events were defined by following the criteria of [56]: the alignment sequence should cover >70% of the longer gene in length; the aligned region should have an identity of >70%; and only 1 duplication event is counted for tightly linked genes. The linked gene pairs were also displayed in a Circos diagram. The ratio of Ka (non-synonymous substitution) to Ks (synonymous substitution) of Snf2 family genes was calculated in the KaKs_Calculator package v. 2.0 by using the YN method [57]; Ka/Ks < 1 indicates purifying selection, Ka/Ks = 1 indicates neutral selection, and Ka/Ks > 1 indicates positive selection. The genomic protein sequences of the wild barley OUH602 assembly were download from the Barley Bioresource Database (http://viewer.shigen.info/barley/ download.php (accessed on 12 July 2021)). The pairwise Ka/Ks ratio of each orthologous pair between OUH602 and Morex was also computed in KaKs_Calculator by using the YN method. For these calculations, the coding sequences were aligned in MAFFT v. 7.475 software [58] with the guidance of the corresponding alignment of protein sequences. Multiple alignment files of coding and protein sequences were constructed in PALNAL v. 14 software [59] and then imported into KaKs_Calculator.

Phylogenetic Reconstruction
The protein sequences of Arabidopsis and rice Snf2 family members used for phylogenetic analysis were collected from The Arabidopsis Information Resource (TAIR, http://www.arabidopsis.org/ (accessed on 6 May 2021)) and Rice Genome Annotation Project (RGAP; http://rice.uga.edu/ (accessed on 25 September 2021)), respectively. The full-length sequences of Snf2 proteins acquired from barley (Morex), Arabidopsis, and rice were aligned by the ClustalW algorithm with default parameters [60], as implemented in MEGA7 software [61]. The phylogenetic tree was reconstructed by the maximum likelihood algorithm with 1000 bootstrap replicates from the multiple sequence alignment file in IQ-TREE2 v. 2.1.3 software [62]. The best-fit model for tree construction was determined by ModelFinder [63] as selected automatically in IQ-TREE2. To validate the search results among barley accessions, we generated the phylogenetic tree by the neighborjoining (NJ) method in the Clustal Omega (CLUSTALO) alignment program [64], including the Arabidopsis and rice Snf2 family members to confirm the subfamily to which barley proteins belong.

Population Genetics Analysis
The coding sequences of each Snf2 genes across twenty-one barley accessions were aligned using ClustalW in CLC sequence viewer v7.8.1 (www.qiagenbioinformatics.com (accessed on 11 December 2022)), respectively. The single nucleotide polymorphisms (SNPs) in Snf2 genes among barley accessions were identified according to multiple alignment files and extracted by using SNP-sites software [65]. SnpEff software was employed to assess effects on protein function resulting from nucleotide variations and outputs the VCF file [66]. To determine the evolutionary divergence, SNPs located within barley Snf2 genes were used for principal component analysis (PCA) and population structure analysis. Phylogenetic tree for Snf2 gene-associated SNPs were constructed based on the PCA result, and population structure was evaluated using STRUCTURE software (http://pritch.bsd.uchicago.edu/structure.html (accessed on 12 December 2022); [67]) with predefined K values (the putative number of genetic groups) ranging from 1 to 8. The most likely value of K was indicated by log probability of the data (LnP(D)) and an ad hoc statistic ∆K through the change rate of LnP(D) between successive K value [68].

Expression Profile Analysis
The public RNA-seq datasets of 14 tissue samples provided by the IPK barley BARLEX server (http://barlex.barleysequence.org (accessed on 29 October 2021); [25]) were used to explore the expression patterns of barley Snf2 genes in different tissues and at different development stages, namely root tissues (ROO), seedling shoots (LEA), etiolated seedlings (ETI), epidermal strips (EPI), developing inflorescence tissues (INF), rachis (RAC), third internode of tillers (NOD), lemma (LEM), and lodicule (LOD) dissected from inflorescence, as well as developing grains at 5 and 15 days after anthesis (CAR), and senescing leaves (SEN). Gene expression levels were estimated in terms of fragments per kilobase of transcript per million fragments mapped reads (FPKM) and were averaged where replicated samples were available. In addition, several biotic and abiotic stress expression datasets were downloaded from the NCBI Sequence Read Archive (http://www.ncbi.nlm.nih.gov/sra (accessed on 12 December 2022)) database to investigate the expression profiles of these genes. All reads from above datasets were mapped to the Morex v3 genome and processed into FPKM values [26]. A heatmap for gene expression data was constructed in the R package pheatmap v. 1.0.12 from log 2 -transformed mean FPKM values, and genes were clustered according to their expression patterns in the heatmap.