Genome-Wide Identiﬁcation and Bioinformatics Analysis of Auxin Response Factor Genes in Highbush Blueberry

: Auxin response factors (ARFs) are a transcription factor family that regulates the expression of auxin phase-responsive genes. Here, we performed a genome-wide investigation of the tetraploid blueberry ( Vaccinium corymbosum cv. ‘Draper’) genome sequence. Physical and chemical properties, phylogenetic evolution, gene structure, conservative motifs, chromosome location, and cis-acting elements of blueberry ARF genes were comprehensively evaluated. A total of 70 blueberry ARF genes ( VcARF ) were found in its genome, which could be divided into six subfamilies. VcARF genes were unevenly distributed on 40 chromosomes and were observed to encode protein sequences ranging in length from 162 to 1117 amino acids. Their exon numbers range from 2 to 22. VcARF promoter regions contain multiple functional domains associated with light signaling, aerobic metabolism, plant hormones, stress, and cell cycle regulation. More family members of VcARF genes were discovered in blueberry than in previously studied plants, likely because of the occurrence of whole-genome duplication and/or tandem duplication. VcARF expression patterns were analyzed at different stages of fruit development, and VcARF3 , VcARF4 , VcARF14 , VcARF37 , and VcARF52 were observed to play important roles. VcARF3 and VcARF4 appeared to function as repressors, while VcARF14 acted as an essential factor in fruit ﬁrmness differences between ﬁrm and soft ﬂesh cultivars.


Introduction
The ubiquitous involvement of auxin in almost all aspects of plant development and in plant responses to the environment, abiotic stress, and growth tropisms underlines its importance [1]. The initiation and regulation of these processes are mostly accomplished by the expression and regulation of auxin-related genes [2], which have received considerable research attention. These genes include the Skp1-Cullin-F-Box protein complex, containing the transport inhibitor response 1 protein (SCF TIR1 ) auxin receptor and its related auxin signaling F-box protein receptor family members, and two families of partially redundant proteins: the auxin response factors (ARFs) and their cognate auxin/indole-3-acetic acid (Aux/IAA) repressors [3][4][5]. ARFs activate or inhibit the expression of auxin response genes by binding to auxin-responsive elements (AuxREs) in the promoter region and recruit Aux/IAA to perform their functions.
As transcription factors, ARFs possess a modular structure and consist of three major domains: an amino-terminal DNA-binding domain (DBD), a middle domain, and a carboxy-terminal PB1 (Phox and Bem1) domain contained within a region previously known as domain III/IV [2,6,7]. The B3-type DBD is flanked by dimerization domains (DDs), at least in ARF1 and ARF5, and also contains a Tudor-like ancillary domain. The DBD recognizes and binds to the TGTCTC element (AuxRE) in the promoter region of auxin-responsive genes [8]. The DD mediates the dimerization of ARF1 and ARF5, which is essential for ARF5 cooperative binding to target DNA [9]. The function of the Tudor-like ancillary domain is unknown, but it might be involved in an interaction with the DD. The middle domain of ARFs either activates or represses transcription [10,11]; repressor domains are enriched in proline, serine, and threonine, and activator domains are enriched in glutamine. This basis has been used to identify and classify ARFs from many plant species through genome-wide analyses.
Most information concerning ARF function derives from both forward genetic analysis and reverse genetic phenotypic analysis in Arabidopsis, which showed that ARF functional redundancy is universal [12,13]. The Arabidopsis genome contains 22 full-length ARF genes and one incomplete ARF23 [14]. This is thought to be a pseudogene because it contains a stop codon in the DBD, leading to a partially encoded protein [15]. AtARF1 mutations do not themselves confer phenotypes but can enhance the phenotypic traits of AtARF2 mutations [16]. AtARF2 mutations result in late flowering leaf senescence, and an increased seed number. ARF3/ARF4 plays an important role in the development of reproductive and vegetative tissues, and AtARF3 also has a function in floral meristem determination, floral organ pattern formation, and pistil development [17,18]. AtARF6 and AtARF8 encode a pair of functionally redundant transcription factors that regulate the pistil and stamen development of immature flowers. The stamen filaments of arf6 and arf8 double mutants are shorter than those of the wildtype and show delayed anther dehiscence, leading to female sterility [19]. Although almost no phenotypic change was detected between AtARF7 or AtARF19 single-mutant and wildtype plants, AtARF7 and AtARF19 double mutants showed inhibited adventitious root formation, a reduced number of lateral roots, and no leaf cell enlargement, indicating that the two genes are complementary in function [20].
ARFs were also reported to be involved in the fruit ripening process, with most fruit development studies being carried out in tomato. Of the 21 ARF tomato genes [21], SlARF2 encodes the main regulatory factor that controls fruit ripening through ethylene signals and biosynthesis [22], and SlARF4 functions in the accumulation of fruit chlorophyll and affects early fruit development by regulating glucose metabolism [23]. SlARF9 regulates cell division at an early stage of fruit development [24], while SlARF10 controls the formation of chlorophyll, starch synthesis, and sugar accumulation in fruits [25]. However, the response of most ARF genes to environmental and hormonal signals has been poorly studied, and there are still unidentified key factors in the ARF expression regulatory network. The temporal and spatial expression of ARF genes is also unclear.
Following advances in sequencing technology, plant genome sequencing is now more affordable. Thus, several ARF plant gene families have been identified at the wholegenome level, including, but not limited, to rice (Oryza sativa) [14], maize (Zea mays) [26], tomato (Solanum lycopersicum) [21], apple (Malus domestica) [27], sweet orange (Citrus sinensis) [28], grape (Vitis vinifera) [29], longan (Dimocarpus longan) [30], and strawberry (Fragaria vesca) [31]. However, the genome-wide identification of ARF genes has not been performed in blueberry (Vaccinium spp.). In comparison with fruit trees, the genome resources of blueberry are limited. The first whole-genome sequence of the blueberry cultivar 'W8520' was released by Bian et al. in 2014 [32], but its assembly quality and usability were limited. Although Gupta et al. [33] released another assembly based on the genome sequencing of the blueberry cultivar 'O'Neal' in 2014, 27.35% of genome sequences contained gaps and mistakes. More recently, Colle and colleagues published a high-quality haplotype-phased blueberry genome [34] that was assembled from the highbush blueberry cultivar 'Draper'. This genome has enabled several molecular and bioinformatics studies of blueberry to be conducted and allows the opportunity to identify transcription factor genes throughout the genome.
In the current study, genome-wide VcARFs were identified from the highbush blueberry 'Draper' genome. By analyzing the number of family members, gene structure, and amino acid sequences, we clarified the structural and evolutionary characteristics of the blueberry ARF gene family. The expression patterns of different VcARF genes were also estimated from transcriptomic data, and pivotal VcARFs that strongly correlate with blueberry fruit development were screened. Moreover, the transcript abundance of key VcARF genes was validated in firm flesh cultivar 'Star' and soft flesh cultivar 'O'Neal' at a range of developmental stages. Our findings provide a reference for the genome-wide identification of transcription factor genes and a study on the regulation of VcARF genes in blueberry fruit firmness.

Plant Materials
The fruits of two southern highbush blueberry cultivars 'Star' (firm flesh) and 'O'Neal' (soft flesh) were collected at four development stages following the sampling strategy of Zifkin et al. [35]. For each cultivar, 50 healthy fruits for every stage were sampled from different fruit setting positions of three plants. A total of 30 fruits of similar sizes at each fruit stage were screened [35]. The fruits were immediately frozen in liquid nitrogen after collection and stored at −80 • C until required.

RNA Isolation and Reverse Transcription
Total RNA was extracted using a modified CTAB method as described by Chang et al. [36]. The quality and concentration of blueberry RNA were determined using agarose gel electrophoresis and spectrophotometric analyses, respectively. cDNA was synthesized from 1 µg RNA using the PrimeScript™ RT reagent Kit with gDNA Eraser (Takara Biotechnology Co., Ltd., Dalian, China) following the manufacturer's instructions. Synthesized 1st-strand cDNAs were diluted 10-fold and stored at −80 • C for further use.

Identification of the ARF Gene Family in Blueberry
A local BLAST database for genome and amino acid sequences of the highbush blueberry 'Draper' was constructed using BLAST-2.9.0+ [37]. A. thaliana ARF peptide sequences were downloaded from the plant transcription factor database at Peking University (Beijing, China) (http://planttfdb.gao-lab.org/, accessed on 1 April 2020). Potential blueberry ARF transcription factors were retrieved from deduced amino acid sequences using BLASTp in the BLAST toolkit [37]. Query hits with an e-value ≤ 1e − 20 were kept for further analyses. Profile hidden Markov models of the ARF conserved domain (PF06507) were downloaded from the Pfam website (http://pfam.xfam.org/, accessed on 1 April 2020) [38]. HMMER3.3 [39] was used to search for homologous ARF proteins in blueberry based on the ARF conserved domain. Other parameters of HMMER were set as default. Candidate ARF genes obtained from BLASTp and HMMER results were merged, and redundant sequences were removed. Non-redundant results were submitted to the NCBI conserved domain database (https://www.ncbi.nlm.nih.gov/cdd, accessed on 3 April 2020) for domain analysis, and the sequences with ARF annotation results were retained for sequential analyses. Prediction and analysis of physicochemical properties of all ARF protein sequences in blueberry were made using the Expasy website (https://web.expasy.org/protparam/, accessed on 4 April 2020).

Analyses of Amino Acid Sequences and VcARF Gene Structure
MAFFT v7.453 [40] and IQ-TREE1.6.12 [41] were used to align protein sequences of Arabidopsis and blueberry and to construct a phylogeny tree based on the maximum likelihood method, respectively. Ultrafast bootstrapping was carried out 1000 times to test the accuracy of the phylogeny tree. Other parameters of IQ-TREE were set as default. Identified VcARF sequences were obtained from the 'Draper' genome using a customized Perl script and aligned with MAFFT v7.453. A phylogenetic tree was built with IQ-TREE1.6.12. Gene structural analysis was performed using the gene structure display server online tool (http://gsds.gao-lab.org/index.php, accessed on 6 April 2020). The structure of coding regions, introns, and non-coding regions of all selected ARF genes in the blueberry genome were plotted, and the phylogenetic relationship of VcARF genes was drawn.

Chromosome Location of VcARFs and Conservative Motif Analysis of VcARF Proteins
Genomic location information of VcARF genes was extracted from annotation general feature format files using Shell script. VcARF genes were plotted onto the chromosomes with the chromPlot R package [42]. MEME (https://meme-suite.org/meme/doc/meme. html, accessed on 5 April 2020) was used to analyze the conserved motifs of all VcARF protein sequences. The number of motifs was set as 15, and other parameters were kept as default.

Identification of Cis Elements on VcARF Promoters
A Perl script was used to retrieve 2 kbp sequences upstream of the transcription start site of VcARF genes. Cis elements in the promoter regions were predicted by PlantCARE (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/, accessed on 6 April 2020). The categories and number of those cis elements were calculated and drawn using Microsoft Excel 2013 (Microsoft, Redmond, WA, USA).

Expression Pattern of VcARF Genes during Fruit Development
The gene expression profile of different plant organs was obtained from a study by Colle et al. (2019) [34]. A heat map was generated by ggplot2 [43] based on the fragments per kilobase of exon per million mapped fragments (FPKM) values. The expression level of VcARF transcripts in fruits of 'Star' and 'O'Neal' cultivars at four development stages was determined by quantitative PCR on an ABI StepOne Plus TM Real-Time fluorescence qPCR system (Applied Biosystems Co., Ltd., Beijing, China). Specific primers (Table 1) were designed using Primer Premier 5 software (PREMIER Biosoft International, Palo Alto, CA, USA). The experiments were conducted on three biological replicates, and the results were normalized by using the VcGAPDH gene. Gene expression data were analyzed with a relative quantification method (2 −∆∆Ct ) [44]. Statistical analyses were carried out using SPSS software v.18.0 (IBM, Armonk, NY, USA). Table 1. VcARF genes and their primers used for quantification PCR in the current study.

Identification of the Blueberry VcARF Gene Family
We predicted a total of 70 VcARF genes in the blueberry genome and named the genes VcARF1-70, according to their physical location on the chromosome. VcARF gene sequence lengths were highly varied, ranging from 1831 bp (VcARF47) to 17,346 bp (VcARF25). VcARF protein sequences ranged from 162 (VcARF18) to 1117 (VcARF50) amino acids, with molecular weights of 18.7-124.7 kDa and isoelectric points of 4.76-9.34 (Table 2).

Phylogenetic Analysis of VcARF Amino Acids and Structure of VcARF Genes
To explore the evolutionary relationships of ARF proteins among Arabidopsis and blueberry, we constructed a phylogenetic tree based on the alignments of 22 Arabidopsis ARFs (AtARFs) and 70 blueberry ARFs (VcARFs). All ARF proteins could be divided into six groups, from clade I to clade VI, according to the classification pattern of Arabidopsis. Both Arabidopsis and blueberry ARFs were present in each group. Clade VI (Figure 1, green branches) contained most ARFs, including 12 AtARF and 17 VcARF members. Clade IV (Figure 1, orange branches) contained 23 ARFs, of which only three were AtARFs. Clade III (Figure 1, blue branches) included the smallest number of ARF transcription factors, with only three VcARFs and one AtARF (AtARF5). Gene structural analysis provided further evidence to support the phylogenetic topology groupings of multigene families. To gain further insights into the structural diversity of blueberry ARF genes, we analyzed the exon/intron organization of full-length cDNAs with corresponding genomic DNA sequences of individual ARF genes (Figure 2). Similar ARF classification patterns to the phylogenetic tree were observed, with VcARFs clustering into six groups according to their gene structure. Most closely related VcARFs within the same groups shared similar gene structures in terms of either intron numbers or exon lengths. Taking clade IV as an example, most genes in this group had two to four exons, with the exception of VcARF18, VcARF25, VcARF42, VcARF56, and VcARF68. These genes contained no untranslated regions or longer introns. The gene structure appeared to be more variable in clades I, II, and VI, which had the largest number of exon/intron structural variants.

Chromosome Location and Conservative Motif Analysis
Next, we linked 70 VcARF genes onto a chromosome-scale genome assembly of the tetraploid highbush blueberry genome. All VcARF genes were mapped onto 40 linkage groups (Figure 3). The distribution of VcARFs on each chromosome was uneven, ranging from one to four. No VcARF was mapped onto chromosomes 4, 7, 16, 31, 32, 36, 45, or 48. Chromosomes 23 and 40 each had four VcARFs. VcARFs clustered in adjacent regions on chromosomes containing more than two VcARFs. Conservative motifs of the blueberry VcARF protein were analyzed using the MEME online database. Identified motifs varied in length from 11-41 amino acid residues. A total of 14 conserved motifs were identified from VcARF members in clade II (Figure 4, brown branches) and clade III (Figure 4, blue branches), with VcARF49 as an exception. The same number of motifs was identified in clades II and III, with just a difference in their order. Neither motif 3 nor motif 13 were identified in VcARF49. Some conservative motifs were absent from clades I (Figure 4, red branches), IV (Figure 4, orange branches), and VI (Figure 4, green branches), especially for VcARF18, VcARF25, VcARF42, VcARF56, and VcARF68, in which only three conserved motifs were identified. The position and number of conserved motifs among VcARF members in clade I were highly varied, showing the most motif differences among all groups. Only five conserved motifs were found in VcARF62, which is less than that of other VcARFs in this group.

Prediction of Cis-Acting Elements in Promoters of Gene Family Members
To identify potential cis-acting elements of VcARFs, we analyzed 2 kbp promoter regions using bioinformatics analysis. This indicated that most cis elements of VcARF promoters belong to the responsive elements of plant hormones and environmental factors. In addition to two core elements (a TATA box and CAAT box), light responsive elements and MYB binding sites were identified in all VcARF genes ( Figure 5). Oxygen-related metabolism and methyl jasmonate-responsive elements accounted for 88.6% and 75.7%, respectively, of VcARF cis-acting elements. Gibberellins and auxin-related responsive elements were also common in VcARF promoter regions, with 52.8% of VcARF genes containing the latter. Only VcARF62 and VcARF67 contained wound-responsive elements. Few cis elements related to development, i.e., cell cycle regulation, circadian control, and endosperm growth, were found in VcARF promoter regions. However, four cell cycle regulation and two circadian control responsive elements were found in the promoters of VcARF15 and VcARF63, respectively.

Expression Pattern of VcARF Genes at Three Development Stages of 'Draper' Fruits
FPKM values of transcripts in different tissues of blueberry 'Draper' were retrieved from Colle et al. [34]. VcARF FPKM values at green mature (grnfrt), pink mature (pinkfrt), and mature (ripe) stages were extracted and used to construct a heat map ( Figure 6). Dramatic differences were found among the expression patterns of those VcARF genes. A total of 58 VcARF genes showed low expression and few changes in expression during the fruit ripening process. Genes that clustered into the same clade in Figures 1 and 2 displayed similar expression patterns. VcARF4 and VcARF52 had the highest expression level in green mature fruit, showing a transcript abundance that decreased with fruit ripening, leading to an obvious downregulation during the loss of fruit firmness. Other genes such as VcARF14 and VcARF37 in clade I (AtARF7/19-like), VcARF16 and VcARF66 in clade II (AtARF6/8-like), and VcARF3 in clade VI (AtARF1/2-like) had higher transcript levels in green mature fruit compared with the other two stages.

Gene Expression during 'Star' and 'O'Neal' Fruit Development and Ripening
Because the expression level of VcARF genes varied according to different maturation stages, we speculated that some genes had an effect on fruit firmness. VcARF genes showing significant expression differences in the ripening process of blueberry 'Draper' were therefore selected to have their expression profiles validated at four stages of development of firm and soft flesh blueberries. Hence, the expression patterns of VcARF3, VcARF4, VcARF14, VcARF37, and VcARF52 were evaluated in the fruit of 'Star' and 'O'Neal' using qPCR (Figure 7). Moderate changes were observed between the expression profiles of these five VcARF genes in 'Draper' transcriptomic data and qPCR results. The expression patterns of VcARF3 and VcARF37 in 'Star' and 'O'Neal' were similar to those in 'Draper'. The expression of VcARF4 and VcARF14 in firm flesh cultivar 'Star' decreased sharply from stage S5, which showed similar FPKM value changes to those observed in 'Draper'. However, the expression levels of these two genes were almost unchanged in the soft flesh cultivar 'O'Neal', particularly for VcARF14. By contrast, a slight change in the expression level of VcARF52 was seen in 'Star' compared with 'O'Neal'.   [14], 22 in S. lycopersicum [21], and 17 in V. vinifera [29]. In the current study, we identified 70 VcARF genes in a tetraploid blueberry genome, suggesting that the blueberry ARF gene family is expanded compared with other genomes. Long evolutionary periods typically lead to the presence of multiple members of a specific gene family [14,45]. Moreover, a recent whole-genome duplication event has been reported in the blueberry genome, in which tandem duplications may have contributed to metabolic diversity or gene functionalization [34]. Gene duplication has been shown to have a non-negligible effect on the formation of gene families [46], so we speculate that whole-genome duplication and tandem duplication are the main contributing events for the expansion of the blueberry ARF gene family.
Phylogenetic analysis indicated that many blueberry VcARF family members share high levels of similarity, while the adjacent chromosome locations of some VcARF genes provide further support for tandem duplication. Duplicated ARF genes exhibited different expression patterns, probably because of the lack of intense evolutionary selection pressure and the need for diversification [47,48]. Previous studies demonstrated that phylogenetically close ARFs may also be genetically linked, while phylogenetically distant ARFs may not be [48]. However, we found no correlation between phylogenetic distance and genetic link in blueberry ARFs. Because ARFs are transcription factors that regulate the expression of auxin response genes, those VcARF genes would be expected to contain AuxREs in their promoter regions. However, only 37 VcARF genes contained AuxREs in their promoter regions, of which 12 had more than two AuxREs. This indicates that the underlying mechanism for auxin inducibility of VcARF genes needs to be further elucidated in blueberry.

The Potential Contribution of VcARF Genes to Firmness Divergence between Firm Flesh and Soft Flesh Blueberries
Previous studies on the ripening process of tomato [21], grape [29], and apple [27] suggested that ARF genes were involved in the regulation of fruit development, especially with respect to floral organ and fruit development. AtARF1 appeared to function as a transcriptional repressor in planta [16], while ARF1 repressed auxin-induced gene expression in transient assays [11], and arf1 mutations increased the transcription of Aux/IAA genes in Arabidopsis flowers [16]. Our phylogenetic analysis results showed that VcARF3 and VcARF4 had the highest amino acid similarity to AtARF1. Moreover, transcript abundances of VcARF3 and VcARF4 decreased greatly at stage S5 and changed slightly during blueberry fruit ripening, suggesting that VcARF3 and VcARF4 function during the early stages of fruit maturation. This indicates a similarity in the roles of VcARF3 and VcARF4 with that of AtARF1. We detected the significant downregulation of VcARF3 expression during ripening in the 'O'Neal' cultivar, and a significant early expression decline of VcARF4 in 'Star'. Therefore, both VcARF3 and VcARF4 have a likely repressor function during blueberry fruit ripening, with VcARF3 having a dominant effect on fruit firmness.
VcARF14 and VcARF37 are both phylogenetically related to Arabidopsis AtARF7 and AtARF19. Because of the reported high level of similarity between ARF7 and ARF19 proteins, the expression of one ARF allows for functional compensation for the loss of the other in arf7 and arf19 single mutants [12]. In tomato, SlARF7 acts as a negative regulator of fruit set until pollination and fertilization have taken place and moderates the auxin and gibberellin response during fruit growth. RNA interference-silenced SlARF7 expression leads to parthenocarpic fruit growth, indicating that SlARF7 acts as a negative regulator of both fruit set and fruit development [24]. In our study, the expression of VcARF14 in 'Star' showed significant differences with that in 'O'Neal', while an overall low transcript abundance and almost no expression changes were observed for VcARF14 during fruit ripening in 'O'Neal'. VcARF37 showed similar expression patterns between 'Star' and 'O'Neal'. Based on the protein sequence similarity between VcARF14 and AtARF7, we speculate that VcARF14 may also be a negative regulator of fruit ripening. The fruit firmness of 'Star' and 'O'Neal' decreased sharply from stage S5, but that of 'Star' was significantly greater than that of 'O'Neal' at all four stages, especially at stage S5 and S6. The higher expression levels of VcARF14 in 'Star' compared with 'O'Neal' may account for this difference in firmness, but further studies should investigate the regulatory mechanisms of VcARF14 during fruit softening.
VcARF52 was identified as an AtARF1/2-like transcription factor in clade VI ( Figure 1) and has a similar sequence to that of Arabidopsis ARF2. ARF2 is most similar to ARF1, with these two proteins having both distinct and overlapping functions in A. thaliana [16]. ARF2 was reported to regulate leaf senescence and floral organ abscission independently of the ethylene and cytokinin response pathways [16]. ARF2A is a recognized auxin signaling component that may interconnect signals of ethylene and additional hormones to co-ordinate and initiate the complex ripening process of tomato [22]. Over-expressing ARF2A in tomato resulted in blotchy ripening in which certain fruit regions reddened and showed accelerated ripening [22]. This suggested that SlARF2A has a positive impact on tomato ripening. However, we found that the expression of VcARF52 in 'O'Neal' decreased with fruit ripening, while its expression in 'Star' did not significantly change. Therefore, VcARF52 probably accelerated the softening process at an early stage of blueberry fruit development but had little effect at later stages. Although ARF2 is thought to regulate fruit firmness by affecting fruit development and ripening, phylogenetic studies indicate that ARF1 and ARF2 diverged prior to the monocot-dicot split [49,50] so would have had ample time to evolve distinct biochemical activities. Thus, VcARF52 may not conform to the canonical auxin response model.
Taken together, our findings suggest that ARF genes play essential roles in the ripening process of blueberry fruit. We propose that future studies should focus on VcARF3, VcARF4, VcARF14, and VcARF52 to elucidate their function in determining differences in fruit firmness between firm and soft flesh cultivars.