Identiﬁcation and Characterization of the Basic Helix-Loop-Helix Transcription Factor Family in Pinus massoniana

: The basic helix-loop-helix (bHLH) protein transcription factor family is the most widely distributed transcription factor family in eukaryotes. Members of this family play important roles in secondary metabolic biosynthesis, signal transduction, and plant resistance. Research on the bHLH family in animals is more extensive than that in plants, and members of the family in plants are classiﬁed according to the classiﬁcation criteria for those in animals. To date, no research on the bHLH gene family in Pinus massoniana (Masson pine) has been reported. In this study, we identiﬁed 88 bHLH genes from four transcriptomes of Masson pine and performed bioinformatics analysis. These genes were divided into 10 groups in total. RT-PCR analysis revealed that the expression levels of the six genes increased under abiotic stress and hormone treatments. These ﬁndings will facilitate further studies on the functions of bHLH transcription factors.


Introduction
During the growth and development of plants in nature, to adapt to and resist various biotic and abiotic stresses, a set of adaptive mechanisms involving various physiological and biochemical reactions are employed [1][2][3]. These mechanisms are usually involved in the transcriptional regulation of gene-specific expression, which plays a very important role in plant responses to environmental stimuli [4]. Transcription factors (TFs) are the most important class of regulatory factors in plants and can maintain normal life activities in these organisms [5]. In recent years, a large number of TFs related to biotic and abiotic stresses have been isolated from higher plants. These TFs are widely involved in regulating the expression of genes related to drought, high salt, high temperature, low temperature, hormone and pathogen responses. In plants, the basic helix-loop-helix (bHLH) family is the second largest TF family after the MYB family [4]. Members of this family not only are generally involved in aspects of plant growth and metabolism, including light morphogenesis, light signal transduction, and secondary metabolism, but also play an important role in plant responses to adversity [6].
bHLH family members contain two highly conserved and different functional domains, namely, the basic region and the helix-loop-helix (HLH) region, with a total of approximately 50-60 amino acids [7]. The basic region consists of approximately 15 amino acids, is located at the N-terminus of the domain, and can recognize and bind the E-box and G-box. The HLH region is composed of 40-50 amino acids and is located at the C-terminus of the domain. It can form dimers of two HLH proteins and regulate the expression of downstream target genes [8][9][10].

Phylogenetic and Protein Domain Analyses of PmbHLH Proteins
The bHLH protein sequences of Masson pine and Arabidopsis were subjected to multiple alignment using the ClustalW algorithm in MEGA7 [28] and used to build a phylogenetic tree based on the Maximum likelihood (ML) method, a bootstrap test performed with 1000 replicates and Jones-Taylor-Thornton (JTT) model in IQ-TREE. Then, the phylogenetic tree was edited for visualization purposes with the online software EvolView (https://www.evolgenius.info/evolview/#login). In order to better determine the distribution of conserved motifs and grouping of bHLH proteins, the online program MEME (http://meme-suite.org/tools/meme) was used to analyze the conserved motifs of bHLH protein sequences of A. thaliana and Masson pine, with the number of motifs set to 20. Finally, TBtools [29] was used to draw the phylogenetic tree and conserved domain combination diagram.

Subcellular Localization Analysis
The subcellular localization of PmbHLH proteins was predicted and analyzed with CELLO (http: //cello.life.nctu.edu.tw/) and PSORT (https://psort.hgc.jp/). To verify the predicted results, we designed primers for the open reading frame (ORF) of PmbHLH44 (MT989428) in group III and obtained coding sequences from cDNA, which was reverse transcribed from RNA extracted from 15 years-old Masson pine twigs. The primer sequences can be found in Supplementary File S1. The obtained sequences were cloned into pBI121 green fluorescent protein (GFP) C-terminal fusion expression vector, transferred into Agrobacterium tumefaciens GV3101 and injected into Nicotiana benthamiana leaves for transient expression. After dark cultivation for 24 h, images were taken with a LSM710 confocal laser-scanning microscope (Zeiss, Jena, Germany).

Transcriptional Profile Analysis
The expression levels of bHLH genes of Masson pine were quantified by transcriptome sequencing data (RNA-seq) collected after pine wood nematode inoculation. The pine wood nematode inoculation transcriptome includes five sets of data: 0, 3, 10, 20, and 35 days after treatment. Through ID conversion and data extraction, the gene expression level was calculated by determining the expected fragments per transcript per million mapped reads (FPKM). TBtools software was used to create bHLH gene expression heat maps of Masson pine.

Abiotic Stress and Hormone Treatment of Plant Materials
Two-year-old Masson pine seedlings obtained from the seed orchard of the Baisha state-owned forest farm, Shanghang, Fujian Province, China, were used in this study. To study the expression level of PmbHLH, we subjected the Masson pine seedlings to six treatments. The six treatments included abiotic stress and hormone treatments, namely, 15% PEG 6000-osmotic stress, mechanical injury, 10 mM H 2 O 2 , 100 µM methyl jasmonate (MeJA), 500 µM ethephon (ETH) and 1 mM salicylic acid (SA). The osmotic stress treatment was an immersion treatment, the mechanical damage treatment involved cutting the upper half of the needle, and the rest of the treatments were spray treatments. Three uniformly growing seedlings were selected for each treatment as three biological replicates. The needles were sampled at 0, 3, 6, 12, 24 and 48 h, immediately frozen in liquid nitrogen and stored at −80 • C. Samples collected at 0 h without any treatment were used as controls.

RNA Extraction and qRT-PCR Analysis
The tissue samples were ground into a fine powder in liquid nitrogen. Total RNA was isolated from all samples using an RNAprep Pure Plant Plus Kit (DP441, Tiangen Biotech, Beijing, China) following the manufacturer's instructions and stored at −80 • C. The 1st strand cDNA synthesis kit (11141, Yeasen Biotech, Shanghai, China) was used to reverse transcribe RNA into cDNA, and the cDNA was stored at −80 • C. Primer 5.0 was used to design primers for qRT-PCR, and the amplicon size was 120-200 bp. The sequences of the internal control gene alpha-tubulin (TUA) (KM496535.1) The conditions for amplification were 10 min of denaturation at 95 • C, followed by 40 cycles of 95 • C for 15 s, 58 • C for 30 s, and 72 • C for 30 s, after which a melting curve was produced at 60-95 • C. Finally, the relative expression levels of genes were calculated using the 2 − Ct method, and t-test was used to analyze the significance of the difference between the values of each time period and 0 h.

Identification of PmbHLH Protein Sequences
The CO 2 stress transcriptome was a transcriptome sampled and sequenced from seedling needles after treatment with a high concentration of CO 2 for 6, 12 and 24 h. The drought stress transcriptome used a weighing method to control the four gradients of normal, mild, moderate and severe soil water content. The needles were collected 60 days later for sequencing. The pine wood nematode inoculation transcriptome was sequenced from needles on the 0th, 3rd, 10th, 20th and 35th days after the inoculation of pine wood nematode, and the young shoots transcriptome was sequenced with tender shoots without any processing. From the four transcriptomes, 198 bHLH protein sequences of Masson pine were screened out. After deleting the repeated sequences and sequences with incomplete domains, 88 PmbHLH protein sequences were retained (Supplementary File S2). The PmbHLH proteins encoded 118-990 amino acids, with the predicted molecular weight ranging from 13.63 to 107.24 kDa. The pI values ranged from 4.93 to 11.77. Detailed parameter information for physical and chemical property analysis of PmbHLH proteins is provided in Supplementary File S3.

Phylogenetic Analysis
The Maximum likelihood (ML) method was used to construct a phylogenetic tree of 88 PmbHLH protein sequences from Masson pine. As shown in Figure 1, the 88 bHLH protein sequences were divided into ten large groups, including I, III, IV, V, VII, VIII, IX, XI, XII and XIV. The group with the largest number of PmbHLH proteins was I, with a total of 24 members, followed by III and XII. The group with the smallest number of PmbHLH proteins was VIII, with only one member. The long branch lengths of PmbHLH41, PmbHLH43, PmbHLH39, PmbHLH60 and PmbHLH62 indicate that they are quite different from other proteins. The high branch support values on the node indicate that the reliability of the phylogenetic tree is high. The bHLH transcription factor family has many members and the evolutionary model is complicated, which may result in low branch support values of the phylogenetic tree root. In addition, a phylogenetic tree constructed by 88 Masson pine bHLH protein sequences and 134 Arabidopsis bHLH protein sequences is provided in Supplementary File S4. distribution of conserved motifs may be related to protein function, and the differences between groups may be related to the diversity of gene functions during evolution. At the same time, we also used MEME to analyze the conservative motifs of 88 Masson pine bHLH proteins and 134 Arabidopsis bHLH proteins (Supplementary File S6). With reference to the distribution of conserved motifs and the grouping of Arabidopsis bHLH genes, we can better determine the group to which each PmbHLH gene belongs.

Analysis of PmbHLH Protein Domains
According to MEME software identification of the conserved motifs of the 88 PmbHLH protein sequences, all of these proteins have highly conserved bHLH domains. The conserved motifs are shown in Supplementary File S5. The bHLH domain contains two different functional regions, the N-terminal basic amino acid region and the C-terminal HLH region, which are the parts composed of motifs 1, 2 and 9 in Figure 2. Motif 1 and 2 are the two main parts of the domain, which are longer than motif 9, and motif 9 is the very short part of the domain between motif 1 and 2. As shown in Figure 2, almost all 88 PmbHLH proteins contained motifs 1 and 2, except for the XIV group. Most of these proteins contain motif 1, 2 and 9 at the same time, indicating that their domains are more complete. Among them, each group generally contained partly the same motifs. For example, part of the proteins in group III contained not only motifs 1, 2, 9 but also motifs 5, 7, 10, and 14; part of proteins in group V and III both contained motif 14. Meanwhile, the proteins in group XI contained motifs 4, and 13, and proteins in group XIV contained only motif 2. Therefore, there were certain differences between these groups, suggesting different functions. Additionally, there were certain similarities between groups. For example, proteins in groups XI and XII had similar motifs. The proteins of group IX and part of group III also had similar conserved motifs. Therefore, the distribution of conserved motifs may be related to protein function, and the differences between groups may be related to the diversity of gene functions during evolution. At the same time, we also used MEME to analyze the conservative motifs of 88 Masson pine bHLH proteins and 134 Arabidopsis bHLH proteins (Supplementary File S6). With reference to the distribution of conserved motifs and the grouping of Arabidopsis bHLH genes, we can better determine the group to which each PmbHLH gene belongs.

Subcellular Localization Analysis
According to the results of CELLO and PSORT prediction of the subcellular localization of PmbHLH1-PmbHLH88 proteins, the proteins were almost always localized in the nucleus. The prediction results for these proteins are shown in Supplementary File S7. To verify the prediction results, we selected the PmbHLH44 protein in group III related to the synthesis of secondary metabolites, fused GFP to the ORF of PmbHLH44 and constructed a 121-PmbHLH44-GFP vector. After transient expression in N. benthamiana, fluorescence signals were observed in the nucleus by confocal laser-scanning microscopy (Figure 3), which was consistent with the predicted results. The 121-GFP no-load produced green fluorescence in the entire cell, which is obviously different from the expression pattern of PmbHLH44. PmbHLH44 only emits green fluorescence in the nucleus, proving that this protein is a nucleus-localized protein.

Subcellular Localization Analysis
According to the results of CELLO and PSORT prediction of the subcellular localization of PmbHLH1-PmbHLH88 proteins, the proteins were almost always localized in the nucleus. The prediction results for these proteins are shown in Supplementary File S7. To verify the prediction results, we selected the PmbHLH44 protein in group III related to the synthesis of secondary metabolites, fused GFP to the ORF of PmbHLH44 and constructed a 121-PmbHLH44-GFP vector. After transient expression in N. benthamiana, fluorescence signals were observed in the nucleus by confocal laser-scanning microscopy (Figure 3), which was consistent with the predicted results. The 121-GFP no-load produced green fluorescence in the entire cell, which is obviously different from the expression pattern of PmbHLH44. PmbHLH44 only emits green fluorescence in the nucleus, proving that this protein is a nucleus-localized protein.

Analysis of the Transcriptional Profile of PmbHLH Genes
According to the pine wood nematode inoculation transcriptome data, heat maps were drawn ( Figure 4). The heat map for the pine wood nematode inoculation transcriptome showed that the expression level of PmbHLH84 at 0-35 days was very high when compared with other genes longitudinally, while the expression levels of PmbHLH20, PmbHLH24, PmbHLH39, PmbHLH50, PmbHLH65 and PmbHLH88 continued to be very low at 0-35 days, and the expression levels of the remaining genes were intermediate. In general, the horizontal expression levels of these genes tended to first increase, then decrease and then increase. The expression levels of PmbHLH genes varied, indicating that these genes each have specific functions in specific environments.

Analysis of the Transcriptional Profile of PmbHLH Genes
According to the pine wood nematode inoculation transcriptome data, heat maps were drawn ( Figure 4). The heat map for the pine wood nematode inoculation transcriptome showed that the expression level of PmbHLH84 at 0-35 days was very high when compared with other genes longitudinally, while the expression levels of PmbHLH20, PmbHLH24, PmbHLH39, PmbHLH50, PmbHLH65 and PmbHLH88 continued to be very low at 0-35 days, and the expression levels of the remaining genes were intermediate. In general, the horizontal expression levels of these genes tended to first increase, then decrease and then increase. The expression levels of PmbHLH genes varied, indicating that these genes each have specific functions in specific environments.

Gene Expression Analysis under Different Treatments
Based on the classification of A. thaliana, we selected group III related to the synthesis of secondary metabolites and randomly selected six PmbHLH genes for RT-PCR expression analysis: PmbHLH9 (MT989425), PmbHL14 (MT989426), PmbHLH17 (MT989427), PmbHLH46 (MT989429), PmbHLH51 (MT989430), and PmbHLH66 (MT989431). The results in Figure 5 show that under MeJA treatment, the expression levels of PmbHLH9, PmbHLH14, PmbHLH46 and PmbHLH66 decreased slightly first and then increased at 24 h, reaching the highest level. The expression levels of PmbHLH17 and PmbHLH51 first increased to their highest levels at 3 h, decreased sharply at 6 h, and then increased steadily. Under ETH treatment, the expression levels of PmbHLH9, PmbHLH51 and PmbHLH66 rose steadily to their highest levels. The expression levels of PmbHLH17 and PmbHLH46 first increased and then decreased at 24 h, while the expression level of PmbHLH14 increased to its maximum at 3 h and then decreased. Under H2O2 treatment, the expression levels of the six genes reached their maxima at 6 h and then decreased, but the expression levels of PmbHLH9, PmbHLH17, PmbHLH51 and PmbHLH66 increased again at 24 h. Under osmotic stress, the expression levels of PmbHLH17 and PmbHLH51 showed an overall increasing trend, and the expression levels of the remaining four genes increased first and then decreased, showing the same patterns as the six genes after SA treatment. Under mechanical injury stress, only the expression level of PmbHLH17 increased

Gene Expression Analysis under Different Treatments
Based on the classification of A. thaliana, we selected group III related to the synthesis of secondary metabolites and randomly selected six PmbHLH genes for RT-PCR expression analysis: PmbHLH9 (MT989425), PmbHL14 (MT989426), PmbHLH17 (MT989427), PmbHLH46 (MT989429), PmbHLH51 (MT989430), and PmbHLH66 (MT989431). The results in Figure 5 show that under MeJA treatment, the expression levels of PmbHLH9, PmbHLH14, PmbHLH46 and PmbHLH66 decreased slightly first and then increased at 24 h, reaching the highest level. The expression levels of PmbHLH17 and PmbHLH51 first increased to their highest levels at 3 h, decreased sharply at 6 h, and then increased steadily. Under ETH treatment, the expression levels of PmbHLH9, PmbHLH51 and PmbHLH66 rose steadily to their highest levels. The expression levels of PmbHLH17 and PmbHLH46 first increased and then decreased at 24 h, while the expression level of PmbHLH14 increased to its maximum at 3 h and then decreased. Under H 2 O 2 treatment, the expression levels of the six genes reached their maxima at 6 h and then decreased, but the expression levels of PmbHLH9, PmbHLH17, PmbHLH51 and PmbHLH66 increased again at 24 h. Under osmotic stress, the expression levels of PmbHLH17 and PmbHLH51 showed an overall increasing trend, and the expression levels of the remaining four genes increased first and then decreased, showing the same patterns as the six genes after SA treatment. Under mechanical Forests 2020, 11, 1292 9 of 14 injury stress, only the expression level of PmbHLH17 increased first and then decreased, and the levels of the remaining genes decreased first, increased and then decreased. The expression levels were highest at 12 h.
Forests 2020, 11, 1292 9 of 14 first and then decreased, and the levels of the remaining genes decreased first, increased and then decreased. The expression levels were highest at 12 h.

Discussion
The bHLH gene family is widespread in animals and plants and is the second largest gene family in plants. Its members play important roles in the metabolism, physiology and development of higher organisms. bHLH TFs were first classified in animals, and the research on these TFs is more extensive in animals than in plants. In plants, bHLH TFs are classified according to animal classification standards and bioinformatics analysis [30]. The bHLH gene family has been extensively studied in many plant species, such as Z. mays [31], O. sativa [13], Solanum lycopersicum [32] and Malus domestica [30], but has not been reported in P. massoniana. Therefore, we identified 88 PmbHLH genes from four transcriptomes of P. massoniana. These genes were divided into 10 groups based on evolutionary relationships. The domains of these genes were analyzed by MEME. All the genes contained bHLH domains, and the domains in the same group were similar. In addition, subcellular localization analysis, RT-PCR, and transcriptional profiling were also performed.
In this study, 88 bHLH-encoding proteins were identified from the transcriptomes of P. massoniana and named PmbHLH1-PmbHLH88. Evolutionary analysis of 88 P. massoniana bHLH genes yielded a phylogenetic tree divided into 10 groups. In a study of peach bHLH TFs, it was found that proteins were lost during evolution, indicating that some nonconserved bHLH subfamilies in certain plants may have evolved specifically to meet the needs of plant growth or stress resistance [33]. In a study of apple bHLH genes, it was found that the conserved domains of genes in the same group were almost the same, and there were large differences between different groups [30]. Maize ZmbHLH TFs in the same group have similar gene structures and similar conserved protein motifs, indicating that members of the same group are closely evolutionarily related [34]. These results are consistent with the results of the present study. However, special patterns were observed in some groups, such as the PmbHLH16 gene in group I, which was different from other members and was more similar to members of group V. This gene may be a transitional form between the two groups in the evolutionary trajectory of a specific regulatory function, consistent with the conclusion of the study on apple bHLH genes. This system may provide a model for studying other groups in the bHLH gene family [30].
The results of subcellular localization prediction and tobacco transient expression showed that PmbHLHs are almost all nucleus-localized proteins. Only PmbHLH57 is located in the extracellular space. This is consistent with the subcellular localization results for rice bHLH TFs [13].

Discussion
The bHLH gene family is widespread in animals and plants and is the second largest gene family in plants. Its members play important roles in the metabolism, physiology and development of higher organisms. bHLH TFs were first classified in animals, and the research on these TFs is more extensive in animals than in plants. In plants, bHLH TFs are classified according to animal classification standards and bioinformatics analysis [30]. The bHLH gene family has been extensively studied in many plant species, such as Z. mays [31], O. sativa [13], Solanum lycopersicum [32] and Malus domestica [30], but has not been reported in P. massoniana. Therefore, we identified 88 PmbHLH genes from four transcriptomes of P. massoniana. These genes were divided into 10 groups based on evolutionary relationships. The domains of these genes were analyzed by MEME. All the genes contained bHLH domains, and the domains in the same group were similar. In addition, subcellular localization analysis, RT-PCR, and transcriptional profiling were also performed.
In this study, 88 bHLH-encoding proteins were identified from the transcriptomes of P. massoniana and named PmbHLH1-PmbHLH88. Evolutionary analysis of 88 P. massoniana bHLH genes yielded a phylogenetic tree divided into 10 groups. In a study of peach bHLH TFs, it was found that proteins were lost during evolution, indicating that some nonconserved bHLH subfamilies in certain plants may have evolved specifically to meet the needs of plant growth or stress resistance [33]. In a study of apple bHLH genes, it was found that the conserved domains of genes in the same group were almost the same, and there were large differences between different groups [30]. Maize ZmbHLH TFs in the same group have similar gene structures and similar conserved protein motifs, indicating that members of the same group are closely evolutionarily related [34]. These results are consistent with the results of the present study. However, special patterns were observed in some groups, such as the PmbHLH16 gene in group I, which was different from other members and was more similar to members of group V. This gene may be a transitional form between the two groups in the evolutionary trajectory of a specific regulatory function, consistent with the conclusion of the study on apple bHLH genes. This system may provide a model for studying other groups in the bHLH gene family [30].
The results of subcellular localization prediction and tobacco transient expression showed that PmbHLHs are almost all nucleus-localized proteins. Only PmbHLH57 is located in the extracellular space. This is consistent with the subcellular localization results for rice bHLH TFs [13].
To further explore the physiological functions of bHLH TFs in plant growth and development, we analyzed the transcription profiles of the PmbHLH family under pine wood nematode inoculation. Within 35 days after inoculation with pine wood nematodes, the expression levels of these genes did not change very much. Through longitudinal comparison, some genes had high expression levels from beginning to end, and the levels of some genes continued to be very low. The overall expression change trends of all the genes were consistent, indicating that these genes are somewhat related to resistance to pine wood nematodes, but they do not play a particularly obvious regulatory role.
We treated P. massoniana with abiotic stress and hormones and then performed RT-PCR analysis. The expression levels of the six selected genes under MeJA, ETH, H 2 O 2 , osmotic, SA and injury stress increased in the later stage. PmbHLH9, PmbHLH51 and PmbHLH66 proteins eventually all increased their expression levels under ETH stress, indicating that these three proteins were significantly expressed under the induction of ETH. PmbHLH17 and PmbHLH51 proteins can be expressed significantly under osmotic stress. The expression levels of bHLH TFs in many plants have been analyzed under these treatments. For example, the OsbHLH6 gene is significantly expressed in rice leaves under induction by MeJA. Studies have also confirmed that OsbHLH TFs are involved in regulating the SA and JA signaling pathways in rice [35]. The expression patterns under abscisic acid (ABA) and ETH treatment indicated that seven genes in strawberry are involved in the biosynthesis of fruit anthocyanins [36]. The expression level of TabHLH49 protein in wheat under osmotic stress was significantly higher than that of the control, and after deleting the C-terminus (amino acids 323-362), TabHLH49 protein exhibited no transcriptional activity in yeast, indicating that TabHLH49 positively regulates the expression of the dehydrin WZY2 gene [37]. These results are consistent with the results of the present study, indicating that bHLH TFs play certain roles in drought tolerance, plant immune response and secondary metabolite synthesis.

Conclusions
In this study, we identified 88 PmbHLH genes from four transcriptomes of P. massoniana and performed bioinformatics analysis. These genes were divided into 10 groups in total, each gene contained a bHLH domain, and the domains in the same group were more similar than those in different groups. Subcellular localization prediction and experiments revealed that these genes encode mostly nucleus-localized proteins. RT-PCR showed that the expression levels of the six selected genes under the six treatments all increased in different time periods. Some proteins can be significantly expressed under specific stress. Therefore, members of the bHLH TF family play crucial roles in plant abiotic stress and hormone responses. This is of great significance to the breeding of the biosynthesis of resin and resistance to pine wood nematode disease in Masson pine. At present, there are very few studies on bHLH transcription factors of Pinaceae. This study will provide a theoretical basis for functional studies of the bHLH gene family, be very useful for studying the secondary metabolism of Pinaceae and provide a potential strategy for further breeding P. massoniana.