A Genome-Wide Identification and Analysis of the Basic Helix-Loop-Helix Transcription Factors in Brown Planthopper, Nilaparvata lugens

The basic helix-loop-helix (bHLH) transcription factors in insects play essential roles in multiple developmental processes including neurogenesis, sterol metabolism, circadian rhythms, organogenesis and formation of olfactory sensory neurons. The identification and function analysis of bHLH family members of the most destructive insect pest of rice, Nilaparvata lugens, may provide novel tools for pest management. Here, a genome-wide survey for bHLH sequences identified 60 bHLH sequences (NlbHLHs) encoded in the draft genome of N. lugens. Phylogenetic analysis of the bHLH domains successfully classified these genes into 40 bHLH families in group A (25), B (14), C (10), D (1), E (8) and F (2). The number of NlbHLHs with introns is higher than many other insect species, and the average intron length is shorter than those of Acyrthosiphon pisum. High number of ortholog families of NlbHLHs was found suggesting functional conversation for these proteins. Compared to other insect species studied, N. lugens has the highest number of bHLH members. Furthermore, gene duplication events of SREBP, Kn(col), Tap, Delilah, Sim, Ato and Crp were found in N. lugens. In addition, a putative full set of NlbHLH genes is defined and compared with another insect species. Thus, our classification of these NlbHLH members provides a platform for further investigations of bHLH protein functions in the regulation of N. lugens, and of insects in general.


Introduction
Basic helix-loop-helix (bHLH) proteins are the largest superfamily of transcription factors characterized by a bHLH signature domain for DNA binding. This domain consists of approximately 60 amino acids of two functionally distinctive regions. The basic region locates at the N-terminal end of the domain including~15 amino acids with a high number of basic residues. The canonical core DNA sequence motif recognized by bHLH is a consensus hexanucleotide sequence known as E-box (5'-CANNTG-3'). E-boxes can be divided into several types based on the identity of two central bases in the sequence. The most common type is the palindromic E-box (5'-CACGTG-3'). Within the basic region of bHLH, certain conserved amino acids serve to identify the core consensus site, whereas other residues in the domain dictate specificity for a given type of E-box [1]. In addition, the nucleotides flanking the hexanucleotide core have been shown to play a role in DNA binding specificity [2,3], and there is evidence that a residue loop outside the core domain plays a critical role in sequence-specific DNA binding through elements that lie outside of the core recognition sequence [4].
The first bHLH gene was reported in human in 1988 [5]. To date, with the sequencing of several insect genomes, a large number of bHLH family members have been identified in

Multiple Sequence Alignments and Phylogenetic Analysis
Multiple sequence alignments of all the potential bHLH proteins were performed using ClustalW v. 2.1 (EMBL-EBI, Hinxton, Cambridgeshire, United Kingdom) [33] with manual inspection. The alignments were used to construct phylogenetic trees by neighbor-joining (NJ), maximum parsimony (MP), maximum-likelihood (ML) and Bayesian phylogenies using MEGA v.6 (Koichiro Tamura, Hachioji, Tokyo, Japan), PAUP v.4.0 Beta 10 (David Swofford, Sunderland, Massachusetts, United States), RAxML v.8 (Alexandros Stamatakis, Heidelberg, Baden-Württemberg, Germany) [34] and MrBayes v.3.2 (Ronquist and Huelsenbeck, Norbyv. 18D, SE-752 36 Uppsala, Sweden) [35], respectively. Default parameter values of the NJ, ML and MP analyses were used, except for the LG amino acid substitution model with a gamma distribution for among-site rate variation in ML analysis. The reliabilities of NJ, MP and ML tree topology were evaluated by bootstrapping a sample of 1000 replicates. For the Bayesian analysis, the alignment was analyzed using both mixed models that model substitutions as a mixture of many empirical amino-acid substitution matrices, and a LG + γ model for amino acid data. All other parameters such as priors, proposal mechanisms and chain settings were defaults. All sets of chains were performed for 4 million generations, sampled every 100 generations, with 2 million generations discarded as burn-ins. Convergence was confirmed by visual comparison of the likelihoods of two chains in each run, and by using the standard deviation of split frequencies and potential scale reduction factors reported by the software. The best available amino acid substitution model (LG) with a gamma distribution for among-site rate variation used in phylogenetic analysis was estimated by ProtTest v.3 (Diego Darriba, Vigo, Galiza, Spain) under the Akaike information criterion [36]. The ingroup phylogenetic analysis was performed using Liu et al. (2012) described methods with sequence alignments of NlbHLH and DmbHLH motifs, and the analysis was used to name each NlbHLH.

Domain Prediction
The predictions of protein domain architectures were performed to further ascertain the reliability of the retrieved motifs and to examine whether the full-length protein sequences contain additional characteristic domains. Tools available online including Simple Modular Architecture Research Tool (SMART, http://smart.embl-heidelberg.de/) [37], Conserved Domain Architecture Retrieval Tool (CDART, https://www.ncbi.nlm.nih.gov) [38] and PROSITE (http://prosite.expasy.org/) [39] were used.

Molecular Cloning
In order to get transcriptional evidence of the genes, reverse transcription polymerase chain reaction (RT-PCR) was performed to authenticate the sequences of genes or fragments. Total RNA was extracted from eggs, first-instar through fifth-instar nymphs, and newly emerged adults (within 24 h after molting) using the Trizol Reagent (Invitrogen, Shanghai, China) according to the manufacturer's instructions. These total RNA samples were pooled. The concentration and purity of the pooled sample were measured with the NanoDrop 1000 spectrophotometer (Thermo Fisher Scientific, Rockford, IL, USA) and the integrity was checked by agarose gel electrophoresis. One microgram (µg) of the total RNA was reverse transcribed to cDNA using the ReverTra Ace qPCR RT Kit (Toyobo Co. Ltd., Osaka, Japan). The cDNA was used to perform polymerase chain reaction (PCR) to verify the candidate NlbHLHs using primers listed in Table 1. The PCR product was sequenced on the Applied Biosystems 3730 automated sequencer (Foster City, CA, USA) from both directions (Additional file 2). The sequences were aligned with N. lugens genome to show their identities.

Identification of bHLH Members in N. lugens
Initially, annotation of the draft N. lugens genome (version 1, GCA_000757685.1) and transcriptome (SRR1187936) identified 62 domain-containing bHLH genes or gene fragments. These candidate genes were further inspected using blast searches (BLASTX, e-value < 0.00001), intron analysis, manual inspection against the 19 conserved amino acid sites, and sequence alignment. This resulted in 60 unique bHLH candidates (NlbHLHs). Out of these NlbHLHs genes, 48 and 12 were from N. lugens official gene sets and N. lugens transcriptome, respectively. The alignments of the 60 NlbHLH members were shown in Figure 1. Furthermore, the ML phylogenetic tree ( Figure 2) generated with amino acids of the 60 NlbHLH motifs, and 59 DmbHLH motifs were used for their categorization (See Supplementary Figure S1 for NJ, MP and Bayesian tree). This data revealed that 25, 14, 10, 1, 8 and 2 NlbHLH members belong to group A, B, C, D, E and F, respectively. These members possess the basic, helix 1, a loop and helix 2 regions, except for NlPxs, NlEmc, NlH, NlSide, NlSim1, NlDpn and NlE(spl)3 where the basic region or helix 2 was completely or partially missing. The missing regions may reflect the truncated functional roles of these proteins. Additionally, NlFer1 and NlMist1 have one additional amino acid (S or V) in helix 1 or the loop region, respectively. This amino acid creates an additional gap among aligned NlbHLH motifs (Figure 1), indicating certain differences between N. lugens and another insect species. In contrast, sites 21 and 64 of the bHLH motif are highly conserved among all NlbHLH motifs ( Figure 1). Of these conserved sites, the 19 sites were the most conserved ones in the basic, helix 1, loop and helix 2 regions, as the element of the predicted model [2]. Phylogenetic analysis showed that two or three members of each SREBP, Mnt, COE, AP4, Mist, Ngn, Atonal, Delihah, ASCa, Sim and H/E(spl) family formed a monophyletic clade with that from D. melanogaster with high or moderate statistical support ( Figure 2). This may suggest relatively recent duplications that were specific to N. lugens. Functional redundancy due to gene duplications is a common feature of many biological systems. Feedback between redundant copies may serve as an information processing element that facilitates signal transduction and the control of gene expression [40]. Since the functional roles of bHLH members in D. melanogaster have been well studied, we adopted their nomenclature for structural and functional comparison, along with the bootstrap supports provided by the ingroup phylogenetic analyses (Table 2). In the case where one DmbHLH sequence has two or more N. lugens homologs, they were numbered "1", "2", "3", etc.  NlbHLH genes were named according to their D. melanogaster homologs. Bootstrap values were obtained from in-group phylogenetic analyses with D. melanogaster or A. pisum bHLH motif sequences using neighbor-joining (NJ), maximum parsimony (MP), maximum-likelihood (ML) and Bayesian algorithms, respectively. OsRa (the rice bHLH motif sequence of R family) was used as outgroup in each constructed tree. n/m means that a N. lugens bHLH does not form a monophyletic group with any other single bHLH motif sequence. * means that orthology of the gene was defined through in-group phylogenetic analyses with bHLH orthologs from A. pisum. RT-PCR, Reverse Transcription-Polymerase Chain Reaction; EST, Expressed Sequence Tag.

Identification of Orthologous Families
Ingroup phylogenetic analysis of bHLH members has been widely used to define evolutionary conserved groups of orthologs [9]. Previous studies have used monophyletic groups as a standard to define bHLH families of orthologs. A monophyletic group includes members of a known family of different phylogenetic algorithms with statistical support values greater than 50 [9,15,20,41]. Accordingly, we defined evolutionary conserved groups of orthologs according to the ingroup phylogenetic analysis of each NlbHLH member. As an example, Figure 3 shows the NJ, MP, ML and Bayesian inference trees constructed with one NlbHLH member (trachealess, NlTrh) and 10 group C members from D. melanogaster. NlTrh formed monophyletic clade with trh of D. melanogaster with statistical support values of 99, 89, 96 and 79 in NJ, MP, ML and Bayesian inference trees, respectively. NlTrh was therefore considered as an ortholog of D. melanogaster trh. The ingroup phylogenetic analysis was performed to each of the identified NlbHLH members. The statistical support values of the constructed NJ, MP, ML and Bayesian trees were listed in Table 2. The majority of these bHLHs could be clearly assigned to the families according to statistical support values of the ingroup phylogenetic trees. Five NlbHLHs [NlMad, NlH, NlE(spl)1, NlE(spl)2, NlE(spl)3] could not be confidently assigned by our phylogenetic analysis with DmbHLHs. They were analyzed with A. pisum bHLHs (ApbHLH) using the same method mentioned above.   Since these statistical support values were greater than the set criterion (50), the genes are assigned as the corresponding D. melanogaster homologs (Table 3).  NlTrh was therefore considered as an ortholog of D. melanogaster trh. The ingroup phylogenetic analysis was performed to each of the identified NlbHLH members. The statistical support values of the constructed NJ, MP, ML and Bayesian trees were listed in Table 2. The majority of these bHLHs could be clearly assigned to the families according to statistical support values of the ingroup phylogenetic trees. Five NlbHLHs [NlMad, NlH, NlE(spl)1, NlE(spl)2, NlE(spl)3] could not be confidently assigned by our phylogenetic analysis with DmbHLHs. They were analyzed with A. pisum bHLHs (ApbHLH) using the same method mentioned above.  Since these statistical support values were greater than the set criterion (50), the genes are assigned as the corresponding D. melanogaster homologs (Table 3).   Secondly, one bHLH member, NlCato, had statistical support value of 37 in the NJ tree. Nevertheless, it formed a monophyletic clade with the same DmbHLH counterpart in MP, ML and Bayesian trees with statistical support values of 97, 78 and 98, respectively. Consequently, we assigned it to a defined ortholog family according to the three trees with statistical support values of greater than 50.
Thirdly, one bHLH member, NlDpn, formed a monophyletic clade in the NJ tree with a statistical support value of 61. A statistical support value of 21 for a monophyletic clade was found in ML, but formed no monophyletic group in MP and Bayesian trees (marked with n/m in Table 2). NlDpn forms similar monophyletic group with DmDpn and with A. pisum Dpn (with statistical support value of 64 and 29 in NJ and ML, respectively). Albeit with insufficient statistical support, we tentatively defined ortholog for NlDpn to the correspondent D. melanogaster dpn. Obviously, this classification is arbitrary and should be modified if new data becomes available. The phylogenetic divergence of bHLH motif sequences between N. lugens and D. melanogaster or A. pisum probably implies that these insect species evolved in quite different circumstances.
Finally, 5 members named as NlMad, NlH, NlE(spl)1, NlE(spl)2 and NlE(spl)3 did not have sufficient bootstrap support in forming a monophyletic clade with any single D. melanogaster homolog in all four phylogenetic trees. They were categorized through constructing phylogenetic trees with ApbHLH family members. Four members, namely NlMad, NlH, NlE(spl)1 and NlE(spl)3, were identified with sufficient confidence (statistical support values > 50) in all the constructed trees. The remaining one member, NlE(spl)2, did not form a monophyletic clade with that of A. pisum, and was categorized as a N. lugens specific clade.
Besides phylogenetic analyses, structure predictions of these NlbHLH proteins were performed. Through predictions by SMART, CDART and PROSITE using the protein sequences of the identified NlbHLH members (Figure 2), we found that: (a) Among members of group C, 6 sequences (NlSim2, NlTrh, NlTgo, NlClk, NlCyc and NlMet) contain one bHLH, one PAC (Motif C-terminal to PAS motifs) [42], and two PAS (Prt-Arnt-Sim) domains. NlTai and NlSima have one bHLH and two PAS domains, respectively. NlSim1 has one bHLH and one PAS domains. The remaining two (NlDys and NlSs) only have the bHLH domain.  [9,[43][44][45]. It is conceivable that these common domain configurations confer particular protein functions across species [15].

Genomic Distribution of N. lugens bHLH Genes
The positions of the 60 NlbHLHs in chromosome scaffolds are shown in Figure 4. These NlbHLH genes were mapped to 59 N. lugens scaffolds. Among these scaffolds, scaffold527 was mapped by two bHLH genes, NlDpn and NlE(spl)3, whereas each of other scaffolds was mapped by one bHLH gene. The locations of NlbHLH genes on chromosome scaffolds are inconsistent with the hypothetical duplication history of the phylogenetic tree, such as NlSREBP1 and NlSREBP1, NlKn(col)1 and NlKn(col)2, NlTap1 and NlTap2, NlAto1 and NlAto2, etc. This contradiction may be due to the draft genome lacking chromosome-level genome assembly [21].

Intron-Exon Structure of N. lugens bHLH Genes
The length of coding regions and exon-intron length are shown in Figure 4. There are eleven intronless genes, and 49 genes having at least one intron. A total of 195 introns were identified with the average intron number of 4.0 per gene. Among these introns, 152 introns are >1000 bp in length (the longest intron is 1,155,031 bp), and the remaining ones are <1000 bp in length (the shortest intron is 35 bp). Intron analysis shows that 29 NlbHLH members have introns in the coding regions of their bHLH motifs. It should be noted that: (a) coding regions of 26 NlbHLH motifs have one intron, and three motifs have introns in the basic region, five have introns in the helix 1 region, ten have introns in the loop region, and eight have introns in the helix 2 region; and (b) coding regions of three NlbHLH motifs have two introns, of which two have introns in the basic and helix 2 regions, and the remaining one has introns in the basic and loop regions. Thus, coding regions of these 29 NlbHLH motifs have a total of 32 introns. In addition, one NlbHLH (NlTai) locates on three separate scaffolds in the genome (Figure 4). In coding regions of NlbHLH motifs, the longest intron is 1,155,031 bp, the shortest one is only 35 bp, and the average is 2282 bp. In comparison, A. pisum, D. melanogaster, A. aegypti, A. gambiae, C. quinquefasciatus, B. mori, A. mellifera, N. vitripennis and H. saltator have 26,18,24,22,19,12,9,22 and 22 bHLH members having introns in the coding regions of their bHLH motifs, and the total number of introns identified is 34,20,30,26,23,12,9,27 [7][8][9][10]13,14].
In summary, the number of NlbHLHs having introns is higher than that of many other insect species. Moreover, NlbHLHs not only have the shortest length intron, but also have longer length introns compared to most studied species (except for A. aegypti and N. vitripennis). The higher intron-density of NlbHLH genes than those of many other insects indicates that N. lugens either gained introns at a faster rate or lost introns at a slower rate than others [46]. Previously hypothesized mechanisms of intron gains mainly involve intron transposition [47], transposon insertion [48], tandem genomic duplications [49], intron transfer [50], insertion of a Group II intron [47], intron gain during double strand break repair [51] and intronization [52,53]. Hypothesized mechanisms of intron loss include reverse transcriptase-mediated intron loss [54], meiotic recombination [46] and genomic deletions [55]. Notably, N. lugens genome contains a high level of specific transposable element (TE) with larger fraction than that in the A. pisum, contributing to the large genome size of N. lugens [56]. We speculate that there may be a relationship between the formation of introns in NlbHLHs and TEs. Nevertheless, the mechanism of high intron-dense NlbHLHs (growing faster or losing slower) needs further investigations.

Molecular Cloning and Predicted Function of N. lugens bHLHs
Transcription evidence by RT-PCR and/or EST are widely used for understanding gene functions, e.g., in N. vitripennis, A. aegypti, A. gambiae, C. quinquefasciatus, L. decemlineata, and H. saltator. The transcriptional evidence of 47 NlbHLHs (78%) was obtained by both RT-PCR and EST, and the remaining ones were only supported by EST (Table 2). Although RT-PCR as direct evidence is used to support transcription, positive results may not be obtained due to specific temporal and spatial expression patterns or other factors that negatively affect the performance of PCR. Thus, EST as indirect evidence is an additional option to support. We believe that EST supported NlbHLHs could denote their highly specific patterns in N. lugens. Sequence alignments show that each cDNA and EST exhibited perfect identity with the N. lugens genome. As the comparison of cDNA/EST and genome shows, all presumed exon-intron structures are correctly predicted (Addition file 2). Meanwhile, the results support that NlbHLHs play similar functional roles in N. lugens as in other insects. Of these NlbHLHs, there are 25 members in group A. The group A proteins bind the E-box variant CACCTG or CAGCTG [20]. This group include proteins such as 48-related-1/Fer1, 48-related-2/Fer2, PTFa/Fer3, ASCa, ASCb, ASCc, amber, Atonal 2, Beta3, Delilah, E12/E47, Hand, Mesp, Mist, MyoD, MyoRa, MyoRb, Net, NeuroD, Neurogenin, NSCL, Oligo, paraxis, peridot, SCL and Twist families [15,57]. These proteins mainly regulate neurogenesis, myogenesis and mesoderm formation [58][59][60][61][62][63]. Our analysis shows that most of NlbHLH members exhibit 1:1 orthology with D. melanogaster, suggesting functional conservation.
There are 14 members of NlbHLHs in group B. Group B members recognize and bind G-box (CACGTG or CATGTTG). This group is represented by Figα, Myc, Mnt, Mad, Max, USF, MITF, SRC, SREBP, AP4 and TF4 [15]. The members in this group are mainly involved in cell proliferation/ differentiation, sterol metabolism and adipocyte formation, and expression of glucose-responsive genes [9,[64][65][66]. We found that the members of SRC, Myc, Mnt, Max, USF, MITF, MLX and AF4 showed 1:1 orthology with D. melanogaster. Furthermore, NlbHLHs have more members in SREBP and AP4 families than that of D. melanogaster, which could suggest divergent functions of these NlbHLHs.
Ten members of NlbHLHs (NlClk, NlDys, NlSs, NlSim1, NlSim2, NlTrh, NlSima, NlTgo, NlCyc and NlMet) are in group C. Group C is formed by bHLH proteins that have one or two PAS domains in addition to the bHLH motif, and bind to non-E-box (NACGTG or NGCGTG) core sequences [20]. The HLH families of Group C include circadian locomotor output cycles kaput (clock), aryl hydrocarbon receptor (AHR), single-minded (Sim), trachealess (Trh), hypoxia-inducible factor (HIF), aryl hydrocarbon receptor nuclear translocator (ARNT), brain and muscle ARNT-like (Bmal) and methoprene-tolerant (Met). They are responsible for the regulation of multiple biological processes including midline and tracheal development, circadian rhythms, and for the activation of gene transcription in response to environmental toxins [66,68]. More specific, Sim and Trh control development of the central nervous system midline and the trachea, respectively [69][70][71]. Clk/ARNT heterodiner activates a feedback loop control the persistence and period of circadian rhythms [72,73]. It is known that NlMet mediates JH signal pathway and plays a role in the ovariole development and egg maturation of the brown planthopper [24], and it could likely be involved in resistance to insecticides [74,75].
There is only one member of NlbHLHs for group D, namely NlEmc. Group D proteins, which include Id, extra macrochaetae (Emc), Heira, and Hhl462, are unable to bind DNA due to lack of a basic domain. They act as antagonists of group A proteins [19,76].
There are eight members of NlbHLHs (NlHey, NlStich1, NlSide, NlDpn, NlH, NlE(spl)1, NlE(spl)2, NlE(spl)3) in group E. Group E proteins are formed by WRPW-bHLH proteins such as Hairy and Enhancer of Split that preferentially bind to sequences referred as N boxes (CACGGC or CACGAC). They have only low affinity for E-boxes, and possess a Pro instead of an Arg residue at a crucial position in the bHLH domain [77]. These proteins usually contain two characteristic domains named "Orange" and "WRPW" peptide in the carboxyl terminus, and mainly regulate embryonic segmentation, somitogenesis and organogenesis. It is notable that NlE(spl)3 lacks the two characteristic domains, suggesting functional defects of this protein.
Group F proteins have the COE domain, which has an additional domain involved in dimerization and DNA binding, that are divergent in sequence from the other groups described. It has only one family (Knot/Collier), and mainly regulates head development and formation of olfactory sensory neurons [20,78,79]. Two members of NlbHLHs([NlKn(col)1 and NlKn(col)2) are in this group. However, NlKn(col)2 lacks the COE domain, suggesting functional defects of this protein.

The bHLH Repertoire of N. lugens and Other Insect Species
This study characterized the orthologs of the 60 NlbHLHs. Thus far, the bHLH members from 11 insect species are available and listed in Table 3. The total number of identified NlbHLHs (60) is comparable with 54,48,57,51,50,49,52,59,55,55,57 and 55 bHLH members in A. pisum, N. vitripennis, H. saltator, A. mellifera, T. castaneum, L. decemlineata, B. mori, D. melanogaster, A. aegypti, A. gambiae, C. quinquefasciatus and P. humanus corporis, respectively. It can be seen that all of the studied insect species lack genes of families Oligo, MyoRb and Figα, suggesting these hallmark members in other organisms may have no role in insects. All examined insect species such far have only one member in 10 bHLH families including E12/E47, Beta3, Hand, SCL, NSCL, SRC, Myc, ARNT, Trh and HIF, except for C. quinquefasciatus with two Trh members. Members of MyoD, Net, Paraxis, Mad or MLX are missing in some insect species. Nevertheless, the comparable number of bHLH families and similar orthologs found among insects strongly suggest that the set of NlbHLH we retrieved is likely to be almost complete, hence represents an accurate view of the bHLH repertoire of planthoppers. In addition to the total number of genes, another obvious difference is the discrepancy of H/E(spl) family members. D. melanogaster have 11 to 12 while other insects have only 4 to 8. One can also notice that N. lugens has one or two more genes in family Ngn, Delilah, SREBP, Sim and COE than most of the insect species. On the other hand, we failed to discover a N. lugens gene in the ASCb family, in which A. pisum has one. Furthermore, similar to H. saltator bHLHs, thirteen NlbHLH families have more than one member (accounted for about 29% of all the families), while in most other insects, families with more than one member are fewer (range of 13% to 20% with an average of 16%). This suggests that some of the N. lugens bHLH genes were originated through duplications. The divergence of bHLHs in insects suggests that those members may play different roles due to adaptations of specific biological niches.

Conclusions
The bHLH proteins play pivotal roles in a wide variety of biological processes. In this study, 60 bHLHs encoded in N. lugens genome were identified. Through multiple sequence alignment and ingroup phylogenetic analysis using bHLHs identified from D. melanogaster and A. pisum, all 60 NlbHLHs have been successfully classified to bHLH groups A-F. N. lugens has members in all six bHLH groups. The ortholog analysis and domain prediction revealed that NlTrh, NlTgo, NlClk, NlCyc and NlMet are highly conserved implying regulatory functions of many physiological processes as in other insects. In contrast, N. lugens specific gene duplications of SREBP, Kn(col), Tap, Delilah, Sim, Ato and Crp suggest functional divergence. All of the results provide a foundation for further investigations of bHLH protein functions in N. lugens specifically, and in insects in general.

Supplementary Materials:
The following are available online at www.mdpi.com/2073-4425/7/11/100/s1, Supplementary figure 1: The NJ (A), MP (B) and Bayesian (C) trees of 60 NlbHLH members with 59 D. melanogaster bHLH members. These trees summarize the evolutionary relationship between the NlbHLHs and DmbHLHs, which were rooted using OsRa (a rice bHLH motif sequence of R family) as outgroup. The trees are based on a multiple alignment that includes 59 DmbHLH and 60 NlbHLH members. For simplicity, branch lengths of the trees are not proportional to distances between sequences; Supplementary file 2: The nucleotide sequences of 60 NlbHLHs.