HSCVFNT: Inference of Time-Delayed Gene Regulatory Network Based on Complex-Valued Flexible Neural Tree Model

Gene regulatory network (GRN) inference can understand the growth and development of animals and plants, and reveal the mystery of biology. Many computational approaches have been proposed to infer GRN. However, these inference approaches have hardly met the need of modeling, and the reducing redundancy methods based on individual information theory method have bad universality and stability. To overcome the limitations and shortcomings, this thesis proposes a novel algorithm, named HSCVFNT, to infer gene regulatory network with time-delayed regulations by utilizing a hybrid scoring method and complex-valued flexible neural network (CVFNT). The regulations of each target gene can be obtained by iteratively performing HSCVFNT. For each target gene, the HSCVFNT algorithm utilizes a novel scoring method based on time-delayed mutual information (TDMI), time-delayed maximum information coefficient (TDMIC) and time-delayed correlation coefficient (TDCC), to reduce the redundancy of regulatory relationships and obtain the candidate regulatory factor set. Then, the TDCC method is utilized to create time-delayed gene expression time-series matrix. Finally, a complex-valued flexible neural tree model is proposed to infer the time-delayed regulations of each target gene with the time-delayed time-series matrix. Three real time-series expression datasets from (Save Our Soul) SOS DNA repair system in E. coli and Saccharomyces cerevisiae are utilized to evaluate the performance of the HSCVFNT algorithm. As a result, HSCVFNT obtains outstanding F-scores of 0.923, 0.8 and 0.625 for SOS network and (In vivo Reverse-Engineering and Modeling Assessment) IRMA network inference, respectively, which are 5.5%, 14.3% and 72.2% higher than the best performance of other state-of-the-art GRN inference methods and time-delayed methods.


Introduction
Gene regulatory networks (GRN) contain the regulation relationships among genes, mRNA and proteins, which almost control all biological activity in the field of biology [1][2][3]. Mastering the gene regulation mechanism could help understand the growth and development of animals and plants, and master a key of revealing the mystery of biology [4][5][6][7][8]. The gene regulatory network can be regarded as a complex dynamics system, which has some characteristics, including strong coupling, random, time-delayed, and strongly nonlinear [9][10][11][12]. In addition, the number of genes of gene real interval stock price time series data [35]. Saoud et al. presented a fully complex-valued wavelet network (FCWN) to predict the global solar irradiation [36]. Savitha et al. proposed a Circular Complex-valued Extreme Learning Machine (CC-ELM) to classify an acoustic emission signal and mammogram problem [37]. In this paper, a complex-valued flexible neural tree model is proposed to model a time-delayed matrix in order to infer the time-delayed regulations. The final structure of GRN can be obtained by an iteratively performing HSCVFNT algorithm.
Three real time-series expression datasets from an SOS DNA repair system in E. coli and Saccharomyces cerevisiae (IRMA network) are utilized to evaluate the performance of HSCVFNT algorithm. Experimental results show that HSCVFNT performs better than other state-of-the-art GRN inference methods.

Datasets and Evaluation Metrics
Two real-life benchmark gene regulatory networks are utilized to validate our method. The first GRN is derived from the SOS DNA damage repair network of E. coli with six genes [38], and the second benchmark network is from Saccharomyces cerevisiae (IRMA network), which contains five genes [39].
Sensitivity, Precision, Specificity and F-score are utilized to evaluate the performance of HSCVFNT algorithm. Four criterions are defined as follows: where TP, FP, FN, and TN are described in Figure 1. network (FCWN) to predict the global solar irradiation [36]. Savitha et al. proposed a Circular Complex-valued Extreme Learning Machine (CC-ELM) to classify an acoustic emission signal and mammogram problem [37]. In this paper, a complex-valued flexible neural tree model is proposed to model a time-delayed matrix in order to infer the time-delayed regulations. The final structure of GRN can be obtained by an iteratively performing HSCVFNT algorithm. Three real time-series expression datasets from an SOS DNA repair system in E. coli and Saccharomyces cerevisiae (IRMA network) are utilized to evaluate the performance of HSCVFNT algorithm. Experimental results show that HSCVFNT performs better than other state-of-the-art GRN inference methods.

Datasets and Evaluation Metrics
Two real-life benchmark gene regulatory networks are utilized to validate our method. The first GRN is derived from the SOS DNA damage repair network of E. coli with six genes [38], and the second benchmark network is from Saccharomyces cerevisiae (IRMA network), which contains five genes [39].
Sensitivity, Precision, Specificity and F-score are utilized to evaluate the performance of HSCVFNT algorithm. Four criterions are defined as follows:

* 2
Sensitivity Precision F score Sensitivity Precision where TP, FP, FN, and TN are described in Figure 1. To evaluate the performance of the HSCVFNT method, the HSCVFNT algorithm is compared with the algorithms with better performance. For the SOS DNA repair network in E. coli, we select the algorithms of DBN [40], S-system [41] and TDSS (time-delayed S-system) [22]. For the IRMA network, six algorithms are selected for deriving GRNs, such as TDARACNE (the ARACNE algorithm based on time-delayed mutual information) [21], TDLASSO (a high-order LASSO) [42], To evaluate the performance of the HSCVFNT method, the HSCVFNT algorithm is compared with the algorithms with better performance. For the SOS DNA repair network in E. coli, we select the algorithms of DBN [40], S-system [41] and TDSS (time-delayed S-system) [22]. For the IRMA network, six algorithms are selected for deriving GRNs, such as TDARACNE (the ARACNE algorithm based on time-delayed mutual information) [21], TDLASSO (a high-order LASSO) [42], DBN-ZC (a high-order DBN) [43], DBmcmc (a first-order DBN) [44], MMHO-DBN [24] and HRNN [25]. For all the methods in the comparison, the parameters are set by default.

Network Construction of Real Data of E. coli SOSDNA Repair Network
SOS DNA damage repair network of E. coli includes four experiments under different intensities of light (Experiments 1 and 2 at 5 Jm2 , Experiments 3 and 4 at 20 J m2 ) [45]. Each experiment consists of 50 time points and the interval between two time points is six minutes, which involves eight genes: uvrD, lexA, umuD, recA, uvrA, uvrY, ruvA and polB. In this paper, six main genes (uvrD, lexA, umuD, recA, uvrA and polB) are selected to test our method. The true gene regulatory network is described in Figure 2. DBN-ZC (a high-order DBN) [43], DBmcmc (a first-order DBN) [44], MMHO-DBN [24] and HRNN [25]. For all the methods in the comparison, the parameters are set by default.

Network Construction of Real Data of E. coli SOSDNA Repair Network
SOS DNA damage repair network of E. coli includes four experiments under different intensities of light (Experiments 1 and 2 at 5 Jm2 , Experiments 3 and 4 at 20 J m2 ) [45]. Each experiment consists of 50 time points and the interval between two time points is six minutes, which involves eight genes: uvrD, lexA, umuD, recA, uvrA, uvrY, ruvA and polB. In this paper, six main genes (uvrD, lexA, umuD, recA, uvrA and polB) are selected to test our method. The true gene regulatory network is described in Figure 2. In the HSCVFNT algorithm, the maximum time lag max τ is set as 6. The number of candidate regulatory factors L is set as 2. The inferred GRN is depicted in Figure 3. The solid lines represent the true regulatory relationships and the dotted lines are the false positive regulations inferred.
Comparing Figures 2 and 3, we can see that our inferred six edges are all true positive and no false positive edges occur. Only edge lexA lexA → could not be inferred. This is because the target gene lexA could not select itself into its candidate regulatory factor set and self-regulation relationships are not considered.  In the HSCVFNT algorithm, the maximum time lag τ max is set as 6. The number of candidate regulatory factors L is set as 2. The inferred GRN is depicted in Figure 3. The solid lines represent the true regulatory relationships and the dotted lines are the false positive regulations inferred. Comparing Figures 2 and 3, we can see that our inferred six edges are all true positive and no false positive edges occur. Only edge lexA → lexA could not be inferred. This is because the target gene lexA could not select itself into its candidate regulatory factor set and self-regulation relationships are not considered. DBN-ZC (a high-order DBN) [43], DBmcmc (a first-order DBN) [44], MMHO-DBN [24] and HRNN [25]. For all the methods in the comparison, the parameters are set by default.

Network Construction of Real Data of E. coli SOSDNA Repair Network
SOS DNA damage repair network of E. coli includes four experiments under different intensities of light (Experiments 1 and 2 at 5 Jm2 , Experiments 3 and 4 at 20 J m2 ) [45]. Each experiment consists of 50 time points and the interval between two time points is six minutes, which involves eight genes: uvrD, lexA, umuD, recA, uvrA, uvrY, ruvA and polB. In this paper, six main genes (uvrD, lexA, umuD, recA, uvrA and polB) are selected to test our method. The true gene regulatory network is described in Figure 2.   The performance comparison results are shown in Table 1. From Table 1, it can be seen that the Specificity and F-score indicators of the HSCVFNT method are the best of these four methods, which are 1.00 and 0.923, respectively. Sensitivity of our method is 14.3% lower than that of the TDSS method, which reveals that the TDSS method infers more true positive regulations than our method. However, specificity of our method is 11.4% higher than that of the TDSS method, and F-score is 5.5% higher than the TDSS method, which reveal that our method infers less false positive edges. On the whole, the main results of our method are better than other comparison methods. Through 20 runs, Sensitivity, Specificity and F-score obtained from HSCVFNT average about 0.8286 ± 0.0602, 0.9897 ± 0.017 and 0.8856 ± 0.0541, respectively. Compared with other methods in Table 1, HSCVFNT has less Sensitivity than the S-system and TDSS, but it has a convincing Specificity and F-score, which reveals that HSCVFNT could have stable performance for SOS network inference.

Network Construction of Real Data of the IRMA Network
The second benchmark network is from Saccharomyces cerevisiae (IRMA network). Two gene expression time-series datasets are collected with 21 equally distributed time points by being triggered by glucose within the network [46]. In the first dataset, glucose medium is switched to galactose (switched on, named on dataset). In the second dataset, galactose medium is switched to glucose (switched off, named off dataset). The true gene regulatory network is described in Figure 4 containing five genes and eight edges. The performance comparison results are shown in Table 1. From Table 1, it can be seen that the Specificity and F-score indicators of the HSCVFNT method are the best of these four methods, which are 1.00 and 0.923, respectively. Sensitivity of our method is 14.3% lower than that of the TDSS method, which reveals that the TDSS method infers more true positive regulations than our method. However, specificity of our method is 11.4% higher than that of the TDSS method, and F-score is 5.5% higher than the TDSS method, which reveal that our method infers less false positive edges. On the whole, the main results of our method are better than other comparison methods. Through 20 runs, Sensitivity, Specificity and F-score obtained from HSCVFNT average about 0.8286 ± 0.0602, 0.9897 ± 0.017 and 0.8856 ± 0.0541, respectively. Compared with other methods in Table 1, HSCVFNT has less Sensitivity than the S-system and TDSS, but it has a convincing Specificity and F-score, which reveals that HSCVFNT could have stable performance for SOS network inference.

Network Construction of Real Data of the IRMA Network
The second benchmark network is from Saccharomyces cerevisiae (IRMA network). Two gene expression time-series datasets are collected with 21 equally distributed time points by being triggered by glucose within the network [46]. In the first dataset, glucose medium is switched to galactose (switched on, named on dataset). In the second dataset, galactose medium is switched to glucose (switched off, named off dataset). The true gene regulatory network is described in Figure 4 containing five genes and eight edges. In the HSCVFNT method, the maximum time lag max τ is set as 6. The number of candidate regulatory factors L is set as 2 for on dataset and off dataset. The inferred GRN with on dataset is depicted in Figure 5. The inferred GRN contains seven edges, of which six edges are true positive and one edge is false positive. The inferred GRN with off dataset is depicted in Figure 6. The inferred GRN contains eight edges, of which six edges are true positive and two edges are false positive. Compared with the golden network, the HSCVFNT algorithm could identify 75% true positive regulations and less false positive edges. In the HSCVFNT method, the maximum time lag τ max is set as 6. The number of candidate regulatory factors L is set as 2 for on dataset and off dataset. The inferred GRN with on dataset is depicted in Figure 5. The inferred GRN contains seven edges, of which six edges are true positive and one edge is false positive. The inferred GRN with off dataset is depicted in Figure 6. The inferred GRN contains eight edges, of which six edges are true positive and two edges are false positive. Compared with the golden network, the HSCVFNT algorithm could identify 75% true positive regulations and less false positive edges.  The performance comparison results with on dataset are shown in Table 2. From Table 2, it can be seen that the Sensitivity and F-score indicators of our method are the best of these seven methods, which are 0.75 and 0.8, respectively. Precision of our method is 28.5% higher than HRNN, 14.3% lower than MMHO-DBN, 19.99% higher than TDARACNE, 114% higher than TDLASSO, 71.4% higher than DBmcmc and 42.8% higher than DBN-ZC. In terms of Sensitivity, our method is 50% higher than MMHO-DBN, 20% higher than TDARACNE, 200% higher than TDLASSO, 200% higher than DBmcmc and 100% higher than DBN-ZC. In terms of F-score, our method is 14.3% higher than HRNN, 20% higher than MMHO-DBN, 20% higher than TDARACNE, 166.7% higher than TDLASSO, 140.2% higher than DBmcmc and 73.3% higher than DBN-ZC.   The performance comparison results with on dataset are shown in Table 2. From Table 2, it can be seen that the Sensitivity and F-score indicators of our method are the best of these seven methods, which are 0.75 and 0.8, respectively. Precision of our method is 28.5% higher than HRNN, 14.3% lower than MMHO-DBN, 19.99% higher than TDARACNE, 114% higher than TDLASSO, 71.4% higher than DBmcmc and 42.8% higher than DBN-ZC. In terms of Sensitivity, our method is 50% higher than MMHO-DBN, 20% higher than TDARACNE, 200% higher than TDLASSO, 200% higher than DBmcmc and 100% higher than DBN-ZC. In terms of F-score, our method is 14.3% higher than HRNN, 20% higher than MMHO-DBN, 20% higher than TDARACNE, 166.7% higher than TDLASSO, 140.2% higher than DBmcmc and 73.3% higher than DBN-ZC.  The performance comparison results with on dataset are shown in Table 2. From Table 2, it can be seen that the Sensitivity and F-score indicators of our method are the best of these seven methods, which are 0.75 and 0.8, respectively. Precision of our method is 28.5% higher than HRNN, 14.3% lower than MMHO-DBN, 19.99% higher than TDARACNE, 114% higher than TDLASSO, 71.4% higher than DBmcmc and 42.8% higher than DBN-ZC. In terms of Sensitivity, our method is 50% higher than MMHO-DBN, 20% higher than TDARACNE, 200% higher than TDLASSO, 200% higher than DBmcmc and 100% higher than DBN-ZC. In terms of F-score, our method is 14.3% higher than HRNN, 20% higher than MMHO-DBN, 20% higher than TDARACNE, 166.7% higher than TDLASSO, 140.2% higher than DBmcmc and 73.3% higher than DBN-ZC. The performance comparison results with off dataset are shown in Table 3. From Table 3, it can be seen that the Sensitivity and F-score two indicators of HSCVFNT algorithm are all the best of these five methods, which are 0.625 and 0.625, respectively. Precision of our method is 0.625, which is 6.25% lower than MMHO-DBN, 25% higher than TDARACNE, 150% higher than TDLASSO, and 267.6% higher than DBmcmc. Sensitivity of our method is 150% higher than MMHO-DBN, 400% higher than TDARACNE, 400% higher than TDLASSO, and 400% higher than DBmcmc. In terms of F-score, our method is 72.2% higher than MMHO-DBN, 212.5% higher than TDARACNE, 274.9% higher than TDLASSO, and 344.2% higher than DBmcmc. Although the MMHO-DBN algorithm has the highest Precision with on and off datasets, which reveals that the ratio of true positive edges is very high, Sensitivity and F-score are not perfect. MMHO-DBN algorithm only inferred four and one true positive edges with on and off dataset, respectively. F-score of MMHO-DBN is lower than our method. On the whole, our method performs better.
In order to test the stability of HSCVFNT, 20 runs are performed for IRMA network identification with on and off datasets. With on dataset, Precision, Sensitivity and F-score obtained average 0.817 ± 0.064, 0.736 ± 0.042 and 0.774 ± 0.045, respectively. With off dataset, the averaged Precision, Sensitivity and F-score are 0.569 ± 0.066, 0.612 ± 0.04 and 0.589 ± 0.047, respectively. Compared with Tables 2 and 3, HSCVFNT has the highest F-score on average.

Influence of Time-Delayed Factor
In order to verify the effect of time-delayed factor on the performance of gene regulatory network modeling, we also use an HSCVFNT algorithm with no delay (non time-delayed version) for an SOS network and IRMA network identification. In the non time-delayed version, the scoring method is based on mutual information, maximum information coefficient and correlation coefficient, and time series data are also non time-delayed.
Comparison results are depicted in Figures 7-9, respectively. The Sensitivity, Specificity, Precision and F-score four indicators of the HSCVFNT method are all higher than HSCVFNT method with no delay. For the construction of an SOS network, the Sensitivity, Specificity, Precision and F-score of our algorithm are 20%, 20.4%, 3.6%, and 20% higher than those of the HSCVFNT method with no delay, respectively. For the construction of IRMA network with on dataset, the Sensitivity, Precision and F-score of our algorithm are 50%, 7.1%, and 30.1% higher than those of our method with no delay, respectively. For the construction of IRMA network with off dataset, the Sensitivity, Precision and F-score of HSCVFNT algorithm are 66.7%, 25%, and 45.7% higher than those of the HSCVFNT method with no delay, respectively. It could demonstrate that the HSCVFNT method with time-delayed factor could identify a gene regulatory network more accurately.

Influence of the Number of Candidate Regulatory Factors L
In order to investigate the effect of the number of candidate regulatory factors L on the performance of the HSCVFNT method, we make the comparison experiments with different L. For SOS network inference, L is set from 1 to 6. For IRMA network inference, L is set from 1 to 5. The experiment results are described in Figures 10-12, respectively. From Figures 10 and 11, it can be seen that when L is set as 2, our method has the highest Sensitivity, Precision, Specificity and F-score. Figure 12 shows that, when L is set as 2, our method has the highest Sensitivity, Precision and F-score. To choose the proper L is very important for the performance of the HSCVFNT algorithm. The performances of different L show that about 30% of the number of genes is more appropriate. In order to investigate the effect of the number of candidate regulatory factors L on the performance of the HSCVFNT method, we make the comparison experiments with different L . For SOS network inference, L is set from 1 to 6. For IRMA network inference, L is set from 1 to 5. The experiment results are described in Figures 10-12, respectively. From Figures 10 and 11, it can be seen that when L is set as 2, our method has the highest Sensitivity, Precision, Specificity and F-score. Figure 12 shows that, when L is set as 2, our method has the highest Sensitivity, Precision and Fscore. To choose the proper L is very important for the performance of the HSCVFNT algorithm. The performances of different L show that about 30% of the number of genes is more appropriate.   In order to investigate the effect of the number of candidate regulatory factors L on the performance of the HSCVFNT method, we make the comparison experiments with different L . For SOS network inference, L is set from 1 to 6. For IRMA network inference, L is set from 1 to 5. The experiment results are described in Figures 10-12, respectively. From Figures 10 and 11, it can be seen that when L is set as 2, our method has the highest Sensitivity, Precision, Specificity and F-score. Figure 12 shows that, when L is set as 2, our method has the highest Sensitivity, Precision and Fscore. To choose the proper L is very important for the performance of the HSCVFNT algorithm. The performances of different L show that about 30% of the number of genes is more appropriate.

Performance of Our Proposed Scoring Method
In order to validate the performance of our proposed scoring method, we make the comparison experiments with the single time-delayed reducing redundancy methods (TDMIC, TDMI, and TDCC) for inferring an SOS network. F-score results of four reducing redundancy methods are depicted in Figure 13, which reveal that our proposed scoring method has the highest F-score, with a more true positive rate and lower false positive rate.

Performance of Our Selected Stochastic Optimizer
In order to validate the performance of the bat algorithm (BA), we make the comparison experiments with the classical optimization algorithms (genetic algorithm (GA), particle swarm optimization (PSO) and differential evolution (DE)), which are utilized to optimize the CVFNT model. Through 20 runs, performance results of four optimization methods for SOS network and IRMA network inference are listed in Tables 4-6, respectively (SD: standard deviation). Hit ratio is the percentage of the best GRNs obtained over all runs. From the results, it can be seen that our selected optimization method has more accurate and robust performance than PSO, DE and GA.

Performance of Our Proposed Scoring Method
In order to validate the performance of our proposed scoring method, we make the comparison experiments with the single time-delayed reducing redundancy methods (TDMIC, TDMI, and TDCC) for inferring an SOS network. F-score results of four reducing redundancy methods are depicted in Figure 13, which reveal that our proposed scoring method has the highest F-score, with a more true positive rate and lower false positive rate.

Performance of Our Proposed Scoring Method
In order to validate the performance of our proposed scoring method, we make the comparison experiments with the single time-delayed reducing redundancy methods (TDMIC, TDMI, and TDCC) for inferring an SOS network. F-score results of four reducing redundancy methods are depicted in Figure 13, which reveal that our proposed scoring method has the highest F-score, with a more true positive rate and lower false positive rate.

Performance of Our Selected Stochastic Optimizer
In order to validate the performance of the bat algorithm (BA), we make the comparison experiments with the classical optimization algorithms (genetic algorithm (GA), particle swarm optimization (PSO) and differential evolution (DE)), which are utilized to optimize the CVFNT model. Through 20 runs, performance results of four optimization methods for SOS network and IRMA network inference are listed in Tables 4-6, respectively (SD: standard deviation). Hit ratio is the percentage of the best GRNs obtained over all runs. From the results, it can be seen that our selected optimization method has more accurate and robust performance than PSO, DE and GA.

Performance of Our Selected Stochastic Optimizer
In order to validate the performance of the bat algorithm (BA), we make the comparison experiments with the classical optimization algorithms (genetic algorithm (GA), particle swarm optimization (PSO) and differential evolution (DE)), which are utilized to optimize the CVFNT model. Through 20 runs, performance results of four optimization methods for SOS network and IRMA network inference are listed in Tables 4-6, respectively (SD: standard deviation). Hit ratio is the percentage of the best GRNs obtained over all runs. From the results, it can be seen that our selected optimization method has more accurate and robust performance than PSO, DE and GA.

Time-Delayed Mutual Information
Mutual information (MI) between two genes X and Y is defined as [47,48] M(X, Y) = ∑ x∈X ∑ y∈Y p(x, y) log p(x, y) p(x)p(y) , where p(x) and p(y) are the marginal probability distribution functions of gene X and gene Y, respectively. p(x, y) is the joint probability density function of gene X and gene Y. MI is symmetric, so it could not identify the time-delayed dependence between two genes. Time-delayed mutual information (TDMI) is proposed to measure the time-delayed dependence between target gene and its regulatory factors.
The time-delayed mutual information is defined as follows: where M τ (X, Y) represents the mutual information between gene X and gene Y with time lag τ, P(X k , Y k+τ ) is the joint probability density function of gene X(t = k) and gene Y (t = k + τ), and P(X k ) is the marginal probability distribution functions of gene X (t = k). The higher the TDMI value is, the greater the dependence of two corresponding genes.

Time-Delayed Maximum Information Coefficient
A maximum information coefficient (MIC), as an exploratory analysis tool, can be utilized to evaluate the relationships between hundreds of variables [49]. Compared with mutual information, MIC can reflect the relationship between two variables better [50].
A bivariate set is defined as a set where the data elements are the ordered tuples (X, Y). The maximum information gain for all the grids sized of tuple (X, Y) can be computed as [51] I * (D, X, Y) = max I(D| G ), where G is a grid and I(D| G ) represents the mutual information of D| G . M(D) is the characteristics matrix of D, which could be represented as: max(M(D) X,Y ) is the MIC of two genes X and Y. Ideally, if two genes have no regulatory relationship, their MIC value should be about 0.
In order to reflect the time-delayed relationship between two genes, a time-delayed maximum information coefficient (TDMIC) with time lag τ is proposed, which is calculated as follows: ).

Time-Delayed Correlation Coefficient (TDCC)
The correlation coefficients of all genes for each time lag τ are obtained by using the time-delayed correlation coefficient (TDCC). The formula is as follows: where X = 1 T ∑ T k=1 X(k), T is the number of time points, and τ is the time lag between gene X and gene Y.

Our Proposed Scoring Method
In order to measure the time-delayed regulatory relationships accurately, a novel hybrid scoring method based on time-delayed mutual information, time-delayed maximum information coefficient and time-delayed correlation coefficient is proposed. Suppose that a time-delayed gene regulatory network contains n genes and gene expression time series dataset includes m time points. The flowchart and pseudo code of our proposed scoring method are depicted in Figures 14 and 15, respectively.

Complex-Valued Flexible Neural Tree
Flexible neural tree (FNT) model was initially introduced by Prof. Chen in 2005 [52]. Because of its automatic features extraction, cross layer connections and flexible activation functions, the FNT model performed better than many classical neural networks and has been applied widely for solving forecasting and classification problems [53][54][55][56][57]. Due to fact that a complex-valued neural network is more flexible and functional, a complex-valued flexible neural tree (CVFNT) model, as the extension of a real-valued FNT model, is proposed to infer the time-delayed regulations in GRN [58][59][60]. A tree-structural based encoding method with a specific instruction set is selected for representing a CVFNT model. In order to create the CVFNT models randomly, two operator sets (function set F and terminal set T) are defined in advance: where + i represents the summation of i variables. z i is a terminal nodes' instruction and takes no operands (i = 1, 2, . . . , n, z i ∈ C n , z i = x i + jy i and j stands for the value of √ −1).

Complex-Valued Flexible Neural Tree
Flexible neural tree (FNT) model was initially introduced by Prof. Chen in 2005 [52]. Because of its automatic features extraction, cross layer connections and flexible activation functions, the FNT model performed better than many classical neural networks and has been applied widely for solving forecasting and classification problems [53][54][55][56][57]. Due to fact that a complex-valued neural network is more flexible and functional, a complex-valued flexible neural tree (CVFNT) model, as the extension of a real-valued FNT model, is proposed to infer the time-delayed regulations in GRN [58][59][60]. A treestructural based encoding method with a specific instruction set is selected for representing a CVFNT model. In order to create the CVFNT models randomly, two operator sets (function set F and terminal set T) are defined in advance: In the process of generating a CVFNT model randomly, the function and terminal operators are selected from the instruction sets F and T. If a terminal instruction z i is selected, the branch is terminated. If a function instruction + n is selected, n leaf nodes and complex-valued weights (w 1 , w 2 , . . . w n ) are randomly created. The output of a non-terminal node can be computed as a complex-valued flexible neuron (CVFN) model (see Figure 16).  The output of a CVFN + n can be calculated as follows. The total excitation of + n is where w 0 is the threshold value and z j (j = 1, 2, . . . , n) is the complex-valued input of CVFN operator. The output of the node + n is then calculated by a complex-valued activation function (CVAF). Generally, there are three kinds of CVAFs: complex-valued Elliot function, Gaussian function and Sigmoid function, which are described as follows: out n = f (net n ) = 1 1 + e −Re(net n ) + j where f (·) is CVAF and the output is complex-valued. a, r and σ are real variables, c is complex-valued and |net n | is the modulus of complex-valued net n . In this paper, we choose complex-valued Elliot function as a CVAF of the CVFNT model. A typical CVFNT model is described in Figure 17. The total output of CVFNT model can be calculated from left to right by a depth-first search method.

Flowchart of Our Method
The flowchart of our HSCVFNT algorithm is described in Figure 18.

Flowchart of Our Method
The flowchart of our HSCVFNT algorithm is described in Figure 18.

Flowchart of Our Method
The flowchart of our HSCVFNT algorithm is described in Figure 18.  (1) Let gene expression data to be D = {X 1 , X 2 , . . . , X n }, where X i = x 1 i , x 2 i , . . . , x m i . Maximal time lag τ max and the number of candidate regulatory factors L are defined in advance.
(2) For target gene k, our proposed scoring method is utilized to select a candidate regulatory factor set G Rank 1 , G Rank 1 , . . . , G Rank L with data D, which is described in detail in Section 4.1.4. (3) For target gene k, its time-delayed time series matrix is created. The algorithm is described in Figure 19. Time-delayed time series matrix between gene k and its regulatory factors is listed in Equation (16): where τ i is the optimal time lag between gene i and gene k, X (2) For target gene k , our proposed scoring method is utilized to select a candidate regulatory factor set with data D , which is described in detail in Section 4.1.4.
(3) For target gene k , its time-delayed time series matrix is created. The algorithm is described in Figure 19. Time-delayed time series matrix between gene k and its regulatory factors is listed in Equation (16): where i τ is the optimal time lag between gene i and gene k , (4) According to the candidate regulatory set and time-delayed time series matrix of gene k, the CVFNT model is utilized to identify the time-delayed regulatory relationships. The gene expression data of candidate regulatory factors are utilized as input data. The gene expression data of target gene k are utilized as output data. The optimization process of the CVFNT model is described as follows: (a) Initialize the population, containing the structure, real-valued and complex-valued parameters of the CVFNT model. (b) All individuals in the population are evaluated, using root mean squared error (RMSE): where N is the number of data points, y i actual is the actual output of i-th time point and y i predicted is the predicted output of the i-th time point. If the optimal CVFNT model is found or the maximum number of iterations is reached, the optimization process is over; otherwise, go to (c). (c) Structure is optimized using three operators: selection, crossover and mutation, which are introduced in Ref [61]. At some iterations, a certain proportion of the individuals is selected to optimize the parameters using a bat algorithm, which is introduced in detail in Refs [62,63]. Go to (b): (5) According to the structure of the optimized CVFNT model, input genes are seen as the regulatory factors. Obtain the regulations between target gene k and its regulatory factors. (6) k = k + 1. If k ≤ n, go to (2); otherwise, the regulations of all target genes are integrated in order to infer a time-delayed gene regulatory network.

Conclusions
In order to infer a gene regulatory network accurately, a novel algorithm, namely HSCVFNT, is proposed to infer a gene regulatory network with time-delayed regulations. In the HSCVFNT algorithm, the novel scoring method is proposed to reduce the redundant regulatory relationships and obtain the candidate regulatory factor set of each target gene. The CVFNT model is utilized to infer the time-delayed regulations with a time-delayed gene expression matrix. Three real time-series expression datasets from SOS network and IRMA network are utilized to evaluate the performance of the HSCVFNT algorithm.
The results on the SOS network show that our HSCVFNT method significantly outperforms the three other state-of-the-art methods (BN, S-system and TDSS). The results on the IRMA network reveal that the HSCVFNT algorithm performs better than HRNN, MMHO-DBN, TDARACNE, TDLASSO, DBmcmc, and DBN-ZC. We also investigate the effect of a time-delayed factor and the number of candidate regulatory factors on the HSCVFNT algorithm for inferring GRN. The experiment results show that a HSCVFNT algorithm with a time-delayed factor could identify a gene regulatory network more accurately. Different numbers of candidate regulatory factors have a great influence on the performance of the algorithm. The comparison results show that about 30% of the number of genes is more appropriate.
In the future, we will apply the proposed algorithm for inferring a large-scale real gene regulatory network and discovering some meaningful relationships.

Conflicts of Interest:
The authors declare no conflict of interest.