Constructing a Precise Fuzzy Feedforward Neural Network Using an Independent Fuzziﬁcation Approach

: This study discusses how to fuzzify a feedforward neural network (FNN) to generate a fuzzy forecast that contains the actual value, while minimizing the average range of fuzzy forecasts. This topic has rarely been investigated in past studies, but is an essential step to constructing a precise fuzzy FNN (FFNN). Existing methods fuzzify all parameters at the same time, which re-sults in a nonlinear programming (NLP) problem that is not easy to solve. In contrast, in this study, the parameters of a FNN are fuzziﬁed independently. In this way, the optimal values of fuzzy parameters can be derived theoretically. An illustrative example is used to illustrate the ap-plicability of the proposed methodology. According to the experimental results, fuzzifying the thresholds on hidden-layer nodes or the connection weights between input and hidden layers may not guarantee that all fuzzy forecasts contain the corresponding actual values. In contrast, fuzzi-fying the threshold on the output node and the connection weights between the hidden and out-put layers is more likely to achieve a 100% hit rate. The results lay a foundation for establishing a precise deep FFNN in the future.


Introduction
Fuzzy feedforward neural networks (FFNNs) combines the advantages of fuzzy logic (in uncertainty modelling) and feedforward neural networks (FNNs) (in nonlinear approximation) [1], and have been widely applied to forecasting in many fields [2][3][4][5]. There are various types of FFNNs with fuzzy or crisp inputs, parameters, and outputs. The numbers of layers and activation (or transformation) functions in these FFNNs are also different [6]. A recent review on FFNNs refers to de Campos Souza [7]. At present, the most commonly applied FFNNs are the variants of adaptive network-based fuzzy inference system (ANFIS) [8][9][10][11][12][13]. Past studies have shown that FFNNs can improve the forecasting accuracy, that is, each forecast is close to the actual value [14][15][16]. However, the present study aims to construct an FFNN to improve the forecasting precision, that is, every actual value is included in the narrowest possible fuzzy forecast. This topic has rarely been discussed in the past, which constitutes the motivation of this research.
However, even if a sophisticated FFNN is applied, the network output is rarely equal to the actual value, especially when the FFNN is applied to unlearned data. To address this issue, an alternative is to estimate the range of the actual value [17]. In other words, a fuzzy forecast that contains the actual value needs to be generated by an FFNN, at least for the training data. However, it is not easy since there are no actual values of the lower and upper bounds of the range. In addition, a fuzzy forecast needs to be as narrow as possible to have a reference value [18]. This is also a challenging task because a narrower range is less likely to contain the actual value. Some of the relevant literature are reviewed as follows.
In an ANFIS, the network output before defuzzification cannot guarantee the inclusion of the actual value [19]. The problem is even more complicated for an FFNN in which all parameters are fuzzy and nonlinear transformation functions (such as sigmoid or tansig functions) are applied. For example, Chen and Wang [20] showed that the problem of deriving the values of fuzzy parameters in an FFNN was a nonlinear programming (NLP) problem that was difficult to solve. A branch-and-bound algorithm can be applied to find a solution to the NLP problem [21], but the solution may be far from (global) optimal. Instead, Chen and Wang established goals for the lower and upper bounds of the actual value to simplify the NLP problem to a goal programming (GP) problem. However, a number of goals needed to be tried to improve the solution, which was time-consuming. A similar method was proposed in Chen and Lin [18], in which the membership of an actual value in the fuzzy forecast had to be greater than a specified level. If only the threshold on the output node was fuzzy, the optimal value of the fuzzy threshold can be derived by solving two linear equations [22]. Similar treatments have been taken by Chen and Wu [17] and Chen [23]. Wang et al. [24] randomized the values of fuzzy thresholds on hidden-layer nodes and then optimized the fuzzy threshold on the output node. After a few replications, these fuzzy thresholds could be optimized. However, connection weights in the FFNN were still crisp.
This study considers an FFNN with a single hidden layer in which all network parameters can be fuzzified and nonlinear transformation functions (i.e., sigmoid functions) are adopted. We aim to optimize the values of fuzzy parameters theoretically without solving an NLP problem, while guaranteeing that all actual values are contained in the corresponding fuzzy forecasts. However, instead of fuzzifying all parameters at the same time, this study follows an independent fuzzification approach in which parameters are fuzzified independently. This study is important because it is a fundamental step towards the construction of a precise FFNN, which lays a foundation for establishing a precise deep FFNN with multiple or recurrent hidden layers.
The contribution of this research is to derive the formula for optimizing the value of each fuzzy parameter in an FFNN, so as to minimize the average range of fuzzy forecasts while ensuring a 100% hit rate. In contrast, existing methods need to solve an NLP problem to achieve the same goal.
The remainder of this study is organized as follows. The independent fuzzification approach is detailed in Section 2. A numerical example is given in Section 3 to illustrate the applicability of the proposed methodology. The effects of fuzzifying various parameters on the average range of fuzzy forecasts are also compared. This study is concluded in Section 4. Some directions for future investigation are also provided.

Independent Fuzzification Approach
All parameters and variables in the proposed methodology are given in or approximated by triangular fuzzy numbers (TFNs).

FFNN Configuration
The FFNN considered in this study is an FFNN that has three layers: the input layer, a single hidden layer, and the output layer. Inputs to the FFNN are indicated with {z jp | p = 1~P; j = 1~n}. z jp is the normalized value of decision variable x jp : To convert back to the original value, These inputs are propagated through the FFNN as follows. First, from the input layer to the hidden layer, the following operations are performed: where w h pl is the connection weight between input node p and hidden-layer node l; l = 1~L. θ h l is the threshold on hidden-layer node l. h jl is the output from hidden-layer node l. (−) denotes fuzzy subtraction. In Equation (5), the activation (or transformation) function is the logistic sigmoid ac-tivation function that returns a value within [0, 1]. Outputs from the hidden layer are aggregated on the output node, and then the network output o j = o j1 , o j2 , o j3 is generated as where w o l is the connection weight between hidden-layer node l and the output node. θ o is the threshold on the output node. o j is unnormalized according to Equation (2) and then compared with the actual value y j .

Deriving the Cores of Fuzzy Parameters
The training of the FFNN is composed of two stages. First, the cores of fuzzy parameters are derived by training the FFNN as a crisp FNN using the Levenberg-Marquardt (LM) algorithm [25], so as to minimize the mean squared error (MSE). The optimal solution is indicated with {w h * pl2 , θ h * l2 , w o * l2 , θ o * 2 | p = 1~P; l = 1~L}. Subsequently, the lower and upper bounds of fuzzy parameters are to be determined, so as to minimize the average range (AR) of fuzzy forecasts: However, deriving the optimal values of all fuzzy parameters at the same time is a computationally intensive task [20]. As an alternative, the optimal values of fuzzy parameters are derived independently as follows.

Deriving the Optimal Value of θ o
The optimal value of θ o is to be derived. First, substituting Equation (8) into Equation (7) gives which can be decomposed into The other parameters are not fuzzified, so I o j1 and I o j3 are equal to I o * j2 that is a fixed value: To minimize AR, o j1 and o j3 should be maximized and minimized, respectively, which correspond to the minimization of θ o 3 and the maximization of θ o 1 . However, o j should include a j (the actual value), therefore Substituting Equation (13) into Constraint (17) gives Therefore, To minimize θ o 3 , Similarly, by substituting Equation (14) into Constraint (18), the following result can be obtained: Most past studies [17,[22][23][24] stop at this step. The following discussion is new to the body of knowledge.

Deriving the Optimal Value of w o l f
The optimal value of w o l is to be derived when l = l f . Equation (6) can be decomposed into because h jl is positive, while w o l may be negative. Substituting Equations (23) and (24) into Equations (13) and (14) gives Only connection weights are fuzzified. The other fuzzy parameters are set to their optimized cores as Substituting Equations (27) and (28) respectively into Constraints (17) and (18), we obtain Only w o l f is fuzzified, the other connection weights are equal to their optimized cores: As a result, Therefore, Increasing the fuzziness (i.e., width) of w o l f will make o j wider. Therefore, it is reason- 2.5. Deriving the Optimal Value of θ h l Substituting Equation (4) into Equation (5) gives which can be decomposed into Substituting Equations (40) and (41) into Equations (23) and (24) leads to that are substituted into Equations (13) and (14): The following requirements should be met: that are equivalent to Only θ h l f is fuzzified, the other fuzzy parameters are set to their optimized cores: As a result, Therefore, To minimize the fuzziness of θ h Otherwise, As a result, Therefore, To minimize the fuzziness of θ h l f , θ h * l f 1 and θ h * l f 3 should be maximized and minimized, respectively, but they are still bounded by θ h * l f 2 . Therefore,

Equation (3) can be decomposed into
Substituting Equations (68) and (69) into Equations (46) and (47) gives The following requirements need to be met: that are equivalent to Only w h p f l f is fuzzified, the other fuzzy parameters are set to their optimized cores: As a result, Therefore, To minimize the fuzziness of w h To minimize the fuzziness of w h 3. An Illustrative Case Using FFNN(3, 6, 1) The problem of predicting the time to replace a computer numeric control (CNC) tool based on the monitoring results of three sensors is adopted to illustrate the ap-plicability of the proposed methodology. Therefore, P = 3. A small-scale problem is used, so that the constructed FFNN will not be too large, and the effect of fuzzifying each parameter can be more obvious. The collected data in-cludes ninety records, as shown in Table 1. The collected data are first normalized (see Table 2). The FFNN has a hidden layer with six nodes. Therefore, it is indicated with FFNN(3, 6, 1) afterwards. There is no absolute rule for determining the optimal num-ber of nodes in the hidden layer. In many studies, it has been shown that a hidden layer with twice the number of inputs is sufficient to fit a complex nonlinear relationship [26][27][28]. The first sixty records are used to train the FFNN, while the rest records are left for evaluating the forecasting performance.
First, the FFNN(3, 6, 1) is regarded as a crisp FNN(3, 6, 1) and trained using the LM algorithm to derive the cores of fuzzy parameters. Other training algorithms, such as the gradient descent (GD) algorithm, the Broyden−Fletcher−Goldfarb−Shanno (BFGS) quasi-Newton algorithm, the GD algorithm with momentum and adaptive learning rate (GDX), and the resilient backpropagation (RP) algorithm [25], are also applicable. However, this research aims to improve the forecasting precision, rather than the forecasting accuracy. The choice of the training algorithm does not affect the application of the proposed methodology.
The optimal values of cores are summarized in Table 3. The crisp forecasts for the training data based on the cores of fuzzy parameters are shown in Figure 1. The forecasting accuracy, measured in terms of root mean squared error (RMSE), is 0.084 (normalized value). Although the forecasting accuracy is satisfactory, there are many records with considerable deviations between actual values and crisp forecasts, showing the necessity of estimating the range of the actual value. To this end, the ef-fects of fuzzifying four parameters, θ o , w o 1 , θ h 1 , and w h 11 , are compared. There are four types of parameters in the FNN(3, 6, 1): the connection weights between the input layer and the hidden layer, the thresholds on the nodes of the hidden layer, the connection weights between the hidden layer and the output lay-er, and the threshold on the node of the output layer. In this way, the effects of fuzzi-fying all types of parameters can be observed and compared.    Figure 4. However, it is not possible for all fuzzy forecasts to contain the corresponding actual values solely by fuzzifying θ h 1 . The hit rate is only 67%. With such a low hit rate, the average range is narrowed to only 93.5.   From the experimental results, the following discussion is made: 1.
Fuzzifying some network parameters may not guarantee that all actual values are contained in the estimated ranges.

2.
In contrast, fuzzifying a network parameter closer to the output node is more like to ensure a 100% hit rate.

3.
Both the ranges estimated by fuzzifying θ o and w o 1 contain the actual value. Therefore, the fuzzy intersection (FI) of the ranges also contain the actual value, which further narrows the range of the actual value.

4.
After applying the trained FFNN(3, 6, 1) to the test/unlearned data, the fore-casting precision levels achieved by fuzzifying various network parameters are evaluated and compared in Table 4. As expected, the hit rate has decreased compared to the results when applied to the training data, but is still acceptable. Fuzzifying w o 1 achieves the highest hit rate, while fuzzifying θ h 1 minimizes the average range of fuzzy forecasts. 5.
The effectiveness (i.e., forecasting precision) and efficiency of the proposed methodology is compared with those of some existing methods in Table 5. All methods are implemented using MATLAB 2017a on a PC with an i7-7700 CPU of 3.6 GHz and 8 GB of RAM. Obviously, the proposed methodology maximized the hit rate for the test data without considerably widening the average range. In addition, the proposed methodology is also the most effi-cient method.

Conclusions and Future Research Directions
Many complex FFNNs have been constructed to improve the forecasting accuracy. Even so, a forecast is rarely equal to the actual value. In addition, the range of a fuzzy forecast generated by prevalent FFNNs does not necessarily include the actual value. In order to solve these problems, this research explores how to fuzzify the parameters of a FNN so that every fuzzy forecast generated by the FFNN contains the actual value. To achieve this goal, most previous studies have solved an NLP problem, which was computationally challenging. In contrast, this research proposes an independent fuzzification approach to fuzzify the parameters of a FNN independently. In this way, the optimal value of each fuzzy parameter can be derived theoretically, thereby enabling the construction of a precise FFNN.
After applying the proposed methodology to an illustrative case FFNN(3, 6, 1), the following conclusions are drawn:

1.
Fuzzifying θ h 1 and w h 11 alone cannot guarantee that all fuzzy forecasts contain corresponding actual values.

2.
Fuzzifying θ o and w o 1 has a higher chance of achieving a 100% hit rate.

3.
Parameters closer to the output node have a greater impact on the forecast-ing precision, and should be fuzzified earlier.

4.
Fuzzifying parameters far away from the output node cannot guarantee a 100% hit rate. Therefore, multiple such parameters should be fuzzified at the same time.
The FFNN discussed in this study is an FFNN with a single hidden layer. The proposed methodology can be extended to deal with a deep FFNN with multiple hidden layers [29][30][31] or recurrent layers [32]. In this case, the parameters of the output layer will be fuzzified first, then the parameters of the hidden layer closest to the output layer, and so on. In addition, FI can also be applied to aggregate the estimated ranges by fuzzifying various parameters (with 100% hit rates) to further enhance the fore-casting precision. These constitute some direction for future research.

Conflicts of Interest:
The authors declare no conflict of interest.