Optimization by Context Refinement for Development of Incremental Granular Models

: Optimization by reﬁnement of linguistic contexts produced from an output variable in the construction of an incremental granular model (IGM) is presented herein. In contrast to the conventional learning method using the backpropagation algorithm, we use a novel method to learn both the cluster centers of Gaussian fuzzy sets representing the symmetry in the premise part and the contexts of the consequent part in the if–then fuzzy rules. Hence, we use the fundamental concept of context-based fuzzy clustering and design with an integration of linear regression (LR) and granular fuzzy models (GFMs). This GFM is constructed based on the association between the triangular membership function produced both in the input–output variables. The context can be established by the system user or using an optimization method. Hence, we can obtain superior performances based on the combination of simple linear regression and local GFMs optimized by context reﬁnement. Experimental results pertaining to coagulant dosing in a water puriﬁcation plant and automobile miles per gallon prediction revealed that the presented method performed better than linear regression, multilinear perceptron, radial basis function networks, linguistic model, and the IGM.


Introduction
Owing to the rapid growth of various application problems, several studies have been conducted on fuzzy models. It is now acknowledged that real-world problems require intelligent methods that integrate the methodology of different models. To solve various application challenges, it is important to design complementary hybrid intelligent systems by integrating multiple computing technologies in a synergistic rather than exclusive manner. A representative method is neuro-fuzzy inference modeling [1,2]. Neural networks adapt to changing environments, whereas fuzzy models deal with fuzzy inference and knowledge-based decision making.
The basic concept of fuzzy clustering has been used in several studies to focus on fuzzy modeling, resulting in the concept of granular fuzzy models (GFMs) [3][4][5][6][7][8]. These models is based on information granules. These information granules (IG) comprise linguistic information produced from both input-output variables. The IG in the design of GFMs are obtained using specialized context-based fuzzy c-means clustering. Unlike context-free clustering, the fuzzy c-means clustering without context produces clusters. When the contexts are generated, then clustering occurs within a specified context. A system modeling method using a context-based fuzzy clustering algorithm has been successfully applied. Yeom [6] proposed a performance evaluation method based on the concept of coverage and specificity in developing a GFM. The performance is typically measured by the root mean square error (RMSE). Byeon [7] designed a Takagi-Sugeno-Kang (TSK) linguistic model. In that study, an intelligent where the regression coefficients are expressed by a and a 0 . The coefficients are obtained using the standard least-squares error method. Here, a T consists of two coefficients [a 1 , a 2 ] T . The improvement of the model occurs in which the local part is based on the input-error data {x k , e k }, where the error is e k = y k − z k . Subsequently, the if-then fuzzy rules and contexts of triangular membership functions are obtained by the context-based fuzzy clustering approach.

Context-Based Fuzzy C-Means Clustering
The context-based fuzzy clustering algorithm is an approach for estimating the clusters to maintain the association characteristics [30][31][32]. The following reiterates the nature of context-based fuzzy clustering. This clustering is realized for each context W 1 , W 2 , · · · , W p through the statistical distribution of the output space. The contexts in the traditional GFM are created by a set of triangular fuzzy sets spaced equally along the output. It can be difficult to estimate the clusters because some contexts are associated with data scarcity.
It can be difficult to produce if-then fuzzy rules in context-based fuzzy clustering approach. To create flexible contexts, we used a characteristics of probabilistic distribution for the output in shown in Figure 1. Figure 1 shows the generation of linguistic contexts obtained by probability density function (b) and conditional density function (c) based on the statistical distribution of error (a) obtained by linear regression. Finally, contexts with a triangular fuzzy set were generated. where the error is k k k e y z  . Subsequently, the if-then fuzzy rules and contexts of triangular membership functions are obtained by the context-based fuzzy clustering approach.

Context-Based Fuzzy C-Means Clustering
The context-based fuzzy clustering algorithm is an approach for estimating the clusters to maintain the association characteristics [30][31][32]. The following reiterates the nature of context-based fuzzy clustering. This clustering is realized for each context 1 2 p W ,W , ,W through the statistical distribution of the output space. The contexts in the traditional GFM are created by a set of triangular fuzzy sets spaced equally along the output. It can be difficult to estimate the clusters because some contexts are associated with data scarcity. It can be difficult to produce if-then fuzzy rules in context-based fuzzy clustering approach. To create flexible contexts, we used a characteristics of probabilistic distribution for the output in shown in Figure 1. Figure 1 shows the generation of linguistic contexts obtained by probability density function (b) and conditional density function (c) based on the statistical distribution of error (a) obtained by linear regression. Finally, contexts with a triangular fuzzy set were generated.
As shown in Figure 1d, the generated contexts are represented as a linguistic label of error. The partitioning matrix obtained by the t-th context is introduced as follows:   As shown in Figure 1d, the generated contexts are represented as a linguistic label of error. The partitioning matrix obtained by the t-th context is introduced as follows: where w tk is the membership grade of the k-th data point in the t-th linguistic context; u ik is the grades corresponding to the k-th data point in the i-th cluster center. The loss function can be defined as where v i is the cluster's center of the i-th cluster. x k − v i is computed by distance between the k-th input and the i-th center. Here, the fuzzification factor m is 2. The objective function is accomplished by adjusting the partition matrix and the cluster center's membership grades. The update of the membership matrix is computed as [30][31][32]: where u tik denotes the i-th center and the k-th data point corresponding the t-th linguistic context. The cluster center v i is computed as follows: Figure 2 shows the estimation of cluster centers corresponding to six contexts. After generating the contexts as shown in Figure 1, the cluster centers are obtained by context-based fuzzy c-means clustering. As shown in Figure 2, the data points (dot mark) included in each context and cluster centers (square mark) are distributed.  Figure 3 shows the architecture of the local GFM and the association between the cluster center and context. In Figure 3, the circle nodes represents the computation of membership degrees obtained in Equation (4). The GFM's output is obtained as a triangular fuzzy membership function representing a fuzzy set. In other words, the GFM output Ê is fully represented by three   Figure 3 shows the architecture of the local GFM and the association between the cluster center and context. In Figure 3, the circle nodes represents the computation of membership degrees obtained in Equation (4). The GFM's output is obtained as a triangular fuzzy membership function representing a fuzzy set. In other words, the GFM output E is fully represented by three parameters, E = e − , e, e + . These parameters denotes lower bound, modal value, and upper bound, respectively.

Local GFM
Assuming a triangular fuzzy membership function for the contexts, the GFM's output is expressed as [4,8] E= Equation (6) marked by ⊗ and ⊕ are used to compute that the basic calculations are performed by fuzzy numbers. ξ k is the summation value of the activation values generated in the k-th linguistic context. Here, activation level denotes the summation of membership degrees corresponding to each context. That is, the three points of triangular fuzzy sets representing contexts and the activation values are multiplied, respectively. Thus, the output includes the lower value, modal value, and upper value with interval prediction. Symmetry 2020, 12, x FOR PEER REVIEW 6 of 16 Assuming a triangular fuzzy membership function for the contexts, the GFM's output is expressed as [4,8] Equation (6) marked by  and  are used to compute that the basic calculations are performed by fuzzy numbers. ξ k is the summation value of the activation values generated in the k-th linguistic context. Here, activation level denotes the summation of membership degrees corresponding to each context. That is, the three points of triangular fuzzy sets representing contexts and the activation values are multiplied, respectively. Thus, the output includes the lower value, modal value, and upper value with interval prediction.
The activation values were computed using Equation (4). The numeric bias term was used to eliminate possible systematic errors as follows: Here, the modeling errors k e obtained from LR are obtained by a global model. This value is calculated simply to eliminate errors. Consequently, the model output ˆ( ) k E x allows the characterization for three parameters: The activation values were computed using Equation (4). The numeric bias term was used to eliminate possible systematic errors as follows:

IGM
Here, the modeling errors e k obtained from LR are obtained by a global model. This value is calculated simply to eliminate errors. Consequently, the model output E(x k ) allows the characterization for three parameters:

IGM
The flow chart of the IGM design is visualized in Figure 4. Adopting a linear regression structure, we compensated the modeling errors through local rules of the GFM that represent the local nonlinearity of the model to be considered [9,11,28]. Figure 5 shows the architecture of the IGM.
The flow chart of the IGM design is visualized in Figure 4. Adopting a linear regression structure, we compensated the modeling errors through local rules of the GFM that represent the local nonlinearity of the model to be considered [9,11,28]. Figure 5 shows the architecture of the IGM.   The flow chart of the IGM design is visualized in Figure 4. Adopting a linear regression structure, we compensated the modeling errors through local rules of the GFM that represent the local nonlinearity of the model to be considered [9,11,28]. Figure 5 shows the architecture of the IGM.   The construction procedures of the IGM shown in Figure 5 are as follows: [ Step 1] Use linear regression on the numerical data points. Subsequently, the errors e k = y k − z k are obtained by the difference between the desired and linear regression outputs.

Refinement of Contexts in IGM Design
We present the optimization of the context refinement in this Section. This optimization method comprises two steps, as shown in Figure 6. First, we used the gradient descent method to adjust the Symmetry 2020, 12, 1916 8 of 16 contexts from the error between the desired and GFM outputs. Next, we obtained new cluster centers using context-based fuzzy clustering approach. The number of iteration to operate in the loop is 20.

Refinement of Contexts in IGM Design
We present the optimization of the context refinement in this Section. This optimization method comprises two steps, as shown in Figure 6. First, we used the gradient descent method to adjust the contexts from the error between the desired and GFM outputs. Next, we obtained new cluster centers using context-based fuzzy clustering approach. The number of iteration to operate in the loop is 20. We can consider the optimization methods where the minimization is performed by the centers of the triangular membership function as the context. The maximization of the average match between the granular output of the GFM and the available numerical desired output is straightforward. We assumed that Y is an output of the GFM for , Y( ). As described by the triangular fuzzy sets, we calculated the degrees as follows: Furthermore, we can consider minimizing the average values of the output for the corresponding inputs as follows: where and b are the values with lower and upper bounds of the triangular membership function produced for , respectively. We performed the optimization by assuming that the successive contexts can overlap at the half-point level. Therefore, the optimization focuses on the median values of the triangular fuzzy membership function of the context. When these modal values have been updated, this clustering is performed. Subsequently, the iteration loop of the optimization is repeated. In gradient descent methods, the next point is adjusted as follows: The chain rule to obtain the gradient vector is applied as follows: We can consider the optimization methods where the minimization is performed by the centers of the triangular membership function as the context. The maximization of the average match between the granular output of the GFM and the available numerical desired output is straightforward. We assumed that Y is an output of the GFM for x k , Y(x k ).
As described by the triangular fuzzy sets, we calculated the degrees as follows: Furthermore, we can consider minimizing the average values of the output for the corresponding inputs as follows: where a k and b k are the values with lower and upper bounds of the triangular membership function produced for x k , respectively. We performed the optimization by assuming that the successive contexts can overlap at the half-point level. Therefore, the optimization focuses on the median values of the triangular fuzzy membership function of the context. When these modal values have been updated, this clustering is performed. Subsequently, the iteration loop of the optimization is repeated. In gradient descent methods, the next point is adjusted as follows: The chain rule to obtain the gradient vector is applied as follows: Hence, the update formula is defined as follows: Figure 7 shows a flowchart pertaining to context refinement by gradient descent and cluster adjustment via clustering.
Hence, the update formula is defined as follows: Figure 7 shows a flowchart pertaining to context refinement by gradient descent and cluster adjustment via clustering.

Experimental Results
We performed experiments using the proposed IGM with two optimization steps for coagulant dosing in a water purification plant and automobile mpg prediction in these experiments. The experimental results were compared with those of previous studies.

Coagulant Dosing in Water Purification Plant
The test data were collected at a water purification plant, and the water purification capacity was 1,320,000 ton/day [28]. We used 346 consecutive samples. The input comprised the turbidity, temperature, pH, and alkalinity. The output was polyaluminum chloride (PAC). To evaluate the resulting model, the dataset was partitioned into training and test data sets, respectively. In this experiment, 173 training data pairs were used to construct the IGM using the optimization method, and the 173 test datasets were used to validate the presented IGM. Table 1 lists the performance comparison for the training (Trn_RMSE) and test (Tst_RMSE) data for the PAC prediction of the coagulant dosing process. The number of iteration loops was 20, and the learning rate was 0.001. Figure 8 shows the prediction performance for the training and test datasets, respectively. As shown in Figure 8, the experimental results showed good prediction and generalization capability.

Experimental Results
We performed experiments using the proposed IGM with two optimization steps for coagulant dosing in a water purification plant and automobile mpg prediction in these experiments. The experimental results were compared with those of previous studies.

Coagulant Dosing in Water Purification Plant
The test data were collected at a water purification plant, and the water purification capacity was 1,320,000 ton/day [28]. We used 346 consecutive samples. The input comprised the turbidity, temperature, pH, and alkalinity. The output was polyaluminum chloride (PAC). To evaluate the resulting model, the dataset was partitioned into training and test data sets, respectively. In this experiment, 173 training data pairs were used to construct the IGM using the optimization method, and the 173 test datasets were used to validate the presented IGM. Table 1 lists the performance comparison for the training (Trn_RMSE) and test (Tst_RMSE) data for the PAC prediction of the coagulant dosing process. The number of iteration loops was 20, and the learning rate was 0.001. Figure 8 shows the prediction performance for the training and test datasets, respectively. As shown in Figure 8, the experimental results showed good prediction and generalization capability. Figure 9 shows the error distribution of PAC obtained by linear regression. As shown in Figure 9, it can be seen that the error distribution is concentrated near zero. Figure 10 shows the cluster centers corresponding to each context when the number of contexts was 8. As listed in Table 1, the prediction performance revealed that the IGM with the new optimization method performed better than linear regression, multilayer perceptron, linguistic model, and the IGM. As listed in Table 1, the IGM without optimization method showed good performance in comparison to the previous works. Furthermore, the experimental results revealed that the IGM with optimization method enhanced the performance of IGM itself. When p = c = 8, we obtained the best prediction performance in trial and error method. We followed the selection method of the number of node and rules in the previous works [28]. shows the error distribution of PAC obtained by linear regression. As shown in Figure 9, it can be seen that the error distribution is concentrated near zero. Figure 10 shows the cluster centers corresponding to each context when the number of contexts was 8. As listed in Table 1, the prediction performance revealed that the IGM with the new optimization method performed better than linear regression, multilayer perceptron, linguistic model, and the IGM. As listed in Table 1, the IGM without optimization method showed good performance in comparison to the previous works. Furthermore, the experimental results revealed that the IGM with optimization method enhanced the performance of IGM itself. When p = c = 8, we obtained the best prediction performance in trial and error method. We followed the selection method of the number of node and rules in the previous works [28].

Automobile MPG Prediction
The automobile MPG prediction data are available from the UCI repository [29]. The dataset consists of 392 data pairs. The input comprised the weight, acceleration, model year, cylinder number, displacement, and horsepower. The dataset was partitioned into training and test datasets. In this experiment, 196 training data pairs and the remaining test datasets were used for model evaluation. The number of iteration loops and the learning rates used were the same as those used previously. Figure 11 shows the performance for the training and test datasets. Figure 12 shows the distribution of error obtained by linear regression. Figure 13 shows the centers corresponding to each context, when the number of contexts was 8. Table 2 lists the performance comparison. As listed in Table 2, the experimental results revealed that the proposed IGM performed better than the linguistic model and IGM. When p = c = 8, we obtained the best prediction performance in trial and error method. We followed the selection method of the number of node and rules in the previous works [28].
Symmetry 2020, 12, x FOR PEER REVIEW 12 of 16 and IGM. When p = c = 8, we obtained the best prediction performance in trial and error method. We followed the selection method of the number of node and rules in the previous works [28].   and IGM. When p = c = 8, we obtained the best prediction performance in trial and error method. We followed the selection method of the number of node and rules in the previous works [28].

Boston Housing Data
Next, we shall use Boston housing data set that deals with the problem of real estate price prediction. This data are available from the UCI repository [29]. In this example, we used twelve input variables except for one binary attribute. The total data include 506 data pairs. We divided the data set into training and testing data sets of equal size. In this study, we used 253 training data pairs, while the remaining testing data sets were used for model evaluation. Table 3 listed the comparison results for RMSE for Boston housing data set. When p = c = 8, we obtained the best prediction performance in trial and error method. The proposed method outperformed the linguistic model and the basic IGM. Table 3. Comparison results for RMSE for Boston housing data set.

Boston Housing Data
Next, we shall use Boston housing data set that deals with the problem of real estate price prediction. This data are available from the UCI repository [29]. In this example, we used twelve input variables except for one binary attribute. The total data include 506 data pairs. We divided the data set into training and testing data sets of equal size. In this study, we used 253 training data pairs, while the remaining testing data sets were used for model evaluation. Table 3 listed the comparison results for RMSE for Boston housing data set. When p = c = 8, we obtained the best prediction performance in trial and error method. The proposed method outperformed the linguistic model and the basic IGM.

Discussion
The experimental results revealed that the proposed optimization method showed good performance in comparison to linguistic model and IGM itself for three data sets. The proposed method can be summarized by the following strength and weakness: Strengths: -The incremental granular model has high prediction performance by combining linear regression and local granular fuzzy model. - The local granular fuzzy model generates the automatic if-then rules using context-based fuzzy clustering method from numerical data set.

-
The incremental granular model can enhance the prediction performance by combining the derivative-based optimization and context-based fuzzy clustering. - In contrast to the conventional back-propagation method, after adjusting the contexts by steepest descent method, the cluster centers in the premise part are estimated by using context-based fuzzy clustering method. Weaknesses: - The number of contexts is obtained by trial and error method. - The number of cluster center per context are obtained by trial and error method. -As the number of data points increase, the number of rules also increase - The specific context can include the small data points, when the distribution of context is uniform.

Conclusions
We developed an optimization method for an IGM. This method comprised two stages: context refinement and cluster adjustment. In contrast to the conventional gradient descent method, we performed a hybrid learning method using the gradient descent method and clustering. The experimental results clearly indicated that the proposed IGM with the optimization method presented better generalization capability compared with the conventional methods and IGM. Therefore, we conclude that the new optimization method for IGM design is effective. Future studies will involve the optimal determination of cluster per context and new performance measures based on the concept of coverage and specificity.