1. Introduction
Various studies have been conducted on complex real-world problems with nonlinear characteristics. A linear regression (LR) method models linear correlation among dependent and one or more independent variables, fuzzy inference [
1], in a way similar to how humans solve vague and uncertain problems. It employs an artificial neural network with adaptation and learning that imitates information processing in human brain [
2] using the adaptive neuro-fuzzy inference system (ANFIS) model. Also, auto-regressive moving average model (ARIMA) [
3] using autoregressive, integrated, moving average is being studied and applied to various fields.
Zhang [
4] studied the LR method, the ANFIS model, and the ARIMA model to predict blood pressure using classification and regression trees. Krueger [
5] proposed a model for predicting semiconductor yield using a linear prediction model. Yahia [
6] proposed a model for predicting the SAR filtering results in speckle filtering using linear regression analysis. Zhang [
7] proposed a model for predicting on-line damping ratio using local weighted linear regression. Drouard [
8] proposed a model to estimate the head posture of a robot using linear regression. Martin [
9] proposed a model that predicts graduate student productivity using vector regression support (SVR). Amikhani [
10] proposed a model to predict the performance of solar power plants using an artificial neural network (ANN) and an adaptive neuro-fuzzy inference system. Naderloo [
11] proposed a model that uses ANFIS to predict crops yields based on various energy inputs. Umrao [
12] proposed a model to predict the strength and elastic modulus of heterogeneous sedimentary rocks using ANFIS. Zare [
13] proposed a model for predicting groundwater fluctuations using ANFIS and wavelet-ANFIS. Adigizel [
14] proposed a model to predict the effect of dust particles on photovoltaic modules using ANFIS. Ordonez [
15] proposed a model that predicts the remaining lifetime of the aircraft engine using hybrid ARIMA-ANFIS. Torbat [
16] proposed a model to predict the commodity market consumption rate using hybrid probabilistic fuzzy ARIMA. Ohyver [
17] proposed a model to predict rise in prices using ARIMA. Barak [
18] proposed an energy consumption prediction model using the ensemble ARIMA-ANFIS. Ramos [
19] proposed a model to predict consumer retail sales volume using ARIMA. Suhermi [
20] proposed a model for predicting roll motion using hybrid deep learning and ARIMA. Musaylh [
21] proposed a model that predicts short-term electricity demand in Queensland using SVR and ARIMA.
Also, clustering method for data analysis had been the subject of numerous studies, among them, a particular interest was paid to context-based fuzzy C-means (CFCM) clustering using fuzzy C-means (FCM) clustering. Special studies had been conducted on extracting information particles using CFCM clustering and designing a granular model (GM) or linguistic model (LM) using the extracted data [
22,
23]. Unlike the existing FCM clustering method, the CFCM clustering can model the data more precisely because it creates the context in the output space and the cluster for each context is built considering the characteristics of both the output and the input space [
24]. Zhu [
25] proposed a hybrid TS fuzzy model that combines the Takagi–Sugeno (TS) fuzzy model and information segmentation method. Hmouz [
26] proposed a time series prediction model using granular time series. Froelich [
27] proposed a detailed time series modeling method that uses information granules in time. Cimino [
28] proposed a genetic interval neural network using the spacing value of the information particles. Zhao [
29] proposed a model that predicts the amount of energy generated in steel production using GM. Pedrycz [
23] proposed LM to model user-centric systems. In addition to using GM, an incremental granular model (IGM) [
30] was suggested, which is a combination of GM and LR. The IGM computes the error of the model using the global portion of the LR, and the GM estimates the final predicted value by compensating the model error using the local portion.
However, since the traditional GM method described above generates the same cluster for each context, it is difficult to obtain good prediction performance for problems with strong nonlinear characteristics. To solve these problems, we studied optimization of internal parameters using the genetic algorithm (GA), an evolutionary optimization algorithm. Oztekin [
31] proposed a model that predicts the quality of life for patients with lung transplant by combining support vector machine (SVM) with GA. Garcia [
32] proposed a model that predicts the amount of short-term traffic congestion using cross entropy and GA. Sharma [
33] proposed a model that predicts the sea surface temperature of the Arabian Sea by applying GA to the preliminary heuristic orthogonal function (EOF). Sadi [
34] proposed a GA model for group method of data handling (GMDH) to predict the asphaltene precipitation. Esfe [
35] proposed a model to predict the viscosity of CuO-ethylene glycol nanofluid using GA-based artificial neural networks. Francescomarino [
36] proposed a model that optimizes the hyper parameters of predictive business process monitoring. Rotta [
37] proposed a model for predicting failure by applying GA to APODIS. Byeon [
38] suggested a method to optimize internal parameters by applying GA to the IGM.
However, in addition to GA, which imitates the evolution of living things and solves optimization problems using computations of crossing and mutation, there is also particle swarm optimization (PSO) [
39]. The advantage of PSO is that it can quickly find a global solution by exchanging information among multiple individuals using a simple algorithm. Huang [
40] proposed a short-term load prediction model using PSO for autoregressive moving average with exogenous variable (ARMAX). Chan [
41] proposed a short-term traffic forecasting model using intelligent PSO for road sensor systems. Bashir [
42] proposed a wavelet method using PSO-based artificial neural network. Anamika [
43] proposed electricity price prediction and classification model using the wavelet dynamic weighting PSO-FFNN. Rocha [
44] proposed a model that predicts the capacity of a power generation system by applying PSO to an extreme learning machine (ELM). Ma [
45] proposed an improved gray model and a railroad gap prediction model using PSO-SVM. Catalao [
46] proposed a short-term electricity pricing model using hybrid wavelet-PSO-ANFIS. Liao [
47] proposed a model that predicts the temperature of the reheating furnace combined with the fuzzy artificial neural network and PSO. Alik [
48] and Yifei [
49] conducted a study comparing PSO and GA, and PSO was able to confirm the superior performance even at faster computation speed and variable optimization than GA.
In this paper, we propose a method to optimize the number of clusters and fuzzy coefficients, which are internal parameters of the CFCM clustering method that models the local part of the IGM with particle swarm optimization (PSO), which is a natural imitative optimization algorithm. To verify the predictive performance of the proposed method, we conduct an experiment using average housing prices in Boston area from the Boston housing dataset. The experimental results show that the proposed PSO-IGM method generates a sufficient number of clusters for each context and optimizes the fuzzy coefficient to fit the model. It also shows better predictive performance than the existing IGM. The composition of this paper is as follows.
Section 1 explains the research background.
Section 2 describes the existing IGM and the proposed method, PSO-IGM.
Section 3 uses the Boston housing dataset to predict the housing prices in Boston and compare their performance.
Section 4 discusses the experimental results. Finally,
Section 5 concludes and discusses future research.
3. Results
In this section, we use the Boston housing dataset to evaluate the predictive performance of the PSO-based IGM described in
Section 3, and conduct an experiment to predict house prices in Boston, USA.
3.1. Boston Housing Dataset
The Boston housing dataset is provided by the StatLib library maintained by Carnegie Mellon University. The data on housing prices in Greater Boston Area consists of 13 input variables and 1 output variable. The input variables include per capita crime rate by town, proportion of residential land zoned for lots over 25,000 sq.ft., proportion of non-retail business acres per town, Charles River dummy variable, nitric oxide concentration (10 ppm), average number of rooms per dwelling, proportion of owner-occupied units built prior to 1940, weighted distances to five Boston employment centers, index of accessibility to radial highways, full-value property-tax rate per $10,000, the proportion of blacks by town, pupil-teacher ratio by town, and % lower status of the population. The output variable is the median value of owner-occupied homes in. The size of the data is 506 × 14. In our experiment, the training data and the test data were divided 50:50, respectively, and normalized to the range 0 to 1.
3.2. Experimental Method
The experimental procedure is as follows. We compare the predictive performance of the existing IGM with the PSO-IGM proposed in this paper. As described above, the existing IGM uses the LR model as a global part and GM as a local part. The global part of the PSO-IGM proceeds using the same LR method as the existing IGM, and optimizes the GM internal parameters using the PSO optimization algorithm in the local part GM.
First, in the experiment using the existing IGM, the experiment was conducted by changing the number of contexts and the number of clusters of the GM, which models the local part. The fuzzification coefficient (m) was fixed at 1.5, and the number of contexts was increased from 5 to 8. The number of clusters was increased from 2 to 20 by 1, and the prediction performance was confirmed. Next, in the experiment using the proposed method, the PSO-based IGM, the number of contexts of the GM that models the local part is increased from 5 to 8 as in the conventional method, but the number of clusters and the fuzzification coefficient are calculated using the PSO algorithm. The range of the number of clusters to be optimized is from 2 to 9, and the range of the fuzzification coefficient is set 1.5 to 2.5. The number of iterations of the PSO was set to 50, and the prediction performance was confirmed by setting the optimal number of the clusters and the fuzzy coefficient by setting the inertia weight to 1, the inertia weight damping to 0.99, the personal learning coefficient to 1.5, and the global learning coefficient to 2.
The predictive performance was evaluated using the root mean square error (RMSE). RMSE is a prediction measure that uses the difference between the predicted value and the observed value of the model, and can be expressed as Equation (13).
Here, represents the predicted value of the model, and represents the actual observed value. When both values are equal, the prediction performance becomes 100%, and the RMSE value becomes 0. Therefore, the smaller the RMSE value, the better the prediction performance.
3.3. Result Analysis
The prediction performance of the existing IGM using the Boston housing dataset is shown in
Table 1,
Table 2,
Table 3 and
Table 4. The context was fixed to 5, 6, 7 and 8, and the number of clusters was increased from 2 to 20 by 1. As a result, when the context was fixed to 7 and the number of clusters was set to 9, the verification RMSE was 3.74, which shows the best prediction performance.
Figure 7 shows a graph summarizing the predictive performance of the IGM with contexts fixed to 5, 6, 7 and 8, respectively. As seen in the figure, the best prediction performance is obtained when the context is 7 and the number of clusters is 9.
The proposed PSO-IGM method fixes the contexts to 5, 6, 7 and 8, respectively, and then generates the final model by optimizing the number of clusters and the fuzzy coefficients generated for each context.
Table 5 shows the prediction performance of the PSO-IGM.
Figure 8 shows the average of predicted performance of IGMs. The red line in the middle shows the average of the predicted performance and the blue box shows the range of the 25th percentile to the 75th percentile. The Red Cross mark indicates an abnormal value.
Figure 9 shows a graph that compares the predictive performance of the IGM that optimizes the number of clusters to be created in the context and the fuzzification coefficient after fixing the context to 5, 6, 7 and 8, respectively. Next, the
Figure 10,
Figure 11,
Figure 12 and
Figure 13 show that the cost function decreases in the PSO-IGM (context = 5, 6, 7, 8).
Figure 14 is a visualization of each clustering and fuzzification coefficient obtained from the PSO-IGM. 5, 6, 7, and 8 on the
x-axis represent the respective contexts, and the black bars above represent the number of clusters. Before the optimization, the cost function value is about 0.1, but it can be confirmed that the value gradually decreases as the optimization process is repeated. The orange bar represents the fuzzification coefficient. As seen in the figure, the best prediction performance is obtained when the context is 8 (the number of clusters is 7, 5, 3, 7, 3, 4, 5, 2) and the fuzzification coefficient is 1.8734.