Forecasting of Industrial Water Demand Using Case-Based Reasoning — A Case Study in Zhangye City , China

Forecasting the industrial water demand accurately is crucial for sustainable water resource management. This study investigates industrial water demand forecasting by case-based reasoning (CBR) in an arid area, with a case study of Zhangye, China. We constructed a case base with 420 original cases of 28 cities in China, extracted six attributes of the industrial water demand, and employed a back propagation neural network (BPN) to weight each attribute, as well as the grey incidence analysis (GIA) to calculate the similarities between target case and original cases. The forecasting values were calculated by weighted similarities. The results show that the industrial water demand of Zhangye in 2030, which is the t arget case, will reach 11.9 million tons. There are 10 original cases which have relatively high similarities to the target case. Furthermore, the case of Yinchuan, 2010, has the largest similarity, followed by Yinchuan, 2009, and Urumqi, 2009. We also made a comparison experiment in which case-based reasoning is more accurate than the grey forecast model and BPN in water demand forecasting. It is expected that the results of this study will provide references to water resources management and planning.


Introduction
Water scarcity is becoming increasingly severe in arid and semi-arid regions of the world [1][2][3][4], whereby access is not only limited by water resources availability, but also by resource conservation, environmental friendliness, appropriateness of technologies, economic viability, and social acceptable of development issues [5].Water resources have become a bottleneck economies all over the world [6].If there is no further action, more than 40% of countries will face a water resources crisis in 2030; arid regions will bear the brunt [7].
Industry plays a fundamental role in the national economy of China.Over the past decades, it has been proved that the forecasting results by some departments of China are obviously greater than actual water uses [8].Industrial water demand forecasting is complicated due to various departments and enormous differences in industries, which were unfavourable for analysis and calculation [9].To forecast the industrial water demand accurately and scientifically is always the key point to water resources planning and management in the current socio-economic development period.Time series methods, such as regression analysis [10], quota method [11], constant rate model [12], principal component analysis [13], etc., calculate and forecast water demand through constructing statistic models of data series.However, the industrial water demand has many affection factors, such as zone, infrastructure, industrial category, production, population, climate, etc., which interact with each other.Past forecasting models cannot completely simulate this non-linear relationship, i.e., there are always large uncertainties in industrial water demand forecasting.To reduce these uncertainties, non-linear theories, such as chaos [14], neural network [15][16][17], marginal analysis [18], scenario simulation [19,20], and multi-model [21,22] have been introduced into water demand forecasting.To a certain extent, these methods have improved the forecasting accuracy; nevertheless, they still have not considered the internal mechanism of industrial water demand variations adequately.Furthermore, regardless of time series methods or non-linear theories, they were driven by historic regulation of water demands.However, the trend of the industrial water demand is not fixed in time.With the variation of natural resources and the environment, and the adjustment of industrial policies, the simulation results of past methods did not always agree with the facts.For different periods of economic development, the evolving regulars of industrial water consumption are different.It is hard to define particular models or restrained regulations to reflect the relationship between the industrial water demand and other socio-economic factors.
To process the bottleneck, this study proposed an approach to forecast the industrial water demand by case-based reasoning (CBR).CBR is an object-oriented method [23][24][25] of comprehensive analysis, and is affiliated with artificial intelligence [26].The most obvious feature of CBR is that it does not need to define unambiguous rules, but uses the underlying expression of cases, which would reduce the model construction time, and effectively solve the problems of fuzziness and uncertainty when acquiring knowledge.CBR can use historic data and choose appropriate cases to analyze and forecast accurately, without a clear internal mechanism of objects development [27].CBR is advantageous in simplifying knowledge acquisition, improving efficiency and quality of problem-solving, and accumulating knowledge [27][28][29], and is widely applied in fields that have abundant experiences but weak theory models, such as fault diagnosis [30,31], business administration [32][33][34], medical application [35][36][37], emergency management [38,39], land use development [40,41] etc.In the long-term forecasting of the industrial water demand, the training samples and target object are at different development stages; they may present uncertainties of internal resources use level, technical transformation process and institutional evolution.Therefore, methods constructed by paradigms may have difficulties to meet the demand of uncertain planning.Instead, CBR can avoid uncertainties and regulations of complex forecasts without a linear hypothesis [27].
To forecast the industrial water demand of Zhangye City in 2030, the approach was divided into three parts: First, we constructed an industrial water demand case base, which contains 420 original cases of 28 cities in China from 2000 to 2014.Secondly, six attributes were extracted as case attributes and weighted by back propagation neural network (BPN).Finally, we applied grey incidence analysis to calculate the similarities between the target case and original cases, and used the similarity weighted method to forecast the industrial water demand of the target case.

Study Area
Zhangye city is the largest economic zone and the largest water consumption area of the Heihe River Basin and is also a critical hub of "the Silk Road Economic Belt and the 21st-Century Maritime Silk Road" (B&R) (Figure 1).The city covers 40,874 km 2 containing six counties.Zhangye is a typical arid area with a continental climate.From the perspective of added value in 2016, the dominant industries of Zhangye are mining, manufacturing and electricity, gas, water production and supply.In the future, industry will continue to play an important role in Zhangye's economy.Figure 2 shows the trend of industrial water consumption and industrial added value in Zhangye from 2000 to 2014.In general, the trend of industrial water consumption can be considered as fluctuating growth.In the current period of industrial transportation, the industrial water consumption may be greatly affected by the breaking of the water balance in Zhangye City.Therefore, it is necessary to forecast the industrial water demand scientifically and accurately in the context of excessive use of the region's water resources due to rapid economic growth and increasing population pressure, and the result will provide a reference for water management in Zhangye city, even in Heihe River basin.

CBR Forecasting Framework
Case-based reasoning is a methodology that uses previous cases to solve new problems [42].The basic assumptions of CBR are that similar problems have similar solutions [29].The paradigm of CBR processing problems is inspired by the human decision-making process [43].It collects a set of cases, In the future, industry will continue to play an important role in Zhangye's economy.Figure 2 shows the trend of industrial water consumption and industrial added value in Zhangye from 2000 to 2014.In general, the trend of industrial water consumption can be considered as fluctuating growth.In the current period of industrial transportation, the industrial water consumption may be greatly affected by the breaking of the water balance in Zhangye City.Therefore, it is necessary to forecast the industrial water demand scientifically and accurately in the context of excessive use of the region's water resources due to rapid economic growth and increasing population pressure, and the result will provide a reference for water management in Zhangye city, even in Heihe River basin.In the future, industry will continue to play an important role in Zhangye's economy.Figure 2 shows the trend of industrial water consumption and industrial added value in Zhangye from 2000 to 2014.In general, the trend of industrial water consumption can be considered as fluctuating growth.In the current period of industrial transportation, the industrial water consumption may be greatly affected by the breaking of the water balance in Zhangye City.Therefore, it is necessary to forecast the industrial water demand scientifically and accurately in the context of excessive use of the region's water resources due to rapid economic growth and increasing population pressure, and the result will provide a reference for water management in Zhangye city, even in Heihe River basin.

CBR Forecasting Framework
Case-based reasoning is a methodology that uses previous cases to solve new problems [42].The basic assumptions of CBR are that similar problems have similar solutions [29].The paradigm of CBR processing problems is inspired by the human decision-making process [43].It collects a set of cases,

CBR Forecasting Framework
Case-based reasoning is a methodology that uses previous cases to solve new problems [42].The basic assumptions of CBR are that similar problems have similar solutions [29].The paradigm of CBR processing problems is inspired by the human decision-making process [43].It collects a set of cases, which are original cases, to form a case base.Each original case has a problem and a solution.When a new problem appears, CBR can retrieve the most similar original case from the case base according to the conditions, and apply its solution to the new problem through analysis and modification.As a new original case, the solved target case and its solution will be kept to renew the case base.Therefore, the CBR problem-solving architecture, as shown in Figure 3, typically consists of four components: Retrieve, Reuse, Revise and Retain-commonly referred to as "4Rs" [44].CBR has the merits that it is easy to obtain knowledge, simple to express it and fast to reason it, especially in the fields of fault diagnosis and decision support which have abundant experiential knowledge but lack strong theoretical models and a complete domain knowledge system [27][28][29].
Water 2017, 9, 626 4 of 13 which are original cases, to form a case base.Each original case has a problem and a solution.When a new problem appears, CBR can retrieve the most similar original case from the case base according to the conditions, and apply its solution to the new problem through analysis and modification.As a new original case, the solved target case and its solution will be kept to renew the case base.Therefore, the CBR problem-solving architecture, as shown in Figure 3, typically consists of four components: Retrieve, Reuse, Revise and Retain-commonly referred to as "4Rs" [44].CBR has the merits that it is easy to obtain knowledge, simple to express it and fast to reason it, especially in the fields of fault diagnosis and decision support which have abundant experiential knowledge but lack strong theoretical models and a complete domain knowledge system [27][28][29].This paper used the core theory of CBR to forecast the industrial water demand in Zhangye City.Specifically, the technique flow (Figure 4) of this paper contains four steps: Step 1 constructed the case base by integrating the data of industrial water consumptions and its attributes of 28 cities evenly distributed over China.Step 2 computed the weights of each influential factor by the BPN.Step 3 calculated the similarities between the target case and original cases by Grey Incidence Analysis.Step 4 screened out the cases with low similarities and kept the higher ones, then forecasted the industrial water demand by weighted similarity.This paper used the core theory of CBR to forecast the industrial water demand in Zhangye City.Specifically, the technique flow (Figure 4) of this paper contains four steps: Step 1 constructed the case base by integrating the data of industrial water consumptions and its attributes of 28 cities evenly distributed over China.Step 2 computed the weights of each influential factor by the BPN.Step 3 calculated the similarities between the target case and original cases by Grey Incidence Analysis.Step 4 screened out the cases with low similarities and kept the higher ones, then forecasted the industrial water demand by weighted similarity.
Water 2017, 9, 626 4 of 13 which are original cases, to form a case base.Each original case has a problem and a solution.When a new problem appears, CBR can retrieve the most similar original case from the case base according to the conditions, and apply its solution to the new problem through analysis and modification.As a new original case, the solved target case and its solution will be kept to renew the case base.Therefore, the CBR problem-solving architecture, as shown in Figure 3, typically consists of four components: Retrieve, Reuse, Revise and Retain-commonly referred to as "4Rs" [44].CBR has the merits that it is easy to obtain knowledge, simple to express it and fast to reason it, especially in the fields of fault diagnosis and decision support which have abundant experiential knowledge but lack strong theoretical models and a complete domain knowledge system [27][28][29].This paper used the core theory of CBR to forecast the industrial water demand in Zhangye City.Specifically, the technique flow (Figure 4) of this paper contains four steps: Step 1 constructed the case base by integrating the data of industrial water consumptions and its attributes of 28 cities evenly distributed over China.Step 2 computed the weights of each influential factor by the BPN.Step 3 calculated the similarities between the target case and original cases by Grey Incidence Analysis.Step 4 screened out the cases with low similarities and kept the higher ones, then forecasted the industrial water demand by weighted similarity.

Extraction of Case Attributes
The attributes of industrial water can be categorized into environmental factors, economic factors and social factors.Although environmental factors are critical to the industrial development of a city, they are also relatively stable and will not have a noticeable impact on urban industry in the short term.However, socio-economic factors, such as rapid economic development, population growth, industrialization level and technological change, play a decisive role in urban industry.Accordingly, combining with previous studies and availability of data, we chose per capita GDP to characterize the regional economic development; industrial population and industrial electricity to characterize the industrial scale; industrial production and industrial fixed assets investment to characterize the regional industrial level; and gross amount of water resources to characterize the constraint of regional natural environment.Hence, the above six indexes were selected as the attributes of the industrial water demand in this paper.Due to the different impacts of each factor on the industrial water demand, it is important to scientifically and reasonably estimate the weights of each factor which may affect the forecasting accuracy.Therefore, it may lead to a large error if the weights are inadequate.

Attributes Weighting Based on the Back Propagation Neural Network (BPN)
Subjective weighting methods, such as the analytic hierarchy process (AHP) and the Delphi Method, may bring weights instability and one-sidedness.However, the method of the BPN can self-adapt the weights according to the impacts of each influential factor on the industrial water demand [45].
The BPN is one of the most widely used artificial neural networks (ANN) in learning methods of the feed-forward neural network [46].It has the abilities of self-learning and self-adapting.The BPN has been widely applied in many fields, such as pattern recognition, expert system, prediction and signal processing, etc.In the classical structure of the BPN, the outputs of each layer are sent directly to each neuron of the next layer.A three-layered BPN contains an input layer that receives and distributes inputs, a middle (or hidden) layer that captures the nonlinear relationships of inputs and outputs, and an output layer that produces calculated data.When the weights are trained, if the outputs of the network are not equal to the expected outputs, the network will send the errors back to the input layer and the middle layer and retrain to control the errors at a very low level.After the training, the weights can be obtained according to the weight matrix throughout the network.In this study, we used 364 cases of 28 cities during 2000-2012 in which case attributes were used as input data, and observed industrial water consumptions were used as target data (Figure 5).The training algorithm for the BPN was Traingdm.The coefficients of the hidden layer, which were the weights of each attribute, were calculated after self-learning.

Extraction of Case Attributes
The attributes of industrial water can be categorized into environmental factors, economic factors and social factors.Although environmental factors are critical to the industrial development of a city, they are also relatively stable and will not have a noticeable impact on urban industry in the short term.However, socio-economic factors, such as rapid economic development, population growth, industrialization level and technological change, play a decisive role in urban industry.Accordingly, combining with previous studies and availability of data, we chose per capita GDP to characterize the regional economic development; industrial population and industrial electricity to characterize the industrial scale; industrial production and industrial fixed assets investment to characterize the regional industrial level; and gross amount of water resources to characterize the constraint of regional natural environment.Hence, the above six indexes were selected as the attributes of the industrial water demand in this paper.Due to the different impacts of each factor on the industrial water demand, it is important to scientifically and reasonably estimate the weights of each factor which may affect the forecasting accuracy.Therefore, it may lead to a large error if the weights are inadequate.

Attributes Weighting Based on the Back Propagation Neural Network (BPN)
Subjective weighting methods, such as the analytic hierarchy process (AHP) and the Delphi Method, may bring weights instability and one-sidedness.However, the method of the BPN can selfadapt the weights according to the impacts of each influential factor on the industrial water demand [45].
The BPN is one of the most widely used artificial neural networks (ANN) in learning methods of the feed-forward neural network [46].It has the abilities of self-learning and self-adapting.The BPN has been widely applied in many fields, such as pattern recognition, expert system, prediction and signal processing, etc.In the classical structure of the BPN, the outputs of each layer are sent directly to each neuron of the next layer.A three-layered BPN contains an input layer that receives and distributes inputs, a middle (or hidden) layer that captures the nonlinear relationships of inputs and outputs, and an output layer that produces calculated data.When the weights are trained, if the outputs of the network are not equal to the expected outputs, the network will send the errors back to the input layer and the middle layer and retrain to control the errors at a very low level.After the training, the weights can be obtained according to the weight matrix throughout the network.In this study, we used 364 cases of 28 cities during 2000-2012 in which case attributes were used as input data, and observed industrial water consumptions were used as target data (Figure 5).The training algorithm for the BPN was Traingdm.The coefficients of the hidden layer, which were the weights of each attribute, were calculated after self-learning.

Similarities Calculation Based on Grey Incidence Analysis
Whether it can retrieve reasonable cases, thus obtaining the optimized solution, is key to the success of a CBR model.The selection of similar cases in this paper applied grey incidence analysis (GIA) by calculating the similarities between the target case and original cases.In comparison with other methods, GIA had a better applicability in this research, and could also compute the degrees of correlation accurately.
Assuming the case set has n cases, namely the original case set C = {C 1 , C 2 , . . ., C n }; each case has m attributes, namely the factor set of the original case F = {f i1 , f i2 , . . ., f im }, i = 1, 2, . . ., n; the target case set T = {T 1 , T 2 , . . ., T q } and the factor set of target case A = {a p1 , a p2 , . . ., a pm }, p = 1, 2, . . ., q.Thus, the n dimensional weighted grey similarity (G(T p ,C i )) between target case (T p ) and each original case (C i ) can be expressed as: where ω j is the weight of the jth influential factor; ρ(a p (j),f i ) represents the similarity of the jth factor between the target case and the original case; λ is discrimination coefficient values 0.5 by experience.
After calculating the similarities between the target case and each original case, combining with the threshold (σ), whether the target case is similar to each original case could be judged.The threshold was determined by expert experience, and was constantly adjusted in the process of selection.Specifically, when G(a p ,C i ) ≥ σ, namely the target case is similar to the original case, and both of them have a similar socio-economic development level and almost the same industrial water demand.Therefore, it can forecast the industrial water demand of the target case by analyzing the industrial water consumption of similar original case/cases.Conversely, when G(a p ,C i ) < σ, there are few similarities of socio-economic situations and the industrial water demand between the target case and original case.

Industrial Water Demand Forecasting
The retrieval process was not to choose one case which has the largest similarity, but a case set inside which the similarities of every case were all above the threshold.Then, the industrial water demand of the target case could be obtained by analyzing the features and industrial water consumptions of similar original cases.At present, stepwise regression analysis, the optimized case method and the weighted similarity method are the widely applied approaches to forecast the target case.Due to the relatively lower error, this paper used the weighted similarity method to forecast the industrial water demand.
where R p is the forecasting result of the pth target case; S j is the industrial water consumption of the jth original case.

Validation of the CBR Model
The validation process used the data between 2000 and 2012 to forecast the industrial water demand of the years 2013 and 2014 in Zhangye City.The case set contained 364 cases by using the statistics of 28 cities in China from 2000 to 2012.Table 1 shows the weighting result of each influential factor which were calculated by the BPN.From the above table, industrial population had the highest weight among the six features.In contrast, gross industrial production was the lowest.Then, grey incidence analysis was used to calculate the similarities between the target case and each original case.With the modified threshold σ (=0.95), the number of similar cases according to the target cases of Zhangye City in the years 2013 and 2014 were screened by 16 and 14, respectively.
Further, these screened cases were applied to forecast the target cases by grey incidence analysis.This paper used the relative error to test the accuracy of the CBR model.The smaller the error was, the higher the accuracy would be.Table 2 gives the forecasting results in comparison with the observed value in 2013 and 2014.The forecasting results of the industrial water demand in Zhangye City were 65 million tons in 2013 and 73 million tons in 2014.Comparably, the observed values were 69 and 72 million tons, respectively.Thus, the relative errors were −5.80% in 2013 and 1.39% in 2014.Apart from the comparison of different years, this paper also compared different methods of forecasting to make sure CBR was more suitable for industrial water demand forecasting.Table 3 illustrates the different forecast results of the Grey Model (GM(1, 1)), BPN and CBR in Zhangye City for the year 2013.From the above table, the forecast results of the industrial water demand were 81 million tons of GM(1, 1) and 55 million tons of BPN, respectively.Accordingly, the relative errors were 17.39% and −20.29%, respectively.By contrast, the forecast result of CBR had relatively high accuracy, which suggested that the forecasting result of CBR was fully justified by real conditions of industrial water consumptions in Zhangye City.

Forecasting of the Target Case
After the above analysis, CBR can be applied to forecast the industrial water demand of Zhangye City in 2030.In light of the validation process, the casebase extended to 420 cases of 28 cities from 2000 to 2014.Firstly, we used the autoregressive integrated moving average model (ARIMA (q, d, p)) to predict the attributes' values of Zhangye City in 2030 (Table 1).Secondly, we calculated the similarities between the target case in 2030 and each original case (Appendix A).When the threshold σ = 0.95, the number of original cases similar to the target case was 10 (Table 4).Finally, we applied the weighted similarity method to forecast the industrial water demand of Zhangye City in 2030, which was 11.9 million tons.

Discussion
In this study, industrial water demand forecasting by CBR is more accurate in comparison with GM(1, 1) and the BPN.However, the case base construction in this paper just considered the socio-economic factors, but lacked environmental factors.The incompleteness of case attributes may restrain accuracy improvement.In order to construct a case base that is as comprehensive as possible, we collected original cases with a wide coverage.Not only the industrially developed cities such as Beijing, Shanghai, and Guangzhou, but also the relatively backward ones were collected as original cases.
The BPN was applied to weight case attributes, which has been proved to be feasible.However, the BPN also has some disadvantages, such as a poor rate of convergence, and easily getting stuck in the local minimum.Furthermore, as the BPN is based on the gradient information of the error function, when the problems are complex or the gradient information is hard to obtain, BPN may be helpless.To overcome the disadvantages, many optimization algorithms have been introduced in the study and design of neural networks such as constructing a neural network based on the particle swarm optimization algorithm [47], and using evolutionary algorithms to optimize the neural networks [48][49][50], which have been proved feasible and effective.
To validate that the CBR has a relatively high accuracy in industrial water demand forecasting, a control experiment was implemented by GM (1, 1) and the BPN.The accuracy of GM (1, 1) is lower than that of the CBR, and is suitable for short-term period forecasting.The forecasting of the BPN needs more consistent internal regulations of training samples.However, each city has a distinctive driving mechanism of development mode and industrial water consumption, which makes it difficult to train the network simulation.Therefore, CBR is more appropriate for industrial water demand forecasting.

Conclusions
This study used case-based reasoning to forecast the industrial water demand in Zhangye City.We extracted six attributes of the industrial water demand as features of the case and constructed a case base containing 420 cases.The BPN was employed to calculate the weights of features.We also selected grey incidence analysis to compute the similarities between the target case and original cases.After constantly adjusting the threshold, the cases with relatively high similarities were screened to forecast the industrial water demand.Our main conclusions are as follows.
(1) The effectiveness, workability and accuracy of CBR in the process of forecasting the industrial water demand have been validated by both longitudinal and crosswise comparisons.The forecasting accuracies reached −5.80% and 1.39% respectively in 2013 and 2014 by using CBR.Moreover, when forecasting the industrial water demand of Zhangye in 2013, accuracies of only −17.39% and 20.29% were obtained for the Grey Model and BPN, respectively.Therefore, CBR showed better adaptation in forecasting the industrial water demand of Zhangye in 2030.
(2) In the validation process, 13 and 11 similar original cases were screened for the target case in 2013 and 2014 respectively.Accordingly, the forecasted industrial water demands of Zhangye were 65 and 73 million tons in 2013 and 2014, respectively.In light of the validation results, the forecasting value of the industrial water demand in Zhangye in 2030 was 11.9 million tons, with 10 similar original cases.
The development of the industrial water demand has characteristics of uncertainty and fluctuation.This study proposed a CBR method to forecast the industrial water demand, which proved to provide relatively high accuracy.With the implementation of strict water resource management policies in China, the government departments are responsible for water management by scientifically planning the industrial water demand and use, which is crucial for guiding water consumption control in Zhangye as well as in other cities in China and other developing countries.Therefore, the results of this paper provide a reference for water resources management and planning, for the consideration of decision-makers.

Figure 1 .
Figure 1.Location of Zhangye City in Heihe River Basin, China.

Figure 2 .
Figure 2. Gross industrial production and industrial water consumption in Zhangye during 2000-2014.

Figure 1 .
Figure 1.Location of Zhangye City in Heihe River Basin, China.

Figure 1 .
Figure 1.Location of Zhangye City in Heihe River Basin, China.

Figure 2 .
Figure 2. Gross industrial production and industrial water consumption in Zhangye during 2000-2014.

Figure 2 .
Figure 2. Gross industrial production and industrial water consumption in Zhangye during 2000-2014.

Figure 4 .
Figure 4. CBR-based framework of industrial water demand forecasting.

Figure 4 .
Figure 4. CBR-based framework of industrial water demand forecasting.Figure 4. CBR-based framework of industrial water demand forecasting.

Figure 4 .
Figure 4. CBR-based framework of industrial water demand forecasting.Figure 4. CBR-based framework of industrial water demand forecasting.

Figure 5 .
Figure 5. Structure of the Back Propagation Neural Network (BPN) to train the weights of case attributes.

Figure 5 .
Figure 5. Structure of the Back Propagation Neural Network (BPN) to train the weights of case attributes.

Table 1 .
Weights of case attributes.

Table 2 .
Comparison between forecasting based on CBR and observed values in 2013 and 2014 (×10 6 tons).

Table 4 .
Similar cases and similarities to the target case.