A Water Consumption Forecasting Model by Using a Nonlinear Autoregressive Network with Exogenous Inputs Based on Rough Attributes

: Scientiﬁc prediction of water consumption is beneﬁcial for the management of water resources. In practice, many factors affect water consumption, and the various impact mechanisms are complex and uncertain. Meanwhile, the water consumption time series has a nonlinear dynamic feature. Therefore, this paper proposes a nonlinear autoregressive model with an exogenous input (NARX) neural network model based on rough set (RS) theory. First, the RS theory was used to analyze the importance of each attribute in water consumption. Then, the main inﬂuencing factor was selected as the input of the NARX neural network model, which was applied to predict water consumption. The proposed model is proved to give better results of a single NARX model and a back propagation neural network. The experimental results indicate that the proposed model has higher prediction accuracy in terms of the mean absolute error, mean absolute percentage error and root mean square error. model is very good. The following is a detailed analysis of the prediction results from the error of each year.


Introduction
Water consumption prediction plays an important role in the supply of urban water, the reduction in water resources waste and sustainable development of water resources. Therefore, it has attracted many domestic and foreign scholars to conduct research on water resource prediction.
A variety of methods have been developed for water consumption prediction. Many methods are based on time-series models, which focus on past behaviors of water consumption, and can be complemented by some exogenous variables, such as the statistical regression model [1,2]. This type of method mainly used techniques such as statistics to analyze the data, that is, it relies on the historical data to predict water consumption. However, the general regression model performs poorly for the analysis of non-stationary time series. The autoregressive-integrated moving average (ARIMA) model [3] has a great advantage for the processing of non-stationary time series. However, the ARIMA model supports the prediction of univariate problems, and it is difficult to establish multivariate predictive models. As an intelligent prediction method, artificial neural network (ANN) provides a quick and flexible means of creating models for time-series prediction [4]. In recent years, the ANN has attracted much research interest in various fields due to its strong self-organization, self-learning ability, and good fault tolerance [5]. The ANNs can learn from patterns and capture hidden functional relationships in given data, even if the functional relationships are unknown or difficult to identify. This kind of ability makes them applicable to nonlinear time-series prediction with satisfactory prediction results. analysis of incomplete information and uncertain information analysis [26]. It identifies partial and full dependencies and facilitates the handling of missing data, non-numeric data and dynamic data. The knowledge expression system (decision system) should be in the form of a set K = (U, B, V, g), where U is a non-empty finite set of all objects, B is a non-empty finite property of the attribute set, V is a set of attribute values, and g is an information function for determining the attribute value of each object x n in U. The rough set theory believes that some uncertain knowledge cannot be accurately represented, so it uses upper and lower approximation sets to represent these concepts. There are object subset X⊆U and attribute subset Q⊆B. Let Q(X) be the set of objects that definitely belong to X according to Q and be called X's lower approximation of Q: Let Q(X) be the set of objects that may belong to X according to Q and be called X's lower approximation of Q: where X =Ø, and Ø is an empty set sign. Let E⊆B, x i , and x j ∈U and then define IND(E) as the equivalence relation. The equivalence relation means that in each equivalence set, the objects are indistinguishable and recorded as U/Q: In the decision system K = (U, C∪D, V, g), C is a set of conditional attributes, and D is a set of decision attributes. The equivalence class U/D is defined as the positive region of the condition attribute C. It is defined as POS C (X): The dependence of D on C is defined as γ C (D): where |.| indicates the number of elements in the set. For attribute a∈C, let ε be the importance of the attribute. The calculation formula for the importance of attribute a is as follows: The rough set theory does not require prior knowledge. It relies on the information provided by the data itself to perform effective data analysis. It can simplify the data while preserving the key information and reduce the dimension of the knowledge expression space.

NARX Neural Network
The water consumption sequence is a dynamic nonlinear sequence. The NARX neural network is a kind of dynamic RNN. It introduces the concept of time series, which makes the NARX model have good dynamic characteristics and high anti-interference ability. The basic network structure of the NARX neural network is the same as that of the ANN. The ANN is a mathematical model that imitates the structure and function of biological neural  [27]. In general, an ANN consists of an input layer, one or more hidden layers, and an output layer, through which the results are provided [28]. It is noted that each layer has several neurons.
Neural networks that use feedback connections, enabling lateral or backward information flow within the network, are called RNNs. The NARX neural network model is a special type of RNN that uses global feedback connection between the output layer and the input layer. This makes the NARX neural network have good dynamic characteristics and strong anti-interference ability [29]. The NARX neural network is a neural network with the memory function. The output of this network depends on the current input and past output, which greatly improves the generalization ability of the network.
The NARX model not only has the advantages of the traditional time-series model but also can improve adaptability of the model's nonlinear data through training. It introduces the output vector's delay feedback into the network training to form a new input vector [30]. The NARX model (open loop) is defined as follows: where y(.) refers to water consumption and x(.) refers to an external factor in this paper. The x(t) indicates the value of x at time t, and d is the number of delays. The model structure of NARX neural network is shown in Figure 1.
Water 2021, 13, x FOR PEER REVIEW 4 of 17 ability. The basic network structure of the NARX neural network is the same as that of the ANN. The ANN is a mathematical model that imitates the structure and function of biological neural networks [27]. In general, an ANN consists of an input layer, one or more hidden layers, and an output layer, through which the results are provided [28]. It is noted that each layer has several neurons. Neural networks that use feedback connections, enabling lateral or backward information flow within the network, are called RNNs. The NARX neural network model is a special type of RNN that uses global feedback connection between the output layer and the input layer. This makes the NARX neural network have good dynamic characteristics and strong anti-interference ability [29]. The NARX neural network is a neural network with the memory function. The output of this network depends on the current input and past output, which greatly improves the generalization ability of the network.
The NARX model not only has the advantages of the traditional time-series model but also can improve adaptability of the model's nonlinear data through training. It introduces the output vector's delay feedback into the network training to form a new input vector [30]. The NARX model (open loop) is defined as follows where y(.) refers to water consumption and x(.) refers to an external factor in this paper. The x(t) indicates the value of x at time t, and d is the number of delays.
The model structure of NARX neural network is shown in Figure 1. The activation function g(.) (sigmoid function [31] selected in this paper) can amplify the output of the neuron or limit it to a suitable range. Hence Equation (7) can be re-written as: where wi and i w are weights, and b represents the bias.
The sigmoid function is: The activation function g(.) (sigmoid function [31] selected in this paper) can amplify the output of the neuron or limit it to a suitable range. Hence Equation (7) can be rewritten as: where w i and w i are weights, and b represents the bias. The sigmoid function is: where u is the neuron input.

A Water Consumption Prediction Model Based on the RS-NARX Neural Network
In this section, the prediction model incorporating RS and NARX neural networks is constructed. The main process of the proposed model is illustrated in Figure 2.
Step 3: Attribute reduction. The dynamic reduction algorithm [33] is used to perform attribute reduction, and the importance of each attribute is obtained.
Step 4: Train the NARX neural network.
(2) Determine the parameters (the number of hidden layers and the number of delays) in the NARX neural network. (3) Train the NARX neural network.
Step 5: Obtain the predicted value.

Data Description
Chongqing is one of the four municipalities under direct control of the central government of China. As the largest heavy industrial and commercial city in the southwest of China, Chongqing is an important link between "the Belt and Road" and the Yangtze River economic belt. Chongqing is a serious water shortage area. In recent years, water waste, water pollution and other problems are widespread in Chongqing, which has become the main bottleneck restricting the sustainable development of the economy and society. Therefore, Chongqing is used as a case study to provide advice on water resources The main steps of the process of the RS-NARX neural network are described as follows: Step 1: Data preparation. Collect relevant data.
Step 2: Data discretization. The continuous data is discretized using the Naive algorithm [32].
Step 3: Attribute reduction. The dynamic reduction algorithm [33] is used to perform attribute reduction, and the importance of each attribute is obtained.
Step 4: Train the NARX neural network.
(1) Establish a NARX network structure. (2) Determine the parameters (the number of hidden layers and the number of delays) in the NARX neural network.
Train the NARX neural network.
Step 5: Obtain the predicted value.

Data Description
Chongqing is one of the four municipalities under direct control of the central government of China. As the largest heavy industrial and commercial city in the southwest of China, Chongqing is an important link between "the Belt and Road" and the Yangtze River economic belt. Chongqing is a serious water shortage area. In recent years, water waste, water pollution and other problems are widespread in Chongqing, which has become the main bottleneck restricting the sustainable development of the economy and society. Therefore, Chongqing is used as a case study to provide advice on water resources management in the country. The study collected annual data including total water consumption and condition attributes (social and economic factors) including the effective irrigation area (10 3 hectares), agricultural GDP (10 8 RMB), precipitation (billion m 3 ), industrial GDP (10 8 RMB), urbanization rate (%), service industry GDP (10 8 RMB), residential water price (ton/RMB), population (10 4 persons), residential consumption level (RMB), agricultural output ratio (%),industrial output ratio (%) and service industrial output ratio (%). The water consumption and socio-economic data of Chongqing from 2001 to 2016 were collected from Chongqing Water Resources Bulletin [34] and the Statistical Yearbook of Chongqing [35], respectively. Table 1 presents the values of socio-economic indicators of Chongqing in 2001-2016. Year Note: X 1 represents effective irrigation area (10 3 hectares), X 2 represents agricultural GDP (10 8 Yuan), X 3 represents precipitation (billion m 3 ), X 4 represents industrial GDP (10 8 RMB), X 5 represents urbanization rate (%), X 6 represents service industry GDP (10 8 RMB), X 7 represents residential water price (ton/RMB), X 8 represents population (10 4 persons), X 9 represents residential consumption level (RMB), X 10 represents agricultural output ratio (%), X 11 represents industrial output ratio (%), and X 12 represents service industrial output ratio (%).
The total water consumption (billion m 3 ) is divided into agricultural water consumption (billion m 3 ), industrial water consumption (billion m 3 ), service industry water consumption (billion m 3 ), domestic water consumption (billion m 3 ) and eco-environmental water consumption (billion m 3 ). The water consumption in each sector in Chongqing from 2001 to 2016 is illustrated in Figure 3. The total water consumption has gradually increased from 2001 to 2011 and has had a gradual downward trend since 2012. In May 2012, the Ministry of Water Resources convened the national work conference on water resources, assigning the tasks for implementing the strictest water resources management system. According to the instructions of the State Council, Chongqing Municipality has also begun to implement the strictest water resources management system in Chongqing. As shown in Figure 4, industrial water consumption began to decrease in 2010. During the "Twelfth Five-Year Plan" period, the Chongqing Municipal Government completed the task of industrial restructuring and eliminating the outdated production capacity. This is the main reason for the decline of water use in the secondary industry during the period 2010-2015 (the "12th Five-Year Plan" period). The water consumption has decreased, however GDP has been increasing year by year. This indicates that various industries have increased the utilization of water resources and reduced unnecessary water use. Of course, the most stringent water resources management system is indispensable. The strictest water resources management system emphasizes strict control of water consumption, optimization of water resources allocation, and overall improvement of water use efficiency. Therefore, under the premise of controlling the total water consumption, it is necessary to coordinate the water Water 2022, 14, 329 7 of 16 resources allocation of each sector. Agriculture and industry are the main sectors in terms of water consumption. Agricultural and industrial water consumption accounts for 79% of total water consumption. Third is residential water consumption, which accounts for 17% of total water consumption. Note: X1 represents effective irrigation area (10 3 hectares), X2 represents agricultural GDP (10 8 Yuan), X3 represents precipitation (billion m 3 ), X4 represents industrial GDP (10 8 RMB), X5 represents urbanization rate (%), X6 represents service industry GDP (10 8 RMB), X7 represents residential water price (ton/RMB), X8 represents population (10 4 persons), X9 represents residential consumption level (RMB), X10 represents agricultural output ratio (%), X11 represents industrial output ratio (%), and X12 represents service industrial output ratio (%). X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 Attribute   Note: X1 represents effective irrigation area (10 3 hectares), X2 represents agricultural GDP (1 Yuan), X3 represents precipitation (billion m 3 ), X4 represents industrial GDP (10 8 RMB), X5 r sents urbanization rate (%), X6 represents service industry GDP (10 8 RMB), X7 represents re tial water price (ton/RMB), X8 represents population (10 4 persons), X9 represents residential sumption level (RMB), X10 represents agricultural output ratio (%), X11 represents industria ratio (%), and X12 represents service industrial output ratio (%).

Evaluation Indexes
In this paper, the following evaluation indicators are selected: mean absolute error (MAE), mean absolute percent error (MAPE) and root mean square error (RMSE). The MAE, MAPE and RMSE are all common measures of forecasting error in time-series analysis. The formulas are as follows: where y o,(t) represents the observed value of y at time t and y m,(t) represents the predicted value of y at time t.

The Attribute Reduction in Water Consumption Based on the Rough Set
Since the RS lacks direct and efficient processing for continuous data, the continuity data need to be discretized before the attribute reduction (discrete data do not need to be discretized). First, the width discretization method [36] was used to discretize the decision attributes. The formula for the breaking point interval I is provided below: where x max is the maximum value in the series, x min is the minimum value in the series, and k is the given parameter, which is the number of intervals. The total water consumption fluctuates from approximately 5.5 billion m 3 to 9 billion m 3 with the interval length 3.5 billion m 3 . Hence it is divided into seven equidistant intervals. The discretization method is presented in Table 2. According to the interval and assignment given in Table 2, the discretization results of decision attributes (water consumption) were obtained. The results are provided in Table 3. The equal width discretization method is a division of the continuous variable value and does not need to consider the variable value of the decision table. Naive Bayes has a solid mathematical foundation and it is a heuristic algorithm that discretizes the continuous condition attributes based on decision attributes. Due to the indistinguishable relationship between condition attributes and decision attributes, the Naive algorithm is used to discretize the continuous condition attributes to obtain a better discretization effect. The results of the discretization are presented in Table 4.
After the data were discretized, the rough set theory was used for attribute reduction. X 1 to X 12 are condition attributes, and total water consumption is a decision attribute. There are many algorithms for condition attribute reduction, and the dynamic reduction algorithm can be said to be a very stable reduction algorithm. The principle of dynamic reduction is to randomly sample a sub-table from a given decision table and then determine the reduction. It adds or removes the condition attribute to the sampled sub-table to correct the reduction result, which effectively enhance the anti-noise ability of the reduction. This article uses dynamic reduction algorithms for attribute reduction. The number of the sampling level is five. The weighted average is based on the frequency of occurrence of the attribute, and the importance of the influencing factors on the water consumption is obtained. The result is illustrated in Figure 4.
As shown in Figure 4, X 1 is the most important influencing factor on the decision attribute. X 1 reflected drought resistance of cultivated land indirectly, that is, when X 1 expands, the water use efficiency increases and the water consumption decreases. In addition, X 1 directly affects the water consumption of the primary industry. As shown in Figure 3, the primary industry and the secondary industry are the main water sectors. Furthermore, X 2 , X 10 and X 11 are the key factors that cannot be omitted. The X 3 in the condition attributes is also highly important. Rainwater can replenish cultivated land and forest land. People can also recycle water resources through rainwater harvesting systems. In summary, based on the combination of qualitative and quantitative analyses, condition attributes with an importance greater than 8% should be selected, that is, X 1 (effective irrigation area), X 2 (agricultural GDP), X 3 (precipitation), X 10 (agricultural output ratio) and X 11 (industrial output ratio). The selected condition attribute is used as a factor for predicting water consumption and input into the prediction model.

The RS-NARX Neural Network
For the NARX modeling, the data from 2001 to 2013 were used to train the model, and the data from 2014 to 2016 were used to test the model. The commonly used empirical formula was used to determine the range of hidden layer neurons [37]. The formula is as follows: where H represents the number of hidden neurons, m represents the number of input neurons, n represents the number of output neurons, and a is a constant between 1 and 10.
As such, the range of hidden neurons is 4-13. To get the optimal parameters, each value was tested 10 times. Thus, the prediction error range corresponding to each parameter value was obtained. The MAE was used to measure the error. The smaller the MAE value was, the smaller the prediction error was. The more dispersed the distribution of MAE was, the more unstable the prediction results were, and vice versa. Here, a box plot is used to show the results of the experiment, which is shown in Figure 5. The choice of the number of neurons in the hidden layer directly affects the prediction result of water consumption. When the number of neurons in the hidden layer is 4, the experimental error is large, and the result is unstable. As the number of neurons in the shadow layer increases, the prediction results become better. When the number of hidden layer neurons is 9, the prediction result is the best. Therefore, the number of hidden neurons in the NARX neural network is set to nine.
where H represents the number of hidden neurons, m represents the number of rons, n represents the number of output neurons, and a is a constant between 1 a As such, the range of hidden neurons is 4-13. To get the optimal param value was tested 10 times. Thus, the prediction error range corresponding to ea eter value was obtained. The MAE was used to measure the error. The smalle value was, the smaller the prediction error was. The more dispersed the dist MAE was, the more unstable the prediction results were, and vice versa. Here, is used to show the results of the experiment, which is shown in Figure 5. Th the number of neurons in the hidden layer directly affects the prediction resu consumption. When the number of neurons in the hidden layer is 4, the experim is large, and the result is unstable. As the number of neurons in the shadow creases, the prediction results become better. When the number of hidden lay is 9, the prediction result is the best. Therefore, the number of hidden neur NARX neural network is set to nine. The number of delays d is a parameter that determines the input delay and feedback delay. A reasonable use of delay parameters can make full use of th law of time series and thus better predict water consumption. The range of d is d by the length of the training set (length = 13, i.e., values in 2001-2013), so d is for modeling investigation. Similarly, to select the best number of delays, the d through experiments. Each value was repeated 10 times, and all the results are in Figure 6. It can be seen from the box plot that when the delay order is three, tion performance is the best and is more stable. The number of delays d is a parameter that determines the input delay and the output feedback delay. A reasonable use of delay parameters can make full use of the inherent law of time series and thus better predict water consumption. The range of d is determined by the length of the training set (length = 13, i.e., values in 2001-2013), so d is set as 1-12 for modeling investigation. Similarly, to select the best number of delays, the d was found through experiments. Each value was repeated 10 times, and all the results are illustrated in Figure 6. It can be seen from the box plot that when the delay order is three, the prediction performance is the best and is more stable.
Based on the above tests, it is found that when the number of delays is three (that is, using the data for the first three years to predict the water consumption in the following year as a cycle), the prediction result is good. Therefore, y(t) is determined by the following variables: After all of the parameters determined, the trained NARX neural network framework is illustrated in Figure 7. In Figure 7, Y = [y(t − 1), y(t − 2), y(t − 3)] refers to the delayed feedback vector. Based on the above tests, it is found that when the number of delays is three (that is using the data for the first three years to predict the water consumption in the following year as a cycle), the prediction result is good. Therefore, y(t) is determined by the following variables: After all of the parameters determined, the trained NARX neural network framework is illustrated in Figure 7. In Figure 7, Y = [y(t − 1), y(t − 2), y(t − 3)] refers to the delayed feedback vector. After setting up the neural network structure, the RS-NARX neural network mode is trained. The Nested Cross-Validation (NCV) method is used to test the model [38]. A method based on Forward-Chaining is used to cross-validate time series data to avoid data leakage. The triennial data is taken as the test set, and all the previous data is assigned to the training set. In this experiment, the delay number d is three. The average results are illustrated in Figure 8. The proposed RS-NARX neural network model predicts the trend  Based on the above tests, it is found that when the number of delays is three (that is, using the data for the first three years to predict the water consumption in the following year as a cycle), the prediction result is good. Therefore, y(t) is determined by the following variables: After all of the parameters determined, the trained NARX neural network framework is illustrated in Figure 7. In Figure 7, Y = [y(t − 1), y(t − 2), y(t − 3)] refers to the delayed feedback vector. After setting up the neural network structure, the RS-NARX neural network model is trained. The Nested Cross-Validation (NCV) method is used to test the model [38]. A method based on Forward-Chaining is used to cross-validate time series data to avoid data leakage. The triennial data is taken as the test set, and all the previous data is assigned to the training set. In this experiment, the delay number d is three. The average results are illustrated in Figure 8. The proposed RS-NARX neural network model predicts the trend of water consumption accurately. However, the prediction of the abrupt nodes in water After setting up the neural network structure, the RS-NARX neural network model is trained. The Nested Cross-Validation (NCV) method is used to test the model [38]. A method based on Forward-Chaining is used to cross-validate time series data to avoid data leakage. The triennial data is taken as the test set, and all the previous data is assigned to the training set. In this experiment, the delay number d is three. The average results are illustrated in Figure 8. The proposed RS-NARX neural network model predicts the trend of water consumption accurately. However, the prediction of the abrupt nodes in water consumption needs to be improved. At the beginning, the total water consumption used has been increasing year by year. With the development of the population and the economy, the demand for water has also increased. Since 2010, water consumption has started to decrease, which contradicts the growth in the population and the economy. This is mainly a result of the guiding policies of the government during the Twelfth Five-Year Plan period, during which the government vigorously promoted the water conservation policy. The most stringent water management system was introduced in 2012, which led to a sharp drop in water consumption in 2012. Overall, the prediction of the RS-NARX neural network model is very good. The following is a detailed analysis of the prediction results from the error of each year. omy, the demand for water has also increased. Since 2010, water consumption has sta to decrease, which contradicts the growth in the population and the economy. Th mainly a result of the guiding policies of the government during the Twelfth Five-Plan period, during which the government vigorously promoted the water conserva policy. The most stringent water management system was introduced in 2012, whic to a sharp drop in water consumption in 2012. Overall, the prediction of the RS-N neural network model is very good. The following is a detailed analysis of the predi results from the error of each year. The error of the prediction results are illustrated in Figure 9. There are large erro several nodes of the training set (i.e., values in year 2009 and 2012). Unpredictable p impacts occurred in 2012, which led to a major bias in water consumption forecas Additionally, the forecasting error of 2013-2016 decreased year by year. Driven by st policies, the gradual reduction in water consumption has stabilized. This is the main son for the high prediction accuracy. Overall, the error of all predicted nodes is contr within 0.2. The predicted results are acceptable. The error of the prediction results are illustrated in Figure 9. There are large errors in several nodes of the training set (i.e., values in year 2009 and 2012). Unpredictable policy impacts occurred in 2012, which led to a major bias in water consumption forecasting. Additionally, the forecasting error of 2013-2016 decreased year by year. Driven by strong policies, the gradual reduction in water consumption has stabilized. This is the main reason for the high prediction accuracy. Overall, the error of all predicted nodes is controlled within 0.2. The predicted results are acceptable.
Water 2021, 13, x FOR PEER REVIEW 13 o Figure 9. Error analysis using RS-NARX neural network.

Comparison
To prove the superiority of the RS-NARX neural network model, a single NA neural network model (without RS) and the BPNN model were chosen as references. Si

Comparison
To prove the superiority of the RS-NARX neural network model, a single NARX neural network model (without RS) and the BPNN model were chosen as references. Similarly, the parameters of the comparison model were obtained experimentally (repeat the experiment 10 times for each value to obtain the best parameters). Table 5 shows the parameter settings of the comparison models. Among them, "Hidden layer size" represents the number of neurons in hidden layer. Similarly, the comparison model is tested using the NCV method. The results of the compared models are illustrated in Figure 10. The prediction results of the comparison model are not as accurate as those of the RS-NARX model. The prediction results of the NARX model are more accurate than those of the BPNN. The single NARX neural network model performed poorly on the prediction of the mutated node. Therefore, the use of rough set theory makes the input data set more streamlined, which removes redundant information, and successfully improves the prediction accuracy of the NARX neural network model. This is the reason why NARX neural network model is better adapted to the mutation nodes of nonlinear dynamic data. As can be seen from the prediction results of the BPNN model, a certain node change from the original data causes a change in the overall prediction trend. This is disadvantageous for the prediction of nonlinear dynamic data.
Water 2021, 13, x FOR PEER REVIEW 14 Figure 10. Results of the comparison models. To analyze the error distribution of the comparison models more easily, the error of each node of the two comparison models is provided. The prediction errors of the comparison models are illustrated in Figure 11. The error of the comparison model is large. At some nodes, the BPNN model has greater error results than a single NARX neural model.  Figure 10. Results of the comparison models. Figure 11. Error analysis using comparison models. Table 6 lists the errors of the different models. As shown in Table 6, the neural network model has higher accuracy. The RS theory is used to pre-proce set, thus reducing the interference of unnecessary data to the model. In ad NARX neural network has the memory function of the dynamic neural netwo linear dynamic data can be better fitted. Hence the proposed framework im prediction accuracy of the model. In short, the rank of these models is RS-NA NARX, and BPNN (worst). Therefore, the proposed RS-NARX neural networ effective in forecasting water consumption.  Figure 11. Error analysis using comparison models. Table 6 lists the errors of the different models. As shown in Table 6, the RS-NARX neural network model has higher accuracy. The RS theory is used to pre-process the data set, thus reducing the interference of unnecessary data to the model. In addition, the NARX neural network has the memory function of the dynamic neural network, so nonlinear dynamic data can be better fitted. Hence the proposed framework improves the prediction accuracy of the model. In short, the rank of these models is RS-NARX (best), NARX, and BPNN (worst). Therefore, the proposed RS-NARX neural network model is effective in forecasting water consumption.

Conclusions
In this paper, the proposed RS-NARX neural network model is reported to predict the water consumption of Chongqing. First, the RS theory is used to reduce the attribute, and the key influence factors of water consumption are obtained. The reduction results are used as the inputs of the predictive model, and the NARX neural network model is used to predict water consumption. The results indicate that the proposed model is more accurate than a single NARX model and a BPNN model.
The proposed RS-NARX neural network model combines the advantages of the RS theory with those of NARX neural networks. The RS theory removes information redundancy and improves the prediction efficiency and accuracy of NARX neural networks, so that the NARX neural network model can better fit the nonlinear dynamic sequence. The results of predicting water consumption using the RS-NARX model are satisfactory. The results can provide recommendations for the allocation of water resources.