# ARIMA-M: A New Model for Daily Water Consumption Prediction Based on the Autoregressive Integrated Moving Average Model and the Markov Chain Error Correction

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Water Data Pre-Processing

_{n}. The decomposed data are arranged, according to the frequency from high to low, and afterwards the highest frequency component is removed and the residual component is summed to obtain the new data as the de-noised data.

## 3. Prediction of Water Consumption Based on Markov Chain Modification

#### 3.1. Prediction Model Based on ARIMA

_{t}is processed by the d difference to develop the stable historical data y

_{t}, fitted to the ARMA (p, q) model to predict the consumption, and then the original data x

_{t}is obtained by d times contrast difference. The ARMA model is expressed as follows:

#### 3.2. Markov Chain Theory

_{j}after k-time processing, when the variable X is in state S

_{i}on the time m.

#### 3.3. Modifying ARIMA Water Consumption Forecast Based on Markov Chain

_{r}into N states, ${D}_{1},{D}_{2},\cdots ,{D}_{N}$. Considering the randomness nature of the water consumption data, the data distribution law is unclear. In order to evenly divide the data sequence into several states, this study proposed the use of the method of k-means algorithm on state division.

_{t+n}be the water consumption data at the time of t+n predicted by the ARIMA model, $\overline{{D}_{te}}$ be the average predicted value based on Markov chain, and $\overline{{y}_{te}}$ be the average predicted value of the ARIMA model. As the error value of the ARIMA prediction increases gradually, in the predicted value of the time t+n in future, the correction coefficient f

_{t+n}is used to correct the error value. Then, the modified predicted water consumption data $\widehat{{y}_{t+n}}$ at the time of t+n is expressed as Formula (7). Because the error value of the ARIMA prediction in the future is the cumulative error, one-by-one, therefore, the value of the correction factor is increased gradually, hence Formula (8) is adopted so as to improve the prediction accuracy.

## 4. Data Analysis

#### 4.1. Data Pre-Processing

#### 4.2. Model Validation

_{1}of the monitoring point 1 fluctuated within a wide range. To eliminate the fluctuation trend of its time series, the data sequence of X

_{1}was differentially processed and data sequence of DX

_{1}was obtained. As can be seen from Figure 6, the sequence after the first-order difference fluctuated steadily, around the mean value. Figure 7 displays the autocorrelation diagram after the first-order difference of the water consumption sequence. It can be seen from the figure that the autocorrelation coefficient is greater than zero for a long time, indicating the presence of a strong property between the sequences. The stationary state of the Augmented Dickey-Fuller (ADF) unit root test sequence was selected (see Table 3). The p-value of the unit root test was less than 0.05, suggesting the sequence after the first difference was a stationary sequence.

_{0}= [0,0,0,0,1], according to the water consumption data, then the state vector of the next day is P

_{1}= ${P}_{0}\times {P}^{\left(1\right)}$. According to Equation (6), the predicted value is [127114.01 88890.54 121126.54 109786.08 137019.38]. In the same way, the prediction value of the next n days is calculated, accordingly, on the basis of the method of the modified ARIMA model, that is, combining the predicted value of the Markov chain to modify the predicted result of the ARIMA in proportion.

^{2}), and the relative prediction error (RE) were selected as the evaluation indexes. The RMSE reflects the difference between the original value and the estimated value. The smaller the value, the closer the predicted value is to the real value, and the better the prediction effect. The R

^{2}can represent the whole fitting degree of the prediction model. The closer the R

^{2}is to 1, the better the fitting degree of the prediction value to the observation value, and the better the prediction performance of the model. The RE is the ratio of absolute error to the real value. The relative error reflects the reliability of the prediction. If the true real value and the predicted value of data r are T

_{i}and Y

_{i}, respectively, N is the number of predicted samples, and the average value of all data values is $\overline{{T}_{i}}$, then RMSE can be calculated through Equation (12), and R

^{2}and RE can be expressed by Equations (13) and (14).

^{2}) was close to 1; therefore, the training dataset can be better fitted by this model. The training data mean square error, coefficient of determination, and relative error rate of the Markov model were much larger than those of the ARIMA model. The relative errors of the Markov model for monitoring point 1 and monitoring point 2 were about 13 and 18 times that of the ARIMA, respectively. Therefore, the ARIMA model provided good fitting results for the training data, and the relative error RE of the Markov prediction was less than 2.5%, which can meet the requirements of the daily water consumption data prediction.

^{2}value was only −0.04, and the relative error reached 8.07. Using ARIMA-M, the RMSE of the predicted value of the test set was decreased by 25%, R

^{2}was increased by more than 10 times, and relative error was decreased by 24.4%, in comparison with the traditional ARIMA. For monitoring point 2, compared to the ARIMA, the RMSE of predicted value on ARIMA-M test set and the relative error were reduced by 18.4% and 13%, respectively.

## 5. Discussion and Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Yasar, A.; Bilgili, M.; Simsek, E. Water demand forecasting based on stepwise multiple nonlinear regression Analysis. Arab. J. Sci. Eng.
**2012**, 37, 2333–2341. [Google Scholar] [CrossRef] - Brekke, L.; Larsen, M.D.; Ausburn, M.; Takaichi, L. Suburban water demand modeling using stepwise regression. J. AWWA
**2002**, 94, 65–75. [Google Scholar] [CrossRef] - Brezonik, P.L.; Stadelmann, T.H. Analysis and predictive models of stormwater runoff volumes, loads, and pollutant concentrations from watersheds in the Twin Cities metropolitan area, Minnesota, USA. Water Res.
**2002**, 36, 1743–1757. [Google Scholar] [CrossRef] - Adamowski, J.; Fung Chan, H.; Prasher, S.O.; Ozga-Zielinski, B.; Sliusarieva, A. Comparison of multiple linear and nonlinear regression, autoregressive integrated moving average, artificial neural network, and wavelet artificial neural network methods for urban water demand forecasting in Montreal, Canada. Water Resour. Res.
**2012**, 48. [Google Scholar] [CrossRef] - Anderson, T.W.; Goodman, L.A. Statistical inference about markov chains. Ann. Math. Stat.
**1957**, 28, 89–110. [Google Scholar] [CrossRef] - Tsaur, R.-C. A fuzzy time series-Markov chain model with an application to forecast the exchange rate between the Taiwan and US dollar. Int. J. Innov. Comput. Inf. Control
**2012**, 8, 4931–4942. [Google Scholar] - Yu, G.; Hu, J.; Zhang, C.; Zhuang, L.; Song, J. Short-term traffic flow forecasting based on Markov chain model. In Proceedings of the IEEE IV2003 Intelligent Vehicles Symposium, Proceedings (Cat. No.03TH8683), Columbus, OH, USA, 9–11 June 2003; pp. 208–212. [Google Scholar]
- Carpinone, A.; Giorgio, M.; Langella, R.; Testa, A. Markov chain modeling for very-short-term wind power forecasting. Electr. Power Syst. Res.
**2015**, 122, 152–158. [Google Scholar] [CrossRef] [Green Version] - Kani, S.A.P.; Ardehali, M.M. Very short-term wind speed prediction: A new artificial neural network–Markov chain model. Energy Convers. Manag.
**2011**, 52, 738–745. [Google Scholar] [CrossRef] - Haan, C.T.; Allen, D.M.; Street, J.O. A markov chain model of daily rainfall. Water Resour. Res.
**1976**, 12, 443–449. [Google Scholar] [CrossRef] - Su, F.; Wu, J.; He, S. Set pair analysis-Markov chain model for groundwater quality assessment and prediction: A case study of Xi’an city, China. Hum. Ecol. Risk Assess. Int. J.
**2019**, 25, 158–175. [Google Scholar] [CrossRef] - Gagliardi, F.; Alvisi, S.; Kapelan, Z.; Franchini, M. A probabilistic short-term water demand forecasting model based on the markov chain. Water
**2017**, 9, 507. [Google Scholar] [CrossRef] - Box, G. Box and jenkins: Time series analysis, forecasting and control. In A Very British Affair: Six Britons and the Development of Time Series Analysis During the 20th Century; Palgrave Macmillan: London, UK, 2013; pp. 161–215. ISBN 978-1-137-29126-4. [Google Scholar]
- Lippi, M.; Bertini, M.; Frasconi, P. Short-term traffic flow forecasting: An experimental comparison of time-series analysis and supervised learning. IEEE Trans. Intell. Transp. Syst.
**2013**, 14, 871–882. [Google Scholar] [CrossRef] - Shvartser, L.; Shamir, U.; Feldman, M. Forecasting hourly water demands by pattern recognition approach. J. Water Resour. Plan. Manag.
**1993**, 119, 611–627. [Google Scholar] [CrossRef] - Mombeni, H.A.; Rezaei, S.; Nadarajah, S.; Emami, M. Estimation of water demand in iran based on sarima models. Environ. Model. Assess.
**2013**, 18, 559–565. [Google Scholar] [CrossRef] - Hao, C.-F.; Qiu, J.; Li, F.-F. Methodology for analyzing and predicting the runoff and sediment into a reservoir. Water
**2017**, 9, 440. [Google Scholar] [CrossRef] [Green Version] - Graf, R. Distribution properties of a measurement series of river water temperature at different time resolution levels (based on the example of the lowland river noteć, Poland). Water
**2018**, 10, 203. [Google Scholar] [CrossRef] [Green Version] - Wang, Z.Y.; Qiu, J.; Li, F.F. Hybrid models combining emd/eemd and arima for long-term streamflow forecasting. Water
**2018**, 10, 853. [Google Scholar] [CrossRef] [Green Version] - Guarnaccia, C.; Tepedino, C.; Viccione, G.; Quartieri, J. Short-term forecasting of tank water levels serving urban water distribution networks with arima models. In Proceedings of the Frontiers in Water-Energy-Nexus—Nature-Based Solutions, Advanced Technologies and Best Practices for Environmental Sustainability, Cham, Switzerland, 19 September 2019; Naddeo, V., Balakrishnan, M., Choo, K.-H., Eds.; Springer: Cham, Switzerland, 2020; pp. 25–28. [Google Scholar]
- Donkor, E.A.; Mazzuchi, T.A.; Soyer, R.; Roberson, J.A. Urban water demand forecasting: Review of methods and models. J. Water Resour. Plan. Manag.
**2014**, 140, 146–159. [Google Scholar] [CrossRef] - Bennett, C.; Stewart, R.A.; Beal, C.D. ANN-based residential water end-use demand forecasting model. Expert Syst. Appl.
**2013**, 40, 1014–1023. [Google Scholar] [CrossRef] [Green Version] - Mouatadid, S.; Adamowski, J. Using extreme learning machines for short-term urban water demand forecasting. Urban Water J.
**2017**, 14, 630–638. [Google Scholar] [CrossRef] - Adebiyi, A.A.; Adewumi, A.O.; Ayo, C.K. Comparison of arima and artificial neural networks models for stock price prediction. Environ. Model. Softw.
**2002**, 17, 219–228. [Google Scholar] [CrossRef] [Green Version] - Sebri, M. Ann versus sarima models in forecasting residential water consumption in Tunisia. J. Water Sanit. Hyg. Dev.
**2013**, 3, 330–340. [Google Scholar] [CrossRef] - Schittenkopf, C.; Deco, G.; Brauer, W. Two strategies to avoid overfitting in feedforward networks. Neural Netw.
**1997**, 10, 505–516. [Google Scholar] [CrossRef] - Huang, N.E.; Shen, Z.; Long, S.R.; Wu, M.L.C.; Shih, H.H.; Zheng, Q.N.; Yen, N.C.; Tung, C.C.; Liu, H.H. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. A Math. Phys. Eng. Sci.
**1998**, 454, 903–995. [Google Scholar] [CrossRef] - Huang, N.E.; Wu, Z. A review on Hilbert-Huang transform: Method and its applications to geophysical studies. Rev. Geophys.
**2008**, 46. [Google Scholar] [CrossRef] [Green Version] - Wu, Z.; Huang, N.E. A study of the characteristics of white noise using the empirical mode decomposition method. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci.
**2004**, 460, 1597–1611. [Google Scholar] [CrossRef] - Zhang, Y.; Kim, C.-W.; Tee, K.F. Maintenance management of offshore structures using Markov process model with random transition probabilities. Struct. Infrastruct. Eng.
**2017**, 13, 1068–1080. [Google Scholar] [CrossRef]

**Figure 6.**The original and first-order difference of total water consumption at the monitoring point 1.

Forecast Method of ARIMA | |
---|---|

A1.1 | Stability treatment: The training set of original sequence is tested for stationarity. If the data sequence is non-stationary, the difference operation is carried out to determine the difference order d, to obtain the stationary state. |

A1.2 | Model selection: The parameters p and q of the ARIMA model are determined. According to the BIC criterion, the p and q values, which minimize the BIC value, are selected. |

A1.3 | Model test: Whether the residual data sequence after fitting by the selected model is white noise. If the residual is white noise, the model is valid. |

A1.4 | Forecast future data: The valid ARIMA (p, d, q) model is used to predict the data in the next few days. |

The Proposed Markov Chain-Modified ARIMA Prediction | |
---|---|

A2.1 | The water consumption data series ${D}_{r}$ is divided into N states. The k-means clustering algorithm is used to cluster the data sequence, and the states of each value in the sequences, the partition of N states, and the mean value E_{i} of state i are obtained. |

A2.2 | One step state transition matrix P^{(1)} is calculated by Formula (4). According to the change of state in the sequence, the state transition frequency ${f}_{ij}$ is obtained, and then the transition probability ${p}_{ij}$ of each state is obtained according to Formula (5). |

A2.3 | Select the time t as the initial state, and get the initial state vector ${X}_{t}=\left({x}_{t,1},{x}_{t,2},\cdots ,{x}_{t,N}\right)$. The data of the day before the forecast date is taken as the initial state. |

A2.4 | Calculate the state vector ${X}_{t+1}$ of water consumption to be predicted at the next time. Let ${x}_{t+1,i}$ represent the probability of state i at time t+1, then the state vector at time t+1 is the product of state vector at time t and transfer matrix, ${X}_{t+1}$= ${X}_{t}{P}^{\left(1\right)}$. |

A2.5 | The prediction value ${D}_{t+1}$ of future time based on Markov chain is calculated, which is expressed as Formula (6). |

A2.6 | Repeat steps A2.3–A2.6 to find the predicted water consumption of Markov chain at each time to be predicted. |

A2.7 | The prediction value of water consumption data at the time of t+n is obtained on the basis of the Markov chain prediction value and the ARIMA prediction value by Formula (7). |

ADF | Critical | Value | p-Value | |
---|---|---|---|---|

test | 1% | 5% | 10% | |

−6.99 | −3.45 | −2.87 | −2.57 | 7.72 × 10^{−10} |

**Table 4.**The white noise test results of the water consumption data difference at monitoring point 1.

Stat | 5% |
---|---|

312.49 | 6.26 × 10^{−70} |

ADF | Critical Value | p-Value | ||
---|---|---|---|---|

test | 1% | 5% | 10% | |

−8.18 | −3.45 | −2.87 | −2.57 | 8.06 × 10^{−13} |

Stat | 5% |
---|---|

316.44 | 8.62 × 10^{−71} |

RMSE | R^{2} | RE | |
---|---|---|---|

ARIMA | 275.17 | 0.9994 | 0.19 |

Markov | 3919.08 | 0.90 | 2.47 |

RMSE | R^{2} | RE | |
---|---|---|---|

ARIMA | 300.93 | 0.9996 | 0.13 |

Markov | 5628.25 | 0.89 | 2.34 |

ID | Actual Water Consumption (m^{3}) | ARIMA Forecast | ARIMA-M Forecast | RE of ARIMA Forecast (%) | RE of ARIMA-M Forecast (%) | RE Decrease of ARIMA-M Compared with ARIMA |
---|---|---|---|---|---|---|

1 | 136,226 | 157,671.60 | 131,251.64 | 15.74 | −3.65 | 12.09 |

2 | 132,041.7 | 155,218.90 | 129,209.92 | 17.55 | −2.14 | 15.41 |

3 | 130,589.9 | 153,773.05 | 128,006.34 | 17.75 | −1.98 | 15.77 |

4 | 131,616.3 | 153,390.78 | 127,688.13 | 16.54 | −2.98 | 13.56 |

5 | 134,733.5 | 153,969.18 | 128,169.61 | 14.28 | −4.87 | 9.41 |

6 | 138,878.1 | 155,285.24 | 129,265.14 | 11.81 | −6.92 | 4.89 |

7 | 142,930.4 | 157,046.25 | 130,731.08 | 9.88 | −8.54 | 1.34 |

8 | 145,891.4 | 158,942.22 | 132,309.34 | 8.95 | −9.31 | −0.36 |

9 | 147,015.6 | 160,692.44 | 133,766.30 | 9.30 | −9.01 | 0.29 |

10 | 146,597.7 | 162,080.79 | 134,922.01 | 10.56 | −7.96 | 2.6 |

RMSE | R^{2} | RE | |
---|---|---|---|

ARIMA | 14,085.60 | −0.04 | 8.07 |

ARIMA-M | 10,569.32 | 0.42 | 6.10 |

RMSE | R^{2} | RE | |
---|---|---|---|

ARIMA | 18,388.74 | −3.04 | 8.07 |

ARIMA-M | 15,003.34 | −1.69 | 7.02 |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Du, H.; Zhao, Z.; Xue, H.
ARIMA-M: A New Model for Daily Water Consumption Prediction Based on the Autoregressive Integrated Moving Average Model and the Markov Chain Error Correction. *Water* **2020**, *12*, 760.
https://doi.org/10.3390/w12030760

**AMA Style**

Du H, Zhao Z, Xue H.
ARIMA-M: A New Model for Daily Water Consumption Prediction Based on the Autoregressive Integrated Moving Average Model and the Markov Chain Error Correction. *Water*. 2020; 12(3):760.
https://doi.org/10.3390/w12030760

**Chicago/Turabian Style**

Du, Hongyan, Zhihua Zhao, and Huifeng Xue.
2020. "ARIMA-M: A New Model for Daily Water Consumption Prediction Based on the Autoregressive Integrated Moving Average Model and the Markov Chain Error Correction" *Water* 12, no. 3: 760.
https://doi.org/10.3390/w12030760