Drinking Water Tank Level Analysis with ARIMA Models: A Case Study

: The operational management of tanks for urban water distribution networks is usually a critical element due to the dynamic nature of the water demand and the age of the distribution networks themselves. Today, in a context of water resource scarcity, optimal management is a key point for the sustainable management of urban systems. For this purpose, it is useful to implement predictive tools, able to provide short-term forecasts to inform urban water managers on the most suitable procedure to be applied in the case of routine or critical events. A possible approach is to use autoregressive integrated moving average (ARIMA) models, which combine the autoregression and the moving average approaches, with the possibility to work on a differenced series of the data. They can further embed a seasonal- component (Seasonal ARIMA models), to account for possible periodic patterns in the observed data. In this study, the data of water levels measured from May 2018 to 10 January 2019 in a water storage tank in the area of Benevento, Campania region (Italy), were considered as a case study. The standard ARIMA techniques were applied to find the best model for this dataset, according to “Deviance Information Criterion” (DIC) and “Bayesian Information Criterion” (BIC) optimization. The results are discussed, shedding light on the behaviour of the time series with reference to the management of the infrastructure and the dataset. The residual analysis, carried out to check if the autocorrelation was still present and if the residuals were normally distributed, revealed a narrow distribution. Small values were found throughout the dataset, except for a few periods, corresponding to the imputed data. This application represents a preliminary step of more detailed research that will be carried out to detect the best model for forecasting tank levels for the case study to help to manage the urban water supply.


Introduction
The management of water distribution networks (WDNs) relies on water utility operations consisting of usually quick responses to either water demand or source variations as well as the effects of network aging [1,2]. Recently, the development of real-time control (RTC) strategies based on the use of measurement devices, with compact technology at affordable prices, has been facilitated by their straightforward implementation in Internet of Things (IoT) technologies as well as in Supervisory Control and Data Acquisition (SCADA) systems [3]. IoT allows automatic WDN monitoring and control as well as SMS alerting by operating on object components, interconnected through low-cost wired and wireless network sensors [4,5]. SCADA systems consist of distributed control systems that allow devices to be turned on or off remotely while displaying real-time operations in a graphical user interface (GUI) for high-level process supervisory management [6,7].
In this evolving context, storage tanks play a key role, actually acting as lungs [8]-that is, by balancing instantaneous flow variations in the water demand pattern as well as compensating abrupt interruptions of the water feed to the storage tank, as in cases of drought periods or electricity shortages in pumping stations delivering water, when the tank level fluctuates within a fixed range of levels. Water tank levels can be modelled through hydraulic models when the water demand and the management rules and operations are known, but in practical applications, the latter facts are not always fully known. Water demand/tank level prediction and forecasting are therefore a crucial step for supporting decision making regarding operating actions.
From a "modellistic" point of view, the AutoRegressive Integrated Moving Average (ARIMA) typology of models is well established, having been applied in the field of water demand forecasting for a long time [8][9][10][11]. This is justified by the fact that the model follows the trend at different time scales. Despite the applications in urban water demand, there is a gap in the literature concerning the use of ARIMA models for tank water levels [12]. In [13], the link between the water supply, consumer demand and water level at the tank is, however, discussed, with the aim of providing a practical tool for water utilities to take prompt action based on water level variations [14]. The definition of ARIMA models or, more generally, time-series analysis techniques applied to water levels would allow the definition of water leakages at the tank as well, helping to save water, on one hand, and treatment costs related to chlorination or purifying techniques. This is an aspect of paramount importance as the circumstance in which the tank is not able to serve due to water scarcity is not rare, whereas there is a waste of the resource when the water inflow is not controlled [14].

Methodology
In this paper, we assess the performance of one of the most conventional linear models, widely used in the literature for the forecasting and management of several datasets: the Box-Jenkins/ARIMA model (see, for instance, [13,[15][16][17][18]).
The order of an ARIMA model is represented by the notation ARIMA (p, d, q), where p, d and q are, respectively, the order of the autoregressive part, the order of the differencing and the order of the moving-average process. The general source formula is: in which Yt is the value of the series observed at the time t, B is the delay operator, φ and ϑ are the autoregressive and the moving average polynomials and et is the difference between the observed value Yt and the forecast Y t at the time t. In the case study presented in this paper, the chosen model is ARIMA (2, 1, 2), according to "Deviance Information Criterion" (DIC) and "Bayesian Information Criterion" (BIC) optimization. The choice was performed with the aid of the statistical program "R".

Dataset Analysis
This statistical study was performed on the time series of the levels observed at the Gesuiti water tank, located in the neighbourhood of Pezzapiana, of the water supply system of the town of Benevento, Italy. The data were measured almost continuously and with a time interval never smaller than 5 min (minimum of 12 samples per hour), from 10:00 of 5 May 2018 to 9:00 of 10 January 2019. Hourly averages were calculated with the available data, resulting in a number of 6000 periods in total. A plot of the input dataset is shown in Figure 1. It can be noticed that the maximum levels observed are never larger than zmax = 5.72 m. This is basically due to the presence of an automatic system of water outlet-that is, a tank spillway-which is allocated at an elevation of 5.80 m, consistent with the observed value of zmax.
Two large intervals of data were missing, from 19:00 of 4 August 2018 to 9:00 of 14 August 2018, and from 16:00 of 18 August 2018 to 13:00 of 6 September 2018. Since the dataset needs to be continuous for the Time Series Analysis (TSA) techniques, a preliminary Deterministic Decomposition model (DD-TSA) [15] was calibrated on the first 2193 data, in order to impute the missing data. When the number of missing measurements was smaller than 10, the missing data were imputed simply with the last available data. On the contrary, for the two large intervals described above, the results of the DD-TSA were used.
The summary statistics of the reconstructed calibration dataset are reported in Table 1. Figure 2 and Figure 3 report, respectively, the autocorrelation function and the histogram of the data. The correlogram reported in Figure 2 shows that there is a daily seasonality (lag = 24). In addition, a relative maximum is observed for lag = 168, meaning that a weekly seasonality could be explored as well.
The distribution of the data reported in Figure 3 is skewed, due to the typical daily pattern of a water tank. The left tail has a low frequency occurrence because the situation of low storage in the tank is uncommon. A marked drop in frequency can be observed on the right side of the distribution, the range of water levels between 5 m and 6 m, because of the presence of the spillway, previously mentioned. The mode of the distribution is not centred but skewed to the right as the range 4-4.50 likely represents the optimal storage level at which the tank operates for water distribution.

ARIMA Model Calibration
As mentioned above, the adopted model is ARIMA (2, 1, 2). This model embeds a differentiation in the data of order 1. Autoregressive and moving average terms are included, both of them of order 2. The prediction provided by the model for a generic period t is described by the following equation: This model provides one-step-ahead simulation. Coefficients were estimated using the likelihood maximization as technique for parameter estimation, in the calibration dataset. Calculations have been performed by means of the statistical program "R". Table 2 shows the estimated values of the coefficients of the model. The plot of the estimated hourly water tank levels is reported in Figure 4. It can be noticed that the slope of the data is very similar to the one shown in Figure 1. The simulated data present a stationary behaviour in two time ranges, in the period 2194 to 2424 and the period 2527 to 2980. This is due to the fact that these ranges are the ones in which the dataset was reconstructed, imputing missing data with the DD-TSA model.

Results and Discussion
The ARIMA (2, 1, 2) model exhibits excellent performance when comparing the estimations with the measurements in the calibration dataset. Despite a few outliers, probably related to sudden spikes in the calibration dataset, the simulations are always very close to the measurements. This result can be quantitatively summarized in the residual analysis.
In Figure 5, the residuals of the model-the differences between the observed and simulated data-are plotted. Residuals larger than 0.5 m in absolute value are always related to periods in which the measurements were missing, and the estimated levels are compared with the imputations. In Figure 6, a histogram of the residuals of the model is presented. A very narrow distribution of the residuals is obtained, as can be expected when looking at the plot in Figure 5, since the largest part of the data is gathered in a ±0.5 m interval with respect to zero. Basically, the model has very small residuals throughout the dataset, except for a few periods, corresponding to the imputed data. The autocorrelation of the residuals is shown in Figure 7. The values are very low, except for two relative maxima for lag = 12 and lag = 24. This result confirms the good performance of the ARIMA model and suggests further applications for which a seasonal model could be tested.  Table 3. Besides the interesting result of very small mean and median values, it is valuable to confirm the presence of outliers by looking at the minimum and maximum values.

Conclusions
Today, in a context of water resource scarcity, optimal management is of paramount importance for the sustainable management of urban water networks. The management relies on water utility operations consisting of usually quick responses to either water demand or source variations as well as the effects of network aging.
In this framework, the present work aimed at the simulation of drinking water tank levels by time series analysis to support water distribution managers. The case study referred to the time series of the levels observed at the Gesuiti water tank, belonging to the water supply system of the town of Benevento, Italy. Since two large intervals of data were missing, data imputation was necessary to obtain a continuous series. This was achieved by the use of a preliminary DD-TSA model. ARIMA (2, 1, 2) was chosen as the optimal statistical model for the purpose, according to the BIC and DIC criteria.
The analysis of the model residuals showed a good agreement between the observed and simulated data. The residuals appeared with a zero mean value and a very moderate correlation at lag 12 and 24, which would suggest a seasonal component to be accounted for in the model description, which is foreseen in order to improve the data simulation for future applications.