1. Introduction
Water has vital importance for the survival of living beings. It is the basic element in terms of maintaining life in nature and human activities. The fact that 75% of the Earth’s surface is covered with water creates a perception that water scarcity would never be an issue to be discussed; more than 97% of this water is seawater, 2% is a mass of ice, and the large part of the remaining 1% is groundwater which is difficult to reach [
1]. Only a tiny fraction of the water that forms the large part of our planet we live consists of healthy drinkable water. Thankfully, this water is renewed by nature’s solar-powered water cycle. With the evaporation fueled by the sun’s energy, water vapor is carried to the atmosphere. Of this evaporation, 86% occurs from the sea and 14% from land [
1]. Even though an equal amount falls back to the Earth as rain, sleet, or snow, the distribution of water is more on continents than in oceans. With the transfer of water from the sea to land in this repeatable process, the renewable local drinking water resources occur. However, increasing population and air temperatures are effective factors in reducing drinking water availability per capita. The demand for drinking water will increase due to increasing population. A reduction in drinking water availability with increasing water demand, will equal an increase in stress on water supplies, as water supplies cannot be replenished quickly enough to meet demand. Moreover, agriculture already accounts for approximately 70% of the drinking water withdrawals in the world and is commonly seen as one of the main factors behind the growing global scarcity of freshwater [
2].
Although a lot of international agreements and declarations such as the International Convention on the Rights of the Child, [
3], the Human Rights Council Resolution, [
4], and the Water Framework Directive, [
5], stated the right to safe drinking water and sanitation is an internationally recognized human right, a great part of the world’s population cannot benefit from this right [
6].
Turkey, a country with land both in Asia and Europe and a population of approximately 81 million people, is exposed to dry seasons. European Environment Agency reported that Turkey will encounter moderate and high-level water scarcity in many areas [
7]. Thus, it is obvious that Turkey is a candidate country to experience problems with water scarceness. Considering that the population is predicted to be close to 100 million in 2030, [
8], it is crucial to take precautions to avoid water shortages and to produce better water policies. Hence, it is important for a country to both follow and also predict the tap water consumer ratios (TWCRs) particularly to take measures to decrease the negative effects of the reduction in tap water that could occur very soon.
Registered water use in a country is important to ensure planned and economical water use. Therefore, monitoring the TWCRs of watersheds and predictions of their near future values would make it easier to establish a basis for water-related precautions to form. In the literature, there are studies on water and its future predictions [
9,
10,
11,
12,
13]. Information on the assessment of sustainable water consumption perception, the evaluation of direct and indirect water consumption through the water footprint indicator, and the link between urban services and water uses are examined elsewhere [
14]. Water footprint is described as an indicator of water use in relation to human consumption [
15]. To incorporate the advances of life cycle assessment and water footprint analysis, an associated indicators set has been developed (see Reference [
16]). However, ignoring outliers which are points that differ from the bulk of the data or fit a different distribution could cause biases in the findings [
17].
This work aims to construct growth curve models (GCMs) to predict Turkey’s TWCRs. It is the ratio of the number of households using tap water in a particular region to the total number of households in that region. In this study, there are 26 grouped cities corresponding to 26 particular regions. Grouping of the cities is according to the watershed use of households. In the construction of the models, the non-robust ordinary least square estimator (OLS), which is employed in general, and robust least median square (LMS) and M estimators are used. This study demonstrates that detected outlying points differ according to which estimator is employed. Hence, every estimated model and hence the short-term predictions that are produced by the estimated model vary with respect to estimators. Thus, making better predictions for TWCRs to take convenient water policies depends on employing robust estimators that handle outliers very well during the parameter estimations of the GCMs. It is highlighted that the presence of outlying points has an undue impact on the model’s parameter estimations and future predictions.
The paper is organized as follows. In 
Section 2 we introduce GCMs based on non-robust 
OLS and, robust 
LMS and 
M estimators. To the best of our knowledge, this is the first time that robust 
LMS and 
M estimators are studied in this context. We show the differences in the outlying points, estimated GCMs, and hence the predictions by using the data originated in 
Section 3. Here, the TWCRs of Turkey are estimated by considering that the straight-line growth model, in other words, the first-order GCM, of time could be fitted to data. In the estimation procedure, we use 
OLS, 
LMS, and 
M estimators as mentioned above. As well as detection of outlying points, predictions to TWCRs of chosen years are obtained. Furthermore, to follow the estimations and predictions obtained from the estimated first-order GCMs based on 
OLS, LMS, and 
M estimators, separately, the curves are plotted on a single figure. Considering that the data could match more a third-order polynomial the progress is repeated for third-order GCMs.
  2. GCMs Based on OLS, LMS, and M Estimators
The GCM usually expressed as
      
      is the change in a growth that corresponds to the response variable 
. This model indicates analytically how the parameters 
 and their standard errors 
 behave in a deterministic procedure for varying points of time [
18]. 
 and 
 are the design matrices. Here, 
 used for grouped repeated measures is not taken into account since only the growth of Turkey’s TWCRs in watersheds on different time points is the subject to be researched. At this point, the vector of unknown parameters, the error, and the design matrix are denoted as 
, 
, and 
, respectively. Each column of 
 is assumed to be distributed as 
-variate normal with 
 the mean vector and 
 the unknown covariance matrix. Additionally, 
 is distributed as 
 where 
 is the expected value, 
 and 
 are the covariance matrices of 
 (
 fixed and 
) and 
 (
), respectively [
18]. The number of time points examined on each of 
 observations is denoted by 
 and 
 is the degree of the polynomial in time.
The 
OLS estimator of 
, which is defined as 
, is obtained from 
. The 
 is the expected value of 
 at time point 0 and called as the estimation of coefficient 
. The 
 is the expected value of 
 when a one-unit change in time has occurred for observation 
 and called the estimation of coefficient 
. In addition, the 
OLS estimation of 
, described as 
, is based on 
 and is calculated from [
18].
      
Regarding the detection of outliers, the sum of squares of residuals of the 
ith observation is calculated from 
. Since 
 is chi-square distributed with 
 degrees of freedom, the calculated value of it is compared with the critical value determined from 
, where 
 denotes the significance level. If the sum of squares of a suspicious observation is larger than the critical value, it would be appraised as an outlier [
19,
20]. The definition and explanations for 
 mentioned above are also valid for 
LMS and 
M estimators.
The estimation procedure of 
 and 
 with robust 
LMS and 
M estimators depends on the weighted least square (
WLS) estimator. Thus, the estimation procedure for 
WLS is based on minimizing 
, [
21]. The 
WLS estimator of 
, which is denoted as 
, is computed from
      
      and the estimation of the weighted covariance matrix is computed from
      
      where 
. The notation “
” denotes the trace. The elements of the diagonal weight matrix 
 that is used in Equation (3) and the calculation of 
 vary according to which estimator will be obtained. For instance, the 
ith element, 
, of the diagonal weight matrix, 
, is defined as
      
      when employing the 
LMS estimator and 
t is a value that ranges from 1 to 
. Here, 
 denotes the number of 
h-combinations from a given data set of 
n elements and 
h is calculated from
      
The notation 
 means rounding to a lower integer. Then, the 
LMS estimators of 
 and 
, defined as 
 and 
, can be easily obtained by regarding the minimization problem of the objective function
      
      where 
 [
19].
The 
M estimator, 
, is obtained by solving the objective function
      
      where 
. Here, the value of 
 is used for the initial point 
. In Equation (8), 
 indicates a function which has a minimum at 0 for all 
 and 
k shows the iteration number. In this instance, Tukey’s 
 function defined as
      
      is used to compute 
 [
20]. In Equation (9), 
 means the derivative of 
. The 
ith diagonal element of the diagonal weight matrix 
 is obtained from:
Here,  and is calculated as the value that provides , where  is the expected value obtained from the chi-squared distribution with p degrees of freedom.
  3. Application to the GCMs on Predictions of TWCRs in Turkey for Deciding Which Estimator Is the Best
  3.1. The Dataset
Turkey consists of eighty-one cities. These cities are categorized into twenty-six groups according to which local watershed they benefit from [
8]. 
Table 1 summarizes TWCRs for each of them from 2001 to 2004 and at two-year intervals from 2006 to 2016.
In this research work, the response variable  represents a matrix and the observed value of  denoted by  is the TWCR of group  in the year , with  and . The design matrix  is employed in two ways since the data could specify a functional form of linear or a cubic growth. Firstly, it is a  dimensional matrix where the first column consists of 1′s and the second column is the numbers 1 to 4 and 6, 8, 10, 12, 14, 16 used for the chosen years 2001 to 2004 and 2006, 2008, 2010, 2012, 2014, 2016, respectively. Hence, it is possible to preserve years as the unit of time. As it has been explained,  does not affect the estimations. Thus, it is taken as an  dimensional vector consisting of 1′s. With the benefit of this design, the parameters of GCMs denoted as  and  would be estimated. Three different methods including OLS, LMS, and M are used to reconstruct GCMs, separately. This makes it possible to show that the differences in the results of identified outlier observations vary regarding the methods.
In the second part of the study, the design matrix
        
        where the first column consists of ones and the other columns consists of the numbers that correspond to the chosen years [
18], is employed. The reason for using the first, second, and third power of these numbers in the design matrix, respectively, is to build third-order GCMs. [
17]. Here, 
 is employed as defined previously in the construction of the first-order GCMs.
  3.2. Detection of Outliers
The results of detecting outliers in the data, which consists of TWCRs in Turkey’s watersheds at different time points, and the parameter estimations of the GCMs according to methods mentioned above are summarized in 
Table 2, when 
. This table summarizes the findings that are observed for first- and third-order GCMs. Watershed number 9 has been identified as outlier with both non-robust and robust estimators. This is strong evidence that there is an outlier in the data. However, when applying robust 
LMS and 
M estimators, watersheds numbered 4, 19, 25, and 1, 5, 22, respectively, are detected as outlying points, besides the watershed numbered 9. Thus, it is safe to infer that the predictions obtained from the estimated GCM based on 
OLS can be adversely affected by the undetected outliers. By definition, 
LMS and 
M estimators are more resistant to outliers compared to the 
OLS estimator [
18,
19,
20]. Therefore, it is suggested to consider the predictions obtained from the estimated GCM based on these estimators.
  3.3. Results
To show the differences in the estimated GCMs depended on 
OLS, LMS, and 
M estimators, 
Figure 1a,b are plotted. The horizontal line denotes the numbers corresponding to years and the vertical line denotes the predictions of ratios. Regarding the estimated first-order GCMs in 
Figure 1a, it is observed that the GCMs based on the 
OLS, 
LMS, 
M estimators are different. The observed predictions from the 
OLS appear to be larger when compared with the predictions from the 
LMS and 
M estimators. Knowing that the 
OLS estimator is being influenced by the outlier points, it is better to evaluate the predictions obtained from 
LMS and 
M estimators due to their robustness to outlier points [
18,
19,
20]. Even, in general, the predictions of TWCRs obtained from the 
OLS and 
LMS estimators are higher than the predictions of TWCRs obtained from the 
M estimator. The 
M estimator is more resistant to outliers than the 
OLS and 
LMS estimators [
18,
19,
20]. Therefore, it is recommended to evaluate the results obtained from the 
M estimator. For instance, the predictions for 2021 seem to be approximately 86%, 89%, and 90% in the case of using 
M, 
LMS, and 
OLS estimators, respectively. Thus, it is highlighted that the outlying observations affect the results. The predictions of the ratios of watersheds at years 2017 to 2021 (corresponds to 17 to 21, respectively) could be seen from these graphs as well. The values of the predictions based on the three methods tend to increase over the years.
In addition, estimating and predicting procedures are repeated as the data could be more appropriate for a third-order GCM. 
Figure 1b illustrates the estimated third-order GCMs after using 
OLS, 
LMS, and 
M estimators and the predictions of TWCRs for watersheds in Turkey from 2017 to 2021 (corresponds to 17 to 21, respectively).
The differences between the estimated third-order polynomials can be seen clearly in this figure. Predictions on TWCRs of Turkey’s watersheds have risen steadily, particularly after 2015. Moreover, even the results for 
M estimator are much lower than the results for 
OLS and 
LMS estimators. In addition, the predictions based on the 
M estimator are observed below 100. Thus, predictions obtained from this estimator are said to be acceptable since the vertical line in 
Figure 1b denotes the ratios. Consequently, it is proposed to consider the observed predictions from the 
M estimator because of its robustness against outlying points.
  4. Conclusions
GCMs, as statistical growth models for short-term predictions, are used for various studies. Based on tap water consumer data in Turkey recorded between the years 2001 and 2016, this study investigated the TWCRs of Turkey’s watersheds with first- and third-order GCMs for short-time predictions. To estimate the parameters of the models, 
OLS, LMS, and 
M estimators are used. Usage of both robust and non-robust estimators allowed us to remark on the differences in parameter estimates (
Table 2) and short-term predictions for TWCRs (
Figure 1). A legitimate clarification for these findings seems to be the existence of outliers in the data. The predictions obtained through the 
M estimator are assumed to be the best, due to its robustness against outlier points [
18,
19,
20]. According to these predictions, the TWCRs for Turkey’s watersheds will constantly increase. Furthermore, a prediction based on the estimated third-order GCM for the year 2021 is expected to be approximately 5% more than in 2020. Hence, making short-term predictions with the robust 
M estimator means a better view of the truth, which will lead us to produce better improvements on water policies.