Machine Learning Modeling of Water Use Patterns in Small Disadvantaged Communities

: Water use patterns were explored for three small communities that are located in proximity to agricultural ﬁelds and rely on their local wells for potable water supply. High-resolution water use data, collected over a four-year period, revealed signiﬁcant temporal variability. Monthly, daily, and hourly water use patterns were well described by autoregressive moving average (ARMA) models. Model development was supported by unsupervised clustering analysis via self-organizing maps (SOMs) that revealed similarities of water use patterns and conﬁrmed the time-series water use model attributes. The inclusion of ambient temperature and rainfall as model attributes improved ARMA model performance for daily and hourly water use from R 2 ~0.86–0.87 to 0.94–0.97 and from R 2 ~0.85–0.89 to 0.92–0.98, respectively. Water use predictions for an entire year forward in time was feasible demonstrating ARMA models’ performance of (i) R 2 ~0.90–0.94 and average absolute relative error (AARE) of ~2.9–4.9% for daily water use, and (ii) R 2 ~0.81–0.95 and AARE ~1.9–3.8% for hourly water use. The study suggests that ARMA modeling should be useful for analysis of temporally variable water use in support of water source management, as well as assessing capacity building for small water systems including water treatment needs and wastewater handling.


Introduction
The freshwater sources such as rivers, lakes, reservoirs, and groundwater are increasingly being utilized worldwide [1,2]. Population growth along with increased water demand by industry, intensive agriculture, and the domestic sector are leading to excessive withdrawals from the various freshwater supplies, thus increasing water stress in various regions of the globe. Moreover, contamination of water supplies has exacerbated the situation as critical water sources are now impaired [3]. For example, in California, impaired groundwater contamination and excessive water salinity are severe in communities in agricultural regions [4,5]. Nearly 95% of the population in the communities in San Joaquin Valley, California, relies on groundwater for its drinking water needs. In this region, there are communities whose water supplies are contaminated by high nitrate levels, which is attributed, in part, to intensive agricultural activities [4,5] and impact of septic systems. In the agricultural regions, small and disadvantaged communities (i.e., with a community median household income of less than 80% of the state annual median household income), who rely on groundwater as their only potable water source, are the most severely impacted.
Small communities with impaired local well water and lack of feasible (or timely) connection to a centralized water system [6,7] can potentially opt for wellhead water the performance of RMSE~2.34 L/person/day relative to the average and maximum water use of 110 L/person/day and 139 L/person/day, respectively.
Predictive GRNN-based models for daily water consumption, incorporating meteorological data (i.e., average daily temperature, daily humidity, and total daily rainfall) were reported for the city of Al-Khobar (population: 455,500) in Saudi Arabia [17], based on (February 2009-October 2009) training and test data interspersed for the same period, and demonstrated the predictive performance of R 2~0 . 9. In another study, BPNN models of daily and hourly water usage were demonstrated for 19 different buildings from eight North American cities [18]. The above work, based on a single week of training data and subsequent testing with one week of data, demonstrated predictive performance, for single building hourly and daily water use, of AARE in 5-11% and 3-5%, respectively. In an earlier study [19], Sugeno fuzzy time series analysis [20] and autoregressive moving average (ARMA) models [21] were developed for monthly water consumption in Istanbul (12 million population), which was reported to be in the range of 10-100 million (m 3 /year). Model training was based on a dataset spanning a period of 7 years (1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002). Model validation was for a period of 18 months (2003-2004) demonstrating performance of RMSE of 1.9 million (m 3 /year) and 2.0 million (m 3 /year) for the above two models, respectively. It is also noted that the population water use in Kuwait was reported in a study [22] that utilized a simple linear ARMA-based model in which one year-forward of water forecasting was based on the previous year consumption. Forecasting of water use was reported for the period of 2004-2025 based on water consumption data for 1954-2003. The above study also reported pair-wise correlation of water consumption and various socioeconomic factors (e.g., residence type (villa or apartment), average house size, number of household occupants, number of cars in the household, number of weekly laundry activities, weekly number of showering/bathing per household, and household monthly income). The analysis demonstrated a low level of correlation, which may suggest that water consumption may depend on multiple factors in a non-linear manner.
Relative to large urban centers, analysis and models of water use in small remote communities have been limited owing to the lack of time-series water use data. Here, we note that the estimates of household potable water use in small communities (30-400 households, 100-2400 people) for laundry, and personal hygiene have been highly approximate [23][24][25] given that real-time water metering data are often lacking. It is also noted that the compilation of water use data for small communities has been typically based on questionnaire and telephone surveys [23][24][25]. Water use data at a high temporal resolution are lacking for small communities that are not part of a centralized water distribution system. Water use is expected to vary temporally, and thus time-series water use data are critical, particularly for communities that rely on well water, to assess needed water storage, water treatment capacity (if needed), and operational protocols. Although various ML techniques are presented in the literature, primarily for modeling water use in large urban regions, the development of robust predictive ML models is challenging when confronted with complex high-resolution time-series patterns. Additionally, models such as GRNN, ANN, BPNN, and PLS entail high complexity with a large hyperparameter space. This poses further challenges to the adaptability of such models, particularly for rapid predictions, model update, and transfer learning. Here, we note that the objective should be to arrive at a model (irrespective of the model parameter space) whereby the existing model can be used for sites of similar characteristics and where model retraining can be accomplished only based on the newly acquired data. In this regard, ARMA models have the advantage of requiring only two hyperparameters (autoregressive and moving average coefficients) [21]. Therefore, ARMA-based models can provide rapid prediction (with respect to computational time) with significantly lower training time relative to BPNNs models.
Accordingly, the current study presents a data-driven modeling approach to describe and forecast water use for small communities. The approach was explored for three small, disadvantaged communities of farm laborers and day workers located in the agricultural region in Salinas Valley, California. Extensive multi-year high-resolution time-series water use data were compiled for each community via wireless water meters. Water use patterns were first explored at hourly, daily, and monthly resolutions via self-organizing maps (SOM) and Spearman coefficient of correlation analysis. This was followed by data-driven ARMA models considering the time of day, day of the week and month, and the daily ambient temperature and rainfall as model inputs. The models were then assessed with respect to forecasting small community water use patterns.

Workflow
Water use patterns in small, disadvantaged communities in the agricultural region of Salinas Valley, California were explored along with time-series models developed as per the workflow described in Figure 1. Water use data were obtained via smart water meters from three small communities, over a four-year period. The time-series water use data were initially explored by self-organizing maps (SOM) and also pair-wise correlations (Spearman coefficient). Subsequently, autoregressive moving average (ARMA) predictive models (Section S1, Supplementary Materials) were developed for water use patterns at different temporal scales. The water use patterns were analyzed to (i) assess the similarity of water use patterns among the different communities; (ii) evaluate the relevant attributes for describing water use patterns, including climate metrics, i.e., daily and monthly low/high temperature ( • C) and rainfall (inches/day); and (iii) establish predictive time-series models for forecasting water use patterns.
regard, ARMA models have the advantage of requiring only two hyperparameters (autoregressive and moving average coefficients) [21]. Therefore, ARMA-based models can provide rapid prediction (with respect to computational time) with significantly lower training time relative to BPNNs models. Accordingly, the current study presents a data-driven modeling approach to describe and forecast water use for small communities. The approach was explored for three small, disadvantaged communities of farm laborers and day workers located in the agricultural region in Salinas Valley, California. Extensive multi-year high-resolution time-series water use data were compiled for each community via wireless water meters. Water use patterns were first explored at hourly, daily, and monthly resolutions via self-organizing maps (SOM) and Spearman coefficient of correlation analysis. This was followed by datadriven ARMA models considering the time of day, day of the week and month, and the daily ambient temperature and rainfall as model inputs. The models were then assessed with respect to forecasting small community water use patterns.

Workflow
Water use patterns in small, disadvantaged communities in the agricultural region of Salinas Valley, California were explored along with time-series models developed as per the workflow described in Figure 1. Water use data were obtained via smart water meters from three small communities, over a four-year period. The time-series water use data were initially explored by self-organizing maps (SOM) and also pair-wise correlations (Spearman coefficient). Subsequently, autoregressive moving average (ARMA) predictive models (Section S1, Supplementary Materials) were developed for water use patterns at different temporal scales. The water use patterns were analyzed to (i) assess the similarity of water use patterns among the different communities; (ii) evaluate the relevant attributes for describing water use patterns, including climate metrics, i.e., daily and monthly low/high temperature (°C) and rainfall (inches/day); and (iii) establish predictive time-series models for forecasting water use patterns.

Study Area and Water Use Data Compilation and Preprocessing
Water use data for three small communities in Salinas Valley, California labeled as Sites A, B, and C (Table 1) were compiled over a five-year period (October 2015-October 2020). The three communities having 8-11 residential units (18-36 residents) (Table 1) are in the midst of agricultural fields. The communities rely on local wells for domestic potable water supply and their wastewater is managed in a local septic system [4].
The collection of water use data was achieved via smart (wireless) meters (Spectrum 88DL 1.5", Metron-Farnier, Inc., Boulder, CO, USA) installed at the community main distribution line from their water delivery pressure tanks. Periodic water usage data (i.e., volume used) was transmitted to a centralized data storage server at regular 5 min intervals. The water use dataset, which is available online (see Data Availability

Study Area and Water Use Data Compilation and Preprocessing
Water use data for three small communities in Salinas Valley, California labeled as Sites A, B, and C (Table 1) were compiled over a five-year period (October 2015-October 2020). The three communities having 8-11 residential units (18-36 residents) (Table 1) are in the midst of agricultural fields. The communities rely on local wells for domestic potable water supply and their wastewater is managed in a local septic system [4]. The collection of water use data was achieved via smart (wireless) meters (Spectrum 88DL 1.5", Metron-Farnier, Inc., Boulder, CO, USA) installed at the community main distribution line from their water delivery pressure tanks. Periodic water usage data (i.e., volume used) was transmitted to a centralized data storage server at regular 5 min intervals. The water use dataset, which is available online (see Data Availability Statement), was utilized to determine the cumulative hourly and daily water use. Daily and hourly temperatures, as well as rainfall data for the study region, were obtained from National Oceanic and Atmospheric Administration's climatic databases [26]. A summary of monthly rainfall and monthly average of daily low and high temperatures is provided in Table 2. Clustering analysis via SOM of water use was carried out based on the normalized data (i.e., to reduce data skewness) within 0-1 range as ( (Table 1). SOM clustering utilizes competitive learning that preserves the topological structure of the input space while representing the output in a lower dimension (i.e., 2-D map of cells within SOM clusters). SOMs dimensionality reduction through its discretized 2-D representation was utilized for preliminary feature selection to identify attributes of significance for ML model development. SOM was carried out for water use data for each month where the resulting 2-D SOM map (Section 3.3) represents water use per day of the week as indicated in the SOM cells. Proximities of cells in the SOM map representing the days of the week are indicative of their similarities in terms of the volume of water used. Cells are also grouped into clusters of (indicated in different colors) of similar range of daily water use. Additionally, monthly water use data were also aggregated with climate metrics (temperature ( • F) and rainfall (inches)) to assess the significance and relevance of climate metrics for ARMA model development. Thus, the SOM clusters provide an indication of the relationship of climate metrics with water use as visualize by clusters with the months shown in SOM cells grouped based on daily water use for the specific month, temperature and rainfall.
Based on interviews with the residents in the study communities, it was determined that the few identified data outliers were associated with a few instances of an unusual level of car washing and community garden irrigation over brief periods of time. It is stressed, however, that data outliers were retained and included in the ARMA models (Section 2.3) given their capability to handle outliers. In addition, since the ARMA model combines auto regression and moving average as subsequent data fitting steps, variations in the reported data were robustly represented without the need for data preprocessing. Therefore, raw data without pre-normalization were directly used in the ARMA model development.

Study Area and Water Use Data Compilation and Preprocessing
Data exploration was carried out for water usage trends from hourly to monthly resolution over the course of the year. In the initial analysis, water use patterns at daily and monthly scales were evaluated based on SOM clustering to identify similarities among the communities throughout the months of the year. Water usage patterns for the study communities revealed temporal irregularity (i.e., the variance was non-stationary for the time-series data). Thus, water use data for the study sites followed non-stationary stochastic patterns with non-uniform variance. Accordingly, for ARMA model development [21,27] for water consumption, non-stationarity was removed via second order differencing [28].
The ARMA models were based on two polynomials (i.e., the first polynomial as auto-regressive and the second as moving averages). The auto-regressive (AR) polynomial constitutes the autoregressive model at a predefined order p describing the dependence of the variable (e.g., water consumption over a specified time period) on its values in a previous time. The moving average (MA) polynomial describes the linear dependence of the forecast errors resulting from the autoregressive model on the second predefined order q (Section S1, Supplementary Materials) [21,27]. The AR and MA polynomials were combined considering both the variable linear relationship and linear dependence of the forecast errors [29]. ARMA model parameter tuning was carried out based on a grid search for p and q in the range of  and , respectively, with incremental 2-step increase. The optimal p and q values were 66 and 72, respectively, for the hourly, and correspondingly 18 and 24 for the daily ARMA models. Model development for daily and hourly water use was based on training and test data for October 2015-December 2019 and January 2020-December 2020, respectively. Depending on the model resolution, input attributes included the hour, day, week, month of the year, in addition to the daily and hourly total rainfall, and low and high daily and hourly and ambient temperatures.
Performance of the ARMA models was quantified by R-squared and the average absolute relative error (AARE). R-squared, representing the proportion of the dependent variable (water consumption) variance predictable based on the independent variables, is given as where N is the number of data test samples, and y i ,y andŷ i are the observed, average, and predicted values, respectively.

Water Use Data
Daily water consumption, averaged over each month (Figure 2) of the year, varied significantly in each community, with the highest water use typically on weekends (Saturday and Sunday). Communities B and C had higher peaks of daily water use relative to community A. Both communities B and C had home gardens, and irrigation water use may partially explain the above differences. Hourly per person water consumption, averaged annually for each weekday, (Section 2-Methods), displayed some water use similarities among the three communities but also starkly different patterns ( Figure 3). As illustrated for the year 2019, the highest hourly water use was at hours 7:00, 19:00, and 12:00 for sites A, B, and C, respectively. Another seemingly high water use was at hour 20:00 and 17:00 for sites A and C, respectively. Hourly per person water similarities were also noticeable during 7:00-10:00 h and 17:00-19:00 h for site A, and 7:00-8:00 h and 17:00-20:00 h (Sunday being the exception) for site B.
Daily water use data revealed trends, indicating high water use for Saturday and Sunday (Figure 2), consistent with the highest hourly water consumption periods typically occurring during the same days ( Figure 3). For site A, the highest hourly water use was for weekends (Saturday and Sunday) at around 10:00. Site B revealed a similar trend on Sunday, with the exception of Saturday in which water usage peaked at around 18:00. Similar water use peak (up to 2.5 gal/person/day) was during the early weekday mornings (4:00-7:00) for site A. Notably, increased water use for sites B and C, for all days, was during 5:00-7:00 with an immediate decline at around 9:00. For sites B and C, an increasing water consumption trend from 1 to 4 gal/person/day, was observed ( Figure 3) from 9:00 to 3:00 during weekdays.  Table 1).  Table 1).

Similarity Analysis and Exploration of Water Use Patterns
Visualization of similarities of water use patterns via SOM clustering analysis was carried out based on the accumulative water use data over each day for each month of the year (Figure 4). In SOM map the collection of contiguous cells shown in the same color represents a cluster which signifies the levels of daily water use such that the days of similar water use volume appear in adjacent cells. From the 2-D SOM representation of data similarity (represented by the proximities of cells within the map), it is seen that {Friday, Saturday, Sunday} were the days of relatively higher water usage for the majority of the months (i.e., appearing in clusters with higher color intensity (red)). However, a subtle difference is evident among certain months, as seen, for example, for February where in Cluster V, which is of the highest water use, {Sunday, Tuesday, Thursday} are the days of highest water usage. Moreover, for the highest water use Cluster II, {Saturday, Sunday, Monday} are of the highest water use days appearing at 75% frequency. In June, the highest water usage was for {Friday, Saturday} with 100% occurrence in Cluster IV. However, the highest water usage in June was for {Monday, Wednesday, Thursday} appearing (at a frequency of 100%) in Cluster VI. There are also other exceptions as observed for July where the highest water usage was Thursday (30% occurrence), while in May, the highest water usage was on Sunday (also at 50% occurrence). Interestingly, Tuesday and Wednesday typically appear in clusters of low water usage for the majority of the months. The high water use during the months of June-August shown in Table 3 (integrated with a heat map for better visualization) is also apparent in the clustering analysis shown in Figures  S1-S3 (Supplementary Materials). These latter figures illustrate that {Friday, Saturday, Sunday} frequently appeared in the clusters of highest water use (red clusters) for the months {June, July, August}. SOM visualizations in Figures 4 and 5 illustrate the higher and lower cumulative water use during the months of {June, July, August} and {December, January}, respectively, as also shown in Table 3 for these two sets for the period of 2015-2020. As noted in Section 3.3, there is a strong correlation of water use with daily and hourly temperature ( • F) and rainfall (inch).

Correlation of Water Use Patterns with Meteorological Conditions
Water usage has been shown to correlate with ambient temperature and rainfall [14-   x FOR PEER REVIEW 13 of 22 former agricultural communities (comprising of single-family homes), relative to community A being a small, isolated apartment building complex. Higher water use (gallons/person/day) is correlated with higher temperature and low rainfall (red clusters at the bottom) as visualized in the clusters of high ranges (red) along with the influenced factor (lower rainfall values). The W, T, and R in the right figure refer to water use, temperature, and rainfall.
The correlation of water usage patterns with the low and high daily temperatures and rainfall events were also quantitatively assessed via the Spearman coefficient [30], as shown in Table 4. The Spearman coefficient was chosen as the metric for attribute correlation due to the non-linearity and monotonicity (i.e., the relationship between temperature and rainfall and their impact on the water use). As indicated in Table 4, a strong positive correlation of water use with temperature (i.e., higher use at higher temperature) was determined (Spearman coefficient of 0.78-0.88) for the three communities. Water use had a strong but negative correlation (Spearman coefficient in the range of −0.72 to −0.79) with rainfall (i.e., decreased water use with higher rainfall).
Exploratory analysis of the water use data (via SOM and Spearmint coefficient analyses) suggested that water use correlates with ambient temperature and rainfall,

Correlation of Water Use Patterns with Meteorological Conditions
Water usage has been shown to correlate with ambient temperature and rainfall [14][15][16]. The monthly average of the minimum and maximum daily ambient temperatures and total rainfall have a monotonic relationship with each other (i.e., increase in temperature shows decrease in rainfall and vice versa), as shown in Figure 5 (left). Additionally, Figure 5 (right) also illustrates that daily water use per person (i.e., hourly data accumulated over each day of the month) correlates with the daily ambient temperature and rainfall for each of the three study communities. The hourly water use also correlates with ambient hourly high temperature, low temperature, and rainfall, with a Spearman correlation coefficient of 0.85-0.91, 0.83-0.86, and 0.69-0.71 for high temperature, low temperature, and rainfall, respectively (Table 4). Monthly water use increased with rising temperature and vice versa, while the converse correlation was observed with rainfall. Daily water use per person in the three communities was generally higher during the hotter and drier months of June through August for all the communities. Although there were differences in daily average water use per person for each month, in general, water utilization correlated with higher temperature and correspondingly lower rainfall. It is noted that higher water use in communities B (85 gal/person/day) and C (61 gal/person/day) relative to A (35 gal/person/day) was likely due to the greater use of water for garden irrigation in the two former agricultural communities (comprising of single-family homes), relative to community A being a small, isolated apartment building complex. The correlation of water usage patterns with the low and high daily temperatures and rainfall events were also quantitatively assessed via the Spearman coefficient [30], as shown in Table 4. The Spearman coefficient was chosen as the metric for attribute correlation due to the non-linearity and monotonicity (i.e., the relationship between temperature and rainfall and their impact on the water use). As indicated in Table 4, a strong positive correlation of water use with temperature (i.e., higher use at higher temperature) was determined (Spearman coefficient of 0.78-0.88) for the three communities. Water use had a strong but negative correlation (Spearman coefficient in the range of −0.72 to −0.79) with rainfall (i.e., decreased water use with higher rainfall).
Exploratory analysis of the water use data (via SOM and Spearmint coefficient analyses) suggested that water use correlates with ambient temperature and rainfall, consistent with other studies on water use in various regions [31]. Accordingly, these meteorological attributes were included in the developed ARMA models (Section 3.4).

Data-Driven ARMA Models for Water Use Patterns
The ARMA models for daily and hourly water use were developed for the three communities based on data compiled over a period of four years, following the workflow presented in Figure 1. As an example, model performance for daily and hourly water use, with and without the inclusion of meteorological parameters, is illustrated in Figures 6 and 7, respectively, for 2017 training data traces. Model performance for the entire training dataset (i.e., October 2015-December 2020) is provided in Table 5, and model validation with the test dataset (January 2020-December 2020) is presented in Section 3.5.
ARMA model performance for the complete training dataset for daily and hourly water use for the three sites was in the range of R 2~0 .94-0.97 and 0.92-0.98, respectively, but was correspondingly significantly lower (0.87-0.89, and 0.86-0.92) when daily temperature and rainfall were excluded as input parameters. The AARE levels for the daily and hourly ARMA models were in the range of~2.86-5.20% and~2.55-3.88%, respectively. However, ARMA models without meteorological parameters as input attributes resulted in higher AARE 3.88-7.89% and 2.86-5.2% for daily and hourly water use, respectively. Variations of per person water use in communities A, B, and C varied by up to factors of 2-4 within each month of the year. Hourly per person water use variability was much greater ranging from no use to as high as 219, 1235, and 499 gallons/hr for Sites A, B, and C, respectively. Finally, the observed and model predicted community minimum, maximum, and average daily water use for each of the study years, as shown in Table S1

Validation of the ARMA Model of Water Use Patterns
SOM analysis of water use patterns (Section 3.3) demonstrated that the study communities exhibited similar water use trends, thus suggesting that ARMA model validation could be carried out with future data (i.e., forward in time relative to the training data). Accordingly, the time series dataset for the period of the year 2020, which was not utilized for model training, was utilized for model validation, as shown in Figure 8. In the absence of temperature and rainfall as model attributes, model predictions for daily and hourly water use for the year 2020, for the three communities, was with R 2 in the range of 0.90-0.95. Here, we note that true water use forecasting would require meteorological data, which would essentially require a predictive model. As expected, the inclusion of temperature and rainfall as model inputs improved predictive performance in terms of R 2 by as high as 12% (from R 2 of 0.82 to 0.94) and 17% (from R 2 of 0.74 to 0.91) for average daily water use for Site B, and hourly data for Site C, respectively (Table 6). The ARMA models of daily water usage ( Figure 8) for sites A-C also demonstrated excellent performance (i.e., mean R 2 for three sites~0.92) with climate metrics included in model inputs. ARMA model performance with the meteorological parameters included was 12% higher relative to the models without these climate parameters (R 2~0 .8). ARMA models of hourly water use with climate parameters included also demonstrated good performance of mean R 2~0 .93 relative to which was 13% higher when these parameters were not considered (Figure 9). Water 2021, 13, x FOR PEER REVIEW 18 of 22   The ARMA models accompanied by similarity analysis should be particularly useful for guiding the design and management of small community water systems including wellhead water treatment (if required). The present work also suggests that ARMA models developed in the present work should prove useful for providing estimates of water use in similar communities. Such models, as a starting base, could also be refined via incremental learning as new data become available. Selection of lightweight ARMA models with tuned hyperparameters can accelerate the learning process of water use patterns for newer sites of similar characteristics via a pretrained model (transfer learning). In this regard, only newly acquired data would then be needed for updating the ARMA model via the transfer learning approach whereby model hyperparameters are reused for training the existing ARMA model for the new sites. It is noted that such an approach of transfer learning can be particularly useful when the water use dataset for the target site is limited.

Conclusions
Water use patterns in multiple small communities, located in Salinas Valley, California (United States), were collected over a four-year period and analyzed to assess and quantitatively describe water use patterns. Self-organizing map (SOM) clustering was used for visual depiction of similarities in water use patterns among the days of the week and months of the year. SOM data exploration of the individual sites collectively showed that {Friday, Saturday, Sunday} are days with the highest water usage. SOM analysis further demonstrated that during the week, {Tuesday, Wednesday} are typically the days of lowest water usage. Among the three study communities, the daily peak water usage was during the periods of about 7:00-9:00 and 18:00-22:00. The highest daily water use during the week was for Saturday and Sunday and highest monthly water use was during the months of June, August, and September. Given that water use represents time-series data, predictive ARMA models were developed for different time scales, for each of the study sites, based on water use training data for the period of October 2015-October 2019 and test data for the period of January 2020-December 2020. The models included input regarding population density, categorical information (hour of the day, day of the week, and associated month) and climate metrics (temperature and rainfall). The performance of the ARMA models (for each community) for daily and hourly water use, based on a year of data forward in time relative to the training data, was with R 2 in the range of 0.91-0.94 and 0.91-0.95, respectively, and corresponding absolute average error (AARE) of 2.9-4.95% and 1.91-3.83%. The present study suggests that there is merit in considering the ARMA type models for supporting water source management, and the design and deployment of local water systems, including the needed capacity for water treatment and wastewater handling. As suggested by the present similarity analysis of water use patterns for the three small study communities, it may be feasible to invoke transfer learning for the ARMA models to accelerate model training for similar sites, particularly when water use data may be limited. Admittedly, the development of water use models that are of a more general applicability would require specific continuous and categorical model parameters that are expanded to include, for example, details of community descriptors such as personal income, occupation, average residents per household, size of residential units and their number per community, as well as the specific source water (i.e., local well or centralized source).

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/10 .3390/w13162312/s1, Section S1: ARMA Modeling, Section S2: 2. SOM Analysis for water usage pattern in three communities. Figure S1: SOM depiction of monthly daily water patterns with respect to each month based on the water consumption dataset of the period October 2015-December 2019 for Site A. Figure S2: SOM depiction of monthly daily water patterns with respect to each month based on the water consumption dataset of the period October 2015-December 2019 for Site B. Figure  S3: SOM depiction of monthly daily water patterns with respect to each month based on the water consumption dataset of the period October 2015-December 2019 for Site C. Additionally, the detailed water use data are available online as indicated in the Data Availability Statement.
Author Contributions: Y.Z.: Data analysis, model development, writing-draft preparation. B.M.K.: Workflow development, model review, and writing: draft preparation and review. J.Y.C.: Installation of water meters, monitoring of online data acquisition and management system, and data compilation. Y.C.: Study conceptualization, project supervision, modeling workflow review, and writing: draft preparation and review. All authors have read and agreed to the published version of the manuscript.