Analysis of Groundwater and Total Water Storage Changes in Poland Using GRACE Observations, In-situ Data, and Various Assimilation and Climate Models

: The Gravity Recovery and Climate Experiment (GRACE) observations have provided global observations of total water storage (TWS) changes at monthly intervals for over 15 years, which can be useful for estimating changes in GWS after extracting other water storage components. In this study, we analyzed the TWS and groundwater storage (GWS) variations of the main Polish basins, the Vistula and the Odra, using GRACE observations, in-situ data, GLDAS (Global Land Data Assimilation System) hydrological models, and CMIP5 (the World Climate Research Programme’s Coupled Model Intercomparison Project Phase 5) climate data. The research was conducted for the period between September 2006 and October 2015. The TWS data were taken directly from GRACE measurements and also computed from four GLDAS (VIC, CLM, MOSAIC, and NOAH) and six CMIP5 (FGOALS-g2, GFDL-ESM2G, GISS-E2-H, inmcm4, MIROC5, and MPI-ESM-LR) models. The GWS data were obtained by subtracting the model TWS from the GRACE TWS. The resulting GWS values were compared with in-situ well measurements calibrated using porosity coefficients. For each time series, the trends, spectra, amplitudes, and seasonal components were computed and analyzed. The results suggest that in Poland there has been generally no major TWS or GWS depletion. Our results indicate that when comparing TWS values, better compliance with GRACE data was obtained for GLDAS than for CMIP5 models. However, the GWS analysis showed better consistency of climate models with the well results. The results can contribute toward selection of an appropriate model that, in combination with global GRACE observations, would provide information on groundwater changes in regions with limited or inaccurate ground measurements. the amplitudes of GWS model much lower wells Among the climate models, the highest correlations with terrestrial measurements were obtained from GRACE in combination with FGOALS-g2 (0.70 for Odra and 0.69 for Vistula), GFDL-ESM2G (0.74 and 0.67, respectively), and MPI-ESM-LR (0.73 and 0.72, respectively). These three models explained the largest part of well-based groundwater variations (relative explained variance above 40%). The similar findings can be drawn from the analysis of RMSE; GWS series computed using FGOALS-g2, GFDL-ESM2G, MPI-ESM-LR and CLM models are characterized by the lowest error that ranged between 4.98 and 8.69 cm. The Q-Q plots (Figures A11–A12 in the Appendix) showed that the non-seasonal GWS series obtained from GRACE in combination with CLM and FGOALS-g2 provided the best representation of non-seasonal GWS derived from in-situ groundwater measurements.


Introduction
Water is a shared resource and thus it needs to be managed to enhance common benefits and ensure its conservation. Total water storage (TWS) is an essential element of the hydrological cycle, playing a key role in the Earth's global and regional climate system. As one of the TWS components, groundwater storage (GWS) represents the largest freshwater storage in the hydrological system. GWS is a major source of fresh water for domestic, agricultural, and industrial use in populated regions and for those lacking alternative water resources, and also plays an important role in the global water cycle. It interacts with all hydrosphere components, such as rivers, lakes, soil moisture, snow, ice and biomass, and is very sensitive to climate changes on both regional and global scales [1][2][3][4]. However, the accurate quantification of changes in its storage is difficult because of the limited and irregular coverage of land by ground stations. The amount of water in the ground varies in time and space according to recharge and discharge processes, spatial variation in aquifer capacity, spatial and temporal distribution of the rainfall, climate impacts, and irrigation water requirements [5,6] The Gravity Recovery and Climate Experiment (GRACE) mission, launched in 2002 and decommissioned in 2017, has allowed direct measurement of total water storage (TWS) variations from space. This is performed by observing the Earth's gravity field changes, which are mostly caused by TWS variations. In addition, GRACE allows trends in the consumption of water resources to be identified [7,8]. Groundwater storage (GWS) changes from the GRACE mission can be estimated by subtracting the sum of snow water equivalence (SWE) and soil moisture (SM) estimates from the GRACE TWS. Both SWE and SM can be acquired from assimilation models [9]. The standard method of determining TWS from land surface models is summing SM, SWE, and canopy water storage (CWS). However, although Global Land Data Assimilation System (GLDAS) models provide all three components, the World Climate Research Programme's Coupled Model Intercomparison Project Phase 5 (CMIP5) climate models deliver only the first two of them. Nevertheless, CWS has the lowest impact on changes in TWS, while SM plays major role [10,11].
It is known that seasonal changes, resulting from seasonal variations of precipitation and snowfall rate, are the most powerful oscillations in TWS. The effects of SM are dominant in this seasonal variation [11,12]. Many studies have shown that the role of GWS change in seasonal TWS variation is less crucial than it is for long-term TWS variation [12][13][14][15]. For example, [12] performed an analysis of seasonal TWS change from different water storage components for the Mississippi River Basin. They showed that adding a groundwater component to the sum of SM, SWE, and surface water did not significantly change annual TWS amplitudes or phases. However, it has been shown that, despite its relatively low seasonal variability, groundwater significantly affects inter-annual TWS change [13]. The long-term groundwater loss, observed today in many parts of the globe, is mainly caused by intensive extraction for drinking water, industry, and irrigation [6] Recently, there have been many attempts to assess the time-variable changes in TWS using GRACE and hydrological models both on global [11,13,[16][17][18][19][20], and regional scale [21][22][23][24][25]. There were also some papers that have validated hydrological and climate models using either GRACE data [12,26,27] or meteorological data [28]. The GRACE observations, together with simulations of soil moisture and snow water from hydrological models, have been used to estimate regional groundwater storage variations (e.g. in China and Tibetan Plateau [29][30][31][32], Middle East [33], India [9,29,[34][35][36][37], Mississippi River Basin [8], South-American river basins [14,38,39], Australia [29,40]). The use of in-situ groundwater measurements at well stations is a common method of validation such GWS estimates [41][42][43].
Direct well measurement is the traditional groundwater level (GWL) monitoring method in Poland and such observations are carried out across the country [44]. However, while groundwater storage determination using GRACE observations and hydrological models has been previously performed in regions with large basins, there have been fewer such studies carried out in Poland. Example research results are presented in [10,45,46]. The GWS studies in Poland were conducted using direct gravimetric measurements [47]. GRACE and GLDAS observations for Poland were applied to study height and mass variations in [48], but not for the GWS variations. There has been no such extensive GWS analysis conducted using GRACE data and assimilation or climate models applied in Poland.
In this research, we analysed TWS and GWS in Poland based on GRACE observations and assimilation models in a systematic and detailed way. We examined spatial and temporal variations of the TWS obtained from GRACE. Moreover, several assimilation models from GLDAS and CMIP5 were used to infer how TWS in models is partitioned into SM components, SWE, and GWS content. The GWS variations were then calculated by computing TWS changes and a terrestrial-based water balance approach [49].
By comparing results from such an extensive data set and conducting a wide range of analyses, we aimed to objectively identify models that provide the best TWS agreement with the TWS observed by GRACE satellites. This research also aimed to identify which models (in combination with GRACE measurements) provide information on GWS variations that are the most consistent with observations from the groundwater monitoring wells. Identifying models that match direct observations is very important in the context of future use of these models for TWS and GWS simulation and prediction. This may be useful especially in regions in which well measurements are unavailable or inaccurate.

Study Area
The study area includes two main rivers in Poland, the Vistula and the Odra. The Vistula basin covers about 194,500 km 2 and the area of the Odra basin is about 118,900 km 2 [50]. These two basins cover almost the entire area of Poland ( Figure 1). Polish basins can be characterized by climatic conditions. For most of the year, the weather is affected by the polar air, both sea and continental. The northern and central parts of the Polish basins have a temperate climate with rather snowy winters and a long vegetation period. In the southern parts of the Vistula and Odra basins, due to mountain climate, the mean annual temperature is lower and the rainfall is abundant. According to the Directive 2000/60/EC of the European Parliament and of the Council establishing a framework for Community action in the field of water policy, adopted on October 23, 2000 (called Water Framework Directive, WFD, https://eur-lex.europa.eu/eli/dir/2000/60/oj), the basic hydrological unit adopted for analysis and management of water resources is the river basin. Moreover, in the currently conducted world-wide researches based on the mascon technique, the reference division is also the whole river basin area [40]. Therefore, our analyses are conducted for Vistula and Odra basins separately. Another motivation for such a division of the study area was the fact that due to a spatial resolution of the GRACE observations (about 300 km), the use of smaller areas is erroneous. Moreover, the area of Poland is relatively small (312,679 km 2 ), therefore the division into smaller units would not give an appropriate view on water storage variations in the country. The overwhelming majority of researches focusing on TWS and GWS analyses were also conducted for river basins or lakes [8,[51][52][53][54] and we would like to be consistent with those studies.

The GRACE Mission Data
The GRACE mission provides global remote sensing information on Earth's gravity field changes resulting from variations in global mass redistribution [55]. The aim of GRACE was to provide temporal gravity field measurements with global coverage. Total water storage changes estimated from the mission, which reflect variations in the gravity field, can be used for calibration of hydrological models. Post-processed GRACE products are provided by official GRACE data centres at the GeoForschungsZentrum (GFZ) in Potsdam (Germany), the Center for Space Research (CSR) in Austin (TX, USA), and the Jet Propulsion Laboratory (JPL) in Pasadena (CA, USA). In this study, in order to analyse TWS and GWS changes in Poland and to compare these variations with groundwater monitoring wells and model indications, we used GRACE gridded data developed by CSR, JPL and GFZ, provided by the NASA MEaSUREs Program [56,57] and available from the Jet Propulsion Laboratory website (GRACE Tellus, http://grace.jpl.nasa.gov). These three solutions, named CSR RL05, JPL RL05, and GFZ RL05, respectively, have time resolution of one month, grid resolution of 1°, and contain surface mass change given in centimetres of equivalent water height, processed by [57][58][59]. For the purposes of this research, we used the mean of CSR RL05, JPL RL05, and GFZ RL05 solutions covering the period of November 2007 to October 2015. Recent studies, e.g. [60], have found that the mean of CSR, JPL, and GFZ solutions serves as an effective noise reduction method for GRACE observations.

GLDAS Models
The GLDAS was developed by the National Aeronautics and Space Administration's (NASA) Goddard Space Flight Centre. This is a global database, freely available for the users via the Internet (https://ldas.gsfc.nasa.gov/gldas/GLDASdownload.php). The GLDAS provides information given in the areas comparable with the GRACE cells (spatial resolution 1° × 1° or 0.25° × 0.25°). Variables in the database consist of several parameters e.g. soil moisture and its temperature at different levels, precipitation, air pressure, evapotranspiration, surface run-off, and snow water equivalent. To achieve the best spatial resolution of the information, data is modelled using both in-situ measurements and satellite observations [61,62]. It should be noted that the GLDAS is a very important data source for water cycle and atmospheric balance research [63][64][65]. The goal of the GLDAS is integrating satellite-and ground-based observations, using advanced land surface modelling and data assimilation techniques. Such an approach results in generating optimal fields of land surface states and fluxes, which are very helpful in hydrology, and in climate modelling and predicting. The GLDAS contains four land assimilation models (sub-models): the Community Land Model (CLM) [66,67], the Mosaic (MOS) model [68,69], the Variable Infiltration Capacity (VIC) model [70] and the Noah model [71,72]. GLDAS products include land surface state data given in metres (m) (soil moisture, snow water equivalent, plant canopy surface water) and flux data given in metres per second (m/s) (snow precipitation rate, rain precipitation rate, evapotranspiration, storm surface runoff, baseflow-groundwater runoff, snow melt) [73].

CMIP5 Models
Coupled general circulation models, otherwise known as coupled global climate models, are models that integrate ocean, atmosphere, sea ice, and land ice modules to analyse climate change and predict its future evolution. Because there are no reference solutions to validate such model predictions, one of the methods used to assess prediction uncertainties is the model inter-comparison project. The main objective of CMIP5 was to compare already existing global coupled climate models to analyse the climate with respect to past, present, and future changes based on expected global warming caused by increasing atmospheric CO2 concentrations [74]. The project assumptions, which are connected with climate modelling, contain two types of experiments: century-scale integrations (or long-term integrations) and decade-scale prediction experiments (or near-term integrations). Bringing together about 20 scientific institutions, the project developed around 50 models that contain a set of parameters that describe processes taking place in the atmosphere, oceans, and land hydrosphere. These include simulations of the global water cycle and can be used to determine TWS. It should be noted that CMIP5 models differ from each other in terms of their spatial and temporal resolution, the number and type of variables, assimilation algorithms, and input data. In addition, they have been developed for different purposes and sometimes for specific regions.
In Table 1, we present a summary of the data sources used to determine TWS and GWS. All models and data have monthly temporal resolution and cover the time period from November 2006 to November 2015 (9 hydrological years).  [44,[81][82][83][84][85][86][87][88][89]. The reports contain in situ observation results of well groundwater depths, wellspring efficiency (or spring rate, the amount of water flowing out of the source given in time unit), and groundwater chemical composition tests. The measurements were conducted at wells belonging to the network of stationary groundwater observations of the Polish Geological Institute. Each annual report concerns a certain hydrological year, which runs from November 1 to October 31 of the subsequent year in Poland. The total number of research points that entered or enter the network at different times exceeds 1000. The number of network points covered by the study varies from year to year, for example, in the hydrological year 2007 it amounted to 804 and in 2016 it was 1184. For the purpose of this article, 215 wells were selected, which were continuously measured throughout the whole period of November 2006 to October 2015. When averaging all basins, 129 wells were assigned to the Vistula, 66 to the Odra basin, and the remaining 20 were outside the basin areas (see Figure 1). The measured variation depths were recomputed into appropriate GWL changes by inverting the sign of changes. Available GWL data concern the minimum, maximum, and mean of unconfined and confined conditions. The average GWL in unconfined conditions was selected for comparison. The results of measurements in wells were highly variable by location (see Figure 2a). Averaging the Vistula and Odra basin areas leads to more uniform results, which should reflect the average behaviour of the amount of groundwater in Poland (see Figure 2b).
Changes in GWL depend on climate conditions, the amount of water extraction for drinking water, irrigation, and industry, and on the type of soil. Typically, soil consists of organic elements (approx. 5%), minerals (approx. 45%), water (approx. 25%), and air (approx. 25%). The sum of all free spaces in the soil is called porosity [90][91][92][93]. Porosity is dependent on mineral content, granulometric composition, organic matter content, soil structure, and agrotechnical interventions. Furthermore, in general, the deeper the soil is, the less the porosity it has [92][93][94][95]. The lowest porosity is observed for soils produced from sands and loams, especially sandy loams [96]. The highest porosity is noted for soils rich in organic matter, such as those based on loess [97].

GWS from Groundwater Level Wells
Porosity coefficients are necessary to convert the GWL observed in the well into groundwater storage (GWS), to enable comparison between measurements from wells and the GRACE data [36,41]. Porosity coefficient estimations are based on the following formula: where is the real density of solid phase of the soil and is the density of the intact sample of the solid phase of the soil (both given in g/cm 3 ). Approximate porosity coefficient values are presented in Table 2.
In this study, the porosity coefficient was calculated based on the soil map of Poland (scale 1:600,000; elaborated by Uggla and distributed by the Soil Science and Agricultural Chemistry Committee of the Polish Academy of Science and the National Soil Science Association) and knowledge of the soil origin [98] for the studied basin areas. Because the Vistula and Odra basins cover areas with few types of soils, in both basins the proportion of each soil type in terms of total land coverage was estimated, which then was treated as a weight for the average basin porosity coefficient determination. Source: [98] Each typical soil type was analysed in terms of ground type composition and origin. The weights for basins and final averaged porosity coefficients are presented in Table 3. It was concluded that the porosity coefficients were 0.41 and 0.42 for the Vistula basin and Odra basin, respectively. These values were used to scale between GWL changes and GWS changes in wells. We used the following name and abbreviation convention: GWL for in situ measurements in wells, and GWS for equivalent water height of the groundwater obtained from GRACE after removing SM and SWE from GLDAS or CMIP5 models and also for well GWL scaled by the porosity coefficient (GWS = GWL  ).

TWS and GWS from GRACE and Models
Gravity observations from GRACE satellites provide a complementary method to estimate groundwater storage changes [55]. The mission provides measurements of total changes in terrestrial water storage in all forms of water stored on and underneath the Earth's surface [99]. However, GRACE satellites are not able to reconcile individual TWS components, such as soil moisture, snow, surface reservoirs, or groundwater [100]. However, all land surface models provide changes of water storage only in layers of thickness defined by model. Consequently, they do not deliver total changes in water storage. Therefore, land assimilation models have been widely used to separate the groundwater components from the GRACE TWS variations [8]. The water storage variation contains changes in soil moisture (∆SM) , snow water equivalent (∆SWE) , and groundwater storage (∆GWS) and can be written as [101]: All quantities appearing in Eq. (2) concern equivalent water height and, as such, are expressed in the units of distance. Formally, by removing changes of SM and SWE given in the land assimilation models from the GRACE TWS data, it is possible to compute groundwater changes.
The GRACE, GLDAS, and CMIP5 data were processed as follows. The SM and SWE variables from CMIP5 were interpolated into regular 1° × 1° latitude-longitude grids using 3D linear interpolation. The interpolation should not affect the final results as we focus on the TWS and GWS changes averaged over Vistula and Odra basins, not given in each grid separately. The model TWS from GLDAS and CMIP5 models were assumed to be the sum of SM and SWE. The groundwater data were obtained from GRACE TWS after removing model TWS. However, it should be kept in mind that the total TWS change, which is measured by the GRACE satellites, is the sum of all water storage components, namely soil moisture (SM), snow water equivalent (SWE), groundwater storage (GWS), canopy water storage (CWS) and surface water storage (SWS), and can be written as [40]: Therefore, the model TWS should be computed using all of above components. However, most of land surface models (LSMs) which are e.g. GLDAS models, provide only SM, SWE and CWS, while CMIP5 climate models simulate only SM and SWE. However, of these three components, CWS has the smallest impact on changes in TWS while SM -the greatest [10,11]. The assumption of TWS being a sum of SM, SWE and CWS is recommended by the authors of the GLDAS documentation [73]. Due to LSMs limitations, we are not able to extend this expression with additional variables and in this paper, the model TWS was computed using simplified equation (as the sum of SM and SWE, without considering CWS and SWS). We are aware that omitting CWS and SWS may introduce additional errors and reduce agreement between model TWS and TWS from GRACE, and therefore also between GWS from GRACE˗models and GWS from in-situ data. It is known that there are other than GLDAS/CMIP5 sources of SWS data, for example global hydrological and water resource models (GHWRMs) [102]. However, we decided to keep consistency between model estimations from GLDAS and CMIP5 and not to use CWS and SWS in TWS computation. Moreover, comparing SM, SWE, CWS from GLDAS/CMIP5 models with SWS from GHWRMs can cause inconsistencies. Besides, apart from Vistula and Odra, Poland does not have very big rivers or lakes so we hope that neglecting SWS would not bring big errors. Many researches that focus on computing GWS from GRACE data and hydrological models, have also used only SM and SWE (e.g. [8,37,51,103]). To prove that SM has a dominant impact on TWS variability and CWS has a negligible role, we drew time series of SM, SWE and CWS computed from GLDAS NOAH model, averaged over area of Poland and gave their standard deviation values (std) (Figure 3). It is clear from the plot that SM has major role in TWS variability in Poland while the effect from CWS is insignificant. Notably, SWE had strong peaks around years 2006, 2010, 2011, 2013 but apart those periods, its variability is rather low. It can result from the fact that in recent years, winters in Poland have been very mild, with almost no snowfall and temperatures only slightly below 0°C.

Time Series Processing
The time series of GWS and TWS from GRACE and models as well as GWS from wells measurements were averaged over area of Vistula and Odra basins. By dividing analyses into two river basins, we would like to check if TWS and GWS changes are characterized by similar behaviour, and if so, where we can expect more significant water storage depletion and what may be potential reasons of such differences. To overcome any gaps in the GRACE observations and irregular data time span in well measurements, time series were interpolated into regular 30-day changes in the same period (from November 2006 to November 2015).
Further analyses comprised of comparison of variations in both TWS and GWS from different data sources for the Vistula and Odra basins separately. Our detailed analyses were conducted for three cases: linear trends, seasonal oscillations, and non-seasonal oscillations.
Linear trends and seasonal oscillations of TWS and GWS series were computed together using the least-squares method. The fitted model had a following form: where y is the value of the series for the time t = 1, ..., 108 (number of months), is the intercept, is a trend coefficient, , , , , , are coefficients of the fitted sinusoids, is a reference epoch (here we assumed November 15, 2006), , , are annual, semiannual and terannual frequencies, respectively.
Non-seasonal changes were obtained after removing trends, annual, semiannual, and terannual signals and they contain all other changes, with periods both longer and shorter than seasonal ones.
To compare different estimations of TWS and GWS, we used both visual interpretation of the time series, amplitude spectra and phasor diagrams, and numeric parameters such as correlation coefficients, relative explained variance and root-mean-square error (RMSE).
The relative explained variance (Varexp) describes the variance compatibility between two time series, the first of which is a reference series (r) and the second of which is evaluated (e): Here, in the case of TWS, the reference series are TWS from GRACE and the evaluated series are TWS from the GLDAS and CMIP5 models. In the case of GWS, the reference series is the GWS series from groundwater monitoring wells and the evaluated series are GWS from GRACE˗GLDAS and GRACE˗CMIP5. Var (r) , Var (e) and Var (r˗e) are variance of reference series, variance of evaluated series and variance of a difference between the reference and evaluated series, respectively. The best variance agreement between series occurs when Varexp = 100%. For this case, the differences between reference and evaluated TWS or GWS are the same for all the time series points, which means that the variance of these differences is equal to zero. As the variance of differences (Var (r ˗ e) ) between TWS or GWS from reference and from evaluated data source increases, the Varexp decreases. For Varexp = 0%, Var (r ˗ e) is the same as the variance reference time series. Relative explained variance is negative when variance of differences (Var (r ˗ e) ) is very high -higher than the variance of the reference data. This can occur when the evaluated series contains noise but the variations for reference and evaluated series are similar.

Time Series Comparison
In Figure 4, we present example full time series (with trends, seasonal and non-seasonal changes retained) of TWS (Figure 4 a, b) and GWS (Figure 4 c, d) from all data sources for the Vistula basin. For better visibility, the value of the TWS or GWS anomaly for first epoch of observations (15th October 2006) was removed from each time series and therefore they all start at zero. As shown in Figure 4(a, b), there is a good agreement in amplitude and phase between GRACE-based TWS and model-based TWS, especially for the GLDAS models. However, Figure 4(c, d) reveals a noticeable antiphase between the GWS series, especially for GWS obtained from GRACE-GLDAS and from the wells.

Linear Trends of TWS and GWS
The linear trends of TWS and GWS variations are shown in Tables 4 (for TWS) and 5 (for GWS).
To assess the quality of the fitted trends, we also computed their errors as well as coefficients of determination (R 2 ). The coefficient of determination is a measure of the quality of the model's fit to the data. It ranges between 0 and 1, with 1 being the best value. The TWS trends obtained from the observations provided by GRACE between November 2006 and November 2015 were 0 and −0.21 cm/year for the Odra and Vistula basins, respectively. The trends derived from GLDAS and CMIP covered the range between −2.50 and +0.82 cm/year. Nevertheless, apart from the trend for the MIROC5 model, the differences between GRACE-based and model-based trends did not exceed 0.90 cm/year.
In terms of trends in the GWS, we observed a small increase for both Odra (+0.12 cm/year) and Vistula (+0.47 cm/year) derived from well measurements. Most of the GWS trends obtained from the GRACE-model agreed well with trends from well measurements in terms of trend sign (apart from results based on FGOALS-g2, GISS-E2-H, inmcm4, and MPI-ESM-LR). The discrepancies between trends from groundwater monitoring wells and GRACE-model did not exceed 1.40 cm/year. The only exception is the trend for the Odra basin obtained from the MIROC model.
In general, almost all TWS and GWS trends were small, as their absolute values were generally not higher than ±1.00 cm/year. However, the attention should be paid to the ratio of trend values to their errors, which may be unfavourable for small trends. The general negative trends in TWS and positive trends in GWS obtained for Poland in the current study suggest that SM and SWE decreased over the period 2006-2015. According to data provided by Polish Institute of Meteorology and Water Management IMGW (http://www.imgw.pl/en/) in the period 2011-2014, there was a decrease in the annual sum of rainfall and snowfall, which may have influenced the decreased SM content.

Seasonal Variations of TWS and GWS
To recover the specific frequencies in the TWS and GWS time series, we computed their amplitude spectra, using Fourier transform band pass filter (FTBPF) [104], for seasonal (in this section) and non-seasonal (in Section 4.4.) variations separately. Figure 5a presents the amplitude spectra of TWS seasonal variations computed for the Vistula basin (the Odra basin showed a similar pattern, data not presented). It is not surprising that the annual change dominates in the seasonal frequency range, and its amplitude is several orders greater than semiannual and terannual oscillations. This occurrence is mainly connected with seasonal changes in rainfall. It is noteworthy that almost all models (except CLM and MPI-ESM-LR) had stronger amplitudes of annual change than GRACE observations. Figure 5b replicates Figure 5a but for the GWS changes. The strongest seasonal GWS changes were those based on the well data. Similar to the TWS spectra, the annual GWS oscillations had the biggest amplitudes among the different data sources. However, in contrast to the TWS spectra, the terannual signals in the GWS were almost as strong as semiannual signals for most data sources. We found that seasonal variations were stronger in TWS than for GWS in all considered GLDAS and CMIP5 models. This is because GWS is only one component of TWS and is believed to change in scales longer than annual.
Because of the strong domination of annual signal in seasonal TWS and GWS changes, in the remainder of the paper we only focus on annual changes. The correlations of semiannual and terannual TWS variations between GRACE and GLDAS or CMIP5 models should not affect the correlations of overall seasonal changes. Annual oscillations were computed by fitting a model described by a second row of Eq. (4). It is obvious that such annual series have the form of sinusoids, therefore their visual interpretation may be difficult. An alternative method to analyse annual variations is to present their phasor diagrams showing their amplitudes and phases related to a reference epoch (in our study the reference epoch is November 15, 2006 which is the start date of our data). The annual amplitude (Amp) and phase (Ph) for each time series were computed as using a, b coefficients of a fitted model: The length of each vector on a phasor diagram represents the magnitude of amplitude, while the vector direction shows a phase. The bigger a difference between two vector directions, the more time series are shifted relative to each other, in other words, their phases are different. The similar length of two vectors indicates that these two time series are characterized by similar magnitude of annual amplitudes. It is clear that the phase shift (difference in phases) between two time series affects the size of the correlation, while amplitude difference can affect variance agreement. Figure 6 presents phasor diagrams of the annual TWS changes for each model and GRACE data separately for the Odra and Vistula basins. The results were similar for the two river basins. The TWS data from almost all considered models had a higher annual amplitude than those obtained from GRACE observations (except CLM). As far as phases are concerned, the compatibility between GRACE and the models was quite good. The results for all GLDAS models were congruous, whereas for CMIP5 they were more diverse. The FGOALS-g2 and MPI-ESM-LR models had annual phases that differed from the corresponding phases obtained from GRACE. Figure 7 presents the same analyses as Figure 6 but for the case of GWS. The results obtained were far more mixed than those shown in previous plots in Figure 5, but they remained similar for the Odra and Vistula basins. None of the GWS data obtained from GRACE after removing SM and SWE were fully consistent with the results obtained from groundwater monitoring wells and these discrepancies concerned both amplitudes and phases. The best amplitude agreement with GWS from wells was observed with the GISS-E2-H model; however, the phase correspondence was poor. The GWS series obtained from three out of four GLDAS models were almost in counter-phase with the ground-based GWS observations. The only exceptions were series obtained from the GRACE-CLM. Nevertheless, this solution possessed one of the smallest amplitudes.   In order to quantify the agreement between the TWS series obtained from GRACE and TWS series based on models, we computed correlation coefficients, relative explained variance and root-mean-square errors (RMSE), and showed them in Tables 6 (for Odra) and 7 (for Vistula). These parameters were determined both for annual and full (seasonal plus non-seasonal, shown in Figure  4) time series. The tables also include the information about the month when the annual TWS variation has its maximum as well as the phase of annual signal given in days related to the TWS from GRACE. Additional information about consistency between reference and evaluated series can be obtained from Q-Q (quantile-quantile) plots. Such plots show the quantiles of the sample data against the quantiles of the other sample data [28]. If the samples are characterized by the same distribution, then the Q-Q plot is linear. For the case of this study, we drew Q-Q plots between GRACE-based and model-based TWS for full time series (see Figures A1-A2 in the Appendix) and for annual changes ( Figures A3-A4 in the Appendix) separately. However, in this section we focus in detail on analysis of annual variations without considering the sum of seasonal and non-seasonal variations.
Tables [6][7] show that the agreement between annual series was much better than the agreement of full time series (that also contain non-seasonal long-and short-period signals), because the former had the sinusoid form, and two sinusoids, if not shifted, are always correlated with each other even when they have different amplitudes. Therefore, the correlation values should not be treated as a reliable parameter for evaluation of annual variations. Nevertheless, differences between particular models remain and lower correlation coefficients may be caused, for example, by a phase shift of one series relative to another. In practice, for the case of annual oscillations, analysis of variances or RMSE is a more appropriate method than the study of correlation coefficients. For annual TWS variations, we found similar results for the Odra and Vistula basins. We observed very good correlation agreement of TWS between GRACE and almost all considered models (correlations above 0.9) except FGOALS-g2. For the case of RMSE, the smallest errors for annual oscillations for both river basins were obtained for CLM, MIROC5 and GFD-ESM2G (it ranged between 0.34 and 1.56 cm). The same models proved to be the most compatible with GRACE measurements in terms of variance agreement as they explained the largest part of the GRACE TWS signal (relative explained variances above 80%). The Q-Q plots between annual GRACE-based and model-based TWS (A3-A4) revealed that most of the modeled series are characterized by the same distribution as reference series. We observed that for most of the models, similarly to GRACE data, the maximum of annual TWS signals was reported for March, which may be connected with early spring snow melting. However, some differences in phases occur, which was also reported in Figure 6. Almost all models (with exception of MOSAIC and VIC) have delayed seasonal amplitudes by several and even several dozen days compared to GRACE observations. Tables 8 and 9 present the same results as Tables 6 and 7 but for GWS variations. The corresponding Q-Q plots for GWS were shown in Figure A5-A6 (Appendix) for full time series and in Figures A7-A8 (Appendix) for annual variations. What is notable from the Tables 6-7, is that none of the GWS obtained from GRACE after removing SM and SWE from GLDAS or CMIP5 models were fully consistent with the results obtained from groundwater monitoring wells and these discrepancies concern correlation coefficients, relative explained variances, RMSE, months with maximum annual GWS signal and phases. However, despite this diversity of results, we attempted to identify which model (in combination with GRACE data) could provide any correspondence with terrestrial groundwater measurements. Positive correlations and significant relative explained variance for both Vistula and Odra basins were indicated for CLM, FGOALS-g2 and MPI-ESM-LR. These models are also characterized by the lowest RMSE (which ranged between 1.95 and 4.32 cm). However, the latter two have GWS phases more incompatible with phases of well-based GWS. Taking this fact into consideration, we suggest that GRACE-CLM is the best for annual GWS estimation because of its high correlations, low RMSE and very good phase agreement with well data (see Figure 7 and Tables 8-9). However, it should be kept in mind that CLM model is not perfect for GWS determination as it suffers many weaknesses, e.g. weak signal and visibly smaller amplitudes than for wells and other GRACE-model estimations. It is worth mentioning that other GLDAS models (MOSAIC, NOAH, and VIC) had significant, but negative, correlations with groundwater variations based on terrestrial measurements. When we compared months with maximum of annual GWS signals, we observed substantial differences between GWS from ground-based measurements and from the GRACE-model. For data from wells, this maximum occurred in April and GLDAS CLM was the only model that agreed with this indication. Other models were delayed by several months, and some of them were delayed even by more than half of the year. This corresponds to negative correlations (Tables 8,9) and differences in phases (Figure 7). Table 6. Correlation coefficients between GRACE-based and model-based TWS, percentage of variance in GRACE-based TWS explained by model-based TWS, root-mean square errors (RMSE) for nonfiltered and annual time series, the months of maximum in annual TWS series, and phases of annual TWS variations for Odra basin.

Non-seasonal Variations of TWS and GWS
In this section, we focus on non-seasonal variations of TWS and GWS. To check which oscillations are the strongest, we drew amplitude spectra of non-seasonal TWS and GWS series for the Vistula basin obtained based on the FTBPF method ( Figure 8). For the Odra basin, these graphs look similar (data not presented). Figure 8 shows that non-seasonal variations in groundwater and terrestrial water storage were less powerful than seasonal ones (see Figure 5). For TWS obtained from GRACE, the most powerful non-seasonal oscillation is a change with a period of 2-4 years. The spectral maximum occurring for the period of 2-4 years is also visible for the MIROC5, NOAH, and MOSAIC models. For the VIC, GISS-E2-H, inmcm4, and MPI-ESM-LR models we observed the strongest oscillations for the period of 1.5-3 years. We also observed that for many GLDAS and CMIP5 models, the amplitudes of non-seasonal TWS changes were greater than those derived from the GRACE mission. This supports the findings of our earlier work [11], in which we concluded that GLDAS models, although they agree well with GRACE in terms of global seasonal TWS change, overestimate non-seasonal TWS change. In the current study, this was especially apparent for MOSAIC and VIC (also strongly correlated with each other) and CLM was the exception.
Turning to GWS spectra, we observed the most energetic oscillations for the same periods as TWS (Figure 8b). However, for the observations from the wells, a quite strong quasi-1.5-year oscillation was also apparent. In addition, for the GWS based on well measurements, non-seasonal oscillations were almost the strongest of all considered data sets and only the combination of GRACE observations and MIROC5 model resulted in more powerful oscillations. Opposite to the seasonal spectral band, where we observed more powerful oscillations for TWS than for GWS ( Figure 5), non-seasonal oscillations were generally stronger for GWS than for TWS (Figure 8). It is worth adding that because changes in groundwater storage are more long-lasting than changes in TWS, one can expect strong GWS oscillations with periods longer than 4-5 years. However, due to insufficient data length, these variations could not be analysed in this study.
In Figure 9 we show time series of non-seasonal TWS variations obtained from GRACE and the considered models for the Odra and Vistula basins separately. The GLDAS models agreed in phase with GRACE measurements very well for both basins. In terms of amplitude, CLM was lower than GRACE and other models, but NOAH, MOSAIC, and VIC were higher. We observed two maxima in TWS non-seasonal time series, which could be the effect of floods in Poland in 2010 and 2013 [105]. The most severe and devastating flooding in Poland in the 21st century was in the late spring of 2010, which also affected other countries in Central Europe [105]. These peaks are also visible in full TWS time series (Figure 4a, b). For climate models, the situation was less clear since there were notable differences between models and GRACE, as well as between individual models.
The non-seasonal change of GWS for Odra and Vistula basins computed from all data sources is compared in Figure 10. The comparison of series from groundwater monitoring wells indicates that non-seasonal variability in GWS was quite higher for the Odra than the Vistula, but the general findings are similar for both river basins. The peaks visible in the TWS series in 2011 and 2013 ( Figure 9) are also noticeable for GWS ( Figure 10). These peaks might be the effect of the slow response of groundwater to floods that affected the country. Figure 10 shows quite good phase and amplitude compatibility between well-based and CMIP5-based GWS non-seasonal variations, which contrasts with the poor agreement between TWS from GRACE and TWS from CMIP5 models (see Figure 9). While considering GLDAS models, we detected that GRACE in combination with MOSAIC, NOAH, and VIC provided opposite phases to those obtained from terrestrial observations. Only the results from GRACE-CLM were in phase compatibility with GWS from wells; however, the amplitudes were very small.   To quantify the agreement of non-seasonal TWS changes between GRACE and the models, we computed correlation coefficients, the percentage of variance in GRACE-based TWS explained by model-based TWS, and RMSE (Table 10). We also drew Q-Q plots for Odra and Vistula basins separately ( Figures A9-A10 in the Appendix). The best results for correlations (between 0.75 and 0.87) and RMSE (below 3 cm) were obtained for all GLDAS and this agreement was similar for both river basins. Among the CMIP5 models, however, only GISS-E2-H and inmcm4 provided positive and RMSE below 4 cm for both rivers. For variance agreement, the most TWS variability observed by GRACE was explained by the CLM, NOAH, and VIC models (relative explained variance ranged between 35 % and 64 %). However, all climate models give negative values of relative explained variance. This means that differences between GRACE-based and model-based TWS were more diverse than the water storage variations measured by the GRACE satellites. The Q-Q plots of non-seasonal TWS variation ( Figures A9-A10 in the Appendix) proved that for both Vistula and Odra, the GRACE-based TWS series have a similar distribution as the TWS obtained from CLM and NOAH models.
For GWS agreement between well measurements and GRACE-model combinations, the situation was reversed; each CMIP5 model provided positive correlations, whereas GLDAS was distinguished by negative values (Table 11). The only exception was CLM, which provided significant positive correlation coefficients for both Odra and Vistula basin (0.70 and 0.75, respectively). However, the amplitudes of GWS obtained using this model were much lower than those derived from the wells (Figure 10). Among the climate models, the highest correlations with terrestrial measurements were obtained from GRACE in combination with FGOALS-g2 (0.70 for Odra and 0.69 for Vistula), GFDL-ESM2G (0.74 and 0.67, respectively), and MPI-ESM-LR (0.73 and 0.72, respectively). These three models explained the largest part of well-based groundwater variations (relative explained variance above 40%). The similar findings can be drawn from the analysis of RMSE; GWS series computed using FGOALS-g2, GFDL-ESM2G, MPI-ESM-LR and CLM models are characterized by the lowest error that ranged between 4.98 and 8.69 cm. The Q-Q plots ( Figures A11-A12 in the Appendix) showed that the non-seasonal GWS series obtained from GRACE in combination with CLM and FGOALS-g2 provided the best representation of non-seasonal GWS derived from in-situ groundwater measurements. It can be generally concluded that in terms of non-seasonal GWS variations, CMIP5 models are more useful than GLDAS. For the latter, only CLM can be used, because although it provided weak GWS amplitudes, it gave high correlations and relative explained variances together with low RMSE and satisfactory Q-Q plots.

Discussion
In this section we would like to address a few issues that arose during TWS and GWS analyses. First of all, the TWS analyses proved that GLDAS models are much more consistent with GRACE estimates than CMIP5 simulations are. We also detected visible differences among various CMIP5-based TWS series. It should be emphasized that GLDAS models, apart from terrestrial observations, also contain some satellite data that include, among others, SWE and SM [62]. Moreover, TWS anomalies provided by GLDAS are usually used to determine the scaling factors, which are essential for the proper use of GRACE TWS products [58]. The recommended scaling factors were used in this study during GRACE data processing. The results from CMIP5 models could also be affected by interpolation procedures because climate models differ from each other in terms of grid sizes, and in this paper, the data from climate models were interpolated into regular 1° × 1° latitude-longitude grids. However, due to the spatial resolution of GRACE observations, we averaged the results over area of Odra and Vistula basins. Therefore, we believe that the interpolation procedure did not noticeably affect these averaged results. The main cause of discrepancies between the results from individual CMIP5 models are probably differences in the SM component, which concern both the number of soil layers and their depth from the surface.
Another important finding is that three out of four GLDAS models, namely MOSAIC, NOAH and VIC, despite they provided high consistency of TWS with GRACE, did not give satisfactory results for GWS. The GWS series obtained from these models had significant but negative correlations with groundwater variations based on terrestrial measurements. It should be noted that the TWS series from GLDAS were in very good phase agreement with TWS obtained from GRACE, but the mentioned models could be distinguished by slightly higher amplitudes than those obtained from GRACE observations. Consequently, after subtracting a time series with larger amplitudes from the one with smaller amplitudes, we obtained a GWS series with opposite phases to the original ones. This was not observed for CMIP5 since TWS from these models were not in good phase compatibility with GRACE observations and consequently, after removal, the peaks in GRACE-based TWS remained in the GRACE-CMIP5-based GWS. The problem of GWS phase disagreement between GLDAS and well measurements would be solved if the models could provide real TWS amplitudes. Conversely, CLM, distinguished by definitely smaller TWS amplitudes, proved to be consistent with terrestrial groundwater observations. Similar findings regarding the overestimation of TWS of GRACE by hydrological models can be found in other papers (e.g. globally in [11], and in California in [106]).
It should be also noted that the relationship between GWS from wells and GWS from GRACE-models is not satisfactory and there are many reasons of such discrepancies. The generally low agreement between GWS derived from terrestrial measurements and corresponding differences obtained from GRACE after removing modeled SM and SWE components, may be an effect of measuring errors, sometimes connected with the location of the station, its physical condition or accuracy of the map, on the basis of which soil types and porosity coefficients were determined. Moreover, an adequate and accurate determination of porosity coefficients may also have contributed. The coefficient values depend on porosity and are different for various soils. Based on soil maps, we used a single averaged factor for each of the two basins. However, both rivers cover mountainous, upland, lowland, and coastal areas with different soil types and conditions. Therefore, the best solution would be to determine the porosity coefficient for each well separately. Nevertheless, more proper porosity factors could improve only amplitude agreement, whereas correlations and phases would not change. To obtain more consistent results for GWS, there is also a need to better model surface water components by adding more ground-based and satellite measurements as input data, using better assimilation algorithms, including more variables in the models (such as canopy water storage and surface water). It should be kept in mind that terrestrial well measurements and their processing methods also suffers from some weaknesses, e.g. irregular coverage of the area with wells, uncertainties of porosity coefficients estimated, methods of selection of wells for analyses. Because both GRACE and hydrological models provide data with not satisfactory spatial resolution, there is a need to downsample GRACE and model estimations, using additional data and advance algorithms (such as Kalman filter or machine learning), in order to derive a high-resolution groundwater storage model.
Our results can be to some extend affected by the small size of study area and spatial resolution of the GRACE TWS data. The GRACE gridded TWS estimates were obtained from spherical harmonic coefficients with maximum degree and order (n) equal to 60. The direct dependence of spatial resolution of GRACE model on maximum degree shows that n = 60 corresponds to the spatial resolution of about 333 km [107]. There are additional factors connected with postprocessing methods (e.g. filtering the data in order to remove north-south stripes) and measurement errors (e.g. leakage effect) that, in fact, reduce spatial resolution even more [58].The problem of signal attenuation in GRACE data due to filtering and attenuating the solutions up to degree 60 was solved by applying appropriate scaling factors (see https://grace.jpl.nasa.gov/data/get-data/ monthly-mass-grids-land/). Taking into consideration above findings, and also the small size of the study area, we decided that the best method to face above mentioned problems would be to average the results over the whole Poland, or over Vistula and Odra, without considering individual grids. The overwhelming majority of researches focusing on TWS and GWS analyses were also conducted for river basins, lakes or other regions of similar to Poland areas [8,[35][36][37][51][52][53][54]99,108]. It should be also mentioned that, apart from GRACE spherical harmonics data, there are several local solutions such as mass concentration (mascon) solutions [109,110]. Moreover, there were also some attempts to assimilate GRACE data into hydrological models in order to obtain high-resolution and high-accuracy hydrological products [111][112][113][114]. However, this study did not aim to compare different GRACE outputs, but to compare various hydrological and climate models using one GRACE dataset as a reference.

Summary and Conclusions
In this study, we analysed linear trends of TWS and GWS as well as seasonal and non-seasonal TWS and GWS variations for the two main river basins in Poland (Vistula and Odra). We compared temporal changes in TWS observed by the GRACE twin satellites with corresponding changes representing the sum of SM and SWE derived from four GLDAS hydrological models and six CMIP5 climate models. We also compared the measurements from groundwater monitoring wells with GWS variations obtained by subtracting the sum of modelled SM and SWE from TWS derived from GRACE.
The general results were similar for the Odra and Vistula basins. For both the TWS and GWS, the trends were relatively small and generally did not exceed ±1 cm/year, and the differences between GRACE-based and model-based TWS trends, as well as between wells-based and GRACE-model-based GWS trends, did not exceed 1.5 cm/year. We also found a correspondence between an increase in GWS and TWS with large floods in Poland.
We identified the models that provided the best TWS agreement with TWS observed by GRACE. For annual variations, all considered parameters (correlation coefficients, relative explained variance and RMSE), were the most satisfactory for the CLM, GFD-ESM2G, and MIROC5 models. For non-seasonal TWS variation, however, most of the GLDAS models (apart from CLM) overestimated the amplitudes of water storage observed by GRACE. However, they agreed well with GRACE in terms of phases, correlations, variance explained, and RMSE. For climate models, however, the situation was less clear since there are major discrepancies among models. Moreover, none of the CMIP5 simulations provided satisfyingly high correlation coefficients, RMSE or relative explained variance with GRACE estimates.
We also aimed to identify which models, in combination with GRACE measurements, could provide information on GWS that would be the most consistent with observations from the groundwater monitoring wells. However, we did not find a noticeable correspondence between ground-based measurements and GRACE-model determinations as GWS from monitoring wells produced much stronger amplitudes for both basins. For annual GWS changes, the most satisfactory correlations, relative explained variance and RMSE for both Vistula and Odra basins were found for the CLM, FGOALS-g2, and MPI-ESM-LR models. The same models, along with GFDL-ESM2G, showed that they produced the non-seasonal GWS series that were most compatible with in-situ groundwater measurements.
Our results indicated that when comparing TWS values, better compliance with GRACE data was obtained for GLDAS than for CMIP5 models. However, the GWS analysis showed better consistency of climate models with the well results. The study provided important information that may be used in water management in Poland. The results can be useful for selecting an appropriate model that, in combination with global GRACE observations, would provide information on groundwater storage changes for regions in which well measurements are not available or unsatisfactorily accurate. Identification of models that agree well with direct observations is useful in the context of future use of these models for TWS and GWS simulation and prediction. We showed that for the area of Polish river basins, the most appropriate model for this purpose is GLDAS CLM, because it was the only model that provided significant agreement with 1) GRACE in terms of TWS, and 2) well measurements for GWS. Other GLDAS performed well only for TWS comparisons, while CMIP5 models performed well only for GWS analyses. However, it should be kept in mind that CLM is not perfect as it suffers some drawbacks, resulting from weak signal and visibly smaller amplitudes than for wells. We noted that, both models and terrestrial data should be improved in order to improve the consistency between modeled and ground-based GWS.
Finally, it should be noted that no noticeable groundwater storage gain or depletion was observed in Poland. It may be the case that if GWS changes were investigated in areas affected by this problem (such as the Indian Peninsula, California Central Valley, or the Middle East), strong negative GWS trends may be detected for both terrestrial and GRACE-model data and the results for different data sources might be more consistent.
Future researches should be focused on comparing and validating modeled TWS and GWS with estimates from another remote sensing data types, such as satellite altimetry [38], evapotranspiration data from MODIS (Moderate Resolution Imaging Spectroradiometer) [115], precipitation data from CHOMPS (Cooperative Institute for Climate Studies (CICS) High Resolution Optimally interpolated Microwave Precipitation from Satellites) [116] and TRIMM (Tropical Rainfall Measuring Mission) [117], soil moisture from SMOS (Soil Moisture and Ocean Salinity) [118] and others.