RADOLAN_API: An Hourly Soil Moisture Data Set Based on Weather Radar, Soil Properties and Reanalysis Temperature Data

: Soil moisture is a key variable in the terrestrial water and energy system. This study presents an hourly index that provides soil moisture estimates on a high spatial and temporal resolution (1 km × 1 km). The long established Antecedent Precipitation Index (API) is extended with soil characteristic and temperature dependent loss functions. The Soilgrids and ERA5 data sets are used to provide the controlling variables. Precipitation as main driver is provided by the German weather radar data set RADOLAN. Empiric variables in the equations are ﬁtted in a optimization effort using 23 in-situ soil moisture measurement stations from the Terrestial Environmental Observatories (TERENO) and a separately conducted ﬁeld campaign. The volumetric soil moisture estimation results show error values of 3.45 Vol% mean ubRMSD between RADOLAN_API and station data with a high temporal accordance especially of soil moisture upsurge. Further potential of the improved API algorithm is shown with a per-station calibration of applied empirical variables. In addition, the RADOLAN_API data set was spatially compared to the ESA CCI soil moisture product where it altogether demonstrates good agreement. The resulting data set is provided as open access data.


Introduction
Soil moisture plays a key role in the interaction of different land surface processes and energy fluxes [1][2][3]. It controls processes like evaporation, infiltration and runoff, hence the fundamentals of the hydrological cycle [4,5]. Therefore, soil moisture influences hazards of different sorts, e.g., the extent or magnitude of floods [6]. But also temperature or precipitation extremes relate to soil moisture state and memory [7]. Soil moisture further is the main governing resource in relation to ecosystem function and form as it provides water for nutrient uptake and transpiration [8]. With that, net biome productivity and hence CO 2 fluxes are strongly linked to soil moisture variability [9]. Consequently, soil moisture has been classified as an essential climate variable (ECV) by the World Meteorological Organization's Global Climate Observing System (WMO, GCOS) [10].
Today, measurement techniques for soil moisture are available across scales. Divers approaches allow measurement of soil moisture on point scale, e.g., gravimetric measurements, Time Domain Reflectometry (TDR) and Frequency Domain Reflectometry (FDR) but also using the attenuation of the Global Positioning System's (GPS) signal [11,12]. Ground penetrating radar [13,14] or cosmic-ray neutron measurements, that cover bigger footprints, may also be used in mobile sampling applications and hence bridge the gap to field scale measurements of the available subsurface soil water storage [11]. Sparse station measurements combined with modeling allows for regional scale soil moisture estimations [15]. In a great effort, the International Soil Moisture Network (ISMN) collects, harmonizes and provides such sparse soil moisture measurements consisting of data streams from a multitude of individually operating networks [16].
Besides these on-site measurements, remote sensing offers retrieval opportunities for large scale spatially distributed soil moisture estimates. Imagery ranges in spatial coverage and resolution from regional UAV based sensing [17] to satellite based global soil moisture products at one single acquisition date [18,19]. Various constellations including thermal infrared, optical and microwave satellite sensor systems can be utilized to retrieve soil moisture [20][21][22][23]. Optical and thermal remote sensing allows for soil moisture estimations via the thermal-optical triangle method (TOTRAM), that interpretes the combination of pixel-based values of land surface temperature and a vegetation index, or the optical trapezoid model (OPTRAM), that utilizes the physical relationship of shortwave infrared transformed reflectance and soil moisture instead [21]. However, given the limitations regarding cloud cover a continuous stream of satellite sensed soil moisture is only possible with active and passive microwave sensors like the Advanced Scatterometer (ASCAT) [24] or the Soil Moisture Active Passive (SMAP) Mission [25,26]. Various algorithms for these platforms have been developed [18,[27][28][29][30][31][32]. The spatial resolution of such data sets and retrieval algorithms usually is rather coarse with tens of kilometers [26]. Several years after the launch of the Sentinel-1 satellites, operationally provided data sets utilize the high resolution active microwave data and soil moisture data sets with higher spatial resolution of up to 1 km × 1 km are available [33,34]. The temporal resolution however depends on the revisit time of the satellites which allows for daily or half-daily data points only [26]. Yet, soil moisture data at very high resolution is sought after by different scientific communities [26]. Sub-daily data is needed to account for the highly volatile nature of soil moisture [35]. So far mostly point measurements can provide this level of temporal resolution. An hourly, continuously available spatial data set at high resolution is lacking.
In comparison, spatially distributed measurements of precipitation are available in high temporal resolution. Precipitation is the main driver for soil moisture changes in the majority of biomes [36]. There is a multitude of different precipitation measurement options [37]. Ground-based estimates range from the long established procedure of direct point measurements using rain gauges to more sophisticated methods like weather radar estimates which also deliver spatially distributed precipitation amounts [38,39]. From that, gridded precipitation products based on gauge measurements alone are developed to deliver spatial coverage [40], e.g., the global land-surface precipitation products of the Global Precipitation Climatology Centre (GPCC) [41,42]. Furthermore, mostly on national scale, ground-based weather radar precipitation data is further improved via coupling with point measurements in the effort to derive an improved, gauge-adjusted version of the spatial precipitation data set [43][44][45][46]. For several decades also satellite systems have been used for atmospheric observations [37]. Geostationary satellites carrying visible/IR sensors and low earth orbit platforms that utilize active and passive microwave imaging systems are in use [37,47]. These data sets mostly comprise a whole constellation of satellites, e.g., NASA's Global Precipitation Measurement Mission (GPM) [48] or the PERSIANN-Climate Data Record (CDR) program which aggregates different satellite data streams and the Global Precipitation Climatology Project (GPCP) using artificial neural networks [49,50]. Reanalysis data sets like the fifth generation reanalysis data set by the European Center for Medium Range Weather Forecast (ECMWF, ERA5) [51] and sophisticated merging schemes like the Multi-Source Weighted-Ensemble Precipitation (MSWEP) [52] strive to further improve quality of gridded precipitation data sets. Still, there are flaws and weaknesses in all estimation and aggregation methods. Direct measurements of rainfall are error-prone concerning wind effects [53] and most regions lack or lose a sufficient amount of rain gauges [54]. Sun et al. [37] also demonstrate big differences and hence uncertainty in satellite and reanalysis data sets.
Still, precipitation values from such data sets correlate with change in soil moisture [55]. Considering the need for sub-daily soil moisture estimates at relatively high spatial and temporal resolution, this study introduces a precipitation based soil mois-ture data set. For that, we employ the German gauge-adjusted weather radar system RADOLAN [44] to derive a modified version of the Antecedent Precipitation Index (API) that directly can be used as soil moisture data set. This allows exploitation of the high sampling rate of the weather radar and also provides spatial distributed, quality controlled precipitation estimates.
Kohler and Linsley [56] introduced the concept of the API to link runoff to antecedent soil moisture conditions, which since then has been applied in varying form in several studies: research on natural hazards consults the API for different applications, e.g., in linking antecedent moisture conditions to bush fires in Australia [57] or investigating the effect of antecedent precipitation on landslides [58]. The API is still used to supplement rainfall-runoff transformation modeling [59][60][61], and subsequently it is used to support flash flood warning at ungauged locations using radar precipitation data in France [62] and Morocco [63]. But the API is not only utilized to estimate surface discharge but also to help with soil moisture assessments via exploiting the relationship to precipitation. Crow et al. [64] and Crow [65] show that errors in precipitation estimates can be evaluated with the API and also that soil moisture retrievals can be improved using the dependence of the two variables. Zhao et al. [66] discuss supplementing relative soil moisture estimates with the API. Recent studies suggest using the API directly or in conjunction with geostatistical methods to derive soil moisture estimates [67][68][69]. The API algorithm used in this study expands upon the work of Pellarin et al. [70] who in recent studies used the API in an assimilation scheme to provide a near-realtime precipitation product [71]. The proposed improved API algorithm additionally incorporates temperature data and information on soil texture composition to allow for individual dry-down rates at distinct locations. Together with higher temporal and spatial resolution precipitation input data the authors aspire to better match local traits of the course of soil moisture in terms of drydown rates and volatility after rainfall events. The API is calculated in a temperate region in this study, as compared to previous investigations that mostly apply the idea in more arid environments [68,70]. This increases the complexity since modeling the seasonality of soil moisture does not allow for a full dry-down of the soil column in a dry season like it has been demonstrated in these other studies.
In this article we answer the question if this improved empirical soil moisture index based on the antecedent precipitation index (RADOLAN_API) is capable to resemble the course of local soil moisture measurements throughout Germany. The detection of timely upsurge of soil moisture and dry-down rates as well as depicting the seasonality with for the temperate region typical summerly depletion is of high interest in this regard. Specifically, we investigate if the proposed hourly soil moisture product matches local measurements within defined but disputed error margins of 4 Vol% soil moisture [10,72]. Furthermore, we test a version of the developed RADOLAN_API data set with local fitted empirical variables against the soil moisture station data using the same threshold of 4 Vol% defined by GCOS.
The characteristics and capabilities of the RADOLAN_API are assessed statistically on different time scales by evaluation against in-situ measurements and spatial comparisons against the ESA CCI soil moisture product. We calculate performance and error metrics in terms of correlation, bias and differences between RADOLAN_API time series and station data. Moreover, we compare the API with the satellite product on a pixel-by-pixel basis and lay out our findings in the upcoming sections. This article also serves as file descriptor for the RADOLAN_API data set, which is openly available [73]. Appendix A Table A1 provides a summary table on the file characteristics.

Precipitation Data Set
Precipitation data forms the fundamental base for the calculation of the proposed soil moisture data set and heavily impacts the final product. Therefore, this study uses quality controlled weather radar data of the German Weather Service (DWD, Deutscher Wetter Dienst) for the years 2015 to 2019. Specifically, the publicly available RADOLAN RW (Radar Online Adjustment) product is consulted, to meet the requirements of high spatial, temporal and radiometric resolution [74,75].
This weather radar data set holds precipitation estimates that are adjusted with gauge measurements [74]. The quality controlled precipitation sums are available at temporal, spatial, and intensity resolution of 1 h, 1 km, and 0.1 mm [44]. With that, RADOLAN delivers input and reference data for high-resolution hydrological modeling [76], rain type modeling [77], estimation of spatio-temporal variability of soil erosion [78] and (flash) flood modeling [79,80] as well as ground truth data for machine learning applications [81]. The polar-stereographic composite grid with the center point at 9.0 • E 51.0 • N covers the whole state territory of Germany [75,82].
All of the included C-Band weather radar stations operate on scanning intervals of 5 min and an approximate coverage of each device of a radius of 150 km. Significant overlap within the dense network ensures accurate retrievals by minimizing problems due to dampening in the signal that occur with increased distance from each sensor [83]. Furthermore, within the automatic calibration procedure, rain intensity-adapted Z-R relationships (empirical formula to estimate rainfall rates from radar reflectivity signal strength) are applied. The correction for radar artifacts contains filtering for statistical clutter and consideration of orographic shadowing effects [74,75].
Nevertheless, for a realistic estimation of the quantity of precipitation, measurements of approximately 1300 conventional stations are used for the operational hourly gauge adjustment routine [84]. These sensors basically work according to "Hellmann" ombrometers [85], which obey the standards of the World Meteorological Organization [86]. The appliance of a weighing principle and surrounding temperature-dependent heating sets the utilized devices apart from conventional measurement systems and allows capturing solid and fluid precipitation alike [75].
To derive precipitation from radar backscatter values assumptions on the drop size distribution and droplet count are necessary [83]. RADOLAN uses an extended Z-R relationship, that considers the absolute reflectivity and horizontal gradients to distinguish between typical convective and stratiform droplet distributions [75]. In wintertime, the effects of overshooting due to lower cloud heights become more prominent in weather radar systems. These shortcomings are accounted for with a seasonally-dependent correction via a regression analysis. In order to mitigate erroneous adaptation at single extreme precipitation events, e.g., intensive convective cells that occur regularly throughout Germany in summer, DWD applies a multiple polynomial regression to generate the correction factors for every pixel. Respective scanning height class, day of year, and reflectivity are therefore taken into account [83]. The weather radar data shows good agreement with NASA's Integrated Multi-satellitE Retrievals for the Global Precipitation Measurement Mission (IMERG, GPM) satellite precipitation data set for the vegetation period [39]. This makes the data set a very good candidate as input data for the calculation of the soil moisture index with a high potential of transferability. Throughout this study, the data set will be referred to as "RADOLAN".

Soil Properties
Data from the Soilgrids project [87], released by the International Soil Reference and Information Center (ISRIC), provides the information on soil properties in this study. Local clay and sand content are utilized to shape the dry-down rates of the modeled soil moisture at any given pixel. Soilgrids is a global complete soil information data set with 250 m spatial resolution and 6 layers. Based on machine learning driven algorithms that account for environmental co-variables and soil profile data, the Soilgrids data set predicts soil type and a multitude of physical and bio-chemical soil properties, e.g., distribution of soil compartments, bulk density, pH [87,88]. Soilgrid data is adapted widely by the scientific community, e.g., for generating European and global soil hydraulic databases [89][90][91] and as auxiliary variables in downscaling algorithms [91].

Temperature Data Set
This study uses the ERA5 single level air temperature data set (t2m) generated by the ECMWF published by the Copernicus Climate Change Service Climate Data Store (CDS) [51,92]. The ERA5 atmospheric reanalysis data set provides climate variables at hourly resolution on global scale currently covering the period from 1979 to present and as preliminary back extension from 1950 onward. Therefore, observation data is combined with model data by the technique of data assimilation in a consistent manner respecting the laws of physics [92]. For the provision at CDS the ERA5 data is interpolated to a regular 0.25 • × 0.25 • grid, which for this study was further bilinearly interpolated to the RADOLAN grid. Albergel et al. [93] show that using ERA5 data as atmospheric forcing in land surface model simulations significantly improves the representation of land surface variables when compared to the predecessor ERA-Interim. Furthermore, other studies find a systematically reduced bias in ERA5 temperature and precipitation data when compared to the previous version [94,95] and indicate the high usability of the reanalysis data set in high-accuracy and high-resolution modeling scenarios [96,97].

Calibration and Validation Data Sets
Different data sets are used to calibrate and validate the RADOLAN_API data set. Soil moisture data from TERENO networks and a self-conducted field campaign at the Wallerfing test site is used for calibration of the necessary variables in the API equation and for validation on point data. The ISMN database provides data from the TERENO Eiffel/Lower Rhine Valley (TERENO-Rur) site [11,16,98,99]. To increase the number and diversity of in-situ measurement stations in terms of soil composition, stations from the TERENO Northeast German Lowland Observatory (TERENO-NE) site are incorporated in this study [100]. Furthermore, the self conducted field campaign holds data for six measurement stations on agricultural fields throughout the growing period of 2017. The data at each site (A2, A4, A6, P2, P4, P6) represents the respective average of five Echo EC-5 probes in 5cm depth that was corrected for diurnal cycle fluctuations.
So, for calibration and validation a total of 23 measurement stations are used. Figure 1 shows (a) the individual soil composition information derived from Soilgrids data and (b) the location of the test sites. Table 1 gives information on the station's setup and their assignment to calibration and validation classes. The stations represent the typical portfolio of soil compartment distribution of Germany. For spatial evaluation of the RADOLAN_API data set the study uses the European Space Agency's Climate Change Initiative (ESA CCI SM) combined data set in version 4.7 [101][102][103]. The combined data set is derived through a multi-sensor merging approach that uses both active and passive publicly available Level 2 satellite products [102]. The authors chose this data set as evaluation reference because of its wide usage in the scientific community. That makes it well-established and strengths and weaknesses are wellknown [104][105][106][107][108][109]. The ESA CCI SM data set is bilinearly interpolated to the RADOLAN grid for interoperability and comparability.

Antecedent Precipitation Index
The basic idea of the antecedent precipitation index (API) is to take a certain number of preceding time steps and include the respective rainfall amount in the current time step with a time dependent diminishing factor. Equation (1) shows the basic idea of the API formulated by Kohler and Linsley [56], with API being the index value at time step t and t − 1 respectively, γ being the diminishing factor and P t representing the precipitation amount at the current time step t.
Derivations of this formula exist in manifold ways that allow using the API as soil moisture proxy. Crow et al. [64] e.g., apply a simple cosine based loss function that controls the summerly depletion overhead as replacement for γ. However, several single processes contribute to the reduction of soil water content represented by this diminishing factor. In this study, the antecedent precipitation index algorithm proposed by Pellarin et al. [70] is extended with additional dampening factors in the attempt to improve the empirical representation of local dominant processes that control the soil moisture loss. Temperature values from the ERA5 reanalysis data set [51,92] is used as proxy information for the upwards water loss in the soil column through evapotranspiration (Equation (3)). With that, sub-daily variations in water loss can be attributed instead of e.g., applying seasonal varying loss factors. Including this extra amount of data might be considered as immoderate contribution to a simple soil moisture index. However, this procedure will provide a better guidance in temporal scales of multiple days or weeks and hence allows e.g., for hot dry spells in summer or earlier onset of winter to be better accounted for. Local saturation state and soil properties control the amount of gravity driven drainage of soil moisture to the lower soil compartments through the process of percolation (Equation (4)). Together, factors a and b reduce the amount of soil water from time step t − 1 to yield the current state soil moisture index API at time step t: and where θ sat = maximum saturation, θ res = residual saturation, P t = precipitation [mm], d = depth [mm], T = temperature, α = temperature scaling factor, clay = clay content [%], β = clay scaling factor and γ ≥ 1 regulating the peak outflow. Figure 2 shows a graphical representation of the loss scaling factors a and b for different overall settings. For this study β is fixed to 0.05 due to computational restrictions. The saturation state in Equation (4) calculates from maximum saturation θ sat and residual saturation θ res . The respective values are specifically derived based on empirical relationships of the Interaction Soil Biosphere Atmosphere (ISBA) model [110,111] with incorporating local sand content. Information on soil texture composition is derived from the Soilgrids [87] data sets. The intent behind factor a is to allow quick outflow if the soil is near saturation. Loss factor b is responsible for loss due to temperature but takes clay content and an empirical scaling factor into account. In this study, we differentiate between the API, which uses empirical scaling factors optimized for all station data, and the local API (lAPI) that uses per-station optimized empirical scaling factors α and γ. Both indices take the respective local soil characteristics and temperature data into account.

Calibration and Validation Procedure
The coupling of precipitation information from the RADOLAN weather radar with distributed soil information and temperature data makes the retrieval of spatio-temporal API values possible. The empirical approach of the calculation implies that the variables α and γ contributing to the loss factors a and b need to be adjusted for best results. Due to the lack of a valid spatially distributed ground-truth data set and computational limits the respective parameters are set constant throughout space and time for the three dimensional API calculation, which is the data represented in the RADOLAN_API data set. However, to highlight the general capacities of the advanced API algorithm itself, a local optimization of the empirical parameters (lAPI) for the single calibration sites is conducted as well. These lAPI realisations on point scale also include local temperature and soil information data but furthermore individually adjusted α and γ values (Table A2).
Three main measures are used for the evaluation of the proposed data set: bias, unbiased root mean square difference (ubRMSD) and Pearson's R linear correlation coefficient.
Quality flags of ISMN and TERENO data sets are respected for calibration and crossvalidation. ESA CCI SM data also ships with quality flags. Such, that indicate snow or cold temperatures are used for all data sets in the point scale evaluation (API, station measurement data and ESA CCI SM data) and also for the spatial evaluation of the RADOLAN_API against ESA CCI SM data. Therefore, a seven day rolling window is additionally applied to help with excluding days after or in between freezing. Remaining quality flags concerning the data quality of ESA CCI SM data itself are only applied to the satellite data set.
Soil moisture data of the TERENO network and the conducted field campaign is used to calibrate the empirical parameters of the API formula that determine the effect of loss factors a and b in Equation (2). The respective stations only provide data for a specific time span and hence calibration was done for the particular available period while omitting a 14 day warm-up period.
Calibration of the respective empirical variables of the API demanded for iteratively calculating the described hourly soil moisture index for each of the 23 single reference stations. Depths of measurement and local soil composition of each installation therefore is taken into account. The applied optimization procedure evaluates the calculations against measured soil moisture station data. In case of the API (RADOLAN_API) version, the minimization target is defined as the mean RMSD across all stations against the measured data. The empirical variables α and γ are optimized iteratively and finally selected based on the outcome of the procedure. For the individually optimized lAPI on point scale, the variables are optimized on a per-station basis. In both variants, the whole available time series was used at each point with no masking besides respecting the warmup period. The Nelder-Mead algorithm is used for minimization of the respective target variable (mean) RMSE [112]. Gao and Han [113] state that this is the most widely applied direct search method for unconstrained optimization problems and further improved the algorithm for solving problems more efficiently in high dimensions.
For validation purposes the soil moisture stations are randomly split and assigned to two sets, I and II (Table 1), to carry out a standard cross-validation. Hence, the calibration approach described above is repeated for the respective calibration subset of stations and the resulting empirical variables are used to validate the remaining subset against the appropriate soil moisture station data. Table A2 gives an overview of respective α and γ values for the overall and local individual optimization as well as the calibration sets.
This study also evaluates the API spatially against ESA CCI data on a pixel-by-pixel basis. Additional to the overall comparison, a monthly and seasonal summary is conducted to provide insights in the temporal dependence of the performance. Furthermore, the local calibrated lAPI time series at the measurement sites are compared to the measurements. The results aim to show the adaptability of the API to local circumstances.

Calibration and Evaluation
The calibration for RADOLAN_API resulted in a mean ubRMSD of 3.37 Vol% with a standard deviation of 1.93 Vol% between the calculated API on point scale and respective reference measurements. Table 3 summarizes the evaluation metrics. The spatial distributed API data set RADOLAN_API is published as CC-BY-SA in form of a netCDF file [73]. The hourly resolution and 1 km × 1 km spatial resolution add up to a data set with dimensions 692 × 1188 × 43,824 (latitude, longitude, time) and a file size of 20.9 GBs that covers the years 2015 to 2019. A summary of the file characteristics is presented in Table A1. In the following "API" refers to the variant of the index included in this soil moisture data set.
The evaluation of the proposed API against ESA CCI SM data and station data respects masking based on the ESA CCI SM flags as described in Section 2.3. Hence, values of metrics in the evaluation differ to the ones obtained in the calibration process itself. This is done to allow the calibration of the empiric variables to also account for seasonality in terms of rising soil moisture in fall and higher soil moisture values after winter. However, for a fair comparison and comparability, the flagged time spans, highlighted in grey in Figure 3, are excluded for the remaining evaluation. That figure shows the best and worst performing station of each network relating to RMSD between API and measurement. Further insight in the exact metrics is given in Table 2 where also metrics for ESA CCI SM versus measurements and ESA CCI SM versus API are included. Table 2. Evaluation of the overall and locally optimized API and ESA CCI SM data against local measurements; and evaluation of ESA CCI SM against overall optimized API; masks are applied as described in the text; the asterisk (*) indicate lower (ub)RMSD and higher R values of the API respectively in comparison to ESA CCI SM data when benchmarked against the local measurements.

RMSD [Vol%]
ubRMSD  The overall optimized API shows a mean RMSD of 4.66 Vol% and ubRMSD of 3.45 Vol% with a respective standard deviation across the 23 point results of 2.47 Vol% and 2.01 Vol% when compared to in-situ station measurements. With that, the differences to measurement data are smaller than for the ESA CCI SM data set in the same comparison (5.99 Vol% and 4.05 Vol% respectively). However, the API shows wider spread and deviation in both RMSD and ubRMSD. The API data set reaches the highest mismatch in RMSD and ubRMSD to the station Wildenrath with 8.87 Vol% and 6.96 Vol% respectively. This station shows extremely low soil moisture values during summer (Figure 3b) which can be traced back to the high sand and low clay content at that site (Table 1). Lowest difference is reached at station Neu Tellin with 1.84 Vol% (RMSD) and for unbiased comparison station Toitz with 1.55 Vol% (ubRMSD). Generally, the sandier sites of the TERENO-NE site perform noticeable better than stations from the other networks.
Overall, the API is similarly correlated to the in-situ measurements like the ESA CCI SM data is, with a mean correlation coefficient of 0.60. Here again, the API shows a higher standard deviation of 0.18 compared to 0.13 of ESA CCI SM data. Expectedly, the lAPI shows increased correlation to the soil moisture measurements with a mean of 0.72 and maximum of 0.88. This higher accordance is shown clearly in Figure 3 where the lAPI distinctly follows the summerly depletion for TERENO-Rur and TERENO-NE time series. The scatterplots demonstrate the improvements accordingly. For Wallerfing sites where there is no seasonality or summerly depletion to follow, the lAPI shows higher and prolonged outflow. This results in a difference in RMSD of −1.15 Vol% and −6.69 Vol% and also lowered ubRMSD that is reduced by 0.16 Vol% and 0.07 Vol% for the stations Wallerfing A2 and Wallerfing P6 respectively. The temporal accuracy and dynamic of the API and lAPI do match the measurement data very well which also the Wallerfing plots (Figure 3b,c) demonstrate clearly. This is the direct effect of the high resolution of the RADOLAN product that directly propagates into a rise in soil moisture.

Two-Fold Cross-Validation
The stations are randomly assigned to two groups I and II for the cross-validation (CV) procedure. Table 1 gives the respective affiliation. Each group is used as calibration and validation group respectively. Run I of the CV uses stations of set I as calibration data and stations of set II as validation data and vice versa for run II. Overall, the cross validation shows in average very similar evaluation metrics like the overall calibration that uses all stations at once ( Table 3). The mean RMSD only increases by 0.07 Vol% to 4.72 Vol% where the standard deviation in the combined validation data set even declines. Table 3. Calibration results of API depletion factors and averaged evaluation metrics for overall calibration and calibration in the cross-validation scheme (no masking applied). In run I the RMSD drops whereas in run II the RMSD with 4.99 Vol% is higher than in the overall calibration. The correlation coefficient behaves similar: the R of both validation sets combined does not change compared to the overall calibration, however run I validation set outperforms the run II validation set. This difference becomes more prominent for the ubRMSD. Both validation sets combined, the error metric does not significantly change compared to the overall calibration with 4.72 Vol% and 4.65 Vol%. But validation in the cross-validation run I outperforms run II validation with 2.59 Vol% ubRMSD compared to 4.25 Vol% ubRMSD. Standard deviation values increase in run II validation accordingly. Figure 4 clearly presents this fact. Whereas RMSD and R do not deviate much from the average distribution, the ubRMSD values for run I and II distinctly differ with a wide spread (standard deviation of 2.13 Vol%) of the metric for the validation set in run II and a compact distribution of ubRMSD for run I (standard deviation 1.28 Vol%). The disparity is to be attributed to the random partition of stations in the two sets: every group got a majority of a distinctive set of soil types (Table 1). Run I with stations of group II as validation stations holds many of the better performing stations, when calibrated separately, mostly of the TERENO-NE site.  Table 3.

Comparison with ESA CCI Soil Moisture Data
The API was compared to ESA CCI SM data on a pixel by pixel basis. Figure Figure A1). In the northeastern part of Germany dissimilarity of the API and ESA CCI SM data is prevalent. At the TERENO sites in this region the API outperformed the ESA CCI SM product (Table 2). A clear reflection of soil types is not to be seen in the error values (Figure 1).
Overall a negative bias of the API compared to the ESA CCI SM product exists, which similarly was present in the per-station comparison. The negative bias of the API in the spatial comparison is most noticeable along river Elbe (52.5 • N, 11.7 • E) and in north western Germany (East Frisia, 53.5 • N, 7.9 • E). In the Harz Mountains the API values are positivly biased which can possibly be attributed to higher precipitation amounts due to orographic rainfall.
Correlation between the ESA CCI SM and the API data set ranges between a R value of 0.4 and 0.8 with lower values in the MAM season. Figure 6 depicts the spatially aggregated evaluation between ESA CCI SM an and API on a monthly basis. It has to be noted, that for masking reasons only about half of the amount of grid cells are available for the winter period ( Figure A2). The monthly evaluation supports the statement of low correlation (with high standard deviation) and higher bias in wintertime where the overall optimized API does not deliver soil moisture values as high as ESA CCI SM does. The monthly mean ubRMSD in the MAM season ranges from 1.0 to 2.0 Vol% (Figure 6 and Figure A1) which does not reflect the strong deviation from the seasons JJA and SON as shown in Figure 5 due to the discussed shift in bias.

Discussion
The introduced RADOLAN_API data set shows very good agreement with local soil moisture measurements in terms of volumetric soil moisture estimation with a mean ubRMSD of 3.45 Vol% and mean RMSD of 4.66 Vol% in the evaluation against 23 measurement stations. The unbiased error values of the hourly soil moisture data set fulfill the criteria of GCOS which proclaims an error threshold of 4 Vol% for soil moisture estimations. With that, RADOLAN_API shows lower mean ubRMSD and lower mean RMSD values than ESA CCI SM (4.05 Vol% and 5.99 Vol%) at the utilized soil moisture measurement stations ( Table 2). An especially strong argument for the weather radar based API is the timely increase of modeled soil moisture that clearly hits the measured upsurge. This is perfectly visible in comparison to station measurements in Figure 3c,d. Delivering this high accuracy in the change signal is a very valuable characteristic and often sought after in the modeling community [26]. Consequently, on point scale, the API outperforms ESA CCI SM data at 16 of 23 stations with regard to ubRMSD against measurement data. This might be explained with the higher spatial and temporal resolution in the original data set than the merged satellite product. Also vegetation influence is reasonably handled in the empiric loss functions of the proposed API.
Additionally, we introduced a locally optimized API (lAPI), that similarly considers soil information and temperature but empirical scaling variables are locally adjusted. Seasonalities of highly volatile soil moisture time series can be even better represented with the lAPI than the overall optimized API. This is convincingly presented in the exemplary station plots in Figure 3. The correlation between modeled data and measurements increases accordingly from a R value of 0.60 (API) to 0.72 (lAPI, Table 2).
The cross-validation procedure of the API shows little differences in validation results of the two respective sets. This means that on the one hand the API formula is robust. On the other hand the need for a local, more individual adjustment based e.g., on relationships between soil properties and actually applied values of the depletion variables is evident. Accordingly, Table 2 shows improvements in all averaged metrics RMSD, ubRMSD and R for the lAPI compared to the API.
Investigation on different temporal resolutions by showing monthly and seasonal aggregates of error and correlation metrics is necessary to get unambiguous results [72]. This approach allowed the authors to identify a local non-stationary bias in the western part of Germany conducting the spatial comparison of the API and ESA CCI SM data.
A dominant pattern of negative biased API values in the southern part of Germany is prevalent. For these regions, the ESA CCI SM data set shows strikingly high mean soil moisture estimates ( Figure A3). Missing sensor data in the constellation contributing to ESA CCI SM data can lead to differences in soil moisture estimates [101]. In this regard, an accumulation of acquisitions that use a specific sensor combination while featuring the distinct bias pattern, could however not be confirmed as possible reason ( Figure A3). These high values in the ESA CCI SM data set coincide with occurrences of Leptosols and Cambisols from material derived from limestone, marlstone and dolomite weathering [114]. In the same region the hydrogeology is dominated by karstified or fissured jurassic calcareous fazies in the base rock [115]. The attributed low air capacity of the effective root zone and indicated low soil moisture at field capacity for this area does not fit the behaviour of the ESA CCI SM data [116,117]. Wagner [118] discusses unexpected backscatter effects in microwave satellite data. Increased surface roughness of dry soils containing rock fragments in the top layer might be the explanation for the very high soil moisture values in these regions [118].
Many processes and properties on earth's surface that affect the water and energy fluxes are not directly included in the proposed empirical API model that seeks to avoid the input data overhead. The two most obvious of these factors might be soil organic carbon (SOC) and vegetation cover: predominantly in dry conditions, SOC explains variance better than soil texture [119]. Furthermore, de la Torre et al. [120] show that vegetation strongly influences soil dry-down rates through evapotranspiration. Empirically modeling the manifold effects that vegetation related processes have on soil moisture in the API formula certainly holds error potential. Yet, the results at the single measurement station sites indicate that the diverse interactions were well mimicked with the applied loss functions at least for grasslands and agricultural sites.
Still, uncertainties in the utilized data sets propagate and introduce errors in the soil moisture estimation. Tifafi et al. [121] e.g., point out spatial representativeness errors in the Soilgrids data set, specifically however for the modeled soil organic carbon. Also, inaccuracies in the RADOLAN data set directly propagate into the API values. Good overall agreement of the weather radar data with the renowned GPM data set has been shown but seasonal differences in performance may be the reasoning for the higher negative bias values in the eastern part of Germany [39]. An investigation of the effects of such input data inherent deficiencies has not been carried out by the authors and is not in the scope of this article. Usage of further downscaled soil texture information as shown by Marzahn and Meyer [122] can guide the way of the proposed API towards field scale soil moisture estimations.

Conclusions
This study introduces the hourly weather radar data, temperature and soil information based soil moisture data set RADOLAN_API. The utilized empirical variables in the API formula are once optimized to be used in the spatial API data set RADOLAN_API, but also on a per-station basis to evaluate the adjustability of the improved API algorithm to given specific circumstances concerning interplay of soil characteristics and natural surroundings. Evaluation of the modeled soil moisture data was conducted on different temporal resolutions, covering daily, monthly and seasonal aggregations.
The API generally shows good agreement with measured data especially for timely detection of the onset of soil moisture increase. This characteristic is taken from the high temporal resolution of the RADOLAN weather radar data input. Also, performance of the RADOLAN_API in terms of error metrics against in-situ soil moisture data is very good with a mean ubRMSD of 3.45 Vol% across 23 stations and hence complies with the GCOS threshold of 4 Vol%. The local adjusted API accomplishes an ubRMSD of 2.84 Vol% against said stations, mostly because the individual seasonality can be better depicted on different soils with local optimization of the empirical parameters in the API algorithm.
Thus, the API is capable of rendering the soil moisture development on point scale and spatially distributed with a focus on detection of rapid moisture change. It has been shown that the per-station optimized API data set greatly benefits from a local optimization of the empirical variables and allows for better representation of seasonal variability than the overall optimized API. Hence, the authors suggest to establish a relationship between soil properties and the locally adjusted empirical loss factors through e.g., cluster analysis in further research. Usage of soil texture data of even higher spatial resolution for downscaling the API needs to be discussed. Independent of eventually investigated spatial scales, a set of distributed empirical factors regulating the soil moisture dry-down would further improve the empirical modeling of soil moisture with the API, because local prevailing soil conditions could be more individually considered.
Overall, integrating weather radar data in the soil moisture estimation scheme showed to be very beneficial. A high temporal resolution soil moisture data set option for Germany is now available to the scientific community.

Funding:
The project leading to this application has received funding from the European Union's Horizon 2020 research and innovation program under Grant Agreement No. 687320.

Abbreviations
The following abbreviations are used in this manuscript: