Infilling Monthly Rain Gauge Data Gaps with Satellite Estimates for ASAL of Kenya

Design and operation of water resources management systems in sub-Saharan Africa suffer from inadequate observation data. Long running uninterrupted time series of data are often not available for water resource planning. Incomplete datasets with missing gaps is a challenge for users of the data. Inadequate data compromise results of analyses leading to wrong inference and conclusions of scientific assessments and research. Infilling of missing sections of data is necessary prior to the practical use of hydrometeorological time series. This paper proposes the use of Tropical Rainfall Measuring Mission satellite data as a viable alternate source of infill for missing rain gauge records. The least square regression method, using satellite-based estimates of rainfall was tested to fill in the missing data for 153 data points at nine rain gauge stations in Machakos, Makueni and the Kitui region of Kenya. Results suggest that the satellite rainfall estimates can be used as an alternative data source for rainfall series where the missing data gaps are large. The infilled data series were used in the development of monitoring, forecasting and drought early warning for Arid and Semi-Arid Lands (ASAL) in Kenya.


Introduction
As in other Arid and Semi-Arid Lands (ASAL) of Kenya, climatic variations have been experienced over the years in the south eastern lowlands of Kitui, Machakos and Makueni counties.The typical approach to gaining understanding of climate variability starts with the acquisition of historical data.For rainfall, historical data provide necessary information about accumulated amounts in both time and space and form the basis for fitting and testing stochastic data-based distribution models.When historical data is unavailable in a region, or available data is inaccurate or incomplete in a spatial or temporal sense, geophysical models can be used to 'fill in' the missing values [1].According to Collischonn et al., [1], areal rainfall estimated by rain gauges exhibits a great deal of uncertainty where the rain gauge network is sparse.This problem is related to the differences in distribution of rain gauges around the region.This situation also affects the quality of data.This paper suggests a method of improving rain gauge-based rainfall measurement datasets through infilling missing gaps using remotely sensed rainfall estimates.
Generally, in operation and model validation of meteorological data, surface observations are considered to be "the truth" [2].Analysis of climatic systems require availability of data forming a complete and homogeneous series to enable generalised deduction and inference from results [3].This is especially important for those approaches that use statistical techniques based on the estimation of covariance matrices, e.g., the principal component, cluster, or discriminant analysis, the canonical correlation method, and the method of multiple linear regressions [4].In Africa in general and Kenya in particular, incomplete datasets of climatic variables are frequent with the ensuing appearance of gaps in the measurement series [5].The existence of missing values in the data series affects the variable estimation from the series [6], and the output of multivariate analysis techniques [7].
Hydrometeorological data analysis such as drought assessment and forecast benefit from a complete dataset [8].A possible way of minimizing the influence of missing data is to rebuild the series, filling in the gaps with estimated values.Various methods for the estimation of missing values in climatological series exist.Bareither et al., [2], evaluated the influence of replacing missing meteorological data with estimates on hydrologic predictions for a water balance model in a semiarid climate.According to Bareither et al., [2], surrogate data technique yields modest predictions of annual water percolation that are statistically similar to percolation predicted using actual data.Aly et al., [9], evaluated deterministic and stochastic interpolation methods to fill gaps in daily precipitation records.
The simplest and more direct methods of data extension take into account the data of the series that is being filled.The arithmetic mean method substitutes missing values by the series mean value of the series.Thus, although the average value of the series is not altered, its variance is reduced and thus the method rendered inefficient to address highly variable climatic quantities, such as precipitation [10].Other methods include the linear interpolation method and the first differences method both of which are particularly appropriate for small temporal scales and variables with high autocorrelation [10].
Methodologies which use information from different sites other than the station with missing data (target station) have also been developed.These methods take into account the spatial variability of the measured variable, ignoring the temporal information in long-time series [11].Such methods include the closest station method [12], the simple arithmetic averaging method; the inverse distance method, the single best estimator method, and the normal ratio method.These methods generally under and/or overestimate the high and low extremes, respectively [13].
Another important set of approaches for gap filling in climatological series is regression methods.These methods are based on relationship techniques of the temporal series of the variable under consideration [14].They take into account the station's 'history' and its climatic characteristics without consideration of spatial dependence of the variables.Uncertainty in climate parameters however originate from its stochastic nature [15], and its magnitude depends on other environmental factors, intrinsic on the recorded value [16].Spatial characteristics of the uncertainty enters the records through the procedure for stations selection [17] when stations other than the target station are considered.The procedures followed for the selection of neighbour stations in the regressive methods utilizes relative weighting, enabling differentiation of analysis from one station to another.The regression methods have the advantage of robustness when dealing with extreme events or local effects [18].This paper utilizes the least square regression method for the estimation of missing data in a monthly precipitation dataset taking into account the measurement uncertainty.The paper addresses the question of whether remote sensing rainfall estimates over a region can be used for infilling missing data in the time series of rain gauge-based data.The Tropical Rainfall Measuring Mission (TRMM) satellite datasets was selected on the basis of its good prior performance in estimating rainfall in East Africa [19,20] in particular and in many parts of the tropics [21] in general.
Errors occurring due to rain gauge measurements are fairly well understood [22], and so, except for their limited coverage, they are ideal for checking satellite estimates [22].The use of satellite estimates to fill rain gauge measurements on the other hand however raises errors due to the space-time differences of the two measurement methods.While rain gauge measurements are point (tens of centimetres in diameter) estimates, satellite measurements are a good attempt to measure rain amounts over areas many kilometres in diameter around a point (rain gauge position).Bell and Kundu [22] investigated the "noisiness" in the comparisons of satellite and rain gauge estimates given the very different observational characteristics of the two.Bell and Kundu [22] observed that the satellite measurements catches glimpses of large areas at infrequent intervals, whereas rain gauges record what happens in small areas continuously.Panet et al., [23] alluded that the presence of non-negligible errors in satellite rainfall estimation presents a hurdle to fully implement the product for wide ranges of hydrologic applications.Gebregiorgis and Hossain [24], however, indicated that the quantitative picture of satellite precipitation error over ungauged regions can be effectively discerned.The paper makes consideration of the space-time scale difference of rainfall estimates based on the point rain gauge measurements and satellite-based estimates.
Rain gauge data series in Machakos, Makueni and Kitui counties of Kenya for the period 2001-2011 has long running data gaps of over two years.These data gaps however form less than 5% of the total length of existing the data for most of the rain gauge stations in the region.The data series are therefore worth consideration for infilling in view of their importance to connect the historical rainfall analysis and the current rainfall situation [25].The purpose of this paper is to proposes the use of TRMM data as a viable alternate source of infill for missing rain gauge records.The method of infill utilizes linear regression relationships and make use the records of a reference station which cover the period of interest.The paper demonstrates the use of satellite rainfall estimate data for extending rain gauge records by infilling missing gaps in a rainfall data series.The method adapts the MOVE.2 approach [26] in a variation of linear regression equations [27], which ensure preservation of characteristics of the statistical parameters (mean, variance and extreme value statistics), of the infilled data series.The Gamma distribution with shape parameter α and scale parameter β is often assumed to be suitable for distributions of precipitation events [28].This distribution has been proven to be effective for the analysis of precipitation data in previous studies [29].The gamma distribution was used in this study to confirm that the infilled data did not alter the parameters of the original series.
In this study, an attempt was made to infill missing monthly rainfall data for 153 missing data points for 9 rain gauge stations in Machakos, Makueni and Kitui counties of Kenya.This paper is organized as follows; first, this introduction giving the background, the problem, the objectives and rationale of the study.The materials and methods used to address the research question and related formulation of proposed solution, and technical details, such as approaches for estimation of infilling model, are detailed in Section 2. The results of the infilling process and evaluation of model achievements in infilling datasets and related statistical test are discussed in Section 3 followed by summary and concluding remarks in Section 4.

Materials and Methods
This study was carried out in Machakos Makueni and Kitui Counties of Kenya.The study area is located in the arid and semi-arid regions of the country.The area lies between Latitude 00 • 03 and 3 • 00 and Longitudes 36 • 45 degrees 39 • 12 (Figure 1).The area receives rains twice a year, with the main rains season occurring in October to December and the lesser rains season occurring in March to May.The annual rainfall ranges from 500 mm in the low moorland areas to 1500 mm in the sub-humid hilltops.The seasonal rainfall is highly variable, erratic and unreliable.

DATA
The data used in this study is of secondary nature comprising of rainfall elements measured on rain gauge instruments in the study area and satellite based rainfall estimates.The rain gauge data series comprised of monthly records for the period 1961-2011 for the different stations.Only rain gauges with missing data gaps were considered in this study.The rain gauge data series used comprised of records for the period 1961-2011 for the different stations.Table 1 below shows the length of the rain gauge data series used in the study.TRMM is a joint mission of the U.S. National Aeronautics and Space Administration (NASA) and the Japan Aerospace Exploration Agency (JAXA), designed to monitor and study tropical rainfall [31].TRMM has a data coverage area, ranging from latitudes 50° S to 50° N, and a spatial resolution (0.25° × 0.25°).The TRMM rainfall estimates have more reliable data than those obtained from other satellites [32].The TRMM data series used in this study comprised of records for the period 1998-2011.

Data Analysis Approach
Two types or techniques of data analysis were considered to achieve the objectives of this study; correlation analysis and the least squares regression method.Data from closest TRMM grid point were compared against each respective rain gauge.
TRMM data from the chosen grid point was used to compare with the corresponding observed rain gauge data.Correlation analysis was used for comparison of the rain gauge and TRMM data fields to confirm relationship of the two data series.
Table 1 shows the locations of the rain gauges matched with the corresponding grid point at which monthly TRMM data was extracted, the estimated distance between rain gauge location and grid point data, the number of data points of missing record and the period of missing rain gauge data.The datasets of the rain gauges had large continuous gaps of missing data for period 2008-2011.
The least square regression method was used to translate estimates of rainfall values donated by the TRMM data series into rainfall values for infilling into the rain gauge series.The viability of the TRMM rainfall data to infill rain gauge missing data gaps was first evaluated through comparison with rain gauge data for the periods in which the rain gauge datasets were complete.Descriptive statistics of station rainfall was calculated for all the TRMM cells and corresponding rain gauge stations, and compared in monthly intervals.Scatter plots of the rain gauge data and TRMM were plotted to confirm the versatility of TRMM data to infill the rain gauge data.

DATA
The data used in this study is of secondary nature comprising of rainfall elements measured on rain gauge instruments in the study area and satellite based rainfall estimates.The rain gauge data series comprised of monthly records for the period 1961-2011 for the different stations.Only rain gauges with missing data gaps were considered in this study.The rain gauge data series used comprised of records for the period 1961-2011 for the different stations.Table 1 below shows the length of the rain gauge data series used in the study.TRMM is a joint mission of the U.S. National Aeronautics and Space Administration (NASA) and the Japan Aerospace Exploration Agency (JAXA), designed to monitor and study tropical rainfall [31].TRMM has a data coverage area, ranging from latitudes 50 • S to 50 • N, and a spatial resolution (0.25 • × 0.25 • ).The TRMM rainfall estimates have more reliable data than those obtained from other satellites [32].The TRMM data series used in this study comprised of records for the period 1998-2011.

Data Analysis Approach
Two types or techniques of data analysis were considered to achieve the objectives of this study; correlation analysis and the least squares regression method.Data from closest TRMM grid point were compared against each respective rain gauge.
TRMM data from the chosen grid point was used to compare with the corresponding observed rain gauge data.Correlation analysis was used for comparison of the rain gauge and TRMM data fields to confirm relationship of the two data series.
Table 1 shows the locations of the rain gauges matched with the corresponding grid point at which monthly TRMM data was extracted, the estimated distance between rain gauge location and grid point data, the number of data points of missing record and the period of missing rain gauge data.The datasets of the rain gauges had large continuous gaps of missing data for period 2008-2011.
The least square regression method was used to translate estimates of rainfall values donated by the TRMM data series into rainfall values for infilling into the rain gauge series.The viability of the TRMM rainfall data to infill rain gauge missing data gaps was first evaluated through comparison with rain gauge data for the periods in which the rain gauge datasets were complete.Descriptive statistics of station rainfall was calculated for all the TRMM cells and corresponding rain gauge stations, and compared in monthly intervals.Scatter plots of the rain gauge data and TRMM were plotted to confirm the versatility of TRMM data to infill the rain gauge data.

Methods of Infilling Rain-Gauge Data
A variation of the linear regression method as employed by Krug et al. [33] was used in this study.The method considers that if records for a normal climatological period of 30-years are incomplete for a desired station, then the records are extended by correlation with a nearby station using Equation (1).
where y 1 = the estimated value for the missing gap rain gauge data for the respective month.y s = the mean value for the respective month for TRMM dataset for the period of record 14 years (1998-2011).b = the slope of the regression line between the concurrent (1998-2011), mean value at the rain gauge station and TRMM.
x 1 = the 30-year (climatological standard 1971-2000) mean value for the monthly rain gauge data.
x s = the mean value for the rain gauge station for the concurrent period with the TRMM.
TRMM datasets were used to estimate monthly values of rain gauge records to fill the gaps.The TRMM data at respective grid point was used as reference for corresponding rain gauge as indicated in Table 1.In this approach, the least squares regression was used in an extension method following a linear form (y = bx + c), but that the coefficient "b" and constant "c" were set not to minimize squared errors, but rather to maintain the sample mean and the variance according to Hirsch [26].Two such linear equations that preserve the sample mean and variance are given in Hirsch [26].These equations are labelled: "Maintenance of Variance Extension, Type I" (MOVE.1)and "Type 2" (MOVE.2).
Hirsch [34] evaluated MOVE.1 for streamflow data extension methods and Parrett and Johnson [35] utilized MOVE.1 for extending streamflow gaging data in eastern Montana over a fifty year period of record.Alley and Burns [36] and Hirsch [26] evaluated MOVE.1, and MOVE.2, for streamflow data extension using least squares linear regression and linear regression plus white-noise based on criteria of sample mean and variance maintenance.These works from the literature suggest that MOVE.2 is the most effective infilling method in preserving the mean, variance, and extreme order statistics of a baseline data.As such MOVE.2 was used in this study to extend the rain gauge data using TRMM monthly rainfall data.

MOVE.2 Method
Missing rain gauge data were infilled using related TRMM data following the MOVE.2 model.The TRMM values are denoted as "X(I)" where "I" is an index of time (month).The rain gauge data values were denoted as "Y(I)".The events for the two sequences are represented as: where "N1" represents the number of rain gauge/TRMM data values used to make the regression equation.When "N1" is less than 10, there is not enough known TRMM data to build a regression relationship (since TRMM data records commences at 1998)."N2" represents the number of missing data gaps in the series.When "N2" is equal to It is not necessary for the two sequences to begin or end simultaneously, nor for the observations be consecutive [26].The MOVE.2 infilling equation yielding an estimate for the missing rain gauge data, denoted as "[yN(I)]", is given by the relationships in Equations ( 6)-( 9) as derived by Hirsch [26].
S (y) = where "m()" and "s 2 ()" represent the mean and variance of the series in the parentheses respectively; "r" represents the product moment correlation coefficient of "x1" and "y1".Thus, in the MOVE.2 method, the mean and variance estimates for "x" are based on all "N1 + N2" observations, and the mean and variance estimates for "y" (i.e., "mN(y)" and "SN2(y)" respectively) are based on the historical values of "y" and on information transferred from the "x" sequence of data.à is a coefficient.

Evaluation of Infilled Data Series
Evaluation was done to examine the extent to which the MOVE.2 method would yield correct values of estimated rain gauge data to infill gaps on repeated trials.Given that the MOVE.2 approach was used in the study to infill long running data gaps, it was necessary to create long running gaps for purpose of the evaluation of the method.The procedure involved removing 12 successive points from the rain gauge data series to create running gaps in the rain gauge series.The gaps were created for each station for the respective years commencing 2007-2011.The months which data was removed was the months which did not have a gap in the original unfilled rain gauge series.If one month had gaps in the original rain gauge data series, this month was not included in this part of analysis.
For each of the months with removed data, a value was computed following the MOVE.2 approach and the same was used to fill the created gap.The months whose data was removed were then replaced with MOVE.2 data series and the series was used in test of reliability.The procedure to remove the data for the respective months was done in steps so that not any more than one year (12 months), were removed at the same time, but each year was removed with successive replacement of the same data to be used in computing MOVE.2 values of next removed year.

Jacknife Sampling Approach
A jackknifing sampling approach was used to evaluate the effectiveness of the MOVE.2 approach to infill rain gauge data gaps.In this simulation, the actual rain gauge data was compared with corresponding TRMM-MOVE.2values for the respective periods.Thus, in this approach, it was easy to compare performance of imputation methods.The following notation was used in a sums of squares equation: Y ijk = is the rainfall value measured on a rain gauge for the ith month, jth year and kth station, and is an element in the rain gauge dataset of the station within the period 1998-2011.For purpose of evaluation, Y ijk was removed from the dataset and replaced with an estimated value Z ijk . Therefore, is the average of the rainfall value measured on a rain gauge for the ith month, jth year and kth station and Y .jk(11) tracks the average change across all the data years as the MOVE.2 model is simulated to estimate the values of removed rainfall values of the respective rain gauge with subsequent replacement.Z ijk is the MOVE.2 estimated value of rainfall for the ith month, jth year and kth station which was used to infill the data gap created by removing Y ijk , Z ijk is thus an element in the TRMM-MOVE.2imputed rainfall series. Likewise, is the average value of the MOVE.2 estimated rainfall for the ith month, jth year and kth station which was used to infill the data gap created by removing Y ijk , and tracks the average change across all the data years as the MOVE.2 values are simulated and added to the series of the removed rainfall values of the respective rain gauge with subsequent replacement.

Evaluation of Errors in the MOVE.2 Estimates
An evaluation of the suitability of the MOVE.2 values for infilling the rain gauge data gaps was done.The evaluation compared samples of the original rain gauge values with MOVE.2 values for the respective sample areas.The evaluation followed 3 steps as follows: first a visual inspection and comparison of non-parametric characteristics of the infilled series against original series was done.The non-parametric comparison considered the descriptive statistics such as median, skewness, kurtosis, minimum and maximum values with due consideration of the influence of the statistics in the distribution of the data sets.
The second step in the evaluation considered the effect of random errors in the computation of the MOVE.2 values.Systematic error inherent in the measurement of rainfall whether in the rain gauge or in the TRMM data series were not considered.An analysis of errors was used to indicate the difference between the computed MOVE.2 values with the original data.The error analysis considered two types of error, the Mean Absolute Percent Error (MAPE), regression residuals.The errors were computed for the samples generated following the jacknife sampling approach in Section 2.3.3.
MAPE is the average of the absolute differences between the estimated values of MOVE.2 and actual rain gauge values, expressed as a percent of actual values.
The SEM was estimated by the sample estimate of the population standard deviation.The SEM assumes statistical independence of the values in the sample and was computed by Equation (14).
where ∂ is the sample standard deviation.
And n is the sample size.
For each of the data series with data replaced with MOVE.2, values, regression analysis and test of equality of means and variance of the series was done.The regression analysis was done to examine how close the replaced data was to the original rain gauge data.Using regression analysis, the capability of the MOVE.2 approach to infill the rain gauge data gaps was tested further.The regression residual was used to estimate the difference between the rain gauge value of the samples (dependent variable) (y) and the predicted MOVE.2 values ( ŷ).For each of the samples used in the jacknife resampling, each data point was estimated by the computed MOVE.2 value and the regression resultant regression residual were considered for the difference with rain gauge values.In the regression analysis, the residuals were computed following Equation (15): Analysis of the residuals was done to determine the difference between the MOVE.2 values and the rain gauge values.

Test of Preservation of Mean and Variance
The method of moments was used to estimate the mean and variance.Parameter estimation was done for the 2-parameter gamma distribution as: E(x) = αβ and ( 16) where E(x) denotes the expected value of the variable and Var(x) denotes the variance.This approach was used since probability distribution function extension data are known to have the same value distribution as the measurement, but on average have no autocorrelations [37].The Student t-test and the F-test were used to compare the means and variance of the original datasets and the extended datasets.Statistical significance of the hypotheses test was determined by p-value at 5% level.

Test of Goodness of Fit
A "goodness-of-fit" test is a procedure for determining whether a sample of n observations, x1, . . ., xn, can be considered as a sample from a given specified distribution.The Pearson correlation coefficient and the coefficient of determination were used to test the closeness of the estimated TRMM-MOVE.2series and the original rain gauge data series.

Stationarity of Extended Time Series
Given that most of the missing gaps in the data series to be infilled occur consecutively in tine sequences, it was necessary to confirm that the time series generated upon infilling of datasets remain stationary.A key assumption in regression is that the error terms are independent of each other.It is therefore necessary to confirm that there is no autocorrelation in the series.The Durbin-Watson test was used to test for autocorrelation.The Durbin-Watson statistic was computed following Equation (18).
where the e i = yiŷi are the observed and predicted values of the response variable for individual i and n = the number of elements in the sample.

Comparison of Rainfall Records TRMM vs. Rain Gauge
A comparison of rainfall data from the rain gauge and TRMM data was done using data for periods of the TRMM data 1998-2011 which were found not have gaps in the respective rain gauge datasets.Figure 2 shows time series plots comparison of the monthly values of the TRMM and rain gauge datasets for Kampi ya Mawe station.From Figure 2 it is observed that the TRMM datasets fit well with the rain gauge datasets for Kampi ya Mawe station.The close association of the rain gauge and TRMM datasets were further confirmed with the scatter plots of respective stations.Figures 3-6 show the scatter plots for Mutonguini, Kambi ya Mawe, Kitui and Mutomo.The scatter plots were done for the month of rain gauge and TRMM data for selected years in the period 1998-2000.From Figure 2 it is observed that the TRMM datasets fit well with the rain gauge datasets for Kampi ya Mawe station.The close association of the rain gauge and TRMM datasets were further confirmed with the scatter plots of respective stations.Figures 3-6 show the scatter plots for Mutonguini, Kambi ya Mawe, Kitui and Mutomo.The scatter plots were done for the month of rain gauge and TRMM data for selected years in the period 1998-2000.From Figure 2 it is observed that the TRMM datasets fit well with the rain gauge datasets for Kampi ya Mawe station.The close association of the rain gauge and TRMM datasets were further confirmed with the scatter plots of respective stations.Figures 3-6 show the scatter plots for Mutonguini, Kambi ya Mawe, Kitui and Mutomo.The scatter plots were done for the month of rain gauge and TRMM data for selected years in the period 1998-2000.From Figure 2 it is observed that the TRMM datasets fit well with the rain gauge datasets for Kampi ya Mawe station.The close association of the rain gauge and TRMM datasets were further confirmed with the scatter plots of respective stations.Figures 3-6 show the scatter plots for Mutonguini, Kambi ya Mawe, Kitui and Mutomo.The scatter plots were done for the month of rain gauge and TRMM data for selected years in the period 1998-2000.The scatter plots indicate that strong positive association exist between the TRMM rainfall estimates and rain gauge-based rainfall observations.From the foregoing comparison of TRMM and rain gauge datasets, it is observed that TRMM rainfall datasets fit closely with rain gauge data series for Machakos, Makueni and Kitui County.As such it is inferred that TRMM rainfall estimates are a viable dataset for use in infilling missing rain gauge data gaps.

Infilling Missing Values of Rain Gauge Data
Missing data gaps in rain gauge datasets were infilled following the MOVE.2 approach.The MOVE.2 infilling model required stepwise approach to be able to account for special discontinuities.The rain gauge and TRMM datasets were arranged into annular sequences of monthly data series such that each month of the year had its own time series of rainfall data.Therefore, for each rain gauge and corresponding TRMM there was 12 series of annular sequence of month data time series (that is the sequence of all January data points for the period of interest for each respective station).In this arrangement for the station with the long missing data gaps (for example Mutonguini with 24 consecutive missing gaps), the missing gaps were reduced at most to two missing data gaps for infilling at the furthest point of estimation.
Following the MOVE.2 approach, stations whose missing gaps occurred earlier than January 2007, had only less than 10 data points of TRMM to be used in the regression.This was so because TRMM rainfall estimates commence in January 1998.As such, the infilling method discussed here applied only for missing gaps occurring at January 2008 onwards.Separate MOVE.2 regression   The scatter plots indicate that strong positive association exist between the TRMM rainfall estimates and rain gauge-based rainfall observations.From the foregoing comparison of TRMM and rain gauge datasets, it is observed that TRMM rainfall datasets fit closely with rain gauge data series for Machakos, Makueni and Kitui County.As such it is inferred that TRMM rainfall estimates are a viable dataset for use in infilling missing rain gauge data gaps.

Infilling Missing Values of Rain Gauge Data
Missing data gaps in rain gauge datasets were infilled following the MOVE.2 approach.The MOVE.2 infilling model required stepwise approach to be able to account for special discontinuities.The rain gauge and TRMM datasets were arranged into annular sequences of monthly data series such that each month of the year had its own time series of rainfall data.Therefore, for each rain gauge and corresponding TRMM there was 12 series of annular sequence of month data time series (that is the sequence of all January data points for the period of interest for each respective station).In this arrangement for the station with the long missing data gaps (for example Mutonguini with 24 consecutive missing gaps), the missing gaps were reduced at most to two missing data gaps for infilling at the furthest point of estimation.
Following the MOVE.2 approach, stations whose missing gaps occurred earlier than January 2007, had only less than 10 data points of TRMM to be used in the regression.This was so because TRMM rainfall estimates commence in January 1998.As such, the infilling method discussed here applied only for missing gaps occurring at January 2008 onwards.Separate MOVE.2 regression The scatter plots indicate that strong positive association exist between the TRMM rainfall estimates and rain gauge-based rainfall observations.From the foregoing comparison of TRMM and rain gauge datasets, it is observed that TRMM rainfall datasets fit closely with rain gauge data series for Machakos, Makueni and Kitui County.As such it is inferred that TRMM rainfall estimates are a viable dataset for use in infilling missing rain gauge data gaps.

Infilling Missing Values of Rain Gauge Data
Missing data gaps in rain gauge datasets were infilled following the MOVE.2 approach.The MOVE.2 infilling model required stepwise approach to be able to account for special discontinuities.The rain gauge and TRMM datasets were arranged into annular sequences of monthly data series such that each month of the year had its own time series of rainfall data.Therefore, for each rain gauge and corresponding TRMM there was 12 series of annular sequence of month data time series (that is the sequence of all January data points for the period of interest for each respective station).In this arrangement for the station with the long missing data gaps (for example Mutonguini with 24 consecutive missing gaps), the missing gaps were reduced at most to two missing data gaps for infilling at the furthest point of estimation.
Following the MOVE.2 approach, stations whose missing gaps occurred earlier than January 2007, had only less than 10 data points of TRMM to be used in the regression.This was so because TRMM rainfall estimates commence in January 1998.As such, the infilling method discussed here applied only for missing gaps occurring at January 2008 onwards.Separate MOVE.2 regression relationships were developed for the stations Kambi ya Mawe, Mutonguni, Kitui, Mutomo, Kisasi, Lukenya, Matungulu, Matiliku and Mutito Forest.
For example, the estimated infilled value for Kitui station for the month of January 2009 followed the Equation (19).
This equation was used to estimate the value of infilled rain gauge missing gaps for January 2009 in Kitui station.The subsequent gaps occurring in the months of February 2009-December 2009, were filled by using the appropriate TRMM value for the respective months as in Equation ( 19).Tables 2  and 3 show the MOVE.2 parameters used to compute the infilled values following equation 16 for Kitui and Mutonguini station (Kambi ya Mawe, Mutomo, Kisasi, Lukenya, Matungulu, Matiliku and Mutit forest).Similar parameters were computed to infill gaps in other stations.One hundred and forty-five data gaps for nine stations infilled in this method.
For the first month of long running data gap, the value of the coefficient à changes with the value of N1 and N2.This change affects the computed value of S'(y) which is estimated variance of the infilled series.The value of N1 changes due to overlap of the months for the preceding year.However, the equation of the infill remains the same for the subsequent year because of the effect of the change in N1 to N1 + 1 and N2 (1st gap for the year) form intrinsic part of S'(y).
The equation was developed with MOVE.2 approach with the intention to preserve mean and variance.Subsequent infilling of other data gaps followed the MOVE.2 process.The MOVE.2 equation was only applied at N1 and N1 + 1 depending on the number of data gaps for each month at each station.For each station, the regression equation was applied to estimate the respective value of the data gap.In this method, each month at which data was estimated was considered independent of the previous estimate.Table 4 shows the number of data gaps infilled for the respective months for the stations.

Evaluation of Infilled Data Series
MOVE.2 values were evaluated for precision and accuracy in estimating the rain gauge values.The evaluation involved comparison sampled series of rain gauge data which were removed from the series and replaced with estimated values following a jacknife approach of replacement.Following the approach described in Section 2.3.3,MOVE.2 values were computed for gaps which were created by removing some rain gauge data.Figures 7 and 8, show plots of infilled data plotted against the rain gauge data in the removed areas for Kambi ya Mawe and Kisasi stations respectively.From the figures it is observed that the MOVE.2 infilled values follow the rain gauge data closely, but they are not a one-on-one match.

Evaluation of Infilled Data Series
MOVE.2 values were evaluated for precision and accuracy in estimating the rain gauge values.The evaluation involved comparison sampled series of rain gauge data which were removed from the series and replaced with estimated values following a jacknife approach of replacement.Following the approach described in Section 2.3.3,MOVE.2 values were computed for gaps which were created by removing some rain gauge data.Figures 7 and 8, show plots of infilled data plotted against the rain gauge data in the removed areas for Kambi ya Mawe and Kisasi stations respectively.From the figures it is observed that the MOVE.2 infilled values follow the rain gauge data closely, but they are not a one-on-one match.

Comparison of Descriptive Statistics
Statistical parameters Mean, Median, standard deviation (Std.Dev), standard error of the mean (Std.Err.Mean), Minimum, Maximum, Skewness and Kurtosis were used to compare the MOVE.2 values infilled in the gaps where rain gauge data had been removed.In this analysis, the difference between the respective summary statistics of the rain gauge values and the MOVE.2 estimates were evaluated.Altman and Bland [38], recommended the use of the difference approach for comparison of summary statistics.The evaluation was done based on a non-parametric approach considering only the arithmetic difference of the statistics.For each station, the arithmetic difference in the summary statistics (Median, Standard Deviation, Standard Error of the Mean, Maximum and Minimum, Skewness and Kurtosis), of the samples originating from the samples of monthly values of rain gauge and MOVE.2 values were evaluated.Table 5 shows the computed differences of the summary statistics.

Difference in the Standard Error of the Mean
The standard error of the mean (SE of the mean) estimates the variability between sample means that were obtained when multiple samples from the same population.In this study, the difference between the standard error of the mean of the samples of rain gauge values and standard error of the mean of the samples of the MOVE.2 estimates, were used to compare the difference in variability of the mean of the rain gauge values and the values of computed MOVE.2 estimates placed at gaps previously created by removing the rain-gauge values against the true rain-gauge values at those respective positions.Reading from Table 5, lower values (less than 2 standard deviations), of the difference of the standard error of the mean indicate closeness to precision of the MOVE.2 estimates to the rain gauge values.
In this way, the difference in the standard error of the mean as indicated in Table 4 is an indication of the deviation of the MOVE.2 estimates from the actual values of variability of the mean.The units of the standard error of the mean are rainfall units (millimetres-mm).The standard error of the mean is a good indicator of the precision of the estimated MOVE.2 values to infill respective rain-gauge values.This analysis is in line with inference made by Altman and Bland [38], that 95% of observations fall within 2 standard deviations.The difference in the standard error of the mean summary statistics viewed in this manner therefore indicates close proximity for all the samples of MOVE.2 values and rain gauge values analysed.
Thus, in this analysis the arithmetic difference between the standard error of the mean of the MOVE.2 infilled values and standard error of the mean of the rain gauge values indicates closeness of the estimated (MOVE.2 values) to the actual data (rain gauge values) for each of the removed data gaps.

Difference in the Median
The median is a measure of location which is useful, particularly when a distribution is skewed, and the end-values are not known, or when it is required that reduced importance be attached to outliers.This consideration is necessary for the purpose of measurement of errors.Given that the median is the 2nd quartile, 5th decile, and 50th percentile, the median values in this study were used alongside the minimum and maximum values of rain gauge data to determine the central location of the data series and compared the same with that of the MOVE.2 infilled series.
From Table 5, it is observed that the difference in the skewness and kurtosis of the two samples data sets is small (less than 1).The low difference analysed imply that the skewness and kurtosis of the samples of the rain gauge data series and the MOVE.2 values are in close proximity.This is an indication that the infilled datasets do not significantly affect the skewness nor the kurtosis of the data series.This is inferred due to the fact that the distribution of the differences in skewness and kurtosis was always symmetrical about zero, and of magnitude less than one, in the respective periods for all the stations.
It is also worth noting that the differences analysed in the minimum values was always low (less than 10 mm of rainfall).The minimum rainfall occurs during the non-seasonal months of January, February, June, July, August and September.On the other hand, systematic errors for TRMM estimates have been observed to be more during the non-rain months, since aggregation of hourly TRMM always gives values more than zero [39].The difference in the maximum value is affected by large outliers associated with the influence of rainfall by topography.TRMM measurements have also been associated with low skill on highly variable topographic regions [39].Reading from Table 5, the infilled data series was observed to maintain the location of the median value as exhibited by the rain gauge series without affecting the skewness nor the kurtosis of the distribution of the infilled series for all the stations.
The Wilcoxon signed test is a non-parametric statistical hypothesis test used when comparing two related samples, matched samples, or repeated measurements on a single sample to assess whether their population mean ranks differ (i.e., it is a paired difference test).Given that the median is a measure of the central location in a data series.The Wixcon test was used to evaluate the closeness of the median to the position of the mean.
The requirements for the Wilcoxon Signed-Rank Tests for Paired Samples where z i = y i -x i for all i = 1, . . ., n, are as follows: The z i are independent; x i are differences data; The distribution of the z i is symmetric (or at least not very skewed).The null hypothesis was thus stated as follows: H 0 : the distribution of difference between paired values of the median of the samples of rain gauge data and the corresponding MOVE.2 values is symmetric about zero.(That is, any differences are due to chance).The test was done at values of α = 0.05 and n = 14 (i.e., the number of values of the TRMM period of rainfall data).From the statistical table we find that Tcrit = 21 (two-tail test).Since T critical = 21 < 35.5 = T.The decision to reject or accept the null hypothesis was done at α = 0.05 (i.e., p ≥ 0.05).Table 6 below shows the decision (1) to accept and (0) to reject the null hypothesis, of the Wixcon test and so conclude there is no significant difference between the two data series.
From Table 6, it is observed that a mix of both acceptance and rejection of the null hypothesis is analysed for the different samples at different stations.Notable in this analysis is the scenario in the months of April, October, November and December where all the stations accepted the null hypothesis indicating therefore that the median was close to the mean in these months.The months of February, June, July, August and September exhibited rejection of the null hypothesis, thus indicating that the median location was not close to the mean.It is worth noting that the months of April, October, November and December are the months with the highest seasonal rainfall exhibiting the high rainfall amounts in the study area.Mahmud et al., [40], analysed similar characteristics between TRMM estimates and rainfall in Peninsular Malaysia and noted that correlation between TRMM and monthly rainfall was good during the wettest months in all local climate regions.Thus, borrowing from Mahmud et al., [40] it may be inferred that the difference indicated in the median probably originate from the TRMM data rather than induced from the MOVE.2 analysis approach.

Parametric Evaluation of Infilled Data Series
An error of measurement is the difference between an obtained value and its theoretical true score counterpart.Two types of errors were used to evaluate the accuracy and precision of the estimated MOVE.2 data series, including Mean Absolute Percent Error (MAPE), and analysis of regression residuals.

Mean Absolute Percent Error (MAPE)
MAPE was used as a measure of accuracy of the infilled data series of the sampled MOVE.2 data to estimate the respective rain gauge rainfall values, Hyndman and Koehler [41]; Wilson [42] recommended that MAPE be used for evaluation of cross-sectional estimates such as the MOVE.2 estimates of rain gauge rainfall.MAPE expresses the accuracy of the MOVE.2 infilled data series as a percentage of the rain gauge data series.Figure 9, shows the distribution of MAPE for the nine stations in the study.From Figure 9, it is observed that low values (less than 100%), of MAPE are analysed for the months of January, March, April, May, October, November and December.These months are the period of seasonal rainfall for the study area.However, a drastic increase and extremely high values of MAPE are analysed for the months of February, June, July, August and September, which also happen to be months of low rainfall amounts.
Given that the MAPE is a relative measure which expresses errors as a percentage of the actual data, it provides in this analysis an easy and intuitive indication of the distribution of errors in the infilled series of estimated rain gauge values.It also gives a way of judging the extent, or importance of errors, such that in this case an error of 10% when the actual value is 100 (making a 10% error) is more worrying than an error of 10 when the actual value is 500 (making a 2% error).This aspect is clearly indicated with the low values of error for the months of high seasonal rainfall and the high values of error during the months of low seasonal rainfall.Thus, the distribution of the MAPE indicated in Figure 9 is an indication of relatively acceptable distribution of errors for the infilled MOVE.2 derived estimates.

Error in the Regression Analysis
Figure 10 shows regression results of the samples of Kisasi station for the year 2011.The plot shows the MOVE.2 values for the year 2011 against rain gauge values for the same year for the station.In this plot each point plotted on the figure indicates where the MOVE.2 values are plotted on the xaxis, and the accuracy of the observations are on the y-axis.The distance from the solid line (perfect agreement) indicates the magnitude of the error (residual) on the prediction of the value.Values above the solid line mean the prediction was too low, and values below the solid line mean the prediction was too high.In this regression analysis, it is observed that the computed MOVE.2 values are close to the rain gauge values with relative small margins of errors.This analysis was repeated for all the stations and similar results were observed.From Figure 9, it is observed that low values (less than 100%), of MAPE are analysed for the months of January, March, April, May, October, November and December.These months are the period of seasonal rainfall for the study area.However, a drastic increase and extremely high values of MAPE are analysed for the months of February, June, July, August and September, which also happen to be months of low rainfall amounts.
Given that the MAPE is a relative measure which expresses errors as a percentage of the actual data, it provides in this analysis an easy and intuitive indication of the distribution of errors in the infilled series of estimated rain gauge values.It also gives a way of judging the extent, or importance of errors, such that in this case an error of 10% when the actual value is 100 (making a 10% error) is more worrying than an error of 10 when the actual value is 500 (making a 2% error).This aspect is clearly indicated with the low values of error for the months of high seasonal rainfall and the high values of error during the months of low seasonal rainfall.Thus, the distribution of the MAPE indicated in Figure 9 is an indication of relatively acceptable distribution of errors for the infilled MOVE.2 derived estimates.

Error in the Regression Analysis
Figure 10 shows regression results of the samples of Kisasi station for the year 2011.The plot shows the MOVE.2 values for the year 2011 against rain gauge values for the same year for the station.In this plot each point plotted on the figure indicates where the MOVE.2 values are plotted on the x-axis, and the accuracy of the observations are on the y-axis.The distance from the solid line (perfect agreement) indicates the magnitude of the error (residual) on the prediction of the value.Values above the solid line mean the prediction was too low, and values below the solid line mean the prediction was too high.In this regression analysis, it is observed that the computed MOVE.2 values are close to the rain gauge values with relative small margins of errors.This analysis was repeated for all the stations and similar results were observed.Figures 11-19 shows plots of the mean of regression residuals for each station.The mean of regression residuals was computed for the number of years which were sampled for each of the stations.From the plots of mean regression residuals, it is observed that the mean residuals are not evenly distributed vertically, such that there are positive and negative residuals.It is also observed that the residual exhibit high variability but certain patterns are easily discerned from the plots.For example, it is notable that the during the months of October, November and December which also are the main rainfall season of the study area, the residuals exhibit low values (less than 30 mm) for all the stations.The months of March and April exhibit high variability of the regression residuals across the stations.This is despite the two months being months of seasonal rainfall in the study area.The high variability of regression residuals during the March-April-May season may be related to the high unreliability of rainfall during the period [43].Glover et al., [44], estimated the unreliability of rainfall in the April to May season in the South Eastern parts of Kenya within which this study was conducted at 40%.The 40% unreliability of seasonal rainfall depicts a situation of erratic characteristics of rainfall with high variability.Figures 11-19 shows plots of the mean of regression residuals for each station.The mean of regression residuals was computed for the number of years which were sampled for each of the stations.From the plots of mean regression residuals, it is observed that the mean residuals are not evenly distributed vertically, such that there are positive and negative residuals.It is also observed that the residual exhibit high variability but certain patterns are easily discerned from the plots.For example, it is notable that the during the months of October, November and December which also are the main rainfall season of the study area, the residuals exhibit low values (less than 30 mm) for all the stations.The months of March and April exhibit high variability of the regression residuals across the stations.This is despite the two months being months of seasonal rainfall in the study area.The high variability of regression residuals during the March-April-May season may be related to the high unreliability of rainfall during the period [43].Glover et al., [44], estimated the unreliability of rainfall in the April to May season in the South Eastern parts of Kenya within which this study was conducted at 40%.The 40% unreliability of seasonal rainfall depicts a situation of erratic characteristics of rainfall with high variability.Figures 11-19 shows plots of the mean of regression residuals for each station.The mean of regression residuals was computed for the number of years which were sampled for each of the stations.From the plots of mean regression residuals, it is observed that the mean residuals are not evenly distributed vertically, such that there are positive and negative residuals.It is also observed that the residual exhibit high variability but certain patterns are easily discerned from the plots.For example, it is notable that the during the months of October, November and December which also are the main rainfall season of the study area, the residuals exhibit low values (less than 30 mm) for all the stations.The months of March and April exhibit high variability of the regression residuals across the stations.This is despite the two months being months of seasonal rainfall in the study area.The high variability of regression residuals during the March-April-May season may be related to the high unreliability of rainfall during the period [43].Glover et al., [44], estimated the unreliability of rainfall in the April to May season in the South Eastern parts of Kenya within which this study was conducted at 40%.The 40% unreliability of seasonal rainfall depicts a situation of erratic characteristics of rainfall with high variability.Figure 20 shows the normal probability plot of the residuals.In Figure 20, the pattern of the residuals curve is approximately linear indicate that the residuals are normally distributed hold.Figure 20 shows the normal probability plot of the residuals.In Figure 20, the pattern of the residuals curve is approximately linear indicate that the residuals are normally distributed hold.Figure 20 shows the normal probability plot of the residuals.In Figure 20, the pattern of the residuals curve is approximately linear indicate that the residuals are normally distributed hold.Figure 20 shows the normal probability plot of the residuals.In Figure 20, the pattern of the residuals curve is approximately linear indicate that the residuals are normally distributed hold.

Test of Equality of the Mean and Variance
The statistical tests t-test and F-test were based on two approaches, first the data of the series generated with the jacknife sampling was arranged in the order of running calendar months (a series for each station for the sample period) containing data of the generated MOVE.2 values, and the dataset of respective rain gauge values arranged in a similar manner and the mean/variance of the two series were compared for equality in a t-test and F-test respectively.
The data of the series generated with the jacknife sampling was arranged along the annular month (a series of data of the order of annular modes representing year to year variability), month-month values beginning 1998 up to and including the year of which there was replacement with the MOVE.2 value).The sequence of the annular series was such as that the sequence of values was, for example: Jan 1998, Jan 1999, Jan 2000, ..., Jan 2011).A similar sequence for each of the 12 calendar months were developed.Two annular months series, one with the surrogate data and the other of rain gauge data within the sections without gaps were developed.The mean and variance of the two series were compared for equality in a t-test and F-test respectively.This approach applied only for the years preceding the gaps in the respective stations.The years within the gaps area as indicated in Table 1 and the years after the appearance of gaps were not included in the analysis.

Two-Sample t-Test for Equal Means
The two-sample t-test [45] was used to determine if the means of the rain gauge data series and the MOVE.2 estimated series are equal.The test, was used to determine whether a significant difference exists or does not exist between two data sets.The t-test was also used to determine whether the two sample means of two independent samples come from the same population.In the t-test, the formula for calculating "t" is given in equation [46].
The null and alternative hypotheses were stated as follows: H 0 : µ 1 = µ 2 ; the means are equal H 1 : µ 1 = µ 2 ; the means are different This is a two tailed test because the Null Hypothesis does not specify a direction, only the condition of equality.
The t-test indicates that there is not enough evidence to reject the null hypothesis that the two means are equal at the 0.05 significance level.The t-test therefore concluded that the two datasets rain gauge datasets and MOVE.2 infilled datasets have the same means at the 0.05 significance level and that the two datasets may be considered to come from the same population.

F-Test for Equality of Two Variances
An F-test is a statistical test in which the test statistic has an F-distribution under the null hypothesis.An F-test [47] was used to test if the variances of two populations are equal.The F-test used is a two-tailed test.The null hypothesis was stated as: The F Statistic was computed as: F = s 1 /s 2 where s 1 and s 2 are the sample variances.The more this ratio deviates from 1, the stronger the evidence for unequal population variances.The variances are significantly different if F is greater than the appropriate value in the F table.The degrees of freedom for the numerator are (n 1 − 1), where n 1 is the sample size for the group with higher variance.Degrees of freedom for the denominator are (n 2 − 1), where n 2 is the sample size for the denominator group.This is a two-tailed test.The F-test indicated mixed analysis with many favouring acceptance of the null hypothesis and two stations favouring rejection of the null hypothesis.The stations of Kisasi, Kitui, Mutonguini, Mutitu, and Lukenya the null hypothesis was accepted for all the samples.The station of Kisasi indicated rejection of the null hypotheses for two samples 2007-2008 and 2011 while Matiliku indicated rejection of the null hypotheses for one sample 2010-2011 The F test indicates that there is enough evidence to reject the null hypothesis that the two variances are not equal at the 0.05 significance level.
Notable in this analysis is that those months which had incidents favouring the acceptance of the null hypothesis were mainly the months of high rainfall including high seasonal rainfall such including March, April, May, October, November and December indicating that the variance of the two samples MOVE.2 generated surrogates and the rain gauge dataset are equal.The months of low rainfall including January, February, June, July, August and September indicated rejection of the null hypothesis indicating that the variance of the two samples MOVE.2 generated surrogates and the rain gauge dataset are not equal.Details of the computation of the t-test and the F-test may be found in the Appendix of this paper.

Confirmation of Preservation of Mean and Variance
A Gamma probability density function (PDF) was used to confirm the preservation of mean and variance of the infilled data series of rain gauge data following Theiler et al., [48].The data sets of the extended MOVE.2 were fitted into a gamma distribution function and statistical tests of equality of mean and variance was done.Figures 21-24 show the Gamma cumulative distribution function for Mutomo station comparing the plots of original data and that of the infilled data for the month of January.It is observed that the cumulative function of the extended dataset fit well with the distribution of the original data.Given that the gamma function fits well, it is an indication that the plots have similar parameters α and β for the two plots further confirming the assumption of preservation of mean and variance of the MOVE.2 approach.
Hydrology 2016, 3, 40 24 of 37 Mutitu, and Lukenya the null hypothesis was accepted for all the samples.The station of Kisasi indicated rejection of the null hypotheses for two samples 2007-2008 and 2011 while Matiliku indicated rejection of the null hypotheses for one sample 2010-2011 The F test indicates that there is enough evidence to reject the null hypothesis that the two variances are not equal at the 0.05 significance level.
Notable in this analysis is that those months which had incidents favouring the acceptance of the null hypothesis were mainly the months of high rainfall including high seasonal rainfall such including March, April, May, October, November and December indicating that the variance of the two samples MOVE.2 generated surrogates and the rain gauge dataset are equal.The months of low rainfall including January, February, June, July, August and September indicated rejection of the null hypothesis indicating that the variance of the two samples MOVE.2 generated surrogates and the rain gauge dataset are not equal.Details of the computation of the t-test and the F-test may be found in the Appendix of this paper.

Confirmation of Preservation of Mean and Variance
A Gamma probability density function (PDF) was used to confirm the preservation of mean and variance of the infilled data series of rain gauge data following Theiler et al., [48].The data sets of the extended MOVE.2 were fitted into a gamma distribution function and statistical tests of equality of mean and variance was done.Figures 21-24 show the Gamma cumulative distribution function for Mutomo station comparing the plots of original data and that of the infilled data for the month of January.It is observed that the cumulative function of the extended dataset fit well with the distribution of the original data.Given that the gamma function fits well, it is an indication that the plots have similar parameters α and β for the two plots further confirming the assumption of preservation of mean and variance of the MOVE.2 approach.Notable in this analysis is that those months which had incidents favouring the acceptance of the null hypothesis were mainly the months of high rainfall including high seasonal rainfall such including March, April, May, October, November and December indicating that the variance of the two samples MOVE.2 generated surrogates and the rain gauge dataset are equal.The months of low rainfall including January, February, June, July, August and September indicated rejection of the null hypothesis indicating that the variance of the two samples MOVE.2 generated surrogates and the rain gauge dataset are not equal.Details of the computation of the t-test and the F-test may be found in the Appendix of this paper.

Confirmation of Preservation of Mean and Variance
A Gamma probability density function (PDF) was used to confirm the preservation of mean and variance of the infilled data series of rain gauge data following Theiler et al., [48].The data sets of the extended MOVE.2 were fitted into a gamma distribution function and statistical tests of equality of mean and variance was done.Figures 21-24 show the Gamma cumulative distribution function for Mutomo station comparing the plots of original data and that of the infilled data for the month of January.It is observed that the cumulative function of the extended dataset fit well with the distribution of the original data.Given that the gamma function fits well, it is an indication that the plots have similar parameters α and β for the two plots further confirming the assumption of preservation of mean and variance of the MOVE.2 approach.

Autocorrelation Test
Given that the infilled values were translated from different datasets, it is prudent to test for autocorrelation among the adjacent variables.If they are correlated, then this implies that the leastsquares regression underestimated the standard error of the coefficients and predictors can seem to be significant when they may not [49].The Durbin-Watson statistic was used to test for autocorrelation within adjacent values in the new series after infilling of the data.Table 7 shows the values of Durbin-Watson statistic computed for each data series.For autocorrelation test the critical limits are 2 − DL and 2 − DU.The hypotheses were stated as follows: H0: ρ = 0 (no serial correlation);

Autocorrelation Test
Given that the infilled values were translated from different datasets, it is prudent to test for autocorrelation among the adjacent variables.If they are correlated, then this implies that the leastsquares regression underestimated the standard error of the coefficients and predictors can seem to be significant when they may not [49].The Durbin-Watson statistic was used to test for autocorrelation within adjacent values in the new series after infilling of the data.Table 7 shows the values of Durbin-Watson statistic computed for each data series.For autocorrelation test the critical limits are 2 − DL and 2 − DU.The hypotheses were stated as follows: H0: ρ = 0 (no serial correlation);

Autocorrelation Test
Given that the infilled values were translated from different datasets, it is prudent to test for autocorrelation among the adjacent variables.If they are correlated, then this implies that the least-squares regression underestimated the standard error of the coefficients and predictors can seem to be significant when they may not [49].The Durbin-Watson statistic was used to test for autocorrelation within adjacent values in the new series after infilling of the data.Table 7 shows the values of Durbin-Watson statistic computed for each data series.For autocorrelation test the critical limits are 2 − DL and 2 − DU.The hypotheses were stated as follows: H 0 : ρ = 0 (no serial correlation); Since all the values of Durbin-Watson statistic are greater than 2, H 0 is not rejected and the conclusion is that there is no serial correlation in the infilled data series.The analysis of lack of serial correlation in the infilled data series serves to confirm the stationarity assumption of the extended series.

Goodness of Fit
Table 8 shows the computed values of correlation coefficient and the coefficient of determination for the years 2007-2011 for the respective stations.Medium and high values of correlation coefficient and the coefficient of determination were analysed between the original rain gauge data series and the MOVE.2 surrogate series for the different years at all the stations.smoothness of estimator.The sampling methodology used in this study also follows very closely with the suggestions of Guo Hua et al., [54].
In estimating the median using a jacknife sampling approach, Guo Hua et al., [54] observed a lack of smoothness which seemingly was caused by the jacknife inconsistent estimate of the standard error.In this concern, Guo Hua et al., [54], suggested that instead of removing one value at a time in the jacknife, a number of values, equivalent to (d), be removed where n = r.d for some integer r.Guo Hua et al., [54], actually suggested removing out more than d = √ n when estimating the median, but fewer than n values to achieve consistency for jacknife estimate of standard error.These suggestions made by Guo Hua et al., [54], are similar to recommendation of Shao and Wu [53].Therefore, since in the jacknife approach (used as explained in Section 2.3.3),considered the requirement for consistency as suggested by Guo Hua et al., [54] and Shao and Wu, [53], it is expected that the MOVE.2 values as evaluated in this study give a true picture of the capability of estimation of the rain gauge values.
The use of the MOVE.2 method produced infilled series with statistical characteristics (mean, variance and extreme values) of the rain gauge series.The MOVE.2 methodology has desirable properties that enable appropriate preservation of the parameters.The MOVE.2 methods also considered the two distributions as separate and distinct distributions with different parameters yet combining into one distribution with the same parameters.A probability density function (PDF) approach was used to confirm the preservation of mean and variance of the infilled data series of rain gauge data following Theiler et al., [48].Sen and Eljadid, [55] indicated that the gamma distribution has appropriate probability distribution for describing monthly rainfall for arid and semi regions.The month data series of the infilled data sets were fitted into a gamma distribution function and statistical tests of preservation of mean and variance was done.It was observed that the cumulative function of the extended dataset fit well with the distribution of the original data.Given that the gamma function fits well, this is an indication that the plots have similar parameters α and β, confirming the assumption of preservation of mean and variance of the MOVE.2 approach.
No physical quantity can be measured with perfect certainty; there are always errors in any measurement.This means that the measurement of MOVE.2 estimates of rain gauge rainfall values, on a repeated basis as more gaps are infilled, certainly will contain errors [56].The error analysis is an attempt to quantify the uncertainty resulting from the infilled values.The understanding of the errors also contributes to emphasizing the need for care in the measurement and application of refinement of the method for the purpose of reducing the errors.We can thereby gain greater confidence that the computed MOVE.2 values closely approximate the true value [57].Error analysis in this study therefore expresses the uncertainties inherent in the estimated values of rainfall computed by the MOVE.2 approach for infilling in the rain gauge data gaps.As such it is inferred that the results of the error analysis are an indicator of the high quality of the extended data series.It is thus inferred that MOVE.2 approach enables maintenance of high quality rainfall data series even after the infilling of the extended datasets.
A mean-preserving spread is a change from one probability distribution (donor series) to another probability distribution (recipient series), which is formed by spreading out one or more portions of the donor probability density function while leaving the mean of the recipient series unchanged [58].As such, in this study, TRMM data series have proven to be good at preserving the mean and variance contraction of rain gauge data series following Gentzkow and Kamenica [58].
A statistical test confirmed the significance of the similarity of the statistical parameters' mean and variance of the infilled dataset for all the stations.In this study, therefore, it is inferred that the use of TRMM data series to infill rain gauge data following the MOVE.2 approach is the mean and variance preservation method.The approach agrees with Khalema [59], who showed that one can mix a baseline distribution with a Gamma distribution and obtain a mixture distribution which has mean and variance preservation capability.

Reliability and Validity of Infilled Data Series
Harvey et al., [50] identified three factors likely to influence reliability of data infilling, the nature of the donor station (TRMM in this case), the location of the station and duration of the gap and the infilling procedure.In this study, TRMM rainfall estimates were confirmed as a good fit of the rain gauge data.The MOVE.2 regression relationships were developed for rain gauge series and TRMM series data for each month of the 12 calendar months.In this approach the number of missing data gaps was reduced to a maximum of two data points for each month for the longest running series of missing data (24 months).Other stations had at most only one missing data point for the respective month.Giustarini [60] observed that best performances for infilling missing data was obtained when the gaps were comparatively short.In this study, the MOVE.2 approach used along the sequential annual months series reduced the long-running missing gaps to short gaps for the respective months series.As such, this study recommends the use of MOVE.2 in a sequential annual months approach for infilling rainfall data from TRMM estimates for effectiveness.It is also observed that the reduced number of missing gaps for infilling reduces the regression errors, thereby enhancing the reliability of results.This method also agrees with Henn et al., [61] that shorter missing gaps are easy to fill for all methods.Generally, in ordinary regression methods of data infilling, it follows that the RMSE increases with an increase in the proportion of missing values (gap size).Furthermore, the MOVE.2 approach demonstrated in this analysis, suggests reducing the gap size, thereby reducing the RMSE.Thus, the MOVE.2 approach, utilizing sequential annual months, enables the infilling to attain high accuracy even with long gaps of missing data.

Summary
This study tested a methodology for infilling missing gaps in rain gauge observed data series following the least squares regression.The study presented a methodology for infilling the rain gauge data series from a satellite based rainfall estimates.The satellite estimates were extracted from grid points nearest to the respective station.These satellite estimates were used as donor stations.
The study tested the use of the MOVE.2 approach using TRMM satellite data as a donor station.The study therefore addressed an imperative challenge for hydro-meteorological science, of long consecutive missing data gaps among the rain gauge observed data series.This is particularly true for the ASAL of Kenya and Africa whose data gaps are rampant in the hydro-meteorological data series, and also other parts of the tropics where TRMM data observations are available.
In the MOVE.2 approach, the coefficient of linear regression was interpreted as being of marginal effect.This marginal effect corresponds to how the dependent variable (rain gauge data) changes when the independent variable (TRMM data) changes by an additional unit holding all other variables in the equation constant.Based on the data used in this regression, adding one additional month of rain gauge record, corresponded to an increase in monthly rainfall.The sequential annual month arrangement of rain gauge rainfall records helped to operationalise the capability of MOVE.2 approach.With this approach the methodology enabled the preservation of the mean, variance and extreme value statistic for the infilled data series.As such the infilled rain gauge series maintained the same distribution as the observed series.It is, however, worth mentioning that the preservation of the variance was not always upheld, particularly for months of low seasonal rainfall.This observation was also noted for the median.

Conclusions
The results reported in this study provide researchers with a methodological framework that can be readily applied for infilling missing values of rainfall in rain gauge data series using TRMM satellite estimates as donor station.The approach has demonstrated capability of extending monthly rainfall values which remain similar to those observed by way of preserving the statistical parameters such as mean, variance and extreme statistics.The infilled values of rainfall have characteristics like those of the actual records they are intended to represent.
The methodology therefore serves a need as expressed by researchers, for development of generic data infilling methodologies which ensure consistency, auditability and effectiveness in the infilled series.
Infilling of missing rainfall data in the data series using the least square regression in MOVE.2 approach as used in this study promises robustness of methodology even in situations of large and extensive data gaps with a high proportion of missing values.The approach proposes a way of shortening long and running missing gaps into very short and manageable missing gaps.The infilling of short missing gaps as proposed here, promises quality of infilled data and hence quality of predictions for models which utilise the infilled data series.The method offers a viable alternative to traditional infilling approaches.
The results suggest that MOVE.2 utilizing TRMM data is effective for infilling rainfall data series in Machakos, Makueni and Kitui counties of Kenya.The TRMM rainfall products coupled with MOVE.2 approaches could therefore be considered as viable alternative data source for large-scale distributed rainfall analysis for development of hydro-meteorological models such drought early warning, monitoring and forecasting.The approach ensures a consistent and auditable approach towards infilling, which could find application in the ASAL of Kenya and for the tropical regions in general.
The null and alternative hypotheses were stated as follows: H 0 : µ 1 = µ 2 ; the means are equal (20) H 1 : µ 1 = µ 2 ; the means are different (21) This is a two tailed test because the Null Hypothesis does not specify a direction, only the condition of equality.
For a two-sided t-test, the null hypothesis was rejected if the absolute value of the test statistic was greater than the value of t 1-α/2,ν in the t-table.The mean of the series was computed along the annular month.This meant that the degrees of freedom changed for each removal of rain gauge values and subsequent replacement following the jacknife approach.For the year 2007, 18 degrees of freedom was used, in 2008 20 degrees of freedom, and up to 22 degrees of freedom for the year 2009.The result is significant if t is greater than the appropriate value in the t-table.The computed values of t for each month alongside the critical values of the t-test for the respective degrees of freedom are shown in the table below.If the t value calculated from the data is equal to or larger than the critical value, the Null hypothesis of H 0 : µ 1 = µ 2 was rejected otherwise the null hypothesis was accepted.The test was done for the means of all the twelve calendar months for the subsequent data removal and replacement as described in the jacknife approach.Incidentally, all the means computed favoured an acceptance of the null hypothesis, thereby upholding the hypothesis that µ 1 = µ 2 .
Therefore, the t-test indicates that there is not enough evidence to reject the null hypothesis that the two means of the annular month series of surrogate datasets and rain gauge datasets are equal at the 0.05 significance level.The t-test therefore concluded that the two datasets rain gauge datasets and MOVE.2 infilled datasets have the same means at the 0.05 significance level and that the two datasets may be considered to come from the same population.
An F-test is a statistical test in which the test statistic has an F-distribution under the null hypothesis.An F-test [47] was used to test if the variances of two populations are equal.The F-test used is a two-tailed test.The null hypothesis was stated as: The F Statistic was computed as: F = s 1 /s 2 where s 1 and s 2 are the sample variances.The more this ratio deviates from 1, the stronger the evidence for unequal population variances.The degrees of freedom for the numerator are (n 1 − 1), where n 1 is the sample size for the group with higher variance.Degrees of freedom for the denominator are (n 2 − 1), where n 2 is the sample size for the denominator group.The two variances were considered significantly different if ratio F is greater than the appropriate value in the F-table.
In this approach, the F-test indicated mixed analysis with many favouring rejection of the null hypothesis and another few favouring acceptance of the null hypothesis.For the stations of Kisasi, Kitui, Mutonguini, Mutitu, Matiliku and Lukenya, the null hypothesis was rejected for all the samples.The station of Matungulu 6 months indicated acceptance of the null hypotheses (4 in the month of June and 2 in the month of July), Mutomo indicated 12 months 4 in January, 2 in February, 2 in June and 4 in September.Kambi ya Mawe indicated four occasions of acceptance of the null hypothesis 2 in February and 2 in September.The F test indicates that there is enough evidence to reject the null hypothesis that the two variances are not equal at the 0.05 significance level.
Notable in this analysis is that those months which had incidents favouring the rejection of the null hypothesis were mainly the months of high rainfall including high seasonal rainfall such including March, April, May, October, November and December indicated rejection of the null hypotheses for all the stations in all the resampled data series.The months of low rainfall including January, February, June, July, August and September indicated acceptance of the null hypothesis.
Tables A1 and A2 shows the computed t-values and F-values of the samples.

Figure 1 .
Figure 1.Map of Machakos, Makueni and Kitui Counties inset in a Map of Kenya and Africa source: in [30], republished with permission with the Masinde Muliro University of Science & Technology

Figure 1 .
Figure 1.Map of Machakos, Makueni and Kitui Counties inset in a Map of Kenya and Africa source: in [30], republished with permission with the Masinde Muliro University of Science & Technology.

Figure 3 .
Figure 3. Mutonguini rain gauge data plotted against respective TRMM data set for the period November 2001-December 2002.

Figure 4 .Figure 2 .
Figure 4. Mutomo rain gauge data plotted against respective TRMM data set for the period October 2004-October 2005.

Figure 3 .
Figure 3. Mutonguini rain gauge data plotted against respective TRMM data set for the period November 2001-December 2002.

Figure 4 .
Figure 4. Mutomo rain gauge data plotted against respective TRMM data set for the period October 2004-October 2005.

Figure 3 . 37 Figure 2 .
Figure 3. Mutonguini rain gauge data plotted against respective TRMM data set for the period November 2001-December 2002.

Figure 3 .
Figure 3. Mutonguini rain gauge data plotted against respective TRMM data set for the period November 2001-December 2002.

Figure 4 .
Figure 4. Mutomo rain gauge data plotted against respective TRMM data set for the period October 2004-October 2005.

Figure 4 .
Figure 4. Mutomo rain gauge data plotted against respective TRMM data set for the period October 2004-October 2005.

Figure 5 .
Figure 5. Kambi ya Mawe rain gauge data plotted against respective TRMM data set for the period January 1998-May 1999.

Figure 6 .
Figure 6.Kitui rain gauge data plotted against respective TRMM data set for the period November 2001-March 2003.

Figure 5 .
Figure 5. Kambi ya Mawe rain gauge data plotted against respective TRMM data set for the period January 1998-May 1999.

Figure 5 .
Figure 5. Kambi ya Mawe rain gauge data plotted against respective TRMM data set for the period January 1998-May 1999.

Figure 6 .
Figure 6.Kitui rain gauge data plotted against respective TRMM data set for the period November 2001-March 2003.

Figure 6 .
Figure 6.Kitui rain gauge data plotted against respective TRMM data set for the period November 2001-March 2003.

Figure 9 .
Figure 9. Distribution of Mean Absolute Percentage Error of Samples of the Rain gauge Values and the MOVE.2Estimates.

Figure 9 .
Figure 9. Distribution of Mean Absolute Percentage Error of Samples of the Rain gauge Values and the MOVE.2Estimates.

Figure 10 .
Figure 10.Results Regression Analysis of the samples of Kisasi station for the year 2011.

Figure 10 .
Figure 10.Results Regression Analysis of the samples of Kisasi station for the year 2011.

Figure 10 .
Figure 10.Results Regression Analysis of the samples of Kisasi station for the year 2011.

Figure 12 .
Figure 12.Mean of Regression Residuals for Mutomo Station.

Figure 13 .
Figure 13.Mean of Regression Residuals for Lukenya Station.

Figure 14 .
Figure 14.Mean of Regression Residuals for Matiliku Station.

Figure 12 .
Figure 12.Mean of Regression Residuals for Mutomo Station.

Figure 12 .
Figure 12.Mean of Regression Residuals for Mutomo Station.

Figure 13 .
Figure 13.Mean of Regression Residuals for Lukenya Station.

Figure 14 .
Figure 14.Mean of Regression Residuals for Matiliku Station.

Figure 13 .
Figure 13.Mean of Regression Residuals for Lukenya Station.

Figure 14 .
Figure 14.Mean of Regression Residuals for Matiliku Station.

Figure 14 .
Figure 14.Mean of Regression Residuals for Matiliku Station.

Figure 15 .
Figure 15.Mean of Regression Residuals for Mutito Station.

Figure 16 .
Figure 16.Mean of Regression Residuals for Matungulu Station.

Figure 17 .
Figure 17.Mean of Regression Residuals for Kisasi Station.

Figure 15 .
Figure 15.Mean of Regression Residuals for Mutito Station.

Figure 15 .
Figure 15.Mean of Regression Residuals for Mutito Station.

Figure 16 .
Figure 16.Mean of Regression Residuals for Matungulu Station.

Figure 17 .
Figure 17.Mean of Regression Residuals for Kisasi Station.

Figure 16 .
Figure 16.Mean of Regression Residuals for Matungulu Station.

Figure 17 .
Figure 17.Mean of Regression Residuals for Kisasi Station.

Figure 17 .
Figure 17.Mean of Regression Residuals for Kisasi Station.

Figure 18 .
Figure 18.Mean of Regression Residuals for Kitui Station.

Figure 19 .
Figure 19.Mean of Regression Residuals for Mutonguini Station.

Figure 18 .
Figure 18.Mean of Regression Residuals for Kitui Station.

Figure 18 .
Figure 18.Mean of Regression Residuals for Kitui Station.

Figure 19 .
Figure 19.Mean of Regression Residuals for Mutonguini Station.

Figure 19 .
Figure 19.Mean of Regression Residuals for Mutonguini Station.

Figure 19 .
Figure 19.Mean of Regression Residuals for Mutonguini Station.

Figure 11 .
Figure 11.Gamma Cumulative Distribution Plot for Mutomo Station During the month of February.

Figure 12 .
Figure 12.Gamma Cumulative Distribution Plot for Mutomo Station During the month of April.

Figure 21 .
Figure 21.Gamma Cumulative Distribution Plot for Mutomo Station During the month of February.

Figure 11 .
Figure 11.Gamma Cumulative Distribution Plot for Mutomo Station During the month of February.

Figure 12 .
Figure 12.Gamma Cumulative Distribution Plot for Mutomo Station During the month of April.

Figure 22 .
Figure 22.Gamma Cumulative Distribution Plot for Mutomo Station During the month of April.

Figure 13 .
Figure 13.Gamma Cumulative Distribution Plot for Mutomo Station During the month of July.

Figure 14 .
Figure 14.Gamma Cumulative Distribution Plot for Mutomo Station During the month of November.

Figure 23 . 37 Figure 13 .
Figure 23.Gamma Cumulative Distribution Plot for Mutomo Station During the month of July.

Figure 14 .
Figure 14.Gamma Cumulative Distribution Plot for Mutomo Station During the month of November.

Figure 24 .
Figure 24.Gamma Cumulative Distribution Plot for Mutomo Station During the month of November.

Table 1 .
Locations of the rain gauges, corresponding grid point for Tropical Rainfall Measuring Mission (TRMM) data extraction, distance between rain gauge location and grid point data, the number and Proportion of data points of missing record and the period of missing data.

Table 2 .
MOVE.2 parameters used in computation of infilled values for Kitui Station.

Table 3 .
MOVE.2 parameters used in computation of infilled values for Mutonguini Station.

Table 4 .
Distribution of number of data points infilled with MOVE.2 estimated values for respective stations.

Table 4 .
Distribution of number of data points infilled with MOVE.2 estimated values for respective stations.

Table 5 .
Differences in Descriptive Statistics of rain gauge values and MOVE.2 estimates.

Table 6 .
Results of Wixcon Test Comparing the Difference between the mean of the Median of the samples of MOVE.2 Estimates and the Rain gauge Values.

Table 7 .
Durbin-Watson Statistic Matching the size of infilled datasets and Highest Number of Points infilled per month (n).

Table 7 .
Durbin-Watson Statistic Matching the size of infilled datasets and Highest Number of Points infilled per month (n).

Table 7 .
Durbin-Watson Statistic Matching the size of infilled datasets and Highest Number of Points infilled per month (n).

Table 8 .
Correlation coefficient and the coefficient of determination for the years 2007-2011 for the respective stations.

Table A2 .
Computed F-values of samples.

Table A4 .
Results of t-test for the jacknife samples.

Table A5 .
Results of F-test for the jacknife samples.