Spatial Downscaling of Trmm Precipitation Product Using a Combined Multifractal and Regression Approach: Demonstration for South China

The lack of high spatial resolution precipitation data, which are crucial for the modeling and managing of hydrological systems, has triggered many attempts at spatial downscaling. The essence of downscaling lies in extracting extra information from a dataset through some scale-invariant characteristics related to the process of interest. While most studies utilize only one source of information, here we propose an approach that integrates two independent information sources, which are characterized by self-similar and relationship with other geo-referenced factors, respectively. This approach is applied to 16 years (1998–2013) of TRMM 3B43 monthly precipitation data in an orographic and monsoon influenced region in South China. Elevation, latitude, and longitude are used as predictive variables in the regression model, while self-similarity is characterized by multifractals and modeled by a log-normal multiplicative random cascade. The original 0.25° precipitation field was downscaled to the 0.01° scale. The result was validated with rain gauge data. Good consistency was achieved on coefficient of determination, bias, and root mean square error. This study contributes to the current precipitation downscaling 3084 methodology and is helpful for hydrology and water resources management, especially in areas with insufficient ground gauges.


Introduction
Precipitation is a major component of the water cycle and plays a key role in agriculture, ecology, and society [1][2][3][4][5].Hence, high quality precipitation data are essential in many applications such as regional hydrological models and water resources analysis and management [6].Traditionally, precipitation data are acquired from rain gauges, which produce accurate point measurements [7].However, the high spatial-temporal variability of precipitation [8] and relatively sparse distribution of rain gauges make it impossible to provide fine-resolution precipitation data [9,10].Recently, remotely sensed products provide an alternative source for acquiring precipitation data.For example, the Tropical Rainfall Measurement Mission (TRMM) [11] has been widely used in meteorological and hydrological research [12,13] However, its spatial resolution (highest 0.25° × 0.25°) is still too coarse for hydrological simulation [2,14] when applied to local basins and regions.Recent studies showed that the spatial variation of precipitation has profound effects on the hydrological behaviors of catchment [15].The spatial resolution can greatly influence model outcomes and models using raster based precipitation data outperform models that use precipitation derived from point measurements [16,17].Therefore, finer resolution precipitation data are crucial for improving our understanding of basin-scale hydrology [6].During the past decades, many attempts have been made to downscale satellite-based remote sensing precipitation data, while regression and self-similarity approaches are among those most widely used.
The rationale underlying the regression approach is that the spatial pattern of precipitation can be related to some other covariates (e.g., altitude and vegetation) that have higher spatial resolution data available.The spatial resolution of precipitation can then be improved by establishing a functional relationship (e.g., linear combination of basis functions) between precipitation and these factors [18].Both linear regression and non-linear methods (e.g., ANN) have been used to obtain parameters of the assumed functional relationship.Using those same factors at that scale as input data, the model is then used to predict the target scale precipitation trend.However, because of the high variability of precipitation, the trend itself is not enough to represent the actual precipitation at fine scale.Thus, the residual at coarse scale is also utilized (i.e., interpolated to fine scale and added up to the trend), which produces the final fine-scale estimate of precipitation data.Different time scale data, regression factors, and residual interpolating methods are used in the literature.Immerzeel et al. [18] downscaled TRMM-based annual precipitation from 0.25° to 1 km by establishing an exponential relationship between NDVI and TRMM precipitation data, and a simple spline tension interpolator for residual downscaling.Similarly, Jia et al. [2] downscaled the TRMM 3B43-derived annual precipitation data in the Qaidam Basin of China to 1 × 1 km field with topography and vegetation as regression factors.Fang et al. [19] used the orographical effect and pre-storm conditions to downscale TRMM3B42 3H product-derived precipitation events to 1 × 1 km, also applying the spline interpolation for residual.
Park [14] downscaled TRMM 3B43 data in South Korea at a target fine scale of 1 km via using the information on elevation and NDVI for the trend and area-to-point kriging for the residual.
On the other hand, the self-similarity approach is underpinned by a quite different rationale.Precipitation is regarded as a random field with self-similar characteristics (i.e., it is statistically the same when checked at different scales) [20].Conventionally, this self-similarity characteristic is modeled with simple scaling (fractal), which involves only one parameter [21].However, many studies have shown that rainfall is better characterized by multi-scaling (multifractal), which requires a spectrum to describe this scaling property [22][23][24][25][26][27][28][29][30][31][32][33][34].The multiple random cascade model, which has its origin in the study of fully developed turbulence and is capable of capturing the multi-scaling characteristic, is thus applied to simulate the precipitation process [24,28,30,[35][36][37][38][39][40].With the assumption that the scale-invariance holds within the range of interested scales, fine-scale precipitation fields could be generated with the scaling law derived from a relatively coarse scale.However, because the scale-invariance property holds only in the stochastic sense, the final scale product is a random field (i.e., its distribution), which could be numerically realized by running the downscaling procedure many times to generate an ensemble [40].This distribution information could be used for various applications, especially in the analysis of extreme hydrological events [41].To be able to model precipitation fields influenced by orographic and other non-homogeneous factors, a heterogeneous component should be introduced into the multi-fractal framework.Jothityankoon et al. [42] developed a model for analyzing a 400 km × 400 km area in southwestern Australia.Pathirana and Herath [43] proposed a simplified model and analyzed a 128 km × 128 km region centered in the Japanese archipelago.They multiplied a homogeneous spatial random cascade by a deterministic factor, which is the long-term accumulation of the space location.With a similar approach, Badas et al. [6] proposed a procedure of modulation of space-time homogeneous cascades by means of a simple modulation function.
In summary, the merit of the regression approach is the ability to predict the exact value of the field at fine scale, but it is not easy to find those factors that establish a good relationship with precipitation, and the interpolation of the residual usually does not reflect the highly variable nature of precipitation, especially for short-term accumulation.On the other hand, a multifractal model is able to reproduce the scale invariance, clustering, and intermittency of the precipitation field, but the result is a random field at fine scale.In addition, it is hard to incorporate the heterogeneity in downscaling.Although a modulation function can be used to account for coarse-scale heterogeneity [43], fine-scale heterogeneity is still lacking.These two approaches are based on different rationales and use different sources of information, so could be made complementary.Thus, it would be advantageous to combine both methods' merits.
While the main regression approaches employ a linear method, non-linear methods have also been widely applied, especially the artificial neural networks (ANN) [44][45][46][47].Comparison studies show both methods could have superiority to others while dealing with different study areas, predictands, predictors, and statistical indicators used for judgement [48,49].Thus, because there is no a priori supposition of the relationship, we would try both multi-linear and ANN models in our study for regression.On the other side, stochastic precipitation downscaling approaches other than those self-similar models are also widely available.For example, the clustered point process model, which assumes that rain events arrive in a Poisson process, and each event consists of a series of cells clustered with Bartlett-Lewis or Neyman-Scott distributions, could be dated back to the work of LeCam in 1961 [50], and has had numerous applications over the past decades [51][52][53][54].However, we adopt the multifractal model in the present study for its parametric parsimony and the straightforward relationship between different scales.
The purpose of this study is to propose an approach combining both mulifractal and regression methods to downscale the TRMM 3B43 precipitation data.This study could add to our understanding of precipitation downscaling methodology and is useful for hydrology and water resources management for areas without sufficient ground gauges.

Study Area
The study area is a square within 21°-29° N and 104°-112° E (Figure 1), which mainly includes South China's Guizhou, Guangxi, and parts of five adjacent provinces and municipalities, as well as the northern part of Vietnam.Elevation in this region varies from sea level in the southeast to 2962 m in the west, and is characterized by karst landform and complex terrain.This area is a typical monsoon region, jointly influenced by both the South Asia monsoon and the East Asia monsoon.Moisture mainly transfers from the west in winter and spring, from the Bay of Bengal and the South China Sea in summer, and mainly from the western Pacific Ocean in autumn [55].Correspondingly, precipitation in this region also shows clear seasonal variations.More than 75% of the precipitation is received from April to September.The precipitation decreases from southeast to northwest and from south to north, varying from more than 2500 mm in the north coast of Beibu Gulf to about 800 mm in the northwest part [56].
Rapid population growth and extensive management during the past decades have caused the ecosystem in the study area to be overexploited and sensitive to extreme climate events [57,58].Droughts and floods occur frequently and often cause severe damage [59].Furthermore, recent studies show that the coupling of extreme events tends to occur more frequently [60,61], which would trigger secondary geological hazard such as landslides and mud flow.Thus, high-resolution precipitation data are critically needed for both public safety and ecological protection.

Tropical Rainfall Measuring Mission (TRMM) Precipitation Data
The Tropical Rainfall Measuring Mission (TRMM) is a joint project by NASA and the Japanese space agency JAXA.Launched in November 1997, it carries a number of precipitation-related instrumentations on board [11].A range of data products based on this project have been derived, covering the global region between 50° N and 50° S [62].These products use different processing algorithms, and play the role of reference standard for other satellite products.The dataset used in this study is TRMM 3B43 [63], which aims at producing the best estimate of precipitation rate and minimal root-mean-square (RMS) precipitation-error estimates from TRMM and other data sources.The temporal resolution is one month and the spatial resolution is 0.25° × 0.25°.Our study area contains 32 × 32 such pixels, which are used in the multifractal analysis.

DEM Data
The Digital Elevation Model (DEM) is from the Shuttle Radar Topography Mission (SRTM) which produced digital topography data for the land area between 60° N and 56° S [64].The spatial resolution is 1 arc-second (within the United States) or 3 arc-seconds (other areas).The data are aggregated to 0.01° in this study.

Station Precipitation Data
In situ point precipitation data collected by the China Meteorological Administration are used to validate the downscaling results [65].In this study, 16 years (1998-2013) of a continuous dataset on daily precipitation from 72 meteorological stations within the area are used.These data are aggregated to the monthly scale.

Downscaling
In order to propose a downscaling framework, we first assume the heterogeneous portion of the field could be explained by external factors like elevation and location coordinates, and this relationship still holds for the target fine scale.Second, the multifractal characteristics of the homogeneous portions at coarse scale are the same as those at fine scale.The proposed framework is described as follows: At each scale, the precipitation field is supposed to be a combination of a deterministic heterogeneous portion and a random homogeneous portion.At the coarse scale, the heterogeneous portion is indicated by long-term accumulations of precipitation fields, and the homogeneous portion is a self-similar random field that could be characterized by multifractals.These two portions are downscaled with different methods to fine scale, and then the results are recombined to generate the final target field.The specific workflow of downscaling can be described as follows (Figure 2): (1) The original TRMM 3B43 precipitation field is divided into a deterministic heterogeneous portion, which is indicated by multi-year average of monthly precipitation fields, and a random homogeneous portion; (2) a multi-fractal analysis was performed on the homogeneous portion, and a random cascade model is fitted to it; (3) simulation of the multiplicative cascade process is performed 100 times, resulting in an ensemble of downscaled homogeneous portion at the 0.01° scale; (4) stepwise linear and ANN regression models for the heterogeneous portion are established at the 0.25° scale, using elevation, latitude, and longitude as regression factors.The approach with best performs is chosen to predict the heterogeneous portion at 0.01° scale, with the predictive variables at this scale as inputs; (5) this high-resolution predictive precipitation data is multiplied with the high-resolution ensemble obtained with the multifractal model, and the result is normalized (to preserve the 0.25° scale value), giving the final ensemble of downscaled precipitation fields at the 0.01° scale.

Multifractal Model
The approach we use to divide the precipitation field is described in Pathirana and Herath [43].Precipitation at coarse scale is considered as a combined effect of a multifractal process that is statistically uniform over the area of interest, and a process that represents heterogeneity.The derived statistically uniform field M, which is the result of filtering original precipitation spatial heterogeneity, is then stochastically downscaled with a multiplicative random cascade model to the target scale.The mathematical equation is the following: where Ri,j is the precipitation on the pixel (i, j) and Gi,j is the component of that precipitation that is invariant over long-term accumulation.It is assumed that it can be represented adequately by the long-term seasonal average.Then, by definition, Mi,j is a component that is randomly distributed in the space so that M yields a uniform field at large accumulations, hence a candidate for multifractal modeling.
Multiplicative random cascade is used to simulate the homogenous multifractal precipitation field.Here we only briefly describe this process, while a thorough description can be found in [66].A random cascade is a hierarchy generated by steps of multiplication of an original component r0 by a random variable W (called the generator) with positive distribution and expectation (1).At each level, each box is subdivided into b (branching number) equal parts, while the expectation of the total mass is conserved.
At level n there are b n boxes ∆ i n , (i = 1, 2, •••, b n ); the mass density of the ith box is: where λn is the scale ratio: where d is the embedding dimension (d = 2 for spatial case).A schematic illustration is given in Figure 3.The qth order sample moment of the field at level n is: In a random cascade, ensemble moment is power law scaled with scale; the slope is called the Mandelbrot-Kahane-Peyriere (MKP) function: which contains the distribution information of the generator W.
The scaling law of qth order sample moment is represented as: For large n (i.e., λn −> 0), and some range of q, sample moment and ensemble moment have the same scaling law, which gives: In data analysis, τ(q) is replaced with the scaling law of sample moment, and the parameters of W could be estimated.
There are lots of cascade models available, which differ mainly on the type of generator used [30,36].A simple cascade model named lognormal model, originally proposed by Mandelbrot [67], was used because of its simplicity and wide application.Over and Gupta [66] integrated it with the β-model to account for rainy-dry intermittency.Because our study focuses on monthly precipitation, where there is rain almost everywhere, making the rainy-dry intermittency very weak, we just use the original log-normal model.The distribution of W is: where γ and σ are free parameters, and X is a standard Gaussian random variable, i.e., with zero mean and unit variance.The MKP function is [38]: The parameter σ 2 could be estimated as: (2) 2 ( ) ln( ) Once the parameter is estimated, simulations could be run to downscale M from 0.25° scale to 0.01°.This is done by recursively running the process six times (i.e., 64 times of division, to 0.00390625°), and then resampling to the 0.01° scale.

Regression Model
Physically, precipitation is affected by the general pattern of atmosphere circulation, distance from vapor sources (usually ocean), and land surface variables like terrain.We assume that the atmosphere circulation pattern is related to season and could be captured at the monthly scale.Thus, the relationship of monthly precipitation with location and terrain could be established.We tried both stepwise linear regression and an artificial neuron network (ANN) model to fit the multi-year average TRMM 3B43 precipitation data with elevation, longitude, and latitude.The MATLAB "stepwise" function is used to perform the stepwise multi-linear regression, while for ANN regression, a feedforward neural network with a three-node hidden layer is adopted, which was trained by the Levenberg-Marquardt backpropagation algorithm, with the ratio of train:validation:test set to 70:15:15.The ANN fitting for each month was run for 30 rounds and the one with the best performance was chosen as the final model.The resulting linear and ANN models were compared on three indexes: the co-relationship coefficient (R), the bias (Bias), and the root mean square error (RMSE).The latter two indexes are calculated as follows: where n is the total number of stations; i is the index of the station; pdi is the model downscaled precipitation of station i; and psi is the observed precipitation of station i.
The model with better performance is chosen to downscale the trend to 0.01° scale.This component is combined with the multifractal downscaled result at last to produce fine-scale precipitation estimates.

Validation
Observation data from 72 rain gauges within the study area were used to validate the downscaling results.Besides that, the spline interpolation methods [18] and the conventional multifractal method [43] were also implemented to provide a cross validation with the proposed approach.We chose the three indexes (R, Bias, and RMSE) as comparison criteria.

Regression Analysis
The R 2 of stepwise multi-linear regression models ranges between 0.51 and 0.74, with the exception of October, at 0.15, while it ranges between 0.78 and 0.97 for the ANN model (Figure 4).
The resulting multi-linear and ANN regression models are used to predict precipitation at the 0.25°, 0.5°, and 1.0° scales, which are compared with the original data resampled at corresponding scales, to check their predicting efficacy.Results show that the ANN model has better performance (lower RMSE and Bias, higher R).Thus, we will use the ANN model for the next step of downscaling (Figure 5).

Multifractal Analysis
The study area contains 32 × 32 grids of 0.25°, thus a total of six levels are available (with branch number b = 2) for parameter estimation.The regression between moment and scale was calculated within the first five moments.A linear relationship could obviously be seen between scale and moment on a log-log plot for both fields, indicating self-similar characteristics, while the detrended field has slightly higher R 2 value (Figure 6).τ(q), which reflects how the slope of these regress lines varies against q, differs from a straight line, which shows multiple scaling.However, the non-linearity is not quite significant (i.e., close to mono-scaling), and it differs between different years and months (Figure 7).c,d).Shows the linear relationship in log-log plot between empirical moment (τ(q)) and scale(λ) (a,c), and the curvature of the function τ(q) ~ q (b,d).
The parameter σ 2 showed strong dependency on the large-scale forcing (or the grid-averaged precipitation), which could be fitted with a power law curve (Figure 8).

Downscaling
The ensemble average of final downscaled monthly precipitation field is as shown in Figure 9.It can be seen that more details are revealed, while the total precipitation within each original coarse grid is preserved.We also checked the time series of downscaling results.The mean, and lower and upper confidence limit at 90% level were compared with the corresponding station data, with good accordance achieved (Figure 10).The results of cross validation with rain gauge data are demonstrated in Figure 11, while the three indexes are shown in Table 1.It can be seen that all three downscaling approaches achieved acceptable results, while their difference is quite small.

Figure 11.
Comparing the three downscaled precipitation fields with all 72 rain gauges in the study area."ANN" refers to the approach of downscaling the trend with ANN regression and the residual with interpolation; "MF" refers to the multifractal approach alone; "Combined" refers to the combined multifractal and regression approach.Figure 12 shows that our approach is able to reproduce the cumulative distribution of the downscaled precipitation field.

Discussion
Due to the heterogeneities and high nonlinearities of hydrologic processes, finer-resolution modeling is required, and scaling presents a fundamental challenge [68].The key concept of downscaling is to utilize extra information based on some kind of scale invariance.In the present work, two such information sources are analyzed and combined.On the one hand, heterogeneous factors like terrain are known to be related to precipitation.Thus, based on the assumption that the same regression relationship would hold at both coarse and fine scales, the regression model fitted with coarse scale data could be used to predict precipitation at fine field.On the other hand, behind the multifractal approach is another kind of scale invariance that is statistically self-similar, and widely used to characterize the statistical properties of precipitation across a wide range of temporal and spatial scales [69].A good downscaling approach should be able to use as much information as could be provided.The spatial regression method and multifractal method utilize different sources of information and it would be an advantage to combine them.In this study, we proposed such a spatial downscaling scheme for spatially downscaling satellite precipitation data.The spatial heterogeneity information above the coarse scale is accounted for by the long-term accumulation of precipitation, while the heterogeneity information below the coarse scale is accounted for by the regression model.In some sense, this approach could be seen as a substitution of the residual interpolation portion in Immerzeel et al. [18] with multifractal downscaling.We applied this model to disaggregate the TRMM 3B43 monthly precipitation data in a region of South China that is characterized by complex terrain and frequent monsoon activities.Due to the lack of fine-scale satellite data, the downloaded results were validated with rain gauge data (Table 1, Figures 10-12); the results prove that the proposed framework is robust, although it does not seem to outperform the original regression approach and the multifractal approach in the sense of R, RMSE, and Bias, which may be attributed to the lessened spatial variability of precipitation at monthly scale.However, compared with the original regression approach and the multifractal approach, this combined approach keeps a similar performance in downscaling the coarse data to a finer scale while being able to produce an ensemble which maintains the statistical distribution of precipitation field.This information could be important for understanding uncertainties in hydrological cycles [70].
In the regression analysis, we tried both stepwise linear model and an ANN model.The fitted models are verified on the three scales (Figure 5).The ANN model showed better performance and was chosen.As for regression variables, we did not use vegetation data as a factor because of the "saturate" effect [71,72], which means it is not a good indicator of precipitation in this area.In addition, many ecology studies usually relate vegetation information (NDVI) to precipitation.Thus if the precipitation product already includes the NDVI information, that kind of study would contain redundant information.However, the vegetation information could be quite useful in many precipitation downscaling applications [2].
The multifractal analysis of a coarse-scale field showed that a very good linear relationship exists between moment and scale (Figure 7), but the non-linearity of the τ(q) ~ q curve is not quite significant (i.e., close to mono-scaling), similar to the study of Gebremichael, et al. [73].This should be attributed mainly to the monthly temporal scale, which is the aggregation of a large number of precipitation events.Multifractality is more significant at a short time scale (e.g., a single precipitation event, or daily precipitation), while at the monthly scale, this multi-scaling characteristic becomes relatively weak and thus the efficacy is weakened.However, on the other side, a relatively long temporal accumulation is favored if we are to achieve obvious correlation with third-party factors.Thus, our approach bears the problem of mismatching of time scale of its two underpinned approaches.As for the parameter of the random cascade model, it has been pointed out that there is no universal model with fixed parameter values, but parameter values may vary as a function of the large-scale forcing [38].In our situation, a power law relationship was found between average precipitation and the random cascade model parameter σ 2 (Figure 8).

Conclusions
The main purpose of this study is to propose a framework for combining the multifractal and regression models for precipitation downscaling.The main merit of this framework is the ability to utilize different sources of information, which is the key to downscaling.Two scale invariance assumptions are made in this model: (1) the random portion at finer scale is assumed to be subject to the same distribution law as that of the coarse-scale field; (2) The regression model achieved at a coarse scale is assumed to still hold at a fine scale.More verification would be needed to insure good downscaling results.In fact, for such a compound approach to work well, both included approaches should be robust, while their combination should also be matched at the temporal or spatial scale.Other downscaling choices are also possible: for example, different kind of statistical regression models or different prediction variables, different multifractal models, or even different combinations.The choice of model depends on the specific issues evolved and information available.Thus, our work is only a first step, and further investigations are needed.
In addition, the proposed framework can not only be used to downscale historical precipitation data like in this study, but also for future prediction, e.g., for downscaling the output of Global Circulation Models (GCM) at regional scales [74].Moreover, it is also applicable in other areas of hydrology such as soil moisture downscaling [75].Thus it is widely applicable and would be helpful for hydrology and water resources management.

Figure 2 .
Figure 2. Schematic overview of the downscaling approach.

Figure 4 .
Figure 4. Regression results of TRMM 3B43 multi-year average with elevation, latitude, and longitude, showing the coefficient of determination (R 2 ).

Figure 6 .Figure 7 .
Figure 6.R 2 of the linear fitting between empirical moments and scales, for both original and detrended precipitation fields.The x-axis represents different orders of moment.

Figure 8 .
Figure 8. Relationship between averaged precipitation and the random cascade model parameter σ 2 .

Figure 9 .
Figure 9. Original and downscaled precipitation field, for the snapshot of November 1998 (a) and June 2013 (b).

Figure 10 .
Figure 10.Comparing downscaled precipitation field with rain gauge at specific locations.Shows the mean, 10 percentile, and 90 percentile of the corresponding downscaled ensemble fields of two gauge stations (106.46E, 25.26 N (a); 108.15 E, 27.57N (b)).

Figure 12 .
Figure 12.Comparing of cumulative distribution function between station precipitation data and downscaled data.A wet month (a) June 1998 and a dry month (b) November 2013 are shown.

Table 1 .
The three indexes of downscaling results with the three downscaling approaches.