Spatiotemporal Regression and Autoregression for Fusing Satellite Precipitation Data

Li, Xueming; Qian, Guoqi

doi:10.3390/engproc2025101001

Open AccessProceeding Paper

Spatiotemporal Regression and Autoregression for Fusing Satellite Precipitation Data^†

by

Xueming Li

and

Guoqi Qian

^*

School of Mathematics and Statistics, University of Melbourne, Parkville, VIC 3010, Australia

^*

Author to whom correspondence should be addressed.

^†

Presented at the 11th International Conference on Time Series and Forecasting, Canaria, Spain, 16–18 July 2025.

Eng. Proc. 2025, 101(1), 1; https://doi.org/10.3390/engproc2025101001

Published: 21 July 2025

Download

Browse Figures

Versions Notes

Abstract

Most existing precipitation data fusion methods rely on reliable precipitation values, such as those observed from ground-based rain gauges, to correct the satellite precipitation estimates (SPEs) that often involve systematic biases. However, such reliable data are rarely available in many regions of the world, especially in rugged terrain and hostile regions, rendering the correction suboptimal. To address this limitation, we propose a novel data fusion method—Triple Collocation Spatial Autoregression under Dirichlet distribution (TCSpAR-Dirichlet)—which eliminates the need for reliable data while still having the capability to effectively capture true precipitation patterns. The key idea in our method is using the variance of the precipitation estimates at each grid location obtained from each satellite to optimally leverage the associated satellite’s weight in data fusion, then characterizing the weights on all locations by a spatial autoregression model, and finally using the fitted weights to fuse the multi-sourced SPEs at all grid locations. We apply this method to SPEs in Nepal, which does not have ground gauges in many of its mountainous areas, to collect reliable precipitation data, to produce a fused precipitation dataset with uniform spatial coverage and high measurement accuracy.

Keywords:

data fusion; satellite precipitation estimation; spatial autoregression model; triple collocation; dirichlet distribution

1. Introduction

Precipitation is a key climate feature in building an early warning system for predicting natural hazard events such as floods, landslides, and wildfires. However, finding a reliable precipitation dataset with sufficient spatial coverage and historical completeness remains challenging due to the sparse and irregular distribution of instruments, such as rain gauge stations, for collecting reliable data. Satellite-derived precipitation estimates (SPEs), on the other hand, provide broad spatial coverage and continuous historical records but often lack accuracy. Our approach aims to integrate SPEs from different sources to obtain accurate precipitation data with uniform spatial coverage distribution and complete historical records.

Precipitation data obtained from different satellites exhibit systematic variability across diverse topographies, geographic locations, and seasons, primarily due to differences in their data retrieval algorithms in use. Thus, precipitation data so obtained require calibration or fusion in order for a valid subsequent analysis [1]. Most of the time, such calibration or fusion requires that reliable precipitation observations be available. For example, in the method of precipitation profiler-observation fusion and estimation (PPrOFusE) developed in [2], SPEs are fused with reliable gauge observations in two steps: in the first step a multiple linear regression model is employed to estimate the correlation effects between the SPEs and the reliable precipitation observations at each grid location; subsequently, a Spatial Autoregressive (SAR) Model is applied to these correlation effects so that the correlation estimates fitted from this SAR model can be used to fuse the SPEs with the reliable observations. PPrOFusE successfully blends the strengths of both gauge observations and SPEs, leveraging the accuracy of the former and the broader coverage of the latter.

However, reliable gauge observations are rarely available in many regions of the world, especially in mountainous and remote areas, posing difficulty in applying the aforementioned data fusion methods. The idea of collocation, i.e., fusion by weighted average, seems promising to tackle this difficulty in the absence of reliable observations. For detail, refer to the triple collocation (TC) method by [3] which fuses the less reliable data from three or more independent sources into a more reliable dataset so that the fused data have the smallest variance. Recently, the TC method has been used for data fusion or calibration for satellite or remote sensing data of various physical features, cf. [4,5,6,7].

In this paper, we are interested in building a reliable precipitation dataset for Nepal from not-so-reliable SPEs that are available in Nepal. Note that Nepal is a country having complex topography with highly variable elevations, and thus does not have ground-based gauges in many of its mountainous areas to collect reliable precipitation data. Therefore, it is natural to use the TC method to create fused data as the weighted averages of SPEs, with the weights being empirically optimized. But simply using TC for data fusion does not take into account the spatial dependence between all grid locations underlying the observed data. To address this issue, we propose to modulate the empirical weights at every grid location to follow a Dirichlet distribution and satisfy a (generalized) spatial autoregression (SAR) model. The modulated weights fitted based on the SAR model takes into account the aforementioned spatial dependence and are used to replace the preceding weights to update the fused data. It is easy to see that the proposed method, to be detailed in Section 3, can be abbreviated as TCSpAR-Dirichlet.

2. Data Sources

In this section, we describe the study area and three sources of satellite precipitation estimations (SPEs) displayed in Table 1.

2.1. Study Area

Our study concerns data fusion of precipitation over Nepal, a landlocked country with a territory covering 147,000 square kilometers. Nepal’s unique geographical position at the heart of South Asia renders it heavily influenced by the Indian Ocean monsoon, which brings significant rainfall during the summer season. Nepal is characterized by complex topography with elevations varying from 8848 m Himalayan mountain range in the northern part to 59 m in the southern lowlands within a short distance of about 160 km. This extreme topography results in unique Köppen–Geiger classifications [8] in Nepal as shown in Figure 1.

2.2. Global Satellite Mapping of Precipitation from Japanese Aerospace Exploration Agency

Japanese Aerospace Exploration Agency (JAXA) is Japan’s national air and space agency. JAXA developed the algorithm Global Satellite Mapping of Precipitation (GSMaP) to derive SPEs from multiple satellites such as the Tropical Rainfall Measuring Mission satellites (TRMMS) [9]. In this paper, we consider SPEs derived by GSMaP version 6, which provide hourly precipitation data with 0.1° × 0.1° spatial resolution, starting from 1 April 2000. However, we use only monthly data in Nepal.

2.3. Unified Precipitation Project by NOAA-CPC

Climate Prediction Center (CPC) in National Oceanic and Atmospheric Administration (NOAA), through its Unified Gauge-based Precipitation Project, completed a dataset of daily precipitation from January 1979 with global coverage of 0.5° × 0.5° resolution [10]. The most significant feature of this dataset is its integration of SPEs from all available sources with gauge-based quality control regarding atmospheric and hydrological operations and services. Here, we use only monthly data in Nepal.

2.4. Climate Hazards Group InfraRed Precipitation with Stations (CHIRPS) Data

Climate Hazards Group Infrared Precipitation with Stations (CHIRPS) data was developed based on infrared Cold Cloud Duration(CCD) observations. CHIRPS provides global daily and monthly precipitation with latitude between 50° S to 50° N (and all longitudes) at a spatial resolution of 0.05° × 0.05° [11]. While the dataset spans a long historical period from 1981 to the present, in this study, we only use monthly precipitation data over Nepal.

3. Methods

The objective is to find a weighted average of the three SPEs (i.e., NOAA-CPC, NOAA-CHIRPS, and JAXA-GSMaP) to best represent the true precipitation at each location. This is achieved in two steps by our proposed TCSpAR-Dirichlet method. First, triple collocation is used to compute the initial weights to minimize the variance of the initial weighted average at each location. Second, a spatial autoregression model with Dirichlet distributed response [12] and elevation as a secondary covariate is used to calibrate the initial weights for all locations simultaneously to obtain the final weights. Figure 2 presents an overview of the TCSpAR-Dirichlet framework.

3.1. Obtain Initial Weights by Triple Collocation

In this section, we use the TC method to determine a weighted average of the three SPEs that empirically best represent the true precipitations at each location. Suppose there are L spatial gridded points, and T time periods. Let

Y_{i, t}^{[1]}, Y_{i, t}^{[2]}, Y_{i, t}^{[3]}

be, respectively, the precipitation estimates from NOAA-CPC, NOAA-CHIRPS and JAXA-GSMaP that are assumed to be independently observed at location i and time t, with

i = 1, \dots, L

and

t = 1, \dots, T

. We denote the weighted average at location i and time t as

Y_{i, t}^{[A]} = ω_{i}^{[1]} Y_{i, t}^{[1]} + ω_{i}^{[2]} Y_{i, t}^{[2]} + ω_{i}^{[3]} Y_{i, t}^{[3]}

(1)

which would be used as a fused precipitation value at location i and time t. Here

ω_{i} = {(ω_{i}^{[1]}, ω_{i}^{[2]}, ω_{i}^{[3]})}^{⊤}

is a vector of weights of the SPEs, assumed time invariant and satisfying

ω_{i}^{[1]} + ω_{i}^{[2]} + ω_{i}^{[3]} = 1

. We then determine the weights by minimizing the variance of each

Y_{i, t}^{[A]}

, namely

\min_{ω_{i}} Var (Y_{i, t}^{[A]}) subject to \sum_{r = 1}^{3} ω_{i}^{[r]} = 1

(2)

It is easy to show that

\begin{matrix} Var (Y_{i, t}^{[A]}) & = Var (ω_{i}^{[1]} Y_{i, t}^{[1]} + ω_{i}^{[2]} Y_{i, t}^{[2]} + ω_{i}^{[3]} Y_{i, t}^{[3]}) \\ = {(ω_{i}^{[1]})}^{2} Var (Y_{i, t}^{[1]}) + {(ω_{i}^{[2]})}^{2} Var (Y_{i, t}^{[2]}) + {(ω_{i}^{[3]})}^{2} Var (Y_{i, t}^{[3]}) \\ = {(ω_{i}^{[1]})}^{2} Var (Y_{i, t}^{[1]}) + {(ω_{i}^{[2]})}^{2} Var (Y_{i, t}^{[2]}) + {(1 - ω_{i}^{[1]} - ω_{i}^{[2]})}^{2} Var (Y_{i, t}^{[3]}) \end{matrix}

(3)

where

Var (Y_{i, t}^{[r]})

, with

r = 1, 2, 3

, can be estimated from

\hat{Var} (Y_{i}^{[r]})

, the sample variance of the SPEs

Y_{i}^{[r]} = {(Y_{i, 1}^{[r]}, \dots, Y_{i, T}^{[r]})}^{⊤}

at location i from the r-th data source.

Replacing

\hat{Var} (Y_{i}^{[r]})

with

\hat{Var} (Y_{i}^{[r]})

in (3) and solving the constrained minimization problem in (2) by Lagranger multiplier, we obtain the weights estimate as

{\hat{ω}}_{i}^{[1]} = \frac{\hat{Var} {(Y_{i}^{[1]})}^{- 1}}{S_{i}}, {\hat{ω}}_{i}^{[2]} = \frac{\hat{Var} {(Y_{i}^{[2]})}^{- 1}}{S_{i}}, {\hat{ω}}_{i}^{[3]} = \frac{\hat{Var} {(Y_{i}^{[3]})}^{- 1}}{S_{i}}

(4)

where

S_{i} = \hat{Var} {(Y_{i}^{[1]})}^{- 1} + \hat{Var} {(Y_{i}^{[2]})}^{- 1} + \hat{Var} {(Y_{i}^{[3]})}^{- 1}

. From (4) we see each obtained

{\hat{ω}}_{i}^{[r]}

is inversely proportional to the sample variance of the r-th SPEs sample at location i, which makes sense because the sample having smaller variation should contributes more to reducing variation and improving accuracy in the fused data. However, these weights are computed independently across locations without considering potential spatial dependence among the SPEs from different locations. The following section introduces a spatial modeling framework that can characterize the spatial dependence in the weights.

3.2. Spatial Autoregressive Model with Dirichlet Distributed Data

Since the L weight vectors

{\hat{ω}}_{i} = {({\hat{ω}}_{i}^{[1]}, {\hat{ω}}_{i}^{[2]}, {\hat{ω}}_{i}^{[3]})}^{⊤}

obtained in Section 3.1 satisfy

{\hat{ω}}_{i}^{[1]} + {\hat{ω}}_{i}^{[2]} + {\hat{ω}}_{i}^{[3]} = 1

, we can treat them as compositional data [13,14]. Therefore, we can characterize them by a spatial autoregression (SAR) model designed for compositional data, as given in [12]. Thereafter, we can calibrate

{\hat{ω}}_{i}

’s to improve data fusion (1).

Without losing specifics, let

{\hat{ω}}_{i} = {({\hat{ω}}_{i}^{[1]}, \dots, {\hat{ω}}_{i}^{[d]})}^{⊤}

,

i = 1, \dots, L

, be L realizations of a

d \times 1

compositional random vector each taking values in a standard

(d - 1)

-simplex

L^{d} = {{\hat{ω}}_{i} : {\hat{ω}}_{i}^{[r]} > 0, r \in {1, 2, 3}, and {\hat{ω}}_{i}^{[1]} + {\hat{ω}}_{i}^{[2]} + {\hat{ω}}_{i}^{[3]} = 1}, i = 1, \dots, L

(5)

Note that

L^{d} \subset R^{d}

and

d = 3

in this paper. Without information a prior it is reasonable to assume

{\hat{ω}}_{i}

follows a Dirichlet distribution with concentration parameter vector

α_{i} = {(α_{i 1}, \dots, α_{i d})}^{⊤} \in R^{d +}

and density function

f ({\hat{ω}}_{i} | α_{i}) = {(\prod_{j = 1}^{d} Γ (α_{i j}))}^{- 1} Γ (\sum_{j = 1}^{d} α_{i j}) \prod_{j = 1}^{d} {({\hat{ω}}_{i}^{[j]})}^{α_{i j} - 1}

(6)

where

Γ (\cdot)

is the gamma function. For the interpretability of the model, an alternative parameterization [15] can be applied to the concentration parameter vector

α_{i}

:

α_{i} = ϕ_{i} μ_{i}

, where

ϕ_{i} \in R

is a scalar known as precision parameter and

μ_{i} \in L^{d}

equals

E ({\hat{ω}}_{i})

. Given a fixed

μ_{i}

, the smaller

ϕ_{i}

, the more likely

{\hat{ω}}_{i}

is distributed around the edges of simplex

L^{d}

whereas a larger

ϕ_{i}

increases the likelihood that

{\hat{ω}}_{i}

is concentrated around

μ_{i}

.

Spatial dependence in

{\hat{ω}}_{i}

’s among all L locations may be described by a SAR model which also can be extended to a spatial autoregression and regression (SAR-X) model to take into account the effects of covariates on the mean parameters

μ_{i}

’s and precision parameters

ϕ_{i}

’s at all L locations. Let

X \in R^{L \times K}

be the data matrix of the K covariates associated with

μ_{i}

’s, and

Z \in R^{L \times K_{z}}

be the data matrix of the

K_{z}

covariates associated with

ϕ_{i}

’s,

i = 1, \dots, L

. For the Nepal precipitation data used in this paper,

X \in R^{L \times 1}

gives the elevation value at each location and

Z = 1_{L}

an

L \times 1

vector of 1’s. Then, the SARX model for compositional data

{{\hat{ω}}_{i}, i = 1, \dots, L}

has the following system form

μ_{i j} = \frac{exp {{[{(I_{L} - ρ W)}^{- 1} X B]}_{i j}}}{\sum_{j^{'} = 1}^{d} exp {{[{(I_{L} - ρ W)}^{- 1} X B]}_{i j^{'}}}} and ϕ_{i} = exp {{[Z γ]}_{i}}, i = 1, \dots, L; j = 1, \dots, d

(7)

where

ρ \in (- 1, 1), B \in R^{K \times d}

and

γ \in R^{K_{z}}

are unknown parameters expressing the spatial dependence effect, the effect of X and the effect of Z, respectively. Here,

I_{L}

is an

L \times L

identity matrix and W is a pre-defined row-sum-equal-to-1 spatial weight matrix. Also

{[V]}_{i j}

(or

{[V]}_{i}

) represents the [row-i, column-j] (or row-i) element in matrix (or vector) V.

To estimate the unknown parameters by the maximum likelihood method, we first provide the log-likelihood function for the SAR-X model:

ℓ (ρ, B, γ; {\hat{ω}}_{1}, \dots, {\hat{ω}}_{L}) = \sum_{i = 1}^{L} ℓ_{i} (ρ, B, γ; {\hat{ω}}_{i}) = \sum_{i = 1}^{L} log f ({\hat{ω}}_{i} | ϕ_{i}, μ_{i}) = \sum_{i = 1}^{L} \{log Γ (ϕ_{i}) - \sum_{j = 1}^{d} log Γ (ϕ_{i} μ_{i j}) + \sum_{j = 1}^{d} (ϕ_{i} μ_{i j} - 1) log ({\hat{ω}}_{i}^{[j]})\}

(8)

knowing that

\sum_{j = 1}^{d} μ_{i j} = 1

and

μ_{i j}

’s and

ϕ_{i}

’s are detailed in (7). We then need the first and second derivatives of the log-likelihood function with respect to each parameter. For simplicity of presentation, denote

ψ (x) = \frac{\partial}{\partial x} log Γ (x)

,

β_{m n} = {[B]}_{m n}

,

M = I_{L} - ρ W

,

{\tilde{X}}_{i m} = {[M^{- 1} X]}_{i m}

and

\tilde{M} = \frac{\partial M^{- 1}}{\partial ρ} = M^{- 1} W M^{- 1}

. It is not difficult to show that, for

i \in {1, \dots, L}

,

j \in {1, \dots, d}

,

m \in {1, \dots, K}

and

n \in {1, \dots, d}

\begin{matrix} \frac{\partial μ_{i j}}{\partial β_{m n}} & = & \{\begin{matrix} - μ_{i n} μ_{i j} {\tilde{X}}_{i m} & when n \neq j \\ μ_{i n} (1 - μ_{i n}) {\tilde{X}}_{i m} & when n = j \end{matrix} \end{matrix}

(9)

\begin{matrix} \frac{\partial μ_{i j}}{\partial ρ} & = & μ_{i j} {[\tilde{M} X B]}_{i j} - μ_{i j} \sum_{j^{'} = 1}^{d} μ_{i j^{'}} {[\tilde{M} X B]}_{i j^{'}} \end{matrix}

(10)

By (7)–(10) and

\frac{\partial ϕ_{i}}{\partial γ_{k}} = ϕ_{i} {[Z]}_{i k}

, we obtain the first derivatives of the log-likelihood function

ℓ (ρ, B, γ; {\hat{ω}}_{1}, \dots, {\hat{ω}}_{L})

with respect to all parameters: for

m \in [1, \dots, K]

,

n \in [1, \dots, d]

and

k \in [1, \dots, K_{Z}]

\begin{matrix} \frac{\partial ℓ}{\partial β_{m n}} & = & \sum_{i = 1}^{L} \{ϕ_{i} μ_{i n} {\tilde{X}}_{i m} [\sum_{j = 1}^{d} μ_{i j} [ψ (ϕ_{i} μ_{i j}) - log ({\hat{ω}}_{i}^{[j]})] - ψ (ϕ_{i} μ_{i n}) + log ({\hat{ω}}_{i}^{[n]})]\} \end{matrix}

(11)

\begin{matrix} \frac{\partial ℓ}{\partial γ_{k}} & = & \sum_{i = 1}^{L} \{ϕ_{i} {[Z]}_{i k} [ψ (ϕ_{i}) + \sum_{j = 1}^{d} μ_{i j} [log ({\hat{ω}}_{i}^{[j]}) - ψ (ϕ_{i} μ_{i j})]]\} \end{matrix}

(12)

\begin{matrix} \frac{\partial ℓ}{\partial ρ} & = & \sum_{i = 1}^{L} \{ϕ_{i} \sum_{j = 1}^{d} μ_{i j} \{[{[\tilde{M} X B]}_{i j} - \sum_{j^{'} = 1}^{d} μ_{i j^{'}} {[\tilde{M} X B]}_{i j^{'}}] [\log ({\hat{ω}}_{i}^{[j]}) - ψ (ϕ_{i} μ_{i j})]\}\} \end{matrix}

(13)

The second derivatives of the log-likelihood function can be obtained in a similar way, but more cumbersome. Therefore, we employ the algorithm of limited-memory Broyden–Fletcher–Goldfarb–Shanno with box constraints (L-BFGS-B) to maximize the log-likelihood, where only an approximation of the second derivatives is needed. Denote

\hat{B}

,

\hat{γ}

and

\hat{ρ}

as the resultant estimates of

B

,

γ

and

ρ

, respectively. We then substitute these estimates into (7), resulting

{\hat{μ}}_{i} = {({\hat{μ}}_{i 1}, \dots, {\hat{μ}}_{i d})}^{⊤}

,

{\hat{ϕ}}_{i}

and

{\hat{α}}_{i} = {\hat{ϕ}}_{i} {\hat{μ}}_{i}

for

i = 1, \dots, L

. Given the estimated concentration parameter vector, we can characterize the distribution of

\hat{ω i}

. We use the expectation of Dirichlet distribution with parameter vector

{\hat{α}}_{i}

as the new weight vector at location i, i.e.,

{\tilde{ω}}_{i}^{[r]} = E ({\hat{ω}}_{i}^{[r]}) = {\hat{α}}_{i r} {(\sum_{k = 1}^{d} {\hat{α}}_{i k})}^{- 1}

with

r = 1, \dots, d

. To justify this choice, we generate samples from the fitted Dirichlet distribution and find that the sample variances are consistently low, and the sample means closely match the corresponding theoretical means. These results suggest that the Dirichlet distribution is sharply concentrated around its mean, supporting the use of the expectation as a stable and reliable point estimate. We then successfully update the initial empirical weights with spatially informed calibrations. Furthermore, we can also predict the weight vector at an unobserved location h,

{\tilde{ω}}_{h} = {({\tilde{ω}}_{h}^{[1]}, \dots, {\tilde{ω}}_{h}^{[d]})}^{⊤}

, given the explanatory covariates at location h,

x_{h} \in R^{K}

,

z_{h} \in R^{K_{z}}

and spatial weight vector

w_{h}

.

4. Results

In this study, we apply our method to precipitation data over Nepal. We use the SPEs of NOAA-CPC, NOAA-CHIRPS, and JAXA-GSMaP as described in Section 2 as the input datasets to obtain the initial fusion weights. A spatial resolution of 1° × 1° is selected here, resulting in

L = 12

locations across Nepal. This resolution is coarser than the ones available in the three satellite data sources, but it reduces computational cost and thus is suitable for method demonstration. We focus on monthly total precipitation from April 2000 to December 2023, resulting in

T = 285

time points per location.

In applying the SAR-X model, we construct the spatial weight matrix based on the distance of k-nearest neighbors, where the number of neighbors is

k = 3

. Nepal has a relatively small north-south extent, but exhibits significant elevation variations [16]. The topography is highly diverse, ranging from the low-lying Terai plains at 60 m above sea level in the south to the towering peak of Mount Everest at 8848 m in the north-east. We believe that elevation plays an important role in shaping precipitation in Nepal. Consequently, we select elevation as the covariate in our SAR-X model. Specifically, the covariate data matrix

X \in R^{L \times 2}

includes an intercept column and a column of elevation values for each location. The other covariate matrix Z is chosen as a column of 1’s for simplicity.

The optimal weights obtained by triple collocation in Section 3.1, i.e., Equation (4), are shown in Figure 3. As shown, these initial weight vectors across the 12 locations reveal a highly scattered and irregular pattern, without clear spatial structure or continuity. This is expected, as these weights are computed independently at each location. As a result, the spatial distribution of the weights appears irregular and lacks spatial smoothness. Such an irregular spatial pattern of the weights not only reduces the interpretability of precipitation distribution but also limits the model’s ability to support subsequent needs, such as predicting precipitation at unobserved locations. To address this underperformance, we apply a SAR-X model to capture the spatial dependence structure.

We then estimate the unknown parameters involved in the SAR-X model with Dirichlet distributed response by the L-BFGS-B algorithm and the initial weight vectors from triple collocation, the results of which are displayed in Table 2. We substitute these parameter estimates into (7) to obtain the estimates

{\hat{μ}}_{i}

’s and

{\hat{ϕ}}_{i}

’s, whereafter we compute the concentration parameter estimate

{\hat{α}}_{i}

at each location. Next, we modulate each initial weight vector

{\hat{ω}}_{i}

by

{\tilde{ω}}_{i}

that is the expectation of Dirichlet distribution with concentration parameter

{\hat{α}}_{i}

. We also interpolate the weight vectors to a finer spatial resolution of 0.5° × 0.5°, using a new covariate matrix

X_{n e w}

that contains elevation data for new grid points and a new spatial weight matrix

W_{n e w}

in the estimated SAR-X model. Figure 4 displays the updated weights of the three satellite products at the finer resolution, showing clear spatial dependence patterns. Combining with the elevation map of Nepal in Figure 1, it is evident that NOAA-CHIRPS performs more reliably in lower-elevation areas, while JAXA-GSMaP shows greater accuracy in mountainous regions. Meanwhile, NOAA-CPC exhibits the smallest variability among the three.

Consequently, given the modulated weights and three SPEs as input, we can construct a complete and reliable monthly precipitation dataset over Nepal through (1), with a spatial resolution of 0.5° × 0.5°, covering the period from April 2000 to December 2023. Figure 5 presents the monthly fused precipitation in 2022, which serves as an example to illustrate how our method successfully captures the seasonal precipitation pattern. It highlights the June-October period with the highest precipitation, as well as the November-May period corresponding to the dry winter. This outcome aligns with the fact that the majority of Nepal precipitation occurs during the Asian summer monsoon season [17]. Based on Figure 1, we can conclude that the Himalayan mountains act as a natural barrier, blocking the monsoon precipitation from the Indian Ocean. Additionally, the polar climate in high-elevation areas leads to lower temperatures and reduced precipitation, primarily in the form of snow [16]. Our result also clearly captures the impact of topography on precipitation patterns, with higher precipitation in the lowlands and lower precipitation in mountainous regions, which is consistent with the geographical features of Nepal.

Given that the complete gauge station observations of Nepal are publicly unavailable, we instead use a publicly accessible small dataset—June and August 2022 Preliminary Precipitation Summary—to assess the efficacy of TCSpAR-Dirichlet. This dataset is available at https://www.dhm.gov.np/climate-services/climate-reports/seasonal-reports (accessed on 5 May 2024), which is kindly provided by the Department of Hydrology and Meteorology of Nepal. Figure 6 compares the fused precipitation obtained from TCSpAR-Dirichlet with the observations for June and August 2022 from 87 gauges in Nepal, from which we conclude that our approach successfully captures the relatively high precipitation in the central region of Nepal, which is also supported by the gauge observations. However, it is important to note that gauge stations provide point-level measurements at irregular and sparse locations, while our results are aggregated over gridded regions at a 0.5° × 0.5° resolution, i.e., each grid covers a broad area. Therefore, it is hard to compare the precipitation observed at a gauge station with the average precipitation within each grid cell, especially in some areas where rainfall is highly variable over short distances, which may result in discrepancies between our results and the gauge observations. Despite this difficulty, the relevant spatial pattern remains consistent: the southern region receives less precipitation than the central region, and the northern region experiences the least overall. Furthermore, during the Monsoon Season, our method tends to underestimate precipitation levels, especially in the central area. This coincides with the findings that a striking weakness of SPEs is a severe underestimation of heavy precipitation happening across the central region of Nepal in a recent study [16].

5. Discussion

In this study, we propose TCSpAR-Dirichlet, a novel method for fusing three independent satellite precipitation estimates (SPEs), and apply it to real-world precipitation data in Nepal. Our fused dataset provides comprehensive spatial coverage and a complete monthly historical record from April 2000 to December 2023. TCSpAR-Dirichlet effectively captures both the spatial patterns and seasonality across different topographies, especially during the monsoon season. Furthermore, the method can be easily extended to incorporate more SPEs. Beyond precipitation, its framework is also applicable to the fusion of various types of meteorological data beyond precipitation. Despite these strengths, the fused dataset tends to underestimate precipitation in regions that experience extreme rainfall. One potential reason is that the method relies only on the satellite estimates; if any individual SPE shows a highly non-linear relation with the ground truth, it can significantly reduce the overall accuracy of the final output. In the absence of calibration with ground-based gauge observations, TCSpAR-Dirichlet may accumulate multiple prediction errors. Overall, while TCSpAR-Dirichlet demonstrates promising results, especially in improving spatial coherence, there remains considerable room for further improvements, including integration with ground truth data, advanced error calibration, or the adoption of more flexible fusion architectures.

Author Contributions

Conceptualization, X.L. and G.Q.; methodology, X.L. and G.Q.; simulation, X.L.; validation, X.L. and G.Q.; data curation, X.L. and G.Q.; writing—original draft preparation, X.L.; writing—review and editing, X.L. and G.Q.; visualization, X.L.; supervision, G.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

JAXA precipitation data is provided at https://sharaku.eorc.jaxa.jp/GSMaP/ (accessed on 2 May 2024). NOAA-CPC data is provided at https://psl.noaa.gov/data/gridded/data.cpc.globalprecip.html (accessed on 2 May 2024). NOAA-CHIRPS data is provided at https://coastwatch.pfeg.noaa.gov/erddap/griddap (accessed on 2 May 2024).

Acknowledgments

Xueming Li’s work on this paper was supported by a Graduate Research Training Scholarship from the University of Melbourne.

Conflicts of Interest

The authors declare no conflicts of interest.

References

McColl, K.A.; Vogelzang, J.; Konings, A.G.; Entekhabi, D.; Piles, M.; Stoffelen, A. Extended triple collocation: Estimating errors and correlation coefficients with respect to an unknown target. Geophys. Res. Lett. 2014, 41, 6229–6236. [Google Scholar] [CrossRef]
Hines, B.; Qian, G.; Tordesillas, A. Mapping Australia’s precipitation: Harnessing the synergies of multi-satellite remote sensing and gauge network data. GISci. Remote Sens. 2022, 59, 2084–2110. [Google Scholar] [CrossRef]
Stoffelen, A. Toward the true near-surface wind speed: Error modeling and calibration using triple collocation. J. Geophys. Res. Ocean. 1998, 103, 7755–7766. [Google Scholar] [CrossRef]
Yilmaz, M.T.; Crow, W.T.; Anderson, M.C.; Hain, C. An objective methodology for merging satellite- and model-based soil moisture products. Water Resour. Res. 2012, 48, W11502. [Google Scholar] [CrossRef]
Roebeling, R.A.; Wolters, E.L.A.; Meirink, J.F.; Leijnse, H. Triple Collocation of Summer Precipitation Retrievals from SEVIRI over Europe with Gridded Rain Gauge and Weather Radar Data. J. Hydrometeorol. 2012, 13, 1552–1566. [Google Scholar] [CrossRef]
Tang, G.; Clark, M.P.; Papalexiou, S.M.; Ma, Z.; Hong, Y. Have satellite precipitation products improved over last two decades? A comprehensive comparison of GPM IMERG with nine satellite and reanalysis datasets. Remote Sens. Environ. 2020, 240, 111697. [Google Scholar] [CrossRef]
Paul, S.; Alemohammad, H. Examining the performance of precipitation products in characterizing the Indian summer monsoon rainfall (ISMR) using triple collocation. J. Hydrol. 2025, 657, 133136. [Google Scholar] [CrossRef]
Beck, H.E.; Zimmermann, N.E.; McVicar, T.R.; Vergopolan, N.; Berg, A.; Wood, E.F. Present and future Köppen-Geiger climate classification maps at 1-km resolution. Sci. Data 2018, 5, 180214. [Google Scholar] [CrossRef] [PubMed]
Kachi, M.; Kubota, T.; Ushio, T.; Shige, S.; Kida, S.; Aonashi, K.; Okamoto, K.; Oki, R. Development and utilization of “jaxa global rainfall watch” system based on combined microwave and infrared radiometers aboard satellites. IEEJ Trans. Fundam. Mater. 2011, 131, 729–737. [Google Scholar] [CrossRef]
Xie, P.; Chen, M.; Shi, W. CPC unified gauge-based analysis of global daily precipitation. In Proceedings of the 24th Conference on Hydrology, Atlanta, GA, USA, 17–21 January 2009; Volume 2. [Google Scholar]
Funk, C.; Peterson, P.; Landsfeld, M.; Pedreros, D.; Verdin, J.; Shukla, S.; Husak, G.; Rowland, J.; Harrison, L.; Hoell, A.; et al. The climate hazards infrared precipitation with stations—A new environmental record for monitoring extremes. Sci. Data 2015, 2, 150066. [Google Scholar] [CrossRef] [PubMed]
Nguyen, T.; Moka, S.; Mengersen, K.; Liquet, B. Spatial Autoregressive Model on a Dirichlet Distribution. arXiv 2024, arXiv:2403.13076. [Google Scholar]
Aitchison, J.; Shen, S.M. Logistic-normal distributions:Some properties and uses. Biometrika 1980, 67, 261–272. [Google Scholar] [CrossRef]
Aitchison, J. The Statistical Analysis of Compositional Data. J. R. Stat. Soc. Ser. (Methodol.) 1982, 44, 139–177. [Google Scholar] [CrossRef]
Maier, M. DirichletReg: Dirichlet Regression for Compositional Data in R; WU Vienna University of Economics and Business: Vienna, Austria, 2014. [Google Scholar]
Sharma, S.; Khadka, N.; Hamal, K.; Shrestha, D.; Talchabhadel, R.; Chen, Y. How Accurately Can Satellite Products (TMPA and IMERG) Detect Precipitation Patterns, Extremities, and Drought Across the Nepalese Himalaya? Earth Space Sci. 2020, 7, e2020EA001315. [Google Scholar] [CrossRef]
Müller, M.F.; Thompson, S.E. Bias adjustment of satellite rainfall data through stochastic modeling: Methods development and application to Nepal. Adv. Water Resour. 2013, 60, 121–134. [Google Scholar] [CrossRef]

Figure 1. Elevation map of Nepal (left); Köppen–Geiger Climate Classifications of Nepal (right).

Figure 2. Flow Chart of TCSpAR-Dirichlet Process.

Figure 3. Mappings of the merging weights (NOAA-CPC: left, NOAA-CHIRPS: middle, JAXA-GSMaP: right) for 12 locations.

Figure 4. Mappings of the interpolated optimal weights (NOAA-CPC: left, NOAA-CHIRPS: middle, JAXA-GSMaP: right) for 55 locations.

Figure 5. Mappings of monthly fused precipitation estimates of Nepal in 2022.

Figure 6. Mappings of monthly fused precipitation estimates of Nepal (left column); Mappings of monthly gauge observations (right column) in June and August in 2022.

Table 1. Satellite Precipitation Estimates Information.

SPE	NOAA-CPC	NOAA-CHIRPS	JAXA-GSMaP
Spatial Resolution	0.5° × 0.5°	0.05° × 0.05°	0.1° × 0.1°
Temporal Resolution	Daily	Hourly	Daily
Start Date	1 January 1979	1 January 1981	1 April 2000

Table 2. Parameter Estimations of Spatial Autoregressive Model under Dirichlet Distribution.

Parameter	Component 1 (base)	Component 2	Component 3
${\hat{β}}_{1}$	0	0.1505	−0.0515
${\hat{β}}_{2}$	0	$- 5.9997 \times 10^{- 5}$	$5.1662 \times 10^{- 5}$
$\hat{γ}$		3.0542
$\hat{ρ}$		0.8850

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, X.; Qian, G. Spatiotemporal Regression and Autoregression for Fusing Satellite Precipitation Data. Eng. Proc. 2025, 101, 1. https://doi.org/10.3390/engproc2025101001

AMA Style

Li X, Qian G. Spatiotemporal Regression and Autoregression for Fusing Satellite Precipitation Data. Engineering Proceedings. 2025; 101(1):1. https://doi.org/10.3390/engproc2025101001

Chicago/Turabian Style

Li, Xueming, and Guoqi Qian. 2025. "Spatiotemporal Regression and Autoregression for Fusing Satellite Precipitation Data" Engineering Proceedings 101, no. 1: 1. https://doi.org/10.3390/engproc2025101001

APA Style

Li, X., & Qian, G. (2025). Spatiotemporal Regression and Autoregression for Fusing Satellite Precipitation Data. Engineering Proceedings, 101(1), 1. https://doi.org/10.3390/engproc2025101001

Article Menu

Spatiotemporal Regression and Autoregression for Fusing Satellite Precipitation Data^†

Abstract

1. Introduction

2. Data Sources

2.1. Study Area

2.2. Global Satellite Mapping of Precipitation from Japanese Aerospace Exploration Agency

2.3. Unified Precipitation Project by NOAA-CPC

2.4. Climate Hazards Group InfraRed Precipitation with Stations (CHIRPS) Data

3. Methods

3.1. Obtain Initial Weights by Triple Collocation

3.2. Spatial Autoregressive Model with Dirichlet Distributed Data

4. Results

5. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Spatiotemporal Regression and Autoregression for Fusing Satellite Precipitation Data †

Abstract

1. Introduction

2. Data Sources

2.1. Study Area

2.2. Global Satellite Mapping of Precipitation from Japanese Aerospace Exploration Agency

2.3. Unified Precipitation Project by NOAA-CPC

2.4. Climate Hazards Group InfraRed Precipitation with Stations (CHIRPS) Data

3. Methods

3.1. Obtain Initial Weights by Triple Collocation

3.2. Spatial Autoregressive Model with Dirichlet Distributed Data

4. Results

5. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Spatiotemporal Regression and Autoregression for Fusing Satellite Precipitation Data^†