# Correction of Fused Rainfall Data Based on Identification and Exclusion of Anomalous Rainfall Station Data

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Study Area and Data

## 3. Research Methods

#### 3.1. Method for Anomalous Station Identification

- Selection of reference stations and exclusion of obviously anomalous stations. During this step, Hampel’s method and an improved Grubbs’ test were used to identify anomalous stations.
- The surrounding stations’ analyses, where hourly rainfall data of a station were compared to that of an adjacent reference station to ascertain whether the data were anomalous.
- Radar-assisted validation, to validate the selection of anomalous stations.

#### 3.1.1. Reference Station Determination

- Hampel’s method

_{i}is a value in the data series X; Median is the median of X; MAD (median absolute deviation) is the median of the data set Y; X = {x

_{1}, x

_{2}, …, x

_{n}} is the rainfall-data sequence of the measurement station; and Y = {y

_{1}, y

_{2}, …, y

_{n}} = {x

_{1}-median, x

_{2}-median, …, x

_{n}-median}. When the value of Z

_{i}(i = 1, 2, …, n) is >2.24, X

_{i}is determined to be an anomalous station and i is the anomalous time of that station.

- ii.
- Grubbs’ test

_{1}, x

_{2}, …, x

_{n}), which was sorted in ascending order. The value G

_{0}for the critical coefficient G

_{(a, n)}was obtained from the critical value table (Table 1). The significance level was denoted by a and its value was adopted as 0.05 in this study. Next, G

_{1}and G

_{n}were calculated as follows:

_{1}= (X

_{m}− x

_{1})/σ,

_{n}= (x

_{n}− X

_{m})/σ,

_{m}is the median of the sample, and σ is the standard deviation. G

_{1}and G

_{n}are statistical quantities.

_{1}≥ G

_{n}and G

_{1}> G

_{0}, x

_{1}is determined to be an outlier and is rejected; if G

_{n}≥ G

_{1}and G

_{n}> G

_{0}, x

_{n}is an outlier and is rejected; if G

_{1}< G

_{0}and G

_{n}< G

_{0}, then there is no outlier. If there is an outlier, it is removed and recalculated using the rainfall values of the remaining stations, and the above steps are repeated until there is no outlier.

#### 3.1.2. The Surrounding Stations’ Analysis

#### 3.1.3. Radar-Assisted Validation

- The standard for verifying the determination results of a station was rainfall that occurred when the low-elevation reflectivity of the radar exceeded the 20 dBZ threshold. This means that if the radar detected rainfall at a certain station, it would verify the determination results of that station.
- The rainfall amount recorded at the station was compared with the radar-estimated rainfall intensity. If the station’s rainfall amount was significantly different from the radar-estimated amount, then it was considered anomalous.
- For a station located at the boundaries of a rainfall–non-rainfall area or a rainfall area with large variations in rainfall intensities, the determination results were verified using the spatial reflectivity gradient. This involves examining the changes in reflectivity over a certain distance and determining whether the station’s rainfall amount was consistent with the observed reflectivity changes.

#### 3.2. Methods for Rainfall-Data Fusion

#### 3.2.1. OI

_{i}is the rainfall intensity; a, r, and g are the analytical value, initial radar-estimated value, and rain gauge-observed value, respectively; n and k are the number of rain gauges and the station’s ordinal number, respectively; and P

_{k}is the weight factor.

_{ij}is the correlation function between two points i and j, and η

_{i}is the relative mean square error (RMSE) of the observed value at the ith gauge (the actual calculated value is usually 0).

_{ij}is the distance between i and j.

#### 3.2.2. KED

_{0}is calculated by a linear estimator, whose weights are given by:

_{i}

_{.}

#### 3.2.3. Distance-Weighted Spatial Interpolation Using Coupled Radar–Gauge Rainfall Data

#### 3.2.4. LOOCV-Based Evaluation of Rainfall Data-Fusion Methods’ Performances

- BIAS: difference between an individual measured value and the average measured value, which is used to determine the precision of the measured results.$$\mathrm{B}\mathrm{I}\mathrm{A}\mathrm{S}=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}}\left({R}_{i}-{\widehat{R}}_{i}\right).$$
- RMSE: square root of the deviation between the predicted and ground-truth values. In this study, the ground-truth values were the rainfall values that were obtained after the anomalous data were excluded.$$\mathrm{R}\mathrm{M}\mathrm{S}\mathrm{E}=\sqrt{\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}}{\left({R}_{i}-{\widehat{R}}_{i}\right)}^{2}}.$$
- MRTE: mean root transformation error. If the assigned weights are small, MRTE can decrease the principal error associated with high rainfall values.$$\mathrm{M}\mathrm{R}\mathrm{T}\mathrm{E}=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}}{\left(\sqrt{{R}_{i}}-\sqrt{{\widehat{R}}_{i}}\right)}^{2},$$

## 4. Results

#### 4.1. Effects of Anomaly Identification and Exclusion

#### 4.2. Performances of Rainfall Data-Fusion Methods

## 5. Discussion

## 6. Conclusions

- By conducting anomalous station identification using Hampel’s method and Grubbs’ test (i.e., “reference station determination”) on four typical rainfall events, it was found that the 08:00–19:00 July 3 event had the highest number of anomalous stations (11.5% of all anomalous stations), while the 01:00–17:00 August 9 event had the smallest number of anomalous stations (7.8% of all anomalous stations). By comparing the anomalous stations that were detected for each rainfall event to the stations that were known to be anomalous, it was determined that the accuracy of reference station determination was 94.2%.
- Radar-assisted validation increased the average accuracy of anomaly identification for the four typical rainfall events from 89.7 to 93.7%. Hence, this method is suitable for identifying false positives in challenging areas (i.e., the boundary between rainy and non-rainy areas, and areas that contain large variations in rainfall intensity).
- By analysing box plots for the performance indicators of each data-fusion method in four rainfall events, KED was found to be the best performing method for rainfall-data fusion. FAR was the second best method, and was only slightly less effective than KED.
- The exclusion of anomalous stations had a pronounced impact on the results of rainfall-data fusion, as it improved the quality of the rainfall station data. We found that 95% of the performance indicators were improved by the exclusion of anomalous data. In scatter diagrams comparing rainfall station data to rainfall estimates derived from fused rainfall products, it was found that the exclusion of anomalous data had the greatest impact on the OI and KED products, with the scatter points much closer to the 1/1 line. In other words, anomalous data exclusion, which improves the quality of rainfall station data, is a very effective way to improve the quality of fused rainfall products.
- A method combining Hampel and Grubbs criterion was used to determine the reference station, by which to identify the surrounding measuring stations. Using radar-assisted inspection, the vast majority of abnormal rainfall data could be eliminated, which greatly improved the quality of rainfall monitoring by rainfall stations. This method obtains high-resolution and high-precision rainfall fusion products by using high-quality data to carry out rainfall fusion. Finally, it will provide strong support for flash flood disaster forecasting and early warning.

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Zhang, J.; Zhou, C.; Xu, K.; Watanabe, M. Flood disaster monitoring and evaluation in China. Glob. Environ. Chang. B
**2002**, 4, 33–43. [Google Scholar] [CrossRef] - Smith, D.D.; Wischmeier, W.H. Rainfall erosion. Adv. Agron.
**1962**, 14, 109–148. [Google Scholar] [CrossRef] - Georgakakos, K.P.; Modrick, T.M.; Shamir, E.; Campbell, R.; Cheng, Z.; Jubach, R.; Sperfslage, J.A.; Spencer, C.R.; Banks, R. The flash flood guidance system implementation worldwide: A successful multidecadal research-to-operations effort. Bull. Am. Meteorol. Soc.
**2022**, 103, E665–E679. [Google Scholar] [CrossRef] - Wang, Y.; Liu, R.; Guo, L.; Tian, J.; Zhang, X.; Ding, L.; Wang, C.; Shang, Y. Forecasting and providing warnings of flash floods for ungauged mountainous areas based on a distributed hydrological model. Water
**2017**, 9, 776. [Google Scholar] [CrossRef] [Green Version] - Ntajal, J.; Lamptey, B.L.; Mahamadou, I.B.; Nyarko, B.K. Flood disaster risk mapping in the lower Mono river basin in Togo, West Africa. Int. J. Disaster Risk Reduct.
**2017**, 23, 93–103. [Google Scholar] [CrossRef] - Wilson, J.W. Integration of radar and raingage data for improved rainfall measurement. J. Appl. Climatol.
**1970**, 9, 489–497. [Google Scholar] [CrossRef] - Hu, Q.; Li, Z.; Wang, L.; Huang, Y.; Wang, Y.; Li, L. Rainfall spatial estimations: A review from spatial interpolation to multi-source data merging. Water
**2019**, 11, 579. [Google Scholar] [CrossRef] [Green Version] - Barrett, E.C.; Beaumont, M.J. Satellite rainfall monitoring: An overview. Remote Sens. Rev.
**1994**, 11, 23–48. [Google Scholar] [CrossRef] - Duan, Z.; Ren, Y.; Liu, X.; Lei, H.; Hua, X.; Shu, X.; Zhou, L. A comprehensive comparison of data fusion approaches to multi-source precipitation observations: A case study in Sichuan Province, China. Environ. Monit. Assess.
**2022**, 194, 422. [Google Scholar] [CrossRef] - Raftery, A.E.; Gneiting, T.; Balabdaoui, F.; Polakowski, M. Using Bayesian model averaging to calibrate forecast ensembles. Mon. Weather Rev.
**2005**, 133, 1155–1174. [Google Scholar] [CrossRef] [Green Version] - Schmeits, M.J.; Kok, K.J. A comparison between raw ensemble output, (modified) Bayesian model averaging, and extended logistic regression using ECMWF ensemble precipitation reforecasts. Mon. Weather Rev.
**2010**, 138, 4199–4211. [Google Scholar] [CrossRef] - Yang, P.; Ng, T.L. Fast Bayesian regression kriging method for real-time merging of radar, rain gauge, and crowdsourced rainfall data. Water Resour. Res.
**2019**, 55, 3194–3214. [Google Scholar] [CrossRef] - Kang, H.B.; Jung, Y.J.; Park, J. Fast Bayesian Functional Regression for Non-Gaussian Spatial Data. Bayesian Anal.
**2023**, 1, 1–32. [Google Scholar] [CrossRef] - Zhang, T.; Li, Y.; Li, J.; Li, Z.; Wang, C.; Liu, J. Quantitative Estimation and Fusion Optimization of Radar Rainfall in Duanzhuang Watershed in the Eastern foot of Taihang Mountains. Authorea
**2023**. [Google Scholar] [CrossRef] - Crane, R.K. Automatic cell detection and tracking. IEEE Trans. Geosci. Electron.
**1979**, 17, 250–262. [Google Scholar] [CrossRef] - Velasco-Forero, C.A.; Sempere-Torres, D.; Cassiraga, E.F.; Jaime Gómez-Hernández, J. A non-parametric automatic blending methodology to estimate rainfall fields from rain gauge and radar data. Adv. Water Resour.
**2009**, 32, 986–1002. [Google Scholar] [CrossRef] - Ochoa-Rodriguez, S.; Wang, L.-P.; Willems, P.; Onof, C. A review of radar-rain gauge data merging methods and their potential for urban hydrological applications. Water Resour. Res.
**2019**, 55, 6356–6391. [Google Scholar] [CrossRef] - de Vos, L.W.; Leijnse, H.; Overeem, A.; Uijlenhoet, R. Quality control for crowdsourced personal weather stations to enable operational rainfall monitoring. Geophys. Res. Lett.
**2019**, 46, 8820–8829. [Google Scholar] [CrossRef] [Green Version] - Sciuto, G.; Bonaccorso, B.; Cancelliere, A.; Rossi, G. Quality control of daily rainfall data with neural networks. J. Hydrol.
**2009**, 364, 13–22. [Google Scholar] [CrossRef] - Guo, B.; Zhang, J.; Xu, T.; Croke, B.; Jakeman, A.; Song, Y.; Yang, Q.; Lei, X.; Liao, W. Applicability assessment and uncertainty analysis of multi-precipitation datasets for the simulation of hydrologic models. Water
**2018**, 10, 1611. [Google Scholar] [CrossRef] [Green Version] - Wang, H.; Zhang, N.; Du, E.; Yan, J.; Han, S.; Li, N.; Li, H.; Liu, Y. An adaptive identification method of abnormal data in wind and solar power stations. Renew. Energy
**2023**, 208, 76–93. [Google Scholar] [CrossRef] - Pegram, G. Patching rain-fall data using regression methods. 3. Grouping, patching and outlier detection. J. Hydrol.
**1997**, 198, 319–334. [Google Scholar] [CrossRef] - Arumugam, P.; Saranya, R. Outlier detection and missing value in sea-sonal ARIMA model using rainfall data. Mater. Today Proc.
**2018**, 5, 1791–1799. [Google Scholar] [CrossRef] - Zhao, C.; Yang, J. A robust skewed boxplot for detecting outliers in rainfall observations in real-time flood forecasting. Adv. Meteorol.
**2019**, 2019, 1795673. [Google Scholar] [CrossRef] [Green Version] - Ma, M.; He, B.; Wan, J.; Jia, P.; Guo, X.; Gao, L.; Maguire, L.W.; Hong, Y. Characterizing the flash flooding risks from 2011 to 2016 over China. Water
**2018**, 10, 704. [Google Scholar] [CrossRef] [Green Version] - Pearson, R.K. Outliers in process modeling and identification. IEEE Trans. Control Syst. Technol.
**2002**, 10, 55–63. [Google Scholar] [CrossRef] - Haberlandt, U. Geostatistical interpolation of hourly precipitation from rain gauges and radar for a large-scale extreme rainfall event. J. Hydrol.
**2007**, 332, 144–157. [Google Scholar] [CrossRef] - Erdin, R.; Frei, C.; Künsch, H.R. Data transformation and uncertainty in geostatistical combination of radar and rain gauges. J. Hydrometeorol.
**2012**, 13, 1332–1346. [Google Scholar] [CrossRef] - Davies, L.; Gather, U. The identification of multiple outliers. J. Am. Stat. Assoc.
**1993**, 88, 782–792. [Google Scholar] [CrossRef] - Qiu, Q.; Liu, J.; Tian, J.; Jiao, Y.; Li, C.; Wang, W.; Yu, F. Evaluation of the radar QPE and rain gauge data merging methods in Northern China. Remote Sens.
**2020**, 12, 363. [Google Scholar] [CrossRef] [Green Version] - Nanding, N.; Rico-Ramirez, M.A.; Han, D. Comparison of different radar-raingauge rainfall merging techniques. J. Hydroinform.
**2015**, 17, 422–445. [Google Scholar] [CrossRef] - Jewell, S.A.; Gaussiat, N. An assessment of kriging-based rain-gauge–radar merging techniques. Q. J. R. Meteorol. Soc.
**2015**, 141, 2300–2313. [Google Scholar] [CrossRef]

**Figure 6.**Scatter diagrams comparing rainfall station data (1/1 line) to rainfall estimates derived from rainfall-data fusion (scatter points), without the exclusion of anomalous data.

**Figure 8.**Scatter diagrams comparing rainfall station data (1/1 line) to rainfall estimates derived from rainfall-data fusion (scatter points) after the exclusion of anomalous data.

a | n | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |

0.05 | 1.15 | 1.45 | 1.67 | 1.82 | 1.94 | 2.03 | 2.11 | 2.18 | 2.28 | 2.29 | 2.33 | 2.37 | 2.41 |

0.025 | 1.15 | 1.48 | 1.71 | 1.89 | 2.02 | 2.13 | 2.21 | 2.29 | 2.36 | 2.41 | 2.46 | 2.51 | 2.55 |

0.01 | 1.15 | 1.49 | 1.75 | 1.94 | 2.10 | 2.22 | 2.32 | 2.41 | 2.48 | 2.55 | 2.61 | 2.66 | 2.71 |

Grade | Rainfall Amount (mm) | ||||
---|---|---|---|---|---|

1 h | 3 h | 6 h | 12 h | 24 h | |

Light rain | 0.1–1.5 | 0.1–2.9 | 0.1–3.9 | 0.1–4.9 | 0.1–9.9 |

Moderate rain | 1.6–6.9 | 3.0–9.9 | 4.0–12.9 | 5.0–14.9 | 10.0–24.9 |

Heavy rain | 7.0–14.9 | 10.0–19.9 | 13.0–24.9 | 15.0–29.9 | 25.0–49.9 |

Rainstorm | 15.0–39.9 | 20.0–49.9 | 25.0–59.9 | 30.0–69.9 | 50.0–99.9 |

Heavy rainstorm | 40.0–49.9 | 50.0–69.9 | 60.0–119.9 | 70.0–139.9 | 100.0–249.9 |

Torrential rainstorm | ≥50.0 | ≥70.0 | ≥120.0 | ≥140.0 | ≥250.0 |

Anomalous Stations | 08:00–19:00 h, July 3 | 15:00, July 5–11:00 h, July 6 | 14:00 h, July 27–08:00 h, July 28 | 01:00–17:00 h, August 9 |
---|---|---|---|---|

Determined | 602 | 530 | 451 | 408 |

Actual | 639 | 566 | 472 | 426 |

Rainfall Event/Time | 08:00–19:00 July 3 | 15:00 July 5–11:00 July 6 | 14:00 July 27–08:00 July 28 | 01:00–17:00 August 9 |
---|---|---|---|---|

08:00 | 87 | 91 | 88 | 90 |

17:00 | 92 | 96 | 92 | 91 |

**Table 5.**Accuracy of anomalous station identification in Hebei before and after radar-assisted validation.

Time | 08:00–19:00 July 3 | 15:00 July 5–11:00 July 6 | 14:00 July 27–08:00 July 28 | |||
---|---|---|---|---|---|---|

08:00 | 17:00 | 08:00 | 17:00 | 08:00 | 17:00 | |

Accuracy before radar-assisted validation | 88 | 93 | 87 | 89 | 91 | 90 |

Accuracy after radar-assisted validation | 93 | 92 | 96 | 94 | 93 | 94 |

Rainfall Event | Indicator | OI | KED | FAR |
---|---|---|---|---|

08:00–19:00 July 3 | BIAS | −0.43 | −0.10 | −0.16 |

RMSE | 4.65 | 2.11 | 3.16 | |

MRTE | 0.32 | 0.16 | 0.16 | |

15:00 July 5–11:00 July 6 | BIAS | −0.50 | 0.16 | −0.21 |

RMSE | 1.86 | 0.84 | 1.55 | |

MRTE | 0.49 | 0.11 | 0.43 | |

14:00 July 27–08:00 July 28 | BIAS | −0.76 | 0.55 | 0.69 |

RMSE | 0.55 | 0.49 | 0.47 | |

MRTE | 0.50 | 0.41 | 0.53 | |

01:00–17:00 August 9 | BIAS | −0.31 | −0.36 | 0.48 |

RMSE | 1.59 | 1.01 | 1.42 | |

MRTE | 0.46 | 0.33 | 0.41 |

Rainfall Event | Indicator | OI | KED | FAR |
---|---|---|---|---|

08:00–19:00 July 3 | BIAS | −0.23 | −0.08 | −0.10 |

RMSE | 3.32 | 1.56 | 2.96 | |

MRTE | 0.16 | 0.12 | 0.08 | |

15:00 July 5–11:00 July 6 | BIAS | −0.36 | 0.10 | −0.13 |

RMSE | 1.77 | 0.86 | 1.55 | |

MRTE | 0.44 | 0.06 | 0.32 | |

14:00 July 27–08:00 July 28 | BIAS | −0.77 | 0.53 | 0.76 |

RMSE | 0.52 | 0.43 | 0.42 | |

MRTE | 0.50 | 0.32 | 0.49 | |

01:00–17:00 August 9 | BIAS | −0.23 | −0.15 | 0.33 |

RMSE | 1.32 | 0.88 | 0.85 | |

MRTE | 0.38 | 0.21 | 0.42 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Qiu, Q.; Wang, Z.; Tian, J.; Tu, Y.; Cui, X.; Hu, C.; Kang, Y.
Correction of Fused Rainfall Data Based on Identification and Exclusion of Anomalous Rainfall Station Data. *Water* **2023**, *15*, 2541.
https://doi.org/10.3390/w15142541

**AMA Style**

Qiu Q, Wang Z, Tian J, Tu Y, Cui X, Hu C, Kang Y.
Correction of Fused Rainfall Data Based on Identification and Exclusion of Anomalous Rainfall Station Data. *Water*. 2023; 15(14):2541.
https://doi.org/10.3390/w15142541

**Chicago/Turabian Style**

Qiu, Qingtai, Zheng Wang, Jiyang Tian, Yong Tu, Xidong Cui, Chunqi Hu, and Yajing Kang.
2023. "Correction of Fused Rainfall Data Based on Identification and Exclusion of Anomalous Rainfall Station Data" *Water* 15, no. 14: 2541.
https://doi.org/10.3390/w15142541