# Research on Rain Pattern Classification Based on Machine Learning: A Case Study in Pi River Basin

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

_{1}score were 98.95% and 98.58%, respectively, and the loss function and accuracy converged quickly after only 20 iterations. LSTM and SVM have satisfactory accuracy but relatively low training efficiency, and DT has fast classification speed but relatively low accuracy. With the increase in the sampling size, classification results became stable and more accurate. Besides the higher accuracy, the training efficiency of the four methods was also improved.

## 1. Introduction

## 2. Study Area and Data Description

#### 2.1. Overview of the Pi River Basin

^{2}. The region from the above-mentioned two reservoirs to Hengpaitou is the middle reaches, with a catchment area of 1130 km

^{2}. The area from Hengpaitou to the river mouth is the lower reaches, with a basin area of 1630 km

^{2}, which consists of hilly and plain depressions [24].

#### 2.2. Data Sources

## 3. Methodology

#### 3.1. General Idea of This Study

#### 3.2. Generation of DTW Rainfall Pattern

#### 3.3. Four Machine Learning Classification Methods Adopted in This Study

#### 3.3.1. Decision Tree

#### 3.3.2. Long- and Short-Term Memory Neural Network

#### 3.3.3. Support Vector Machine

#### 3.3.4. Light Gradient Boosting Machine

## 4. Construction of Rain Pattern Classification Model for Pi River Basin

#### 4.1. Data Sources and Feature Selection

#### 4.1.1. Data Sources

#### 4.1.2. Feature Selection

#### 4.2. Model Development and Numerical Test

#### 4.2.1. Model Framework

#### 4.2.2. Model Parameters

#### 4.3. Model Evaluation

_{1}score, are essential evaluation metrics in machine learning. They are defined as follows (taking binary classification as an example, the confusion matrix is shown in Figure 9): TP represents the number of correctly identified targets, TN represents the number of other items correctly identified, FP represents the number of target items incorrectly identified, and FN represents the number of missed targets.

- (1)
- Accuracy (ACC)

- (2)
- Precision (P)

- (3)
- Recall (R)

- (4)
- F
_{1}score

_{1}score is the harmonic mean of precision and recall; the calculation method is shown in Equation (13):

_{1}score for each class separately and then took their mean; the calculation method is shown in Equations (14)–(16).

## 5. Results and Discussion

#### 5.1. Classification Results of DTW Rainfall Patterns

#### 5.2. Comparison and Analysis of Four Machine Learning Classification Methods

_{1}score values for the rainfall classification dataset, which were 98.95%, 99.25%, 97.96%, and 98.58%, respectively. The accuracy and F

_{1}score were respectively improved by 0.18% and 0.27% compared to the LSTM classification method, by 1.32% and 1.26% compared to the SVM method, and by 3.6% and 5.4% compared to the DT method. Therefore, it can be seen that the LightGBM algorithm has improved all indicators of rainfall classification accuracy and is superior to the other three models.

#### 5.3. Analysis of Classification Results with Samples of Different Magnitudes

_{1}score under different rain pattern samples are shown in Figure 16.

_{1}score of the four classification models generally increased as the sample size increased from 500 to 5000. Among them, the evaluation indicators of LightGBM and LSTM models did not increase significantly after the sample size increased to 1000, while those of DT and SVM models steadily increased with the increase in sample size. This indicates that the sample size has a significant impact on the accuracy of classification models.

#### 5.4. Analysis of Characteristics Significance

## 6. Conclusions

_{1}score were 98.95% and 98.58%, respectively, and the loss function and accuracy converged quickly after only 20 iterations. The imbalance in the number distribution of sample categories in the dataset will affect the classification accuracy of the model. LightGBM has significant advantages in solving classification problems with imbalanced category distribution. In practical applications, appropriate classification algorithms and data preprocessing methods can be selected based on the actual situation to achieve the classification goals more effectively.

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Diederen, D.; Liu, Y. Dynamic Spatio Temporal Generation of Large Scale Synthetic Gridded Precipitation: With Improved Spatial Coherence of Extremes. Stoch. Environ. Res. Risk Assess.
**2020**, 34, 1369–1383. [Google Scholar] [CrossRef] - Yuan, W.L.; Liu, M.Q.; Wan, F. Study on the Impact of Rainfall Pattern in Small Watersheds on Rainfall Warning Index of Flash Flood Event. Nat. Hazards
**2019**, 97, 665–682. [Google Scholar] [CrossRef] - Kan, G.; Hong, Y.; Liang, K. Research on the Flood forecasting based on coupled machine learning model. China Rural. Water Hydropower
**2018**, 10, 165–169, 176. (In Chinese) [Google Scholar] - Kan, G.; Liu, Z.; Li, Z.; Yao, C.; Zhou, S. Coupling Xin’anjiang runoff generation model with improved BP flow concentration model. Adv. Water Sci.
**2012**, 23, 21–28. (In Chinese) [Google Scholar] - Mo, B. The Rain Water and Confluent Channel; Architectural Engineering Press: Beijing, China, 1959. [Google Scholar]
- Keifer, G.J.; Chu, H.H. Synthetic storm pattern for drainage design. J. Hydraul. Div. ASCE
**1957**, 83, 1332-1–1332-25. [Google Scholar] [CrossRef] - Huff, F.A. Time distribution of rainfall in heavy storms. Water Resour. Res.
**1967**, 3, 1007–1010. [Google Scholar] [CrossRef] - Pilgrim, D.H.; Cordery, I. Rainfall temporal patterns for design floods. J. Hydraul. Div. ASCE
**1975**, 101, 81–95. [Google Scholar] [CrossRef] - Yen, B.C.; Chow, V.T. Design hyetographs for small drainage structures. J. Hydraul. Div. ASCE
**1980**, 106, 1055–1076. [Google Scholar] [CrossRef] - Zhao, G. Time history allocation of design rainstorm type. Water Resour. Hydropower Eng.
**1964**, 1, 38–42. (In Chinese) [Google Scholar] - Wang, M.; Tan, X.C. Study on urban rainstorm and rain pattern in Beijing. J. Hydrol.
**1994**, 3, 1–6. (In Chinese) [Google Scholar] - Wu, Z.; Cen, G.; An, Z. Experimental study on slope confluence. J. Hydraul. Eng.
**1995**, 7, 84–89. (In Chinese) [Google Scholar] - Cen, G.; Shen, J.; Fan, R. Study on rainstorm pattern of urban design. Adv. Water Sci.
**1998**, 9, 42–47. (In Chinese) [Google Scholar] - Zhao, K.; Yan, H.; Wang, Y.; Tao, T. Influence of Rainfall Pattern and Intensity on Local Sensitivity of SWMM model parameters. Water Purif. Technol.
**2018**, 37, 95–101. (In Chinese) [Google Scholar] - Zhang, X. Estimation of Hydrological Parameters and Identification of Influencing Factors of SWMM Model by Bayesian Statistics; Chongqing University: Chongqing, China, 2019. (In Chinese) [Google Scholar]
- Tu, X. Study on Mountain Flood Disaster Warning Model Based on Rain Pattern Clustering and Recognition; Zhengzhou University: Zhengzhou, China, 2021. (In Chinese) [Google Scholar]
- Yang, S.X. Research on Optimization of Rainfall Runoff Data-Driven Model Based on Deep Learning and Data Mining; Chongqing University: Chongqing, China, 2021. [Google Scholar]
- Gupta, U.; Jitkajornwanich, K.; Elmasri, R.; Fegaras, L. Adapting K-Means Clustering to Identify Spatial Patterns in Storms. In Proceedings of the 2016 IEEE International Conference on Big Data (Big Data), Washington, DC, USA, 5–8 December 2016. [Google Scholar]
- Gao, C.; Xu, Y.; Zhu, Q.; Bai, Z.; Liu, L. Stochastic generation of daily rainfall events: A single-site rainfall model with Copula-based joint simulation of rainfall characteristics and classification and simulation of rainfall patterns. J. Hydrol.
**2018**, 564, 41–58. [Google Scholar] [CrossRef] - Yin, S.; Wang, Y.; Xie, Y.; Liu, A. Time-history classification of rainfall processes in China. Adv. Water Sci.
**2014**, 25, 617–624. (In Chinese) [Google Scholar] - Xiao, K.L.; Zhao, G.L.; Wang, Y.; Hu, C.J. Spatial and temporal distribution of rainfall in flood season in Beijing city based on dynamic cluster analysis and fuzzy pattern recognition. J. Hydrol.
**2019**, 39, 74–77. (In Chinese) [Google Scholar] - Hu, R.; Wang, S.; Wang, P. Study on short-duration rainstorm pattern based on cluster analysis. Water Resour. Power
**2021**, 39, 8–10. (In Chinese) [Google Scholar] - Li, Y.; Yang, T.; Ma, J. Variation characteristics of precipitation concentration and concentration period during flood season in Pihe River Basin. Resour. Sci.
**2012**, 34, 418–423. (In Chinese) [Google Scholar] - Zhang, Z.; Xue, C.; He, X.; Li, J.; Wang, F. Study on Joint flood control operation in Pihe River Basin, a tributary of Huaihe River. China Flood Drought Manag.
**2020**, 30, 13–18. (In Chinese) [Google Scholar] - Li, Y.; Wang, Y.; Ma, Q.; Liu, T.; Si, L.; Yu, H. Study on the Characteristics of rainfall and rain Pattern Zoning in Hebei Province based on DTW and K-means algorithm. J. Geo-Inf. Sci.
**2021**, 23, 860–868. (In Chinese) [Google Scholar] - Song, X.; Duan, Z.; Jiang, X. Comparison of Artificial Neural Networks and Support Vector Machine Classifiers for Land Cover Classification in Northern China Using a SPOT-5 HRG Image. Int. J. Remote Sens.
**2012**, 33, 3301–3320. [Google Scholar] [CrossRef] - Pan, H.; Li, Z.; Tian, C.; Wang, L.; Fu, Y.; Qin, X.; Liu, F. The LightGBM-based classification algorithm for Chinese characters speech imagery BCI system. Cogn. Neurodyn.
**2023**, 17, 373–384. [Google Scholar] [CrossRef] [PubMed] - Hina, T.; Mutahir, I.M.; Zafar, M.; Maqsooda, P.; Irfan, U. Gender classification from anthropometric measurement by boosting decision tree: A novel machine learning approach. J. Natl. Med. Assoc. 2023; in press, corrected proof. [Google Scholar]
- Nesrine, K.; Ameni, E.; Mohamed, K.; Hbaieb, T.S. New LSTM Deep Learning Algorithm for Driving Behavior Classification. Cybern. Syst.
**2023**, 54, 387–405. [Google Scholar] - Breiman, L. Classification and Regression Trees; Wadsworth: Belmont, CA, USA, 1984. [Google Scholar]
- Han, J.W.; Micheline, K. Data Mining—Concepts and Techniques; Higher Education Press: Beijing, China, 2001. [Google Scholar]
- Hu, X. Research on Semantic Relation Classification Based on LSTM; Harbin Institute of Technology: Harbin, China, 2015. (In Chinese) [Google Scholar]
- Han, Q.; Zhang, X.; Shen, W. Lithology identification based on gradient lifting decision tree (GBDT) algorithm. Bull. Mineral. Petrol. Geochem.
**2018**, 37, 1173–1180. (In Chinese) [Google Scholar] - Wang, S.; Wu, R.; Xie, W.; Lu, Y. Study on Mountain flood disaster risk Zoning based on FloodArea: A case study of Pihe River Basin. Clim. Chang. Res.
**2016**, 12, 432–441. [Google Scholar] - Fan, X. Research and Application of Support Vector Machine Algorithm; Zhejiang University: Hangzhou, China, 2003. [Google Scholar]
- Ding, S.; Qi, B.; Tan, H. Review on Theory and Algorithm of Support Vector Machine. J. Univ. Electron. Sci. Technol. China
**2011**, 40, 2–10. [Google Scholar] - Yang, J.; Qiao, P.; Li, Y.; Wang, N. A review of machine learning classification Problems and Algorithms. Stat. Decis.
**2019**, 35, 36–40. [Google Scholar]

**Figure 8.**Distribution of rain patterns for the seven rain patterns concerning the overall set, training set, and validation set in the Pi River basin.

**Figure 11.**Classification effect of LSTM models on rainfall patterns: (

**a**) Training and validation loss; (

**b**) Training and validation accuracy.

**Figure 12.**Classification effect of LightGBM models on rainfall patterns: (

**a**) Training and validation loss; (

**b**) Training and validation accuracy.

**Figure 13.**Classification effect of SVM models on rainfall patterns: (

**a**) Training and validation loss; (

**b**) Training and validation accuracy.

**Figure 15.**Confusion matrix of four classification algorithms: (

**a**) LightGBM model; (

**b**) LSTM model; (

**c**) SVM model; (

**d**) DT model.

**Figure 16.**Classification accuracy of four models with different sample sizes: (

**a**) Accuracy; (

**b**) Precision; (

**c**) Recall; (

**d**) F

_{1}score.

**Figure 17.**Loss function and accuracy of different sample sizes of LSTM models: (

**a**) Sample size = 500; (

**b**) Sample size = 1000; (

**c**) Sample size = 2500; (

**d**) Sample size = 5000.

**Figure 18.**Loss function and accuracy of different sample sizes of LightGBM models: (

**a**) Sample size = 500; (

**b**) Sample size = 1000; (

**c**) Sample size = 2500; (

**d**) Sample size = 5000.

**Figure 19.**Classification accuracy of the SVM model under 500 and 5000 sample size: (

**a**) Sample size = 500; (

**b**) Sample size = 5000.

**Figure 21.**Order of importance of the LightGBM model with different sample sizes: (

**a**) Sample size = 500; (

**b**) Sample size = 1000; (

**c**) Sample size = 2500; (

**d**) Sample size = 5000.

Pattern | I | II | III | IV | V | VI | VII |
---|---|---|---|---|---|---|---|

Number of rainfall events | 761 | 1149 | 1596 | 1142 | 193 | 677 | 192 |

Proportion/% | 13.33 | 20.12 | 27.95 | 20.00 | 3.38 | 11.86 | 3.36 |

Average rainfall duration/h | 22.47 | 21.17 | 22.20 | 9.44 | 26.28 | 17.01 | 27.08 |

Average rainfall/mm | 23.06 | 20.42 | 22.05 | 9.00 | 25.25 | 17.00 | 25.36 |

Mean rainfall intensity/mm/h | 1.03 | 0.96 | 0.99 | 0.95 | 0.96 | 1.00 | 0.94 |

**Table 2.**Distribution of rain events in training and validation set with different sample quantities.

Number of Sample Sets | Category | I | II | III | IV | V | VI | VII |
---|---|---|---|---|---|---|---|---|

500 | Training | 53 | 80 | 112 | 80 | 14 | 47 | 13 |

Validation | 13 | 20 | 28 | 20 | 3 | 12 | 3 | |

1000 | Training | 107 | 160 | 224 | 161 | 27 | 95 | 27 |

Validation | 27 | 40 | 56 | 40 | 7 | 24 | 7 | |

2500 | Training | 267 | 400 | 559 | 402 | 68 | 237 | 67 |

Validation | 71 | 103 | 134 | 95 | 14 | 68 | 14 | |

5000 | Training | 533 | 800 | 1118 | 805 | 135 | 474 | 135 |

Validation | 143 | 207 | 268 | 189 | 29 | 137 | 28 |

Name of Classification Models | Accuracy/% | Precision/% | Recall/% | F_{1}-Score/% |
---|---|---|---|---|

DT | 95.35 | 92.21 | 94.28 | 93.18 |

LSTM | 98.77 | 99.15 | 97.51 | 98.29 |

LightGBM | 98.95 | 99.25 | 97.96 | 98.58 |

SVM | 97.63 | 98.84 | 95.99 | 97.32 |

Recall/% | I | II | III | IV | V | VI | VII |
---|---|---|---|---|---|---|---|

DT | 93.25 | 97.03 | 93.79 | 97.69 | 93.94 | 96.79 | 87.50 |

LSTM | 95.71 | 100.00 | 97.06 | 99.54 | 93.94 | 100.00 | 96.88 |

LightGBM | 97.55 | 100.00 | 99.67 | 99.07 | 96.97 | 98.72 | 93.75 |

SVM | 92.64 | 100.00 | 99.35 | 99.54 | 93.94 | 97.44 | 90.63 |

Average | 94.79 | 99.26 | 97.47 | 98.96 | 94.70 | 98.24 | 92.19 |

Name of Classification Models | Number of Samples | Accuracy/% | Precision/% | Recall/% | F_{1} Score/% |
---|---|---|---|---|---|

LSTM | 500 | 79.00 | 83.70 | 80.08 | 80.36 |

1000 | 94.00 | 92.79 | 95.62 | 94.00 | |

2500 | 95.80 | 95.10 | 94.32 | 94.67 | |

5000 | 98.60 | 98.05 | 97.17 | 97.59 | |

LightGBM | 500 | 92.00 | 93.00 | 92.00 | 92.00 |

1000 | 96.00 | 97.20 | 95.24 | 96.03 | |

2500 | 97.20 | 98.13 | 95.20 | 96.56 | |

5000 | 98.80 | 99.15 | 97.63 | 98.36 | |

DT | 500 | 65.00 | 59.60 | 62.71 | 58.26 |

1000 | 76.50 | 59.21 | 61.36 | 59.91 | |

2500 | 83.60 | 76.36 | 78.55 | 76.89 | |

5000 | 94.80 | 90.00 | 92.74 | 91.15 | |

SVM | 500 | 81.82 | 94.40 | 74.76 | 79.78 |

1000 | 89.45 | 92.62 | 83.22 | 84.76 | |

2500 | 96.19 | 97.63 | 93.31 | 95.17 | |

5000 | 97.40 | 98.73 | 95.46 | 96.98 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Fu, X.; Kan, G.; Liu, R.; Liang, K.; He, X.; Ding, L.
Research on Rain Pattern Classification Based on Machine Learning: A Case Study in Pi River Basin. *Water* **2023**, *15*, 1570.
https://doi.org/10.3390/w15081570

**AMA Style**

Fu X, Kan G, Liu R, Liang K, He X, Ding L.
Research on Rain Pattern Classification Based on Machine Learning: A Case Study in Pi River Basin. *Water*. 2023; 15(8):1570.
https://doi.org/10.3390/w15081570

**Chicago/Turabian Style**

Fu, Xiaodi, Guangyuan Kan, Ronghua Liu, Ke Liang, Xiaoyan He, and Liuqian Ding.
2023. "Research on Rain Pattern Classification Based on Machine Learning: A Case Study in Pi River Basin" *Water* 15, no. 8: 1570.
https://doi.org/10.3390/w15081570