Precipitation Forecasting in Northern Bangladesh Using a Hybrid Machine Learning Model

: Precipitation forecasting is essential for the assessment of several hydrological processes. This study shows that based on a machine learning approach, reliable models for precipitation prediction can be developed. The tropical monsoon-climate northern region of Bangladesh, including the Rangpur and Sylhet division, was chosen as the case study. Two machine learning algorithms were used: M5P and support vector regression. Moreover, a novel hybrid model based on the two algorithms was developed. The performance of prediction models was assessed by means of evaluation metrics and graphical representations. A sensitivity analysis was also carried out to assess the prediction accuracy as the number of exogenous inputs reduces and lag times increases. Overall, the hybrid model M5P-SVR led to the best predictions among used models in this study, with R 2 values up to 0.87 and 0.92 for the stations of Rangpur and Sylhet, respectively.


Introduction
Precipitation forecasting plays a key role in the assessment of several hydrological processes.Precipitation variability is a critical input parameter for both the management of water resources for urban and agricultural purposes [1,2], flood, and drought prediction [3].However, due to climate changes observed in the past decades, an evaluation of the meteorological parameters has become more complex.This makes the precipitation prediction a challenging task [4].
Precipitation is usually measured by means of rain gauges, which are relatively inexpensive, and easy-to-use methodology, with however the disadvantage of providing data relating to a limited area [5].In order to avoid the problems related to ungauged basins, for which no data are available, radar-based models were developed, which have the advantage of high spatial-temporal resolution [6,7].The accuracy of this method has been discussed in different studies [8,9].
Precipitation forecasting is based on two different approaches: dynamic and empirical.The first one considers process-based equations through physical models.However, operation complexities and computational efficiency have limited the applicability of these models on a large scale [10].Therefore, following an empirical approach, based on data-driven models, should represent a valid alternative, in particular with limited time series [11].However, using the conventional empirical approach for the precipitation predictions is complex given the chaotic nature of the meteorological variables.From this point of view, an artificial intelligence (AI) algorithm-based approach has proved to be the most reliable technique, allowing high computational speed without the need to define analytical relationships between the input data and the target [12].This led the AI algorithms to be widely used for the hydro-meteorological phenomena modeling [13].A algorithms (M5P-SVR) was developed.The particle swarm optimization (PSO) algorithm was used for an optimization of the SVR parameters.To the authors' knowledge, in literature, no study provides a hybrid model based on M5P and SVR algorithms for the prediction of the monthly precipitation.Furthermore, in the current literature there is no predictive model based on the hybridization of ML algorithms for the monsoon climate of Northern Bangladesh.Due to the considerable variability in rainfall throughout the year, it is more difficult to provide accurate precipitation forecasts, particularly for the monsoon season.In addition, a sensitivity analysis related to the number of exogenous input parameters and to lag time was performed and discussed.

Study Area and Datasets
The study area consists of two divisions located in the northern region of Bangladesh: Rangpur and Sylhet.Rangpur Division shares its border with the Indian states of West Bengal, to the west and north, Assam and Meghalaya, to the east, and with the Bangladesh Division of Rajshahi to the south.
Sylhet Division shares its border with the Indian states of Meghalaya, to the north, Assam, to the east, and Tripura, to the south, and with the Bangladesh Division of Mymensingh and Dhaka, to the west, and Chittagong, to the southwest (Figure 1a).Elevations in the Rangpur Division range from values lower than 10 m in its southern area to 100 m in its northern area while in the Sylhet Division the terrain is flat, with an elevation that never exceeds 10 m (Figure 1b).
Sustainability 2022, 14, 2663 3 of 21 algorithms were considered: M5P and SVR.Furthermore, a hybrid model, based on both algorithms (M5P-SVR) was developed.The particle swarm optimization (PSO) algorithm was used for an optimization of the SVR parameters.To the authors' knowledge, in literature, no study provides a hybrid model based on M5P and SVR algorithms for the prediction of the monthly precipitation.Furthermore, in the current literature there is no predictive model based on the hybridization of ML algorithms for the monsoon climate of Northern Bangladesh.Due to the considerable variability in rainfall throughout the year, it is more difficult to provide accurate precipitation forecasts, particularly for the monsoon season.In addition, a sensitivity analysis related to the number of exogenous input parameters and to lag time was performed and discussed.

Study Area and Datasets
The study area consists of two divisions located in the northern region of Bangladesh: Rangpur and Sylhet.Rangpur Division shares its border with the Indian states of West Bengal, to the west and north, Assam and Meghalaya, to the east, and with the Bangladesh Division of Rajshahi to the south.
Sylhet Division shares its border with the Indian states of Meghalaya, to the north, Assam, to the east, and Tripura, to the south, and with the Bangladesh Division of Mymensingh and Dhaka, to the west, and Chittagong, to the southwest (Figure 1a).Elevations in the Rangpur Division range from values lower than 10 m in its southern area to 100 m in its northern area while in the Sylhet Division the terrain is flat, with an elevation that never exceeds 10 m (Figure 1b).The physiographic of the Rangpur Division is mainly characterized by the floodplains, with alluvial fan deposits of both young and old gravelly sand, and the Barind clay deposits, that cover the southern part extending within the Rajshahi division.Sylhet Division is also covered by floodplains with alluvial silt and clay deposits, in particular in its central region, while in the western region, at the border with the Mymensingh and The physiographic of the Rangpur Division is mainly characterized by the floodplains, with alluvial fan deposits of both young and old gravelly sand, and the Barind clay deposits, that cover the southern part extending within the Rajshahi division.Sylhet Division is also covered by floodplains with alluvial silt and clay deposits, in particular in its central region, while in the western region, at the border with the Mymensingh and Dhaka Divisions and also in some areas of the central region, is covered by paludal deposits consisting in marsh clay and pea.
Overall, the climate in Northern Bangladesh is tropical monsoon with three distinct seasons: winter (between November and February), which is relatively cool with nearly no rainfall, pre-monsoon (between March and May), which is warm and characterized by thunderstorms, and monsoon (between June and October), with heavy rainfall [24].Furthermore, due to its location just south of the Himalayas foothills, where monsoon winds blow from west and northwest, northeastern Bangladesh, in particular the Sylhet Division, receives the greatest average annual precipitation, over 4000 mm, while the national average is about 2550 mm [25].
Dataset consisted of time series measured from two monitoring stations, one in each division, both equipped with rain gauges for the precipitation measurement.The Rangpur station is located on a floodplain characterized by gravelly sands deposits, while the Sylhet station is situated on the paludal deposits of the central Sylhet.Monthly values of maximum temperature (T max ), minimum temperature (T min ), relative humidity (H), wind speed (V wind ), cloud coverage (C) and a monthly average of daily bright sunshine (S), were used for the precipitation (P) forecasting, from January 1956 to December 2013.Cloud coverage was measured in okta, ranging from 0 oktas, which indicates a completely clear sky, to 8 oktas, completely covered sky.
A normalization of the data with respect to the maximum values of each variable was performed, in order to improve forecasting efficiency [26], providing a common interval between 0 and 1.Moreover, datasets were split with a 70-30% ratio for training and testing stages, respectively [27][28][29].
Time series of precipitation for both Rangpur and Sylhet were reported in Figure 2.
Sustainability 2022, 14, 2663 4 of 21 Dhaka Divisions and also in some areas of the central region, is covered by paludal deposits consisting in marsh clay and pea.Overall, the climate in Northern Bangladesh is tropical monsoon with three distinct seasons: winter (between November and February), which is relatively cool with nearly no rainfall, pre-monsoon (between March and May), which is warm and characterized by thunderstorms, and monsoon (between June and October), with heavy rainfall [24].Furthermore, due to its location just south of the Himalayas foothills, where monsoon winds blow from west and northwest, northeastern Bangladesh, in particular the Sylhet Division, receives the greatest average annual precipitation, over 4000 mm, while the national average is about 2550 mm [25].
Dataset consisted of time series measured from two monitoring stations, one in each division, both equipped with rain gauges for the precipitation measurement.The Rangpur station is located on a floodplain characterized by gravelly sands deposits, while the Sylhet station is situated on the paludal deposits of the central Sylhet.Monthly values of maximum temperature (Tmax), minimum temperature (Tmin), relative humidity (H), wind speed (Vwind), cloud coverage (C) and a monthly average of daily bright sunshine (S), were used for the precipitation (P) forecasting, from January 1956 to December 2013.Cloud coverage was measured in okta, ranging from 0 oktas, which indicates a completely clear sky, to 8 oktas, completely covered sky.
A normalization of the data with respect to the maximum values of each variable was performed, in order to improve forecasting efficiency [26], providing a common interval between 0 and 1.Moreover, datasets were split with a 70-30% ratio for training and testing stages, respectively [27][28][29].
Time series of precipitation for both Rangpur and Sylhet were reported in Figure 2. Five different models were developed, allowing to evaluate the accuracy of the prediction as the number of exogenous inputs changes (Table 1).Furthermore, four evaluation metrics were computed to assess the accuracy of the ML algorithms [30,31].The coefficient of determination (R 2 ), which assess how well the model replicates measured Five different models were developed, allowing to evaluate the accuracy of the prediction as the number of exogenous inputs changes (Table 1).Furthermore, four evaluation metrics were computed to assess the accuracy of the ML algorithms [30,31].The coefficient of determination (R 2 ), which assess how well the model replicates measured values and predicts future values, the mean absolute error (MAE), equal to the average magnitude of the difference between measured and predicted values, the root mean square error (RMSE), equal to the root of the average square difference between measured and predicted values, and the relative absolute error (RAE), equal to the ratio between absolute error and absolute value of the difference between average of the measured value and each measured value.These metrics are defined as: where P predicted,i is the predicted precipitation for the i-th data, P measured,i is the measured precipitation for the i-th data; n is the total number of measured data; P is the mean value of the measured precipitation.

M5P
The M5P algorithm develops a regression tree, which is a decision tree with the real numbers as target variables, to get predictions [32].Three different types of nodes are included in a regression tree: the root node, which includes the complete dataset, the internal nodes, which assign conditions on the input variables, and the leaf nodes, consisting of linear regression models of the target values.
The input dataset is iteratively divided into sub-domains, in which multivariable linear regression models are built, in the development process.In particular, the first step consists of a subdivision of the dataset into two subsets, assessing the possible binary split.In the subsequent steps, each subset is divided into smaller subsets considering the couple of subsets that maximized a least-squared deviation (LSD) function, with: where R(t) is the within variance in the node t, N indicates the number of subset units, y i is the target variable value for the i-th unit, and y m is the target variable mean.The function Φ(s p , t) to be maximized is expressed as: where p L and p R are the portion units allocated to the left node t L and right node t R , and s p indicated the split value [33].Different stopping rules were considered: minimum impurity level, minimum impurity change in the subdivision, minimum elements number for each node, and maximum tree depth.Furthermore, the pruning technique was considered to avoid overfitting problems for the fully developed tree.This technique removes branches that provide a low contribution to the prediction ability in order to reduce the tree size.The following parameters were considered: Batch size = 100; minimum number of instances to allow at a leaf node = 6.

Support Vector Regression (SVR)
Support vector machine algorithms (SVMs) are supervised learning models with associated learning algorithms.SVMs have proved to be among the most robust prediction algorithms, being particularly efficient for classification and regressions analysis [34][35][36].These are assumed and proven to be highly robust in nature for extremely noise-mixed data in comparison to the other local models and algorithms which use traditional chaotic methods.Furthermore, SVMs are more reliable for noise-mixed data in comparison to other models and algorithms based on traditional chaotic methods.When applied to regression problems, the SVM algorithm is generally called support vector regression (SVR).
The objective of the SVR is to find a function f (x) with a deviation lower than a value ε from the target values y i .Based on the following training dataset: {(x i , y i ), i = 1, . . ., l} ⊂ X × R, where X indicates the space of the input arrays, the Euclidean norm ||w|| 2 must be minimized, by solving a constrained convex optimization problem, in order to find a linear function f (x) = w, x + b, where b ∈ R and w ∈ X.In addition, slack variables were introduced to tolerate to allow deviations from ε.
The optimization can be expressed as: minimize : 1 2 subject to : where deviation and function flatness depend on the constant C, which is greater than 0 [37].The effectiveness of SVR depends also on the selection of the kernel function, which defines the feature space, and of its parameters.The Pearson VII universal function kernel (PUK) was considered, whose parameters were optimized through the PSO algorithm.PUK kernel can be expressed as: with σ and ω that control the half-width and the tailing factor of the peak, respectively.

Hybrid Model M5P-SVR
In order to improve the modeling performances, based on the predictions made with the M5P and SVR algorithms, it is possible to build hybrid models, leading to better forecasts.A key parameter to configure the hybrid model is how the predictions performed by the single algorithms are combined.More details on the rules for the combination of classifiers are reported in Kittler et al. (1998) [38].In the present study, the average of probabilities was considered as the combination rule, which evaluates the mean value of each class among the independent classifiers [39].
The parameters considered for the individual algorithms, M5P and SVR, within the hybrid model, were the same reported in the previous sections.

Particle Swarm Optimization (PSO)
The particle swarm optimization (PSO) is a well-known algorithm, which is widely applied in optimization problems, including the parameters calibration of machine learning algorithms in order to improve their performance in hydrological applications [40][41][42].PSO is a population-based technique that was motivated by studying the social behavior of fish and birds in finding the shortest route to find the food [43].The PSO performs an iterated research based on a population, namely as a swarm, of individuals, namely as particles.The velocity update equation manages the population as it moves through the search space searching of the optimal state.In each iteration, the algorithm saves the local optimum value and compares it with the global ones, with the optimum state which is chosen based on the fitness of an objective function [44].In addition, due to its high learning speed and the low memory requirement, the PSO algorithm was used to solve several non-linear applications in hydrologic field [45,46].The PSO algorithm was applied to optimize the following parameters for SVR: Batch size = 100; C = 1.0;Kernel = PUK with σ = 2.0 and ω = 0.1.

Time Series Analysis
Figure 3 shows a bar plot of the average monthly precipitation for Rangpur and Sylhet from 1956 to 2013.Maximum precipitations were observed during the monsoon season, equal to 457 mm for Rangpur and 799 mm for Sylhet, in July and June, respectively.Minimum precipitations were instead observed for the dry season, equal to 8 mm for both Rangpur and Sylhet, in January and December, respectively.

Particle Swarm Optimization (PSO)
The particle swarm optimization (PSO) is a well-known algorithm, which is widely applied in optimization problems, including the parameters calibration of machine learning algorithms in order to improve their performance in hydrological applications [40][41][42].PSO is a population-based technique that was motivated by studying the social behavior of fish and birds in finding the shortest route to find the food [43].The PSO performs an iterated research based on a population, namely as a swarm, of individuals, namely as particles.The velocity update equation manages the population as it moves through the search space searching of the optimal state.In each iteration, the algorithm saves the local optimum value and compares it with the global ones, with the optimum state which is chosen based on the fitness of an objective function [44].In addition, due to its high learning speed and the low memory requirement, the PSO algorithm was used to solve several non-linear applications in hydrologic field [45,46].The PSO algorithm was applied to optimize the following parameters for SVR: Batch size = 100; C = 1.0;Kernel = PUK with σ = 2.0 and ω = 0.1.

Time Series Analysis
Figure 3 shows a bar plot of the average monthly precipitation for Rangpur and Sylhet from 1956 to 2013.Maximum precipitations were observed during the monsoon season, equal to 457 mm for Rangpur and 799 mm for Sylhet, in July and June, respectively.Minimum precipitations were instead observed for the dry season, equal to 8 mm for both Rangpur and Sylhet, in January and December, respectively.During the dry season, from November to February, both stations showed values of the average monthly precipitation lower than 30 mm.However, pre-monsoon season highlighted marked difference between the two stations, with average monthly precipitation that increased from 29 mm in March to 263 mm in May for Rangpur and from 114 mm in March to 571 mm in May for Sylhet.
These differences became even more marked during the monsoon season until they fade at the end of the season, in the month of October, where the two stations showed similar average precipitations (170 mm for Rangpur and 218 mm for Sylhet).Overall, the mean annual rainfall estimated in the monitored period was equal to 2149 mm for Rangpur and 4004 mm for Sylhet.During the dry season, from November to February, both stations showed values of the average monthly precipitation lower than 30 mm.However, pre-monsoon season highlighted marked difference between the two stations, with average monthly precipitation that increased from 29 mm in March to 263 mm in May for Rangpur and from 114 mm in March to 571 mm in May for Sylhet.
These differences became even more marked during the monsoon season until they fade at the end of the season, in the month of October, where the two stations showed similar average precipitations (170 mm for Rangpur and 218 mm for Sylhet).Overall, the mean annual rainfall estimated in the monitored period was equal to 2149 mm for Rangpur and 4004 mm for Sylhet.Table 2 provides the statistics of the monthly data for both Rangpur and Sylhet stations, where σ indicates the standard deviation and CV the coefficient of variation, equal to the ratio between σ and mean.For the input selection, different techniques can be used, e.g., average mutual information [47] and Akaike Selection Criterion (AIC) [48].In the present study, the crosscorrelation function (XCF) and auto-correlation function (ACF) were used to assess the feedback delay between the exogenous input variables and the precipitation (Figure 4) and the input delay (Figure 5), respectively.This approach is in agreement with different machine-learning-based models developed to solve hydrological problems [49,50].XCF is expressed as: where I is the exogenous input variable, s is the duration of the time series, and τ the delay [51].Patterns between the two stations were very similar.In particular, T min (Figure 4b) showed XCF peaks equal to 0.8, higher than those computed for T max (Figure 4a), equal to 0.6, highlighting a greater correlation of the minimum temperatures with the precipitation.Both peaks were observed for τ = 12 months.The cross-correlation between relative humidity and precipitation (Figure 4c) exhibited peaks at τ = 11 months for both stations.A higher correlation for Sylhet was computed, with XCF close to 0.8.However, Rangpur also showed a good correlation, with XCF higher than 0.5.For the cross-correlation between wind speed and precipitation, peaks were instead observed for a τ = 14 months (Figure 4d), with a lower correlation in comparison to temperature and humidity, with XCF = 0.4 for Rangpur and XCF = 0.3 for Sylhet.Cross-correlation between cloud coverage and precipitation (Figure 4e) showed high peaks at τ = 12 months, with XCF close to 0.8 for both stations.
The cross-correlation between bright sunshine and precipitation (Figure 4f) showed an opposite trend in comparison with the other exogenous input, with XCF positive peak equal to 0.4 for Rangpur and 0.6 for Sylhet at τ = 6 months.However, for τ = 12 months a greater XCF negative peak, in absolute value, was computed, equal to −0.6 for Rangpur and −0.7 for Sylhet.The strong negative correlation is closely linked to the tropical monsoon climate of the region.During the monsoon season, heavy rainfalls are followed by a reduced number of hours of sunshine per day, up to a minimum monthly average value of 2 h a day in July and August, while, during the winter and pre-monsoon seasons, sunny days with low rainfalls prevail, with a maximum monthly average value of 9 h a day in December and January.
The auto-correlation function (ACF), which was also computed for both stations which is expressed as:  The auto-correlation function (ACF), which was also computed for both stations which is expressed as: Also in this case, similar patterns were found between the two stations, with a similar positive peak ACF close to 0.8 for a delay τ = 12 months.This strong autocorrelation can be related to the seasonal nature of the precipitation.ACF results were in agreement with Chowdhury et al. (2019) [52], which also investigate the monthly precipitations for different stations located in the Sreemangal sub-district of the Sylhet Division.
Overall, based on the XCF and ACF analyses, both delays for exogenous inputs and targets were set to 12 months.

Rangpur Station
Predictions obtained for the Rangpur station are discussed in this section.The evaluation metrics computed for the training and testing stages are reported in Table 3.Also in this case, similar patterns were found between the two stations, with a similar positive peak ACF close to 0.8 for a delay τ = 12 months.This strong autocorrelation can be related to the seasonal nature of the precipitation.ACF results were in agreement with Chowdhury et al. (2019) [52], which also investigate the monthly precipitations for different stations located in the Sreemangal sub-district of the Sylhet Division.
Overall, based on the XCF and ACF analyses, both delays for exogenous inputs and targets were set to 12 months.

Rangpur Station
Predictions obtained for the Rangpur station are discussed in this section.The evaluation metrics computed for the training and testing stages are reported in Table 3.

Sylhet Station
This section shows the precipitation forecasts for the Sylhet station, with the performances for both training and testing stages reported in Table 4.For the testing stage, the best predictions were achieved also with the hybrid algorithm M5P-SVR and Model A, with no relevant difference as the lag time increases (t a = 1 month-R 2 = 0.87, MAE = 62 mm, RMSE = 88 mm, MAE = 38.09%, Figure 6e,f; t a = 3 months-R 2 = 0.87, MAE = 63 mm, RMSE = 89 mm, MAE = 38.45%).It should be noted that, passing from the training to the testing stage, only a slight performance reduction was observed.The difference in terms of performances between Model B and Model C, observed for the training stage in particular for SVR, has been observed also for the hybrid model M5P-SVR for the testing stage (t a = 1 month, Model B-R 2 = 0.82, MAE = 73 mm, RMSE = 91 mm, MAE = 43.15%;t a = 1 month, Model C-R 2 = 0.86, MAE = 65 mm, RMSE = 89 mm, MAE = 39.65%).However, the worst predictions were achieved with Model E, for both lag times (t a = 1 month, R 2 = 0.79, MAE = 79 mm, RMSE = 94 mm, MAE = 46.56%, Figure 6g,h; t a = 3 months, R 2 = 0.78, MAE = 83 mm, RMSE = 96 mm, MAE = 48.94%).

Sylhet Station
This section shows the precipitation forecasts for the Sylhet station, with the performances for both training and testing stages reported in Table 4.
For the training stage, the best performances were observed for the hybrid model M5P-SVR with Model A, for both lag times (t a = 1 month-R 2 = 0.94, MAE = 55 mm, RMSE = 76 mm, MAE = 18.64%, Figure 7a,b; t a = 3 months-R 2 = 0.94, MAE = 56 mm, RMSE = 78 mm, MAE = 19.12%).As the number of exogenous inputs reduces, a performance decrease was observed.In particular, Models B and C exhibited performances similar to each other and lower to Model A, for both lag times.A further slight performance decrease occurs passing to Model D (t a = 1 month-R 2 = 0.91, MAE = 64 mm, RMSE = 91 mm, MAE = 22.38%; t a = 3 months-R 2 = 0.90, MAE = 69 mm, RMSE = 95 mm, MAE = 24.35%).However, a marked performance decrease occurs passing from Model D to Model E, with the latter that considered only the relative humidity as exogenous input (t a = 1 month-R 2 = 0.88, MAE = 82 mm, RMSE = 112 mm, MAE = 28.28%, Figure 7c,d; t a = 3 months-R 2 = 0. 88, MAE = 85 mm, RMSE = 114 mm, MAE = 28.99%).(g) (h) Both M5P and SVR algorithms led to lower performances in comparison with the hybrid model.However, M5P outperformed SVR for both lag times and for all the five models, with higher R 2 values and lower values of MAE, RMSE, and MAE.
For the testing stage, hybrid model M5P-SVR was confirmed as the best algorithm, for all models and lag times.In particular, Model A led to the best predictions with a slight performance decrease passing from ta = 1 month (R 2   7g,h; ta = 3 months, R 2 = 0.83, MAE = 89 mm, RMSE = 122 mm, MAE = 32.87%).However, the hybrid model M5P-SVR with Model E, even including the only relative humidity as exogenous input, was still able to properly detect the precipitations trend.
As for the Rangpur station, SVR led to predictions in line with M5P algorithm.However, both individual M5P and SVR were outperformed by the hybrid model M5P-SVR.It should be noted that also the individual M5P and SVR algorithms did not show particularly marked reductions in performances as the number of exogenous inputs decreased, with Model E that exhibited quite good performances for both algorithms and lag times (M5P-ta = 1 month, R 2 = 0.85, MAE = 102 mm, RMSE = 133 mm, MAE = 35.96%;SVR-ta = 1 month, R 2 = 0.82, MAE = 92 mm, RMSE = 125 mm, MAE = 34.08%).

Performance Comparisons of the Models
Figure 8 shows the box plots of absolute errors, providing further analysis of the precipitation predictions performed with the different algorithms and models for the two stations of Rangpur and Sylhet.Absolute errors are expressed as the differences between measured and predicted precipitations.Therefore, a positive error denotes an underestimation of the measured value while a negative error denotes an overestimation of the same.Both M5P and SVR algorithms led to lower performances in comparison with the hybrid model.However, M5P outperformed SVR for both lag times and for all the five models, with higher R 2 values and lower values of MAE, RMSE, and MAE.
For the testing stage, hybrid model M5P-SVR was confirmed as the best algorithm, for all models and lag times.In particular, Model A led to the best predictions with a slight performance decrease passing from t a = 1 month (R 2 = 0.92, MAE = 68 mm, RMSE = 91 mm, MAE = 25.26%, Figure 7e,f) to t a = 3 months (R 2 = 0.91, MAE = 69 mm, RMSE = 93 mm, MAE = 25.57%).As for the training stage, performances of Model B and Model C were in line and lower than those computed for Model A. Model D (t a = 1 month-R 2 = 0.88, MAE = 77 mm, RMSE = 107 mm, MAE = 29.34%;t a = 3 months-R 2 = 0.86, MAE = 79 mm, RMSE = 109 mm, MAE = 30.07%)was slightly outperformed by both models B and C, proving its reliability despite it included only relative humidity and the wind speed as exogenous inputs.A lower prediction ability was observed for Model E (t a = 1 month, R 2 = 0.85, MAE = 86 mm, RMSE = 119 mm, MAE = 31.84%,Figure 7g,h; t a = 3 months, R 2 = 0.83, MAE = 89 mm, RMSE = 122 mm, MAE = 32.87%).However, the hybrid model M5P-SVR with Model E, even including the only relative humidity as exogenous input, was still able to properly detect the precipitations trend.
As for the Rangpur station, SVR led to predictions in line with M5P algorithm.However, both individual M5P and SVR were outperformed by the hybrid model M5P-SVR.It should be noted that also the individual M5P and SVR algorithms did not show particularly marked reductions in performances as the number of exogenous inputs decreased, with Model E that exhibited quite good performances for both algorithms and lag times (M5P-t a = 1 month, R 2 = 0.85, MAE = 102 mm, RMSE = 133 mm, MAE = 35.96%;SVR-t a = 1 month, R 2 = 0.82, MAE = 92 mm, RMSE = 125 mm, MAE = 34.08%).

Performance Comparisons of the Models
Figure 8 shows the box plots of absolute errors, providing further analysis of the precipitation predictions performed with the different algorithms and models for the two stations of Rangpur and Sylhet.Absolute errors are expressed as the differences between measured and predicted precipitations.Therefore, a positive error denotes an underestimation of the measured value while a negative error denotes an overestimation of the same.For Rangpur station, M5P box plot (Figure 8a) showed notches, which reflect the 95% confidence interval of the median, between −84 mm for Model D-ta = 3 days and 55 mm for Model E-ta = 3 days, while outliers (indicated with the red crosses) with positive and negative absolute errors between −291 mm and 370 mm.Overall, Model E led to the higher underestimation of heavy rainfalls.A narrow box plot was computed for Model A with a notch between −70 mm and 9 mm, a median equal to −14 mm and the outliers between −240 mm and 190 mm.SVR box plot (Figure 8b) showed similar results, with the exception of the Model E for which the SVR algorithm led to a narrower box plot in comparison with the M5P one.The narrowest box plots and the lowest outliers where, however, computed for the hybrid model M5P-SVR (Figure 8c) with a notch for Model A between −54 mm and 14 mm and a median equal to −11 mm.
For Sylhet station, M5P box plot (Figure 8d) exhibited more asymmetrical notches in comparison with Rangpur with a notch for Model A between −85 mm and 6 mm and a median equal to −18 mm.SVR (Figure 8e) and M5P-SVR (Figure 8f) notches were instead more symmetrical, following an almost normal distribution of absolute errors.In Particular, for M5P-SVR and Model A, the notch was between −40 mm and 60 mm with the median equal to 10 mm.It should be noted that higher positive outliers where instead computed for both SVR and M5P-SVR, up to values close to 600 mm, while M5P led to outliers lower than 400 mm.On the other hand, the negative outliers reached values close to −320 mm and −300 mm for SVR and M5P-SVR, respectively, which were lower (in absolute value) than those reached with the M5P algorithm, close to −430 mm.For Rangpur station, M5P box plot (Figure 8a) showed notches, which reflect the 95% confidence interval of the median, between −84 mm for Model D-t a = 3 days and 55 mm for Model E-t a = 3 days, while outliers (indicated with the red crosses) with positive and negative absolute errors between −291 mm and 370 mm.Overall, Model E led to the higher underestimation of heavy rainfalls.A narrow box plot was computed for Model A with a notch between −70 mm and 9 mm, a median equal to −14 mm and the outliers between −240 mm and 190 mm.SVR box plot (Figure 8b) showed similar results, with the exception of the Model E for which the SVR algorithm led to a narrower box plot in comparison with the M5P one.The narrowest box plots and the lowest outliers where, however, computed for the hybrid model M5P-SVR (Figure 8c) with a notch for Model A between −54 mm and 14 mm and a median equal to −11 mm.
For Sylhet station, M5P box plot (Figure 8d) exhibited more asymmetrical notches in comparison with Rangpur with a notch for Model A between −85 mm and 6 mm and a median equal to −18 mm.SVR (Figure 8e) and M5P-SVR (Figure 8f) notches were instead more symmetrical, following an almost normal distribution of absolute errors.In Particular, for M5P-SVR and Model A, the notch was between −40 mm and 60 mm with the median equal to 10 mm.It should be noted that higher positive outliers where instead computed for both SVR and M5P-SVR, up to values close to 600 mm, while M5P led to outliers lower than 400 mm.On the other hand, the negative outliers reached values close to −320 mm and −300 mm for SVR and M5P-SVR, respectively, which were lower (in absolute value) than those reached with the M5P algorithm, close to −430 mm. of the hybrid M5P-SVR to provide accurate precipitation predictions in areas with different climates (e.g., semi-arid and Mediterranean regions) and for higher lag times.
In order to improve the model predictions, further developments may concern both the application of data preprocessing algorithms (e.g., principal component analysis or wavelet transform) or of different hybridization algorithm, with the combination of the different machine learning algorithms using a further method such as stacking.

Conclusions
This study developed and compared different precipitation prediction models based on two machine learning algorithms and six meteorological exogenous inputs.Precipitation time series from two stations located in the north region of Bangladesh, Rangpur and Sylhet, were used for the training and testing of the two individual ML algorithm, M5P and SVR, and of the hybrid one M5P-SVR.The particle swarm optimization (PSO) algorithm was used for an optimization of the SVR parameters.In order to evaluate the performance of the three ML algorithms, four evaluation metrics have been computed: coefficient of determination (R 2 ), mean absolute error (MAE), root mean square error (RMSE), and relative absolute error (RAE).Box plots have been also employed to compare the predictions made with the different combinations of algorithms, exogenous inputs, and lag times.The hybrid model M5P-SVR outperformed both the individual M5P and SVR algorithms.
The M5P-SVR model, in particular with reference to Model A, which took into account all the available exogenous inputs, provided good precipitation predictions for both stations with no marked performance decrease as the lag times increased, up to a lag time of 3 months.
This research was limited to the precipitation modeling on two divisions in the northern Bangladesh.In the future, further studies should be performed in other areas characterized both by a tropical monsoon climate and by climates with different features, e.g., Mediterranean and semi-arid.

Figure 1 .
Figure 1.Location of the stations: with a representation of the Bangladesh divisions (a); with the elevation in meter above the sea level (b).

Figure 1 .
Figure 1.Location of the stations: with a representation of the Bangladesh divisions (a); with the elevation in meter above the sea level (b).

Figure 2 .
Figure 2. Time series of precipitation for the stations of: Rangpur (a) and Sylhet (b).

Figure 2 .
Figure 2. Time series of precipitation for the stations of: Rangpur (a) and Sylhet (b).

Figure 3 .
Figure 3. Bar plot of the average monthly precipitation.

Figure 5 .
Figure 5. Auto-correlation function computed on the precipitation time series.

Figure 6 .
Figure 6.Rangpur station-M5P-SVR, ta = 1 month-Measured vs. predicted precipitation (on the left): Training stage-Model A (a), Training stage-Model E (c), Testing stage-Model A (e), Testing stage-Model E (g); time series with the measured and predicted precipitation (on the right): Training stage-Model A (b), Training stage-Model E (d), Testing stage-Model A (f), Testing stage-Model E (h).

Figure 6 .
Figure 6.Rangpur station-M5P-SVR, t a = 1 month-Measured vs. predicted precipitation (on the left): Training stage-Model A (a), Training stage-Model E (c), Testing stage-Model A (e), Testing stage-Model E (g); time series with the measured and predicted precipitation (on the right): Training stage-Model A (b), Training stage-Model E (d), Testing stage-Model A (f), Testing stage-Model E (h).

Figure 7 .
Figure 7. Sylhet station-M5P-SVR, ta = 1 month-Measured vs. predicted precipitation (on the left): Training stage-Model A (a), Training stage-Model E (c), Testing stage-Model A (e), Testing stage-Model E (g); time series with the measured and predicted precipitation (on the right): Training stage-Model A (b), Training stage-Model E (d), Testing stage-Model A (f), Testing stage-Model E (h).

Figure 7 .
Figure 7. Sylhet station-M5P-SVR, t a = 1 month-Measured vs. predicted precipitation (on the left): Training stage-Model A (a), Training stage-Model E (c), Testing stage-Model A (e), Testing stage-Model E (g); time series with the measured and predicted precipitation (on the right): Training stage-Model A (b), Training stage-Model E (d), Testing stage-Model A (f), Testing stage-Model E (h).

Table 1 .
Models developed based on the different exogenous inputs.
Table 2 provides the statistics of the monthly data for Bar plot of the average monthly precipitation.

Table 2 .
Statistics for Rangpur and Sylhet stations.

Table 3 .
Evaluation metrics computed for the Rangpur station.

Table 4 .
Evaluation metrics computed for the Sylhet station.

Table 4 .
Evaluation metrics computed for the Sylhet station.