Bias Correction Method Based on Artiﬁcial Neural Networks for Quantitative Precipitation Forecast †

: The nowcasting and very short-term prediction system (SisPI, for its acronym in Spanish) is among the tools used by the National Meteorological Service of Cuba for the quantitative precipitation forecast (QPF). SisPI uses the WRF model as the core of its forecasts and one of the challenges to overcome is to improve the precision of the QPF. With this purpose, in this work we present the results of the application of a bias correction method based on artiﬁcial neural networks. The method is applied to the highest-resolution domain of SisPI (3 km), and the correction is made from the precipitation estimation of the GPM satellite product. Results shows higher correlation with the artiﬁcial neural network model in relation to the values predicted by SisPI (0.76 and 0.34, respectively). The mean square error when applying the artiﬁcial neural network model is 3.69, improving the performance of SisPI by 6.78. In general, the bias correction has a good ability to correct the precipitation forecast provided by SisPI, being less evident in cases where precipitation is reported and SisPI is not capable of forecasting it. In cases of overestimation by SisPI (which happens quite frequently), the correction achieves the best results.


Introduction
Artificial neural networks are a mathematical technique inspired by biological neural networks [1,2]. At present, the development of this tool continues, being widely used in different branches of science, including atmospheric sciences and related areas [3][4][5]. Within meteorology there are many applications, highlighting its use in the detection of cloud patterns, in weather and climate forecasting, and as a method for correcting forecast errors generated by numerical weather forecast models (NWMs), among others [5].
This research is focused on this last application of ANNs. The work presented is one more contribution that explores the use of artificial neural networks to increase the precision of the numerical forecast, in particular, the quantitative precipitation forecast (QPF). For this purpose, a multi-layer perceptron is used as the network model. It is trained using a configuration obtained in previous studies [5]. As data for the training, the estimation of the satellite precipitation of the GPM product and the precipitation forecast with 3 km of spatial resolution of the nowcasting and very short-term forecast system (SisPI) [6][7][8] are used.
In the section Model and Configuration of the ANN, the information regarding the ANN model and its configuration is presented. The details of the data used, as well as the SisPI description can be found in the section Data Used, while the results achieved for two case studies are discussed in the section Results and Discussion, followed by preliminary conclusions.

Model and Configuration of the ANN
As was mentioned before, a multi-layer perceptron model (MLP) was used in order to extend a previous result in which the observations of meteorological surface stations were used for training. The MLP describes an artificial neural network that uses the output to establish a relationship with the input data. In our case, the input data were the QPF values directly taken from SisPI, and as output the precipitation estimation of GPM was used. The MLP was configured with 64 neurons in the hidden layer and a sigmoid function activation for the hidden and the output layers. The machine learning platform Tensorflow as well as the Keras library [9] were used for implementing the MLP. The training, validation, and verification steps were development using the available data in the period of 2018-2019.

Data Used
As observation data for training, the precipitation data from the Global Precipitation Mission was used. In particular, we used GPM_3IMERGHH, which is the GPM IMERG Final Precipitation L3 product (version 06) with a temporal resolution of 30 min and a spatial resolution of 0.1 • × 0.1 • . This is a multi-satellite precipitation product with global coverage, and it is a Level 3 NASA product that unifies and inter-calibrates data of and about some constellations and types of satellites from several space agencies [10].
The correction using the MLP was applied to the forecast of SisPI over the highest spatial resolution domain (3 km, see Figure 1). SisPI uses the Weather Research and Forecast (WRF) V3.8.1 model [11] as the numerical forecast core initialized with the 0.5 degrees of spatial resolution output of the Global Forecast System (GFS). The main details of the physical configuration are shown in Table 1. This system generates 24 forecast hours, updated four times during the day at 0000, 0600, 1200, and 1800 UTC; in this investigation the 0000 UTC initialized forecast is used. Table 1. Physical configuration of the WRF used in SisPI (see [11] for the details of each parameterization).

Parameters Settings
Spatial resolution Three nested domains of 27, 9, and 3 km of resolution  Once the training step was complete, the validation and the verification steps were carried out using the observations of the surface meteorological stations. Figure 2 shows the location of the 67 surface weather stations that were included in this study. A conventional verification process was applied in order to analyze the ability of the MLP to correct the bias. The following statistical metrics were computed: mean absolute error (mae), the mean square error (mse), and Pearson's correlation coefficient ( ); the cell-point verification approach was applied [12].

Results and Discussion
In this section we discuss some preliminary results that show the improvement in the QPF when an ANN model is used as a bias correction method. The analysis is done by taking two study cases: 5 and 10 July of 2019. Rainfall between 4 and 10 July, 2019, was encouraged by the evening instability as a result of the diurnal warming, the sufficient moisture content in the lower troposphere, and conditions in the mesoscale, together with the transit of active tropical waves through the seas at the south of Cuba (http://www.insmet.cu/asp/genesis.asp?TB0=PLANTILLAS&TB1=MES&TB2=/Mes/ JULIO2019.HTM&TB3=2019 accessed on 12 January 2020). The reason for selecting these cases is because they are one of the situations in which SisPI frequently fails [6,7]. Once the training step was complete, the validation and the verification steps were carried out using the observations of the surface meteorological stations. Figure 2 shows the location of the 67 surface weather stations that were included in this study. A conventional verification process was applied in order to analyze the ability of the MLP to correct the bias. The following statistical metrics were computed: mean absolute error (mae), the mean square error (mse), and Pearson's correlation coefficient (p corr ); the cell-point verification approach was applied [12]. Once the training step was complete, the validation and the verification steps were carried out using the observations of the surface meteorological stations. Figure 2 shows the location of the 67 surface weather stations that were included in this study. A conventional verification process was applied in order to analyze the ability of the MLP to correct the bias. The following statistical metrics were computed: mean absolute error (mae), the mean square error (mse), and Pearson's correlation coefficient ( ); the cell-point verification approach was applied [12].

Results and Discussion
In this section we discuss some preliminary results that show the improvement in the QPF when an ANN model is used as a bias correction method. The analysis is done by taking two study cases: 5 and 10 July of 2019. Rainfall between 4 and 10 July, 2019, was encouraged by the evening instability as a result of the diurnal warming, the sufficient moisture content in the lower troposphere, and conditions in the mesoscale, together with the transit of active tropical waves through the seas at the south of Cuba (http://www.insmet.cu/asp/genesis.asp?TB0=PLANTILLAS&TB1=MES&TB2=/Mes/ JULIO2019.HTM&TB3=2019 accessed on 12 January 2020). The reason for selecting these cases is because they are one of the situations in which SisPI frequently fails [6,7].

Results and Discussion
In this section we discuss some preliminary results that show the improvement in the QPF when an ANN model is used as a bias correction method. The analysis is done by taking two study cases: 5 and 10 July of 2019. Rainfall between 4 and 10 July, 2019, was encouraged by the evening instability as a result of the diurnal warming, the sufficient moisture content in the lower troposphere, and conditions in the mesoscale, together with the transit of active tropical waves through the seas at the south of Cuba (http://www.insmet.cu/asp/ genesis.asp?TB0=PLANTILLAS&TB1=MES&TB2=/Mes/JULIO2019.HTM&TB3=2019 accessed on 12 January 2020). The reason for selecting these cases is because they are one of the situations in which SisPI frequently fails [6,7].

Case Study 5 July 2019
The mae, mse, and p corr for all forecast times are shown in Figures 3-5, respectively. Notice that for the first 7 forecast hours the correction through the MLP model is greater with an error reduction of 2.0 mm/3 h for mae and 60 mm/3 h for mse. In the evening hours, although there is also a decrease in the error values, this decrease is more discreet. This makes sense when one takes into account that, when using SisPI runs initialized at 0000 UTC, the first 7 h of the WRF is in the spin-up period which can extend to the first 12 h of forecasting. Therefore, for this case study, the MLP model constitutes a tool not only to reduce the error, but also to enhance the quality of the forecast in the WRF spin-up period.

Case Study 5 July 2019
The mae, mse, and for all forecast times are shown in Figure 3, Figure 4 and Figure 5, respectively. Notice that for the first 7 forecast hours the correction through the MLP model is greater with an error reduction of 2.0 mm/3 h for mae and 60 mm/3 h for mse. In the evening hours, although there is also a decrease in the error values, this decrease is more discreet. This makes sense when one takes into account that, when using SisPI runs initialized at 0000 UTC, the first 7 h of the WRF is in the spin-up period which can extend to the first 12 h of forecasting. Therefore, for this case study, the MLP model constitutes a tool not only to reduce the error, but also to enhance the quality of the forecast in the WRF spin-up period.   The mae, mse, and for all forecast times are shown in Figure 3, Figure 4 and Figure 5, respectively. Notice that for the first 7 forecast hours the correction through the MLP model is greater with an error reduction of 2.0 mm/3 h for mae and 60 mm/3 h for mse. In the evening hours, although there is also a decrease in the error values, this decrease is more discreet. This makes sense when one takes into account that, when using SisPI runs initialized at 0000 UTC, the first 7 h of the WRF is in the spin-up period which can extend to the first 12 h of forecasting. Therefore, for this case study, the MLP model constitutes a tool not only to reduce the error, but also to enhance the quality of the forecast in the WRF spin-up period.   The behavior of the correlation curve indicates that the SisPI forecast behaves contrary to what was recorded in the observations. The negative values of suggest that when precipitation was recorded in the surface meteorological station, the SisPI did not predict rain, or it could be the opposite. However, a more rigorous and in-depth study must be done to be able to affirm this behavior. When it comes to very high-resolution forecasts, errors have a double penalty due to position error. Therefore, the fact that SisPI does not predict precipitation at a given point could be due to a position error of the precipitation area. In the case of the correction with the MPL, however, the previous situation is corrected, and the correlation reaches values of up to 0.6, being positive for all forecast periods. In the spin-up hours, is zero since the mean is zero as well. Figure 6 shows the SisPI forecast, the corrected SisPI forecast, and the GPM precipitation for 1800 UTC. It can be seen that the SisPI forecast underestimates the precipitation over the province of Pinar del Río not only quantitatively but also from a spatial point of view. While the GPM estimate presents values of more than 20 mm/3 h, SisPI barely predicts 11 mm/3 h. On the province of Matanzas the opposite is observed; SisPI suggests the occurrence of rain practically in the entire province, being a false alarm. The correction fails to improve SisPI's prognosis. Over the province of Pinar del Río, the MLP achieves a better spatial representation of precipitation; however, no improvement is obtained in terms of the quantitative forecast of precipitation. Furthermore, the MLP fails to eliminate false alarms. The behavior of the correlation curve indicates that the SisPI forecast behaves contrary to what was recorded in the observations. The negative values of p corr suggest that when precipitation was recorded in the surface meteorological station, the SisPI did not predict rain, or it could be the opposite. However, a more rigorous and in-depth study must be done to be able to affirm this behavior. When it comes to very high-resolution forecasts, errors have a double penalty due to position error. Therefore, the fact that SisPI does not predict precipitation at a given point could be due to a position error of the precipitation area. In the case of the correction with the MPL, however, the previous situation is corrected, and the correlation reaches values of up to 0.6, being positive for all forecast periods. In the spin-up hours, p corr is zero since the mean is zero as well. Figure 6 shows the SisPI forecast, the corrected SisPI forecast, and the GPM precipitation for 1800 UTC. It can be seen that the SisPI forecast underestimates the precipitation over the province of Pinar del Río not only quantitatively but also from a spatial point of view. While the GPM estimate presents values of more than 20 mm/3 h, SisPI barely predicts 11 mm/3 h. On the province of Matanzas the opposite is observed; SisPI suggests the occurrence of rain practically in the entire province, being a false alarm. The correction fails to improve SisPI's prognosis. Over the province of Pinar del Río, the MLP achieves a better spatial representation of precipitation; however, no improvement is obtained in terms of the quantitative forecast of precipitation. Furthermore, the MLP fails to eliminate false alarms. The behavior of the correlation curve indicates that the SisPI forecast behaves contrary to what was recorded in the observations. The negative values of suggest that when precipitation was recorded in the surface meteorological station, the SisPI did not predict rain, or it could be the opposite. However, a more rigorous and in-depth study must be done to be able to affirm this behavior. When it comes to very high-resolution forecasts, errors have a double penalty due to position error. Therefore, the fact that SisPI does not predict precipitation at a given point could be due to a position error of the precipitation area. In the case of the correction with the MPL, however, the previous situation is corrected, and the correlation reaches values of up to 0.6, being positive for all forecast periods. In the spin-up hours, is zero since the mean is zero as well. Figure 6 shows the SisPI forecast, the corrected SisPI forecast, and the GPM precipitation for 1800 UTC. It can be seen that the SisPI forecast underestimates the precipitation over the province of Pinar del Río not only quantitatively but also from a spatial point of view. While the GPM estimate presents values of more than 20 mm/3 h, SisPI barely predicts 11 mm/3 h. On the province of Matanzas the opposite is observed; SisPI suggests the occurrence of rain practically in the entire province, being a false alarm. The correction fails to improve SisPI's prognosis. Over the province of Pinar del Río, the MLP achieves a better spatial representation of precipitation; however, no improvement is obtained in terms of the quantitative forecast of precipitation. Furthermore, the MLP fails to eliminate false alarms.

Case Study 10 July 2019
A similar behavior of mae, mse, and is observed for this case study (Figure 7, Figure 8 and Figure 9, respectively). As before, during the spin-up time, the MLP correction shows its ability to reduce the WRF error.
According to the GPM estimate ( Figure 10 bottom panel), at 2200 UTC, non-significant values between 6 mm/3 h and 12 mm/3 h occurred over the provinces of Pinar del Río, Artemisa, Havana, Camaguey, and Holguín. SisPI, however, predicts more than 30 mm/3 h in the northern region of Matanzas, which is considered a false alarm. In the rest of the country there is also an overestimation of precipitation values. In this forecast period, the application of the MLP failed to correct any of the errors indicated above and increased the spatial overestimation of precipitation. The above indicates that there is still much work to be done, and that other ANN models with more appropriate characteristics for this type of application should be explored, such as convolutional networks, for example.

Case Study 10 July 2019
A similar behavior of mae, mse, and p corr is observed for this case study (Figures 7-9, respectively). As before, during the spin-up time, the MLP correction shows its ability to reduce the WRF error.
According to the GPM estimate ( Figure 10 bottom panel), at 2200 UTC, non-significant values between 6 mm/3 h and 12 mm/3 h occurred over the provinces of Pinar del Río, Artemisa, Havana, Camaguey, and Holguín. SisPI, however, predicts more than 30 mm/3 h in the northern region of Matanzas, which is considered a false alarm. In the rest of the country there is also an overestimation of precipitation values. In this forecast period, the application of the MLP failed to correct any of the errors indicated above and increased the spatial overestimation of precipitation. The above indicates that there is still much work to be done, and that other ANN models with more appropriate characteristics for this type of application should be explored, such as convolutional networks, for example.

Case Study 10 July 2019
A similar behavior of mae, mse, and is observed for this case study (Figure 7, Figure 8 and Figure 9, respectively). As before, during the spin-up time, the MLP correction shows its ability to reduce the WRF error.
According to the GPM estimate ( Figure 10 bottom panel), at 2200 UTC, non-significant values between 6 mm/3 h and 12 mm/3 h occurred over the provinces of Pinar del Río, Artemisa, Havana, Camaguey, and Holguín. SisPI, however, predicts more than 30 mm/3 h in the northern region of Matanzas, which is considered a false alarm. In the rest of the country there is also an overestimation of precipitation values. In this forecast period, the application of the MLP failed to correct any of the errors indicated above and increased the spatial overestimation of precipitation. The above indicates that there is still much work to be done, and that other ANN models with more appropriate characteristics for this type of application should be explored, such as convolutional networks, for example.

Conclusions
The research presented consists of an application of a multi-layer perceptron artificial neural network for the bias correction of QPF. Some positive results were obtained in terms of the reduction of the metrics mae and mse, the correction being better in the spinup period of the WRF. However, the MLP model was not able to overcome quantitative and positional errors when looking at a given forecast timeframe. It is recommended to extend the experiments to more appropriate ANN models for correcting the quantitative forecast of precipitation, taking into account its value and also the spatial location.

Conclusions
The research presented consists of an application of a multi-layer perceptron artificial neural network for the bias correction of QPF. Some positive results were obtained in terms of the reduction of the metrics mae and mse, the correction being better in the spin-up period of the WRF. However, the MLP model was not able to overcome quantitative and positional errors when looking at a given forecast timeframe. It is recommended to extend the experiments to more appropriate ANN models for correcting the quantitative forecast of precipitation, taking into account its value and also the spatial location.