Deep Learning-Based Univariate Prediction of Daily Rainfall: Application to a Flood-Prone, Data-Deﬁcient Country

: There are several attempts to model rainfall time series which have been explored by members of the hydrological research communities. Rainfall, being one of the deﬁning factors for a ﬂooding event, is rarely modeled singularly in deep learning, as it is usually performed in multivariate analysis. This study will attempt to explore a time series modeling method in four subcatchments located in Samar, Philippines. In this study, the rainfall time series was treated as a signal and was reconstructed into a combination of a ‘smoothened’ or ‘denoised’ signal, and a ‘detailed’ or noise signal. The discrete wavelet transform (DWT) method was used as a reconstruction technique, in combination with the univariate long short-term memory (LSTM) network method. The combination of the two methods showed consistently high values of performance indicators, such as Nash–Sutcliffe efﬁciency (NSE), correlation coefﬁcient (CC), Kling–Gupta efﬁciency (KGE), index of agreement (IA), and Legates–McCabe index (LMI), with mean average percentage error (MAPE) values at almost zero, and consistently low values for both residual mean square error (RMSE) and RMSE-observations standard deviation ratio (RSR). The authors believe that the proposed method can give efﬁcient, time-bound results to ﬂood-prone countries such as the Philippines, where hydrological data are deﬁcient.


Introduction
Mathematical models are key to understanding the dynamics of natural systems.Hence, the use of multivariate analysis [1], correlation and principal component analysis (PCA), as well as stream flow reconstruction [2] were used by researchers for hydrological models.The challenging area of disaster and water resources management drives scientists to arrive at more accurate and, most of the time, consolidated approaches.
The LSTM networks of [3], a type of DL, are developed for prolonged retention of information in order to train more successfully across datasets composed of consecutive samples.As a consequence of a series of activations and operations, LSTM neurons produce two distinct values as opposed to an activation function, which produces one output and transmits it to immediate neurons in the same layer, as well as those in the layer above.Although both outputs are kept in the LSTM layer, one is shifted to the next layer to maintain track of the lessons from the previous part of the sequence.
Necesito et al. (2021) [4] subsequently used DWT and univariate LSTM to visualize dengue surges.The authors of [5], on the other hand, used DWT and SVM to predict monthly rainfall in China.Choi et al. (2019) [6] subsequently used CNN and DWT to forecast rainfall in Malaysia.DWT has been a well-tested decomposition technique and was proven effective, as emphasized by [4].
LSTM is an established method of modeling in hydrological research.In fact, Chong et al. (2020) [7] estimated daily runoff using an LSTM model that used meteorological data.The results outperformed those of the Sacramento soil moisture accounting model (SAC-SMA) snow-17, a well-known physical model.Kratzert et al. (2019) [8] showed that the LSTM model outperforms hydrological models, such as the calibrated SAC-SMA and the national water model, when applied to ungauged watersheds.They achieved this by utilizing k-fold cross-validation to apply the LSTM model to 531 U.S. watersheds.
Many studies have proven how LSTM surpasses other hydrological models.Using climatic data and the current day's stream flow, Damavandi et al. (2019) [9] developed an LSTM model on a Texas watershed to forecast the daily stream flow the next day.Their findings demonstrated that LSTM outperforms the CaMa-Flood model.Zhang et al. (2018) [10] used an LSTM model to forecast monthly reservoir inflow and outflow on an hourly, daily, and monthly basis, and outperformed backpropagation neural networks and support vector machines in their analysis.Kumar et al. (2019) [11] employed RNN and LSTM models, and it was shown that LSTM gave better results.Qin et al. (2019) [12] projected stream flow using LSTM and compared it to the AR model.An LSTM model was employed by [13] to calculate the reservoir's daily overflow.The use of ENN in a real-time 3 h flood forecast by [14] was equally successful.
Other studies have used LSTM with spatiotemporal attention (STA) for an interpretable AI flood forecasting [15].Song et al. (2019) [16] used T multivariate single-step LSTM networks, which receives inputs related to the geographical and temporal dynamics of actual and modeled precipitation and runoff.However, some research concentrated on the pretreatment of the data, such as the breakdown of runoff and rainfall before using deep learning models.He at al. (2019) [17] developed a DNN model for prediction of daily stream flow, with stream flow series inputs broken down into a number of intrinsic mode functions (IMF) using variational mode decomposition (VMD).
Rainfall modeling has been a subject of different studies in the field of hydrology.In fact, Scopus (https://www.scopus.com,accessed on 6 December 2022) recorded a total of 148,859 journal articles that are related to rainfall models from 1969 to 2022.These studies used different methods in modeling rainfall, which vary from conventional mathematics to the 20th century machine learning techniques.The table below shows the different approaches in rainfall modeling used by other scientists.
Table 1 shows how rainfall has been modeled by several scientists from the field of hydrology.The variables used vary from single rainfall to multiple hydrological variables such as temperature, wind speed, and humidity, among others.Performance indicators also vary between each study, with RMSE or MSE being the most common.The techniques also showed that some studies tried to model rainfall through conventional mathematical techniques, such as regression and ARIMA, to various machine learning approaches.There are studies which have used discrete wavelet transform (DWT) in combination with neural networks.However, unlike in this study, the whole raw data were preprocessed by DWT and the results were all transmitted to the neural network, including the base signal [23,31,34].Interestingly, Choi et al. (2019) [32] modeled the rain damage and model residuals separately using several machine learning techniques, but without using DWT.On the other hand, Wu et al. (2021) [33] separated the noise from the raw data and used LSTM for the residuals and the remainder using ARIMA.
DWT is often applied in signal processing to transform signals into a frequency [36].DWT was used in surveillance or tracking of moving objects [37], and in disease classification [38].Shmueli (2013) [39] and Alimohamadi et al. (2020) [40] investigated the use of DWT to detect outbreaks using biosurveillance systems, while the latter author focused on pertussis aberrations.Essentially, DWT transforms the signal into a different frequency [36].Hence, DWT is simply a derivation of the theory of spectral decomposition, which states that any time series can be decayed into multiple statistically independent time series.
Time series data analysis for rainfall model is rarely performed singularly in deep learning, as it is usually performed in multivariate analysis.There are also several attempts to model rainfall which have been explored by members of the hydrological research communities.Some researchers used data-driven techniques such as regression methods [41][42][43], while others used physical-based models, in which the latter (physical-based models) was found to be less accurate than the former (data-driven techniques) [44,45].
Some studies have associated rainfall with several climatological factors.However, many studies have also used univariate modeling techniques in analyzing rainfall [30,46,47].In fact, Ray and Chattopadhyay (2021) [48] emphasized that the "steady state probabilities indicate that a month with high surface air temperature is most likely to be preceded by another warm month but less likely lead to a low surface air temperature in the subsequent month."Thus, according to Ray and Chattopadhyay (2021) [48], surface air temperature may have an effect on rainfall, but its impact on summer monsoon rainfall cannot be explained through a time-domain approach, but on a frequency-domain approach, hence, the authors used DWT in this study.
Several research studies have investigated the use of DWT in surveillance or tracking of moving objects [37], damage detection in building structures [49], and disease classification [38].Based on the authors' knowledge, only a handful of research investigated the use of DWT in series with univariate LSTM to model rainfall, and none have focused on the Philippines region.In addition, none have considered noise modeling of rainfall using univariate LSTM.

Discrete Wavelet Transform
The continuous wavelet transform (CWT) can be expressed as [50]: where ϕ(a, b) is the complex conjugate of the base signal, ϕ However, CWT's scale parameter's dyadic discretization, which is the DWT, has a linear connection to the shift parameter's step size.
which makes Equations ( 1) and (2) become Using the Mallat algorithm, Equations ( 4) and ( 5) are further simplified to where where d j,k are the high-frequency coefficients, a j,k are the low-frequency coefficients, g(n) and h(n) are the high-and low-pass filters, respectively, expressed as: where Denoising signals make use of orthogonal wavelets as a scaling function or basis for DWT.Daubechies wavelet four is a common and ideal scaling function for denoising [51,52].Figure 1 shows the DWT mechanism.
where d , are the high-frequency coefficients, a , are the low-frequency coefficients, g n and h n are the high-and low-pass filters, respectively, expressed as: where g n 0 Denoising signals make use of orthogonal wavelets as a scaling function or basis for DWT.Daubechies wavelet four is a common and ideal scaling function for denoising [51,52].Figure 1 shows the DWT mechanism.For DWT with Daubechies wavelet four as the scaling function, the following matrix applies [53]: where a are the low-frequency coefficients, y is the raw signal, s and d are the transformed signal, which can also be calculated as: For DWT with Daubechies wavelet four as the scaling function, the following matrix applies [53]: where a are the low-frequency coefficients, y is the raw signal, s and d are the transformed signal, which can also be calculated as: 1) [n] + d j−1 (1) [n + 1] (16) 2) [n] (17) For more details regarding Daubechies wavelet four and other types of wavelets, the authors suggest to refer to [54].
DWT can either be calculated manually or by using pywavelets in python.

Long Short-Term Memory Network
LSTM neural networks were first introduced by Hochreiter and Schmidhuber (1997) [3].As emphasized by Qashwai et al. (2021) [55], LSTM, just like other recurrent neural networks (RNN), is a blackbox method.It is a special type of RNN which is capable of identifying long-term dependencies.LSTMs are designed specifically to avoid the longterm dependence problem.They do not work hard to learn; instead, it comes naturally to them to retain information for a long time.All recurrent neural networks have the form of an array of repeating neural network modules.Although it too has a chain-like architecture, the repeating module of LSTMs is organized differently.Instead of just one, there are three layers in the neural network, and they interact significantly differently.
LSTM has three main layers: the input layer, the recurrent hidden layer, and the output layer.LSTM does not simply have memory blocks which memorize the temporal state, but it also has gating units which adapt and control the flow of information.LSTM has shown efficiency in capturing time series dependencies, most particularly in dealing with longer periods [56].One can imagine the LSTM gates as controls, and the LSTM network as a set or series of LSTM units with each unit having four main controls or gates.One is the forget control (f t ), which allows the network to either keep or forget the memory in transit.Next is the new memory control, which allows new memories to pass through and later on merge with the memories that passed through the forget control.The merging of these memories will happen with the help of the merged control.On the other hand, a set of new memories is being processed by another neural network.Eventually, these new memories will join the first two memories through the merged control.The last control is the output control, which checks how much memory should be produced as an output to the next LSTM unit (Figure 2).Where W represent weights, b is the bias,  are nonlinear activated functions, and Xt is the input.Rectified linear activation function (ReLU), , which allows the LSTM network to approximate not just a linear function (if it exists), but also accounts for the nonlinearity of the time series, is represented by the formula: Where W represent weights, b is the bias, σ are nonlinear activated functions, and X t is the input.Rectified linear activation function (ReLU), σ, which allows the LSTM network to approximate not just a linear function (if it exists), but also accounts for the nonlinearity of the time series, is represented by the formula: which returns 0 if it receives negative input, and the value is retained if it is positive.The adaptive moment estimation (Adam) optimizer, which handles bias estimation and weights, is used in this study (algorithm can be found in the paper of [57]).
In this study the following LSTM structure was used: five units (nodes) for the hidden layer, and one unit (node) for the output layer.

Model Performance Indicators
The authors used several metrics for evaluation, one of which is RMSE.The discrepancy between the values that were modeled and those that were actually observed is known as the residual mean square error, or RMSE.RMSE is a measurement of the amount of error between the projected and simulated outcomes and the actual rainfall (mm).In addition to comparing the highest and lowest values of the simulated and the observed values, we also used the Nash-Sutcliffe efficiency (NSE), correlation coefficient (CC), Kling-Gupta efficiency (KGE), index of agreement (IA), Legates-McCabe index (LMI), mean average percentage error (MAPE), percent (%) bias (PBIAS), and RMSE-observations standard deviation ratio (RSR) to assess the performance of the univariate LSTM and DWT rainfall model.
RMSE is commonly used for verification of experimental results in forecasting and regression analysis.It is essentially a standard deviation of the model residuals, which could tell us how concentrated the data around the line of best fit.Thus, lower values of RMSE mean a better model fit.
The following is the formula for RMSE: where Y i is the simulated output, X i is the observed sample, and n denotes the number of data points.By convention, the lower the value of these metrics, the better the forecasted model.NSE is a measure of how well simulated and real data fit on a 1:1 line in a graphic.This metric's values vary, with positive values indicating the opposite of the message, and negative values indicating bad simulation.Positive values, on the other hand, are almost identical to the observations.The efficiency of how well the modeling technique could model the resultants were studied through the use of various indicators.Nash-Sutcliffe efficiency, NSE, can be evaluated using: where X and Y are the observed and simulated variables, while µ y is the mean of the simulated variable in the model.CC values illustrate the degree of statistical association between the variables.Its values range from negative one to one.Such values determine either a positive or a negative correlation among the datasets.Moreover, a CC value of zero indicates no linear correlation.
CC is a performance indicator of statistical association between the variables.To calculate CC, the following formula should be used: where µ x is the mean of the observed variable.Originally proposed by [58], Kling-Gupta efficiency has been used in various fields.The bias, α (see Equation ( 20)), is calculated by dividing the standard deviation of the simulated variable by the standard deviation of the observed variable.Another bias, β (see Equation ( 21)), is calculated by dividing the mean of the simulated variable by the mean of the observed variable.The Pearson correlation coefficient, CC KGE (see Equation ( 22)), of the simulated and observed variables together with the two aforementioned biases, were the input values in the KGE, as shown in Equation ( 23): On the other hand, to measure the relative magnitude of the residual variance to the variance of errors of the model, IA, is also calculated.The following formula applies: The ratio of the mean square error (MSE) to the potential error (PE), multiplied by the number of observations, is how [59] defines the index of agreement (IA).Then, one is deducted from this figure.The range of IA values is zero to one, with higher index values indicating better agreement between observed and simulated values.
Legates-McCabe index (LMI) is another metric used in this study.
In the equations stated above, n represents the number of data points, Y i represents the observed values, X i represents the simulated values, and µ y is the mean of the observed values.
On the other hand, KGE assesses the skill of the model simulation by optimizing the bias and the variability of the datasets.KGE values range from −∞ to 1, and as with the other indicators, a value of 1 is preferred.
Another metric used is mean absolute percentage error (MAPE, %), which is calculated as follows: The smaller the MAPE value, the better the simulated value.Percent (%) bias (PBIAS) is also used in this study.A lower value of PBIAS means better simulation.If the value is positive, it means model overestimation bias, and if the value is negative, it means the model has underestimation bias.The optimal value of this metrics is 0.0.
The last metric is RMSE-observations standard deviation ratio (RSR).As implied by its name, it utilizes the standard deviation of the observed values to compare the models.This metric is calculated as follows: where the RSR optimum value is 0.0, thus, the lower the value obtained means the better the performance of the model.

Study Area
The Philippines, which is known to be consistently rampaged by typhoons, has been prone to excessive flooding, causing a huge number of fatalities each year.Samar, which is located in the Visayas region, has always been subject to heavy rainfall, which took a heavy toll on the province's facilities and infrastructure.In January 2021, more than a thousand people were displaced in the 13 upstream barangays of Oras, Eastern Samar.
This study will focus on four subcatchments in Samar, Philippines (see Figure 3).The extents of the subcatchments, which will be named as S-7, S-8, S-9, and S-10, are shown in Figure 4.The four subcatchments (S-7, S-8, S-9, and S-10) were chosen simply based on data availability.Some stations failed to record rainfall amounts due to broken rain gauges, which were caused by several factors, both man-made and natural.The study area is harsh and steep, and enveloped with thick rainforest.However, the catchment area is also drained by diverse bodies of water.The area has several mountain ranges and summits as well.Table 2 shows the stations and date range used in this study.The time-scale used for the entire study is daily.Hydrological variable models for rainfall can lead to efficient and accurate modeling of river or subcatchment discharge.Knowledge of this type of information can lead to flood and fatality prevention, but also to dissemination of relief operations, which can lead to optimum water resource management later on.However, there is lack of explorations regarding the methods and appropriate variables that can affect hydrological models, especially in developing countries such as the Philippines.
There have been various records of devastating floods that have submerged the Philippines.Samar, one of the most common victims of typhoons and massive flooding, recorded at least 3000 affected families in the January 2021 incessant rainfall [60], at least 1800 families affected in the December 2017 flooding [61], 2 dead and 40,000 affected in the January 2011 flooding [62], and 6 dead in the December 2008 flooding [63].Frequently, the floodwaters are caused by incessant rainfall, which causes the ground to exceed its ability to absorb water.Floods also occur when dry streams, creeks, or streams surpass their banks, which causes floodwaters to rise rapidly.These kinds of events are life-threatening, especially to inhabitants near flood-prone areas.

Data Collection and Characteristics
Data were obtained from the Advanced Science and Technology Institute (ASTI) of the Department of Science and Technology (DOST) in the Philippines.In this study, the authors used the univariate long short-term memory network (LSTM), alongside discrete wavelet transform (DWT), to simulate rainfall quantities in the following rainfall stations in Samar, Philippines (see Table 2).
This study proposed that the superposition of the base 'smoothened' signal and the noise signal modeled by univariate LSTM can be used to model rainfall.There are a total of four subcatchments considered in this study, namely: Oras sub-basin or S-7 (   Hydrological variable models for rainfall can lead to efficient and accurate modeling of river or subcatchment discharge.Knowledge of this type of information can lead to flood and fatality prevention, but also to dissemination of relief operations, which can lead to optimum water resource management later on.However, there is lack of explorations regarding the methods and appropriate variables that can affect hydrological models, especially in developing countries such as the Philippines.
There have been various records of devastating floods that have submerged the Philippines.Samar, one of the most common victims of typhoons and massive flooding, recorded at least 3000 affected families in the January 2021 incessant rainfall [60], at least 1800 families affected in the December 2017 flooding [61], 2 dead and 40,000 affected in the January 2011 flooding [62], and 6 dead in the December 2008 flooding [63].Frequently, the floodwaters are caused by incessant rainfall, which causes the ground to exceed its ability to absorb water.Floods also occur when dry streams, creeks, or streams surpass their banks, which causes floodwaters to rise rapidly.These kinds of events are life-threatening, especially to inhabitants near flood-prone areas.

Data Collection and Characteristics
Data were obtained from the Advanced Science and Technology Institute (ASTI) of the Department of Science and Technology (DOST) in the Philippines.In this study, the authors used the univariate long short-term memory network (LSTM), alongside discrete wavelet transform (DWT), to simulate rainfall quantities in the following rainfall stations in Samar, Philippines (see Table 2).
This study proposed that the superposition of the base 'smoothened' signal and the noise signal modeled by univariate LSTM can be used to model rainfall.There are a total of four subcatchments considered in this study, namely: Oras sub-basin or S-7 ( 3 shows the descriptive statistics of the data used in the rainfall model.Figure 5 shows the amount of precipitation in each rainfall station over varying time ranges.As mentioned, the time range varies in each station due to the varying availability of data, because some stations were built earlier or later than the others.Another reason is that some stations were damaged by typhoons, which disrupted the functional ability of the gauges.
Atmosphere 2023, 14, x FOR PEER REVIEW 14 of 29 and end dates in each sub-basin were varying.Table 3 shows the descriptive statistics of the data used in the rainfall model.'count' represents the number of data points; 'mean' is the arithmetic average of the rainfall data; 'std' is the standard deviation; 'min' and 'max' are the minimum and maximum values, respectively; '25%', '50%', and '75%' are the percentile of the data.
Figure 5 shows the amount of precipitation in each rainfall station over varying time ranges.As mentioned, the time range varies in each station due to the varying availability of data, because some stations were built earlier or later than the others.Another reason is that some stations were damaged by typhoons, which disrupted the functional ability of the gauges.As shown in Figure 5, there are noticeable peaks and high amounts of rainfall in S-8 and S-9 compared to the amount of rainfall in S-7 and S-10.In fact, the maximum amount of rainfall obtained in S-8 is 91.0 mm, while S-9 has 71.5 mm.On the other hand, S-7 and S-10 have 22.4 mm and 14.0 mm, respectively.

Overview of the Process
The overview of the process for the rainfall model is shown in the following schematic diagram (see Figure 6).The process will start with data collection from a Philippine government entity, the Department of Science and Technology (DOST).This will be followed by the application of discrete wavelet transform (DWT) and univariate long short-term memory network (LSTM), which will be further discussed in the following sections.As shown in Figure 5, there are noticeable peaks and high amounts of rainfall in S-8 and S-9 compared to the amount of rainfall in S-7 and S-10.In fact, the maximum amount of rainfall obtained in S-8 is 91.0 mm, while S-9 has 71.5 mm.On the other hand, S-7 and S-10 have 22.4 mm and 14.0 mm, respectively.

Overview of the Process
The overview of the process for the rainfall model is shown in the following schematic diagram (see Figure 6).The process will start with data collection from a Philippine government entity, the Department of Science and Technology (DOST).This will be followed by the application of discrete wavelet transform (DWT) and univariate long short-term memory network (LSTM), which will be further discussed in the following sections.

Noise Analysis Using DWT and LSTM
This study applied DWT (see Figure 7) to generate a smoothened rainfall 'base' curve from varying time ranges (dependent on the available data per rainfall station per subcatchment).The time series or signal are the inputs in the DWT process.The low-pass filter was utilized in the time series to recognize the 'noise' from the 'non-noise' parts.The level of decomposition used is one, and the thresholding was calculated using the average value of the time series, in this case, rainfall.Each value produced in wavelet transform is thresholded.Soft-thresholding [40], which is defined as replacing the absolute value of coefficient produced in wavelet transform with the threshold value when it is less than or equal to the threshold value, is applied.

Noise Analysis Using DWT and LSTM
This study applied DWT (see Figure 7) to generate a smoothened rainfall 'base' curve from varying time ranges (dependent on the available data per rainfall station per subcatchment).The time series or signal are the inputs in the DWT process.The low-pass filter was utilized in the time series to recognize the 'noise' from the 'non-noise' parts.The level of decomposition used is one, and the thresholding was calculated using the average value of the time series, in this case, rainfall.Each value produced in wavelet transform is thresholded.Soft-thresholding [40], which is defined as replacing the absolute value of coefficient produced in wavelet transform with the threshold value when it is less than or equal to the threshold value, is applied.
The scaling function used is the Daubechies wavelet four, and the soft-thresholding method with the low-pass filter (or the average of the time series as the threshold) was applied.Calculations can be performed manually or by using pywavelets in python.The scaling function used is the Daubechies wavelet four, and the soft-thresholding method with the low-pass filter (or the average of the time series as the threshold) was applied.Calculations can be performed manually or by using pywavelets in python.

Rainfall Noise Modeling Using LSTM
This study considered one input layer (the rainfall data preprocessed by the inverse distance weighting (IDW) method) and the adaptive moment estimation (Adam) optimizer was used, since, as emphasized by [46], it is suitable for large data and is efficient, as it requires less memory.The maximum number of epochs the model could undergo was set to 400.Therefore, when the validation loss stops improving, the system will stop the training, even if it has not reached epoch 400.An epoch is an iteration, and for each epoch, each data element will undergo training.If the number of epochs is small, the number of iterations the element will undergo will also be small, and can potentially result in underfitting.The activation function used is the rectified linear activation function (ReLU).ReLU has the ability to return the input directly if the value is greater than zero.If the value is less than zero, then it will return zero.ReLU allows the LSTM network to approximate not just a linear function (if it exists), but also accounts for the nonlinearity of the time series.The time series was divided into two datasets: the training and testing datasets; 70% for the former (training), and 30% for the latter (testing).For prediction modeling, data are typically divided into training, testing, and validation data.However, in some cases, training data and testing data [64][65][66] are already sufficient for model prediction.The authors utilized python's software library, keras v2.11.0, which serves as an interface for the TensorFlow library.
The authors of [4] have subsequently used DWT and LSTM to visualize dengue surges.To the best of the authors' knowledge, the use of DWT in a rainfall model to smoothen the base or observed rainfall curve, and then utilizing univariate LSTM (See Figure 8) for the noisy signal obtained by DWT, has not yet been used in any rainfall-related studies.

Rainfall Noise Modeling Using LSTM
This study considered one input layer (the rainfall data preprocessed by the inverse distance weighting (IDW) method) and the adaptive moment estimation (Adam) optimizer was used, since, as emphasized by [46], it is suitable for large data and is efficient, as it requires less memory.The maximum number of epochs the model could undergo was set to 400.Therefore, when the validation loss stops improving, the system will stop the training, even if it has not reached epoch 400.An epoch is an iteration, and for each epoch, each data element will undergo training.If the number of epochs is small, the number of iterations the element will undergo will also be small, and can potentially result in underfitting.The activation function used is the rectified linear activation function (ReLU).ReLU has the ability to return the input directly if the value is greater than zero.If the value is less than zero, then it will return zero.ReLU allows the LSTM network to approximate not just a linear function (if it exists), but also accounts for the nonlinearity of the time series.The time series was divided into two datasets: the training and testing datasets; 70% for the former (training), and 30% for the latter (testing).For prediction modeling, data are typically divided into training, testing, and validation data.However, in some cases, training data and testing data [64][65][66] are already sufficient for model prediction.The authors utilized python's software library, keras, which serves as an interface for the TensorFlow library.
The authors of [4] have subsequently used DWT and LSTM to visualize dengue surges.To the best of the authors' knowledge, the use of DWT in a rainfall model to smoothen the base or observed rainfall curve, and then utilizing univariate LSTM (See Figure 8) for the noisy signal obtained by DWT, has not yet been used in any rainfall-related studies.As shown in Figure 9, the approximate or base signal produced by DWT-LSTM was able to capture both time and frequency of the original rainfall time series or signal.This is better illustrated when the significant peak values of the approximate or base signal corresponded to the peak values of the observed rainfall time series or signal.
(a) S-7 As shown in Figure 9, the approximate or base signal produced by DWT-LSTM was able to capture both time and frequency of the original rainfall time series or signal.This is better illustrated when the significant peak values of the approximate or base signal corresponded to the peak values of the observed rainfall time series or signal.
As shown in Table 4, in S-7, the highest peak (in mm), which occurs at 13 September 2016, was also captured by the smoothened 'denoised' signal.S-8, on the other hand, has peaks   As shown in Figure 9, the approximate or base signal produced by DWT-LSTM wa able to capture both time and frequency of the original rainfall time series or signal.Thi is better illustrated when the significant peak values of the approximate or base signa corresponded to the peak values of the observed rainfall time series or signal.As shown in Table 4, in S-7, the highest peak (in mm), which occurs at 13 September 2016, was also captured by the smoothened 'denoised' signal.S-8, on the other hand, has peaks

Discussion
There are many studies which made use of noise modeling, especially in hydrology [32,67].Some studies showed that a model would demonstration high performance indicator values when the noises are modeled separately [32].In this study, the rainfall time series, which was treated as a signal, was reconstructed into a combination of a base 'smoothened' or 'denoised' signal and a noise signal.
One of the advantages of using DWT is that it can capture time and frequency information.Thus, filtering noise signals in a stationary, or even nonstationary, time series is still suitable when DWT is used.With DWT, the authors were able to divide the information in the signal into two subsignals: the 'denoised' (approximate) signal or time series, and the noise (detailed) signal or time series.
The noise (detailed) signal or time series was then processed by the univariate LSTM (see Figure 8).Figure 9 shows the framework of the DWT and univariate LSTM.The output of LSTM (Figure 8) is then combined to the output of the DWT to be able to obtain the new rainfall curve (Figure 9).As shown, the new rainfall curves (Figure 9) more or less matched the observed rainfall.To measure the DWT and univariate LSTM model performance, we used several indicator metrics, namely: RMSE, CC, NSE, KGE, IA, LMI, MAPE, PBIAS, and RSR (see Table S1).
The reconstructed signals show high performance (see Table S1).As pointed out by [4], LSTM can actually recognize seasonality behaviour in a time series.Precipitation is a time series signal which shows seasonal characteristics.This type of property has been proven in several studies [68,69].Therefore, despite excluding any type of meteorological variables in modeling rainfall time series and just reconstructing it, the DWT and univariate LSTM model achieved very high performance indicator values.In fact, it was consistently high (CC, NSE, KGE, IA, LMI) with MAPE values at almost zero, and consistently low values for RMSE and RSR.Some researchers who used minimal metrics of evaluation used an ablation experiment to verify the performance of their proposed rainfall model.In fact, in the paper of [33], the authors emphasized that the use of wavelet-ARIMA-LSTM (W-AL) is superior compared to using a plain LSTM and a plain ARIMA rainfall model.In this study, the authors provided a supplementary table (Table S1) to show the comparison of performance indicators between DWT and LSTM-and LSTM-generated rainfall.
Comparing Table S1 with Table 4, it is clear that the DWT and LSTM model have outperformed the LSTM model in terms of modeling the rainfall of each subcatchment.It can be seen that the only time that the LSTM rainfall outperformed the DWT-LSTM is in the CC value of S-8, where the former (LSTM) obtained a value of 0.98, while the latter (DWT and LSTM) obtained a value of 0.94.
Several studies have discussed the disadvantages of using LSTM alone in the model prediction of rainfall.In fact, Ref. [32], in their proposed rainfall model, emphasized that LSTM, just like the other neural networks, has a tendency to overestimate or underestimate peak rainfalls, therefore affecting the model accuracy.The same goes for [33], who pointed out that a hybrid approach rainfall prediction model has better fitting effects than the other singular ARIMA and LSTM models.
The results of this study, using the proposed method of superposition of the base 'smoothened' signal and the noise signal modeled by univariate LSTM, have been proven to be superior to an LSTM-generated rainfall.
Ref. [33] used DWT and ARIMA and LSTM and called it the W-AL model.In the mentioned study, the authors emphasized the effectiveness of using W-AL in the univariate forecasting of rainfall.However, the evaluation metrics used were few.In this study, we used several metrics of evaluation (RMSE, CC, NSE, KGE, IA, LMI, MAPE, PBIAS, and RSR), with the lowest CC value at 0.94.The difference between the research results of this study in comparison to the other studies, such as that of [33], is that this study emphasized the noise signals by using LSTM to model them.Choi et al. (2019) [32] also modeled the noises separately, but this was performed in the concept of heavy rain damage and not in rainfall prediction.Their method involves decision tree, random forest, SVM, DNN, linear regression, and PCA.
To the best of the authors' knowledge, none of the available rainfall model studies have used the approach performed in this paper.However, it was proven in the works of [33] and [34] that superimposed models (especially with the use of wavelet) can outperform nonhybrid, linear, and nonlinear models.Choi et al. (2019) [32], on the other hand, showed in their work that noise modeling deserves special attention.Combined, this paper emphasized that superimposed models with noise modeling is an effective rainfall modeling method, especially in a country where data is deficient.

Conclusions
In this day and age where data science is prominent, and where classical mathematical equations have not established a higher reputation against deep learning (DL), especially with study areas where data are lacking, the use of an effective, time-bound rainfall prediction model is deemed necessary.The authors conclude that the use of DL and DWT to estimate the rainfall parameter of the hydrological model can be effective in modeling some subcatchments in Samar, Philippines.The proposed method of modeling noise signal using LSTM and reconstructing the rainfall time series using DWT has great potential for the advancement of disaster risk reduction measures, especially in countries where data and resources are scarce, such as the Philippines.This study is also beneficial for governments that need strategic and highly targeted risk assessment studies in order to boost proper decision-making, and ultimately result in better disaster risk policies.

Figure 6 .
Figure 6.Schematic Diagram of the Rainfall Model Using Univariate LSTM.

Figure 6 .
Figure 6.Schematic Diagram of the Rainfall Model Using Univariate LSTM.

Table 1 .
Some Rainfall Models used by Other Research Scientists.

Table 2 .
Stations and Date Range Used in the Study (daily).

Table 2 .
Stations and Date Range Used in the Study (daily).

Table 3 .
Descriptive Statistics of the Data used in the Rainfall Model (in mm).
'count' represents the number of data points; 'mean' is the arithmetic average of the rainfall data; 'std' is the standard deviation; 'min' and 'max' are the minimum and maximum values, respectively; '25%', '50%', and '75%' are the percentile of the data.

Table 3 .
Descriptive Statistics of the Data used in the Rainfall Model (in mm).

Table 4 .
Performance Indicators Results for DWT + LSTM Rainfall