Evaporation Duct Height Nowcasting in China’s Yellow Sea Based on Deep Learning

The evaporation duct is a weather phenomenon that often occurs in marine environments and affects the operation of shipborne radar. The most important evaluation parameter is the evaporation duct height (EDH). Forecasting the EDH and adjusting the working parameters and modes of the radar system in advance can greatly improve radar performance. Traditionally, short-term forecast methods have been used to estimate the EDH, which are characterized by low time resolution and poor forecast accuracy. In this study, a novel approach for EDH nowcasting is proposed based on the deep learning network and EDH data measured in the Yellow Sea, China. The factors that affect nowcasting were analyzed. The time resolution and forecast time were 5 min and 0–2 h, respectively. The results show that our proposed method has a higher forecast accuracy than traditional time series forecasting methods and confirm its feasibility and effectiveness.


Introduction
Evaporation ducts are atmospheric layers that form due to large-scale evaporation of seawater and are the most common type of duct in the air-sea boundary layer [1]. According to statistics, the probability of evaporation duct occurrence in the waters surrounding China is 85% [2,3]. An evaporation duct can change the propagation path and energy distribution of electromagnetic waves, which affects the system in microwave frequency band, such as radar systems, communication systems, etc. As shown in Figure 1, the main effects are as follows: 1.
Radar wave propagates with less propagation loss in an evaporation duct environment, which can lead to over-the-horizon detection such that more distant targets can be detected; 2.
Radar wave is bound by the evaporation duct layer, which leads to the formation of a blind area for radar detection.
Therefore, before the radar system is operated, the radar detection performance and detection area must be optimized to utilize beneficial effects, such as over-the-horizon detection, and avoid adverse effects, such as the detection of blind areas, to enhance the radar system performance. To predict the detection performance of a radar, it is necessary to first obtain evaporation duct height (EDH) data and process it with a radio wave propagation algorithm to calculate the radio wave propagation loss in different regions. Obtaining accurate evaporation duct forecasts is thus an important task.
At present, evaporation duct forecast data are mainly based on mesoscale numerical weather forecast models, such as the Weather Research and Forecasting model, which produce short-term forecasts. These methods mainly use global background data as their data sources, with a time resolution of 3 h, horizontal resolution of 0.25 • × 0.25 • , and forecast duration of 72-120 h. Limited by the temporal and spatial resolution of the background data, the temporal and the horizontal resolutions of the evaporation duct short-term forecast product are 1 h and 30 km × 30 km, respectively. These spatiotemporal resolutions cannot accurately reflect the short-term non-stationary change of the EDH and do not meet the needs of radar performance evaluation. Nowcasting duration is generally 0-2 h and the temporal resolution is at the minute-level or smaller; thus, it can better reflect the short-term change of the EDH, leading to a more accurate evaluation of the radar detection performance. Although the short-term forecasting and nowcasting of the EDH belong to the same category, the utilized models, methods, and data are different. Several studies have investigated short-term forecasting of the EDH and established numerous methods and models; however, there is a lack of research on nowcasting the EDH as well as on effective methods and models. At present, evaporation duct forecast data are mainly based on mesoscal weather forecast models, such as the Weather Research and Forecasting m produce short-term forecasts. These methods mainly use global background d data sources, with a time resolution of 3 h, horizontal resolution of 0.25° × 0.25 cast duration of 72-120 h. Limited by the temporal and spatial resolution o ground data, the temporal and the horizontal resolutions of the evaporation term forecast product are 1 h and 30 km × 30 km, respectively. These spatiote olutions cannot accurately reflect the short-term non-stationary change of th do not meet the needs of radar performance evaluation. Nowcasting duration 0-2 h and the temporal resolution is at the minute-level or smaller; thus, it ca flect the short-term change of the EDH, leading to a more accurate evaluation detection performance. Although the short-term forecasting and nowcasting belong to the same category, the utilized models, methods, and data are differ studies have investigated short-term forecasting of the EDH and established methods and models; however, there is a lack of research on nowcasting the E as on effective methods and models.
Although there is no mature nowcast model, several time series forecast m be used for nowcasting the EDH. In the 1980s, two types of methods were us series forecasting: 1. Linear statistical methods; 2. Nonlinear time series models such as the bilinear model [4] and aut conditional heteroskedastic model [5]. Although there is no mature nowcast model, several time series forecast methods can be used for nowcasting the EDH. In the 1980s, two types of methods were used for time series forecasting:
Nonlinear time series models such as the bilinear model [4] and autoregressive conditional heteroskedastic model [5].
Linear statistical methods yield better results for stable time series, but the results for unstable time series are poor [6,7]. Because the EDH is non-stationary, linear statistical methods are not suitable for its forecast. However, the decision tree, support vector machine (SVM), and nearest neighbor regression models that were developed [8,9], have poor forecast accuracy which needs to be further improved [10]. Deep learning was proposed by Hinton et al. [11] and can be used to solve nonlinear forecast problems. Deep learning has been applied to evaporation duct inversion, evaporation duct diagnosis, sea clutter time series prediction, and other fields, and have achieved good results. Guo et al. [12,13] realized the inversion of an evaporation duct using a deep learning network, which had a higher inversion accuracy than traditional methods. Zhu et al. [14] established an evaporation duct model using a deep learning network, which has a higher accuracy than the traditional Paulus-Jeske (P-J) model. Zhao et al. [15] used a long short-term memory (LSTM) model for long-distance sea clutter prediction. Based on previous research results, an approach to nowcast the EDH is proposed in this study.
The essence of deep learning is to use a large number of continuous EDH data to establish a deep learning network model with multiple hidden layers to learn the nonlinear mapping relationship between the measured and future EDH and improve forecast accuracy by optimizing the model parameters. The research results of this paper indicate that the EDH nowcasting model based on deep learning has better forecast accuracy than traditional methods.
The content of this paper is arranged as follows: the calculation method of the EDH and its variation characteristics are introduced in Section 2.1. The selection of the deep learning network is described in Section 2.2. The modeling method used for nowcasting the EDH based on the LSTM network is discussed in Section 2.3. In Section 3, the results of a comparative experiment are described, which show that the forecast accuracy of the proposed model is higher than that of traditional methods. In Section 4, different parameters affecting forecast performance are discussed. The conclusions are presented in Section 5. Models that can be used to determine the evaporation duct mainly include the P-J model, Musson-Genon-Gauthier-Bruth model, and Babin models [16]. The P-J model is one of the most widely used models and has been integrated into the Integrated Reflection Effect Dissemination System [17]. The P-J model uses the air temperature, relative humidity, wind speed, atmospheric pressure, and sea surface temperature at a certain height from the sea surface as input parameters, introduces the potential refractive index as a similar parameter, and assumes that it satisfies the similarity theory [1]. The potential refractive index N p is calculated using the following Equation:

Materials and Methods
where z, L, k, p, θ, and e p are the height, the MoniObukhov length, the von Karmen's constant, the atmospheric pressure, potential temperature, and water vapor pressure, respectively, and N p is the characteristic scale parameter of the pseudo-refractive index. The relationship between the potential refractive index and atmospheric refractive index satisfies the following condition [1]: where N is the scaling parameter of potential refractivity. The evaporation duct appears when ∂N p ∂z < −0.125 [1]. The height corresponding to a vertical gradient of the potential refractive index that is equal to the critical value (−0.125) for the formation of an atmospheric duct is the EDH.

EDH Data Acquisition
From 2017 to 2019, an EDH data acquisition test was carried out in the Yellow Sea by the China Research Institute of Radiowave Propagation. The sensors used for the test included temperature, humidity, atmospheric pressure, wind speed, and sea surface temperature sensors. The sensor parameters are shown in Table 1.
The above-mentioned sensors were installed on the top of the buoy,~10 m away from the sea. The geographical location of the buoy and position of the sensor are shown in Figure 2.
Once the temperature, humidity, atmospheric pressure, wind speed, and sea surface temperature are obtained, the EDH can be calculated. If the wind speed is less than 0.01 knot, the EDH is set to zero. If not, the following four steps are taken [18].

•
Step 3: A potential refractivity difference between the air and the sea surface is determined from • Step 4: The stability conditions are examined to determine which form the EDH equation will take The above-mentioned sensors were installed on the top of the buoy, the sea. The geographical location of the buoy and position of the sen Figure 2. Once the temperature, humidity, atmospheric pressure, wind speed temperature are obtained, the EDH can be calculated. If the wind speed knot, the EDH is set to zero. If not, the following four steps are taken [18


Step 1: calculate the bulk Richardson's number  Step 2: From the Richardson's number, determine the Monin-Obuk  Step 3: A potential refractivity difference between the air and the se mined from  Step 4: The stability conditions are examined to determine which for tion will take 2.1.3. Variation of the EDH EDH data from 18:00 LST on 13 September to 18:00 LST on 15 Sept selected for the analysis of the EDH variation. The daily variation of the Figure 3.  Once the temperature, humidity, atmospheric pressure, wind speed, and sea temperature are obtained, the EDH can be calculated. If the wind speed is less th knot, the EDH is set to zero. If not, the following four steps are taken [18].

•
Step 1: calculate the bulk Richardson's number • Step 2: From the Richardson's number, determine the Monin-Obukhov lengt • Step 3: A potential refractivity difference between the air and the sea surface i mined from • Step 4: The stability conditions are examined to determine which form the EDH tion will take 2.1.3. Variation of the EDH EDH data from 18:00 LST on 13 September to 18:00 LST on 15 September 20 selected for the analysis of the EDH variation. The daily variation of the EDH is sh Figure 3.  Figure 3 shows that the height of the evaporation duct regularly changes during the selected time period. At night (from 13 September 18:00 to 14 September 6:00, from 14 September 18:00 to 15 September 6:00), the EDH is low, with a minimum of~5 m; in the daytime (from 14 September 6:00 to 14 September 18:00, from 15 September 6:00 to 15 September 18:00), the EDH is higher, reaching up to 20 m. However, this change is not always constant and sometimes opposite to the change shown in the graph. It is certain that the EDH regularly changes, which is mainly controlled by the climate at the observation site. Based on this regular change, the EDH can be forecasted.
The variation of the EDH, from 19:30 LST on 5 September to 21:30 LST on 5 September 2017, is shown in Figure 4.
Remote Sens. 2021, 13, x FOR PEER REVIEW that the EDH regularly changes, which is mainly controlled by the climate at the o tion site. Based on this regular change, the EDH can be forecasted.
The variation of the EDH, from 19:30 LST on 5 September to 21:30 LST on 5 Se 2017, is shown in Figure 4.  Figure 4 shows that the average EDH is ~11 m (red line). The minimum an mum EDH are ~9.5 and ~13 m, respectively, and the variation range is ~3.5 m. Th term variation of the EDH is mainly caused by the change of hydrometeorologica eters, systematic error of the sensor, and measurement error. Due to the non-st characteristics of the EDH, it is difficult to nowcast the EDH.

Deep Learning Network Selection
The nowcasting of the EDH is based on the data of the previous period an expressed by the following equation [19]:

t is the result of nowcasting, and H is the nonlinear network
Based on the choice of an appropriate deep learning network, the continuous tim tion and non-stationary characteristics of the EDH can be determined.
With the continuous development of deep learning technology, a variety learning networks have been established in recent years. At present, deep neural n (DNNs), recurrent neural networks (RNNs), and convolutional neural networks are commonly used. The RNN is one of the most effective networks for time ser processing. By introducing a cyclic structure, it can realize time series data anal prediction. However, the learning process of the RNN is characterized by gradi ishing and explosion phenomena, which cannot evaluate long-term nonlinear ships. To solve this problem, several researchers have proposed the use of a model. The LSTM, which is a special type of RNN, was proposed by Hochreiter e On the basis of a RNN, LSTM adds three gates-the input, output, and forget based on which the gradient disappearance or gradient explosion of the RNN can  Figure 4 shows that the average EDH is~11 m (red line). The minimum and maximum EDH are~9.5 and~13 m, respectively, and the variation range is~3.5 m. The short-term variation of the EDH is mainly caused by the change of hydrometeorological parameters, systematic error of the sensor, and measurement error. Due to the non-stationary characteristics of the EDH, it is difficult to nowcast the EDH.

Deep Learning Network Selection
The nowcasting of the EDH is based on the data of the previous period and can be expressed by the following equation [19]: where x(t) is the EDH measured at the current time, h(t − 1) is the historical characteristic EDH, y(t) is the result of nowcasting, and H is the nonlinear network model. Based on the choice of an appropriate deep learning network, the continuous time variation and non-stationary characteristics of the EDH can be determined.
With the continuous development of deep learning technology, a variety of deep learning networks have been established in recent years. At present, deep neural networks (DNNs), recurrent neural networks (RNNs), and convolutional neural networks (CNNs) are commonly used. The RNN is one of the most effective networks for time series data processing. By introducing a cyclic structure, it can realize time series data analysis and prediction. However, the learning process of the RNN is characterized by gradient vanishing and explosion phenomena, which cannot evaluate long-term nonlinear relationships. To solve this problem, several researchers have proposed the use of an LSTM model. The LSTM, which is a special type of RNN, was proposed by Hochreiter et al. [20]. On the basis of a RNN, LSTM adds three gates-the input, output, and forget gates-based on which the gradient disappearance or gradient explosion of the RNN can be overcome. Therefore, the LSTM is suitable for the classification, processing, and prediction of time series. It is one of the most advanced architectures that can be used for the deep learning of time Remote Sens. 2021, 13, 1577 6 of 15 series [21,22]. Thus, the LSTM was selected to nowcast the EDH in this study. The structure of the LSTM memory unit is shown in Figure 5. In the LSTM, i x is the EDH input at time t and i h is calculated with the fo equations [23]: and t c are the outputs of the input gate, forget gate, memory output gate, respectively; σ is a sigmoid function; and g and h are the a functions from input to output of the memory cells, usually hyperbolic tangent functi The EDH nowcasting framework based on the LSTM network is shown in F The framework comprises three layers: input, hidden, and output layers. The inp is the time series of the EDH, which is constructed by the measured EDH data; th layer is composed of the LSTM layer and Dense layer; and the output layer is t nowcasting result. In the LSTM, x i is the EDH input at time t and h i is calculated with the following equations [23]: where i t , f t , o t , and c t are the outputs of the input gate, forget gate, memory cell, and output gate, respectively; σ is a sigmoid function; and g and h are the activation functions from input to output of the memory cells, usually hyperbolic tangent functions tanh. The EDH nowcasting framework based on the LSTM network is shown in Figure 6. The framework comprises three layers: input, hidden, and output layers. The input layer is the time series of the EDH, which is constructed by the measured EDH data; the hidden layer is composed of the LSTM layer and Dense layer; and the output layer is the EDH nowcasting result. In the LSTM, i x is the EDH input at time t and i h is calculated with the following equations [23]: where t i , t f , t o , and t c are the outputs of the input gate, forget gate, memory cell, and output gate, respectively; σ is a sigmoid function; and g and h are the activation functions from input to output of the memory cells, usually hyperbolic tangent functions tanh The EDH nowcasting framework based on the LSTM network is shown in Figure 6 The framework comprises three layers: input, hidden, and output layers. The input layer is the time series of the EDH, which is constructed by the measured EDH data; the hidden layer is composed of the LSTM layer and Dense layer; and the output layer is the EDH nowcasting result.

EDH Nowcast Model Based on the LSTM Network
The EDH nowcast model based on the LSTM network is shown in Figure 7. First, a large amount of measured EDH data were used to build a training data set. Second, the nowcast model was established using the LSTM network, and the parameters were set.
x FOR PEER REVIEW 7 of 15

EDH Nowcast Model Based on the LSTM Network
The EDH nowcast model based on the LSTM network is shown in Figure 7. First, a large amount of measured EDH data were used to build a training data set. Second, the nowcast model was established using the LSTM network, and the parameters were set.

Training Data Construction
The training data are the EDH data measured in Qingdao, Shandong, China, from July 2017 to March 2019, with a time resolution of 1 min and total number of samples of ~800,000. To reduce the influence of systematic and random errors and meet the format requirements for input data of the LSTM network, the original data must be processed. The processing flowchart is shown in Figure 8. The EDH data processing comprises three steps:

Training Data Construction
The training data are the EDH data measured in Qingdao, Shandong, China, from July 2017 to March 2019, with a time resolution of 1 min and total number of samples of 800,000. To reduce the influence of systematic and random errors and meet the format requirements for input data of the LSTM network, the original data must be processed. The processing flowchart is shown in Figure 8.

EDH Nowcast Model Based on the LSTM Network
The EDH nowcast model based on the LSTM network is shown in Figur large amount of measured EDH data were used to build a training data set. S nowcast model was established using the LSTM network, and the parameters Figure 7. EDH nowcast model.

Training Data Construction
The training data are the EDH data measured in Qingdao, Shandong, C July 2017 to March 2019, with a time resolution of 1 min and total number of ~800,000. To reduce the influence of systematic and random errors and meet requirements for input data of the LSTM network, the original data must be The processing flowchart is shown in Figure 8. The EDH data processing comprises three steps:

•
Step 1: Moving average The EDH data processing comprises three steps:

•
Step 1: Moving average The real-time monitoring of the EDH data strongly fluctuates and cannot accurately reflect the real situation. Therefore, it is necessary to smooth the measured data. In this study, the moving average method was used for the monitoring data. After the data were averaged, the time resolution of the EDH data was 5 min and the total amount of samples was~160,000.

•
Step 2: Data division After averaging, the EDH data were divided into training and test sets, accounting for 80% and 20% of the total data, respectively.

•
Step 3: Data normalization Data normalization was used to linearly transform the original data without affecting the data distribution or relationship between the data such that the resulting value is mapped between [0,1] or [−1,1]. Data normalization can accelerate the speed of the gradient descent; thus, the optimal solution can be identified and the prediction accuracy might be improved. In this paper, the training and test set data were mapped between [0,1].

Model Parameters
Model parameters, such as the optimization method, dropout rate, number of input and output layer nodes, number of hidden layers, number of hidden layer neurons, activation function, and loss function, play a vital role in the model performance. Some of the parameters were set when the model was built; the remaining parameters were first set to initial values and then optimized based on further analysis of the model performance. For the specific analysis, please see Section 4. The determined parameters are as follows: 1.
Number of epochs: The problem solved in this paper is similar to that in Zhao et al. [15] and the number of inputs is similar. Therefore, the number of epochs was set to 200, referring to the number used in Zhao et al. [15].

2.
Number of hidden layers: In this study, three hidden layers [12] were used including two LSTM layers and one fully connected layer (Dense).

3.
Number of neurons in the output layer: In this study, the future EDH is predicted; therefore, the number of neurons in the output layer is 1.

4.
Activation function: In the neural network, a functional relationship exists between the output of the upper node and input of the lower node. This function is called the activation function. Common activation functions are the Sigmoid, tanh, Rectified Linear Unit (ReLu), and linear functions. For the EDH nowcasting of this study, the ReLu and linear functions were used as activation functions [24].

5.
Loss function: The loss function guides the network parameter learning by calculating the error between the predicted and real samples such that the model reaches a convergence state. In this study, we used the mean_squared_error function, which can be expressed as follows: where pred(i) represents the nowcasting result of the EDH, y(i) represents the expected value of the EDH sample, and n is the number of EDH time series selected for the training.

Test Results and Analysis
To verify the superiority of the model proposed in this paper, the EDH nowcasting results were compared. In this paper, two representative time series data forecasting methods were selected: SVM and artificial neural network (ANN).
The SVM is an important data mining algorithm. It is a binary classification algorithm that supports linear and nonlinear classification [25]. It has been rapidly developed since the 1990s leading to the derivation of a series of improved and extended algorithms [26,27]. The EDH time series is defined as follows: where N is the length of the time series of the EDH. Support vector regression can be expressed as follows: where φ is the input feature that maps the input vector x to the high-dimensional space and w b are the model parameters to be estimated from the EDH. The ANN, based on the principles of NNs in biology, is a mathematical model that simulates the processing of complex information by the nervous system of the human brain. Three ANN layers were used in this study and the ReLU and linear functions were utilized as activation functions for the hidden and output layers, respectively. EDH nowcasting can be expressed as follows: where x is the time series of the EDH; w T 1 and w T 0 are the weight matrix of the hidden and output layers, respectively; and b 0 and b 1 are the offsets of the hidden and output layers, respectively.
In this study, the mean absolute percentage error (MAPE), mean absolute error (MAE), and root mean square error (RMSE) were used to evaluate the accuracy of the forecast results. The formula of MAPE, MAE, and RMSE are as follows: whereĥ i is the nowcast result, h i is the true value of the EDH, and n is the total number of samples. Figure 9 shows that the nowcast results of the three methods match the true EDH when the forecast duration is 30 min and the error is small; when the forecast duration is 60 or 120 min, the nowcast result error significantly increases. Among the three nowcasting methods, the LSTM yields the best nowcast results, which may be due to the fact that the LSTM uses more information about the EDH in the model building process.
To quantitatively describe the advantages of this method,~38,000 sets of EDH data were used for the prediction and the RMSE, MAE, and MAPE of the LSTM, ANN, and SVM methods were calculated. The results are compared in Table 2. Table 2 shows that the RMSE, MAE, and MAPE derived from the proposed model are smaller than those of the other two models. At a forecast time of 30 min, the advantage of the model proposed in this study is the most notable and the MAPE is better than that of the other two models (~3% smaller).

yers, respectively.
In this study, the mean absolute percentage error (MAPE), mean absolute error AE), and root mean square error (RMSE) were used to evaluate the accuracy of the recast results. The formula of MAPE, MAE, and RMSE are as follows: here ˆi h is the nowcast result, i h is the true value of the EDH, and n is the total number samples. Figure 9 shows that the nowcast results of the three methods match the true EDH hen the forecast duration is 30 min and the error is small; when the forecast duration is or 120 min, the nowcast result error significantly increases. Among the three nowcastg methods, the LSTM yields the best nowcast results, which may be due to the fact that e LSTM uses more information about the EDH in the model building process. To quantitatively describe the advantages of this method, ~38,000 sets of EDH were used for the prediction and the RMSE, MAE, and MAPE of the LSTM, ANN SVM methods were calculated. The results are compared in Table 2.  Table 2 shows that the RMSE, MAE, and MAPE derived from the proposed m are smaller than those of the other two models. At a forecast time of 30 min, the adva of the model proposed in this study is the most notable and the MAPE is better than of the other two models (~3% smaller). Table 2 and Figure 9 show that the proposed model has a higher accuracy and b applicability than the SVM and ANN methods with respect to the nowcasting of the E The results also indicate the advantages of the use of the LSTM network for the time s prediction.

Effect of the Input Vector Dimension on Nowcasting Accuracy
The input vector dimension refers to the amount of EDH data received by input neurons. If the amount of input EDH data is small, the change of the EDH cann EDH (m)   Table 2 and Figure 9 show that the proposed model has a higher accuracy and better applicability than the SVM and ANN methods with respect to the nowcasting of the EDH. The results also indicate the advantages of the use of the LSTM network for the time series prediction.

Effect of the Input Vector Dimension on Nowcasting Accuracy
The input vector dimension refers to the amount of EDH data received by input layer neurons. If the amount of input EDH data is small, the change of the EDH cannot be determined; if the amount of input EDH data is large, the data farther away from the forecast time point may affect the nowcasting results, resulting in a lower nowcasting effect. EDH nowcasting accuracy changes with the input vector dimension, as shown in Figure 10. When the input vector dimension is small, the root mean squared error (RMSE) is large. As the dimension increases, the RMSE decreases; it is the smallest when the dimension is 12. determined; if the amount of input EDH data is large, the data farther away from t forecast time point may affect the nowcasting results, resulting in a lower nowcasting fect. EDH nowcasting accuracy changes with the input vector dimension, as shown Figure 10. When the input vector dimension is small, the root mean squared error (RMS is large. As the dimension increases, the RMSE decreases; it is the smallest when the mension is 12.

Effect of the Dropout Rate on Nowcasting Accuracy
The dropout method was proposed by Hinton [10] and is one of the main metho used to prevent the overfitting of deep neural network models. During overfitting, t model obtains a small loss function for the training data and a large loss function for t test data. In the dropout method, the model randomly "drops" hidden layer neurons w the probability p during training, preventing the model from relying too heavily on lo features and leading to a more generalized model. The interval of probability p is [0, We selected p = 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, and 0.8 and analyzed the effect of the prob bility p on the model. Under different p-values, the loss function changes with the numb of epochs, as shown in Figure 11.

Effect of the Dropout Rate on Nowcasting Accuracy
The dropout method was proposed by Hinton [10] and is one of the main methods used to prevent the overfitting of deep neural network models. During overfitting, the model obtains a small loss function for the training data and a large loss function for the test data. In the dropout method, the model randomly "drops" hidden layer neurons with the probability p during training, preventing the model from relying too heavily on local features and leading to a more generalized model. The interval of probability p is [0,1]. We selected p = 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, and 0.8 and analyzed the effect of the probability p on the model. Under different p-values, the loss function changes with the number of epochs, as shown in Figure 11. Figure 11 shows that the training set error continues to decrease for p > 0.2. The validation set error first decreases and then increases, indicating that the model is overfitting and the model generalization ability is poor. When p = 0.2, the model training set and test set errors change relatively smoothly, no overfitting occurs, and the difference between the training set and test set errors is small.
The change in EDH nowcasting accuracy depending on different p-values used to build a 1 h nowcasting model is shown in Figure 12. When p = 0.2, nowcasting accuracy is highest. determined; if the amount of input EDH data is large, the data farther away from the forecast time point may affect the nowcasting results, resulting in a lower nowcasting effect. EDH nowcasting accuracy changes with the input vector dimension, as shown in Figure 10. When the input vector dimension is small, the root mean squared error (RMSE) is large. As the dimension increases, the RMSE decreases; it is the smallest when the dimension is 12.

Effect of the Dropout Rate on Nowcasting Accuracy
The dropout method was proposed by Hinton [10] and is one of the main methods used to prevent the overfitting of deep neural network models. During overfitting, the model obtains a small loss function for the training data and a large loss function for the test data. In the dropout method, the model randomly "drops" hidden layer neurons with the probability p during training, preventing the model from relying too heavily on local features and leading to a more generalized model. The interval of probability p is [0,1]. We selected p = 0.1, 0.  Figure 11 shows that the training set error continues to decrease for p > 0.2. The validation set error first decreases and then increases, indicating that the model is overfitting and the model generalization ability is poor. When p = 0.2, the model training set and test set errors change relatively smoothly, no overfitting occurs, and the difference between the training set and test set errors is small.
The change in EDH nowcasting accuracy depending on different p-values used to build a 1 h nowcasting model is shown in Figure 12. When p = 0.2, nowcasting accuracy is highest.

Influence of the Number of Hidden Layer Neurons on Nowcasting Accuracy
The number of neurons affects nowcasting accuracy and training time. A small number of hidden layer neurons will lead to an insufficient network performance; a large number of neurons leads to a good network performance but will increase network training time. In this study, the value range of the LSTM layer in the hidden layer is 64, 128, 256, and 512, and the numbers of fully connected layers are 16 and 64, a total of 8 combinations. The effect of different combinations on the model performance is shown in Figure 13.  Figure 11 shows that the training set error continues to decrease for p > 0.2. T dation set error first decreases and then increases, indicating that the model is ov and the model generalization ability is poor. When p = 0.2, the model training set set errors change relatively smoothly, no overfitting occurs, and the difference b the training set and test set errors is small.
The change in EDH nowcasting accuracy depending on different p-values build a 1 h nowcasting model is shown in Figure 12. When p = 0.2, nowcasting acc highest.

Influence of the Number of Hidden Layer Neurons on Nowcasting Accuracy
The number of neurons affects nowcasting accuracy and training time. A sm ber of hidden layer neurons will lead to an insufficient network performance; a lar ber of neurons leads to a good network performance but will increase network time. In this study, the value range of the LSTM layer in the hidden layer is 64, 1 and 512, and the numbers of fully connected layers are 16

Influence of the Number of Hidden Layer Neurons on Nowcasting Accuracy
The number of neurons affects nowcasting accuracy and training time. A small number of hidden layer neurons will lead to an insufficient network performance; a large number of neurons leads to a good network performance but will increase network training time. In this study, the value range of the LSTM layer in the hidden layer is 64, 128, 256, and 512, and the numbers of fully connected layers are 16 and 64, a total of 8 combinations. The effect of different combinations on the model performance is shown in Figure 13. Figure 13 shows that the error of the test set first decreases and then increases when the number of neurons in the LSTM layer is 64, indicating that the model is overfitting. The training set error, test set error, and time consumption corresponding to a number of neurons above 64 are shown in Figure 14.  Figure 13 shows that the error of the test set first decreases and then increases when the number of neurons in the LSTM layer is 64, indicating that the model is overfitting. The training set error, test set error, and time consumption corresponding to a number of neurons above 64 are shown in Figure 14.  Figure 14 shows that the difference in nowcasting accuracy under different neuron combinations is small when the number of neurons in the LSTM layer is greater than 64, indicating that an increase in the number of neurons does not significantly improve the nowcasting accuracy of the model but leads to an increased training time. Considering the nowcasting accuracy of the model and time consumption of the training, a combination of 128, 128, and 64 neurons were used for the model in this study.
In later research, more EDH data could be used for model training, and more parameter combination schemes should be compared and analyzed. The method described in this paper can be applied to the nowcasting of hydrometeorological parameters. tion of the test set error depending on the number of epochs. Figure 13 shows that the error of the test set first decreases and then increases the number of neurons in the LSTM layer is 64, indicating that the model is overfi The training set error, test set error, and time consumption corresponding to a num neurons above 64 are shown in Figure 14.

Conclusions
In this study, a nowcasting model for the EDH is proposed. Based on this model, the EDH parameters can be obtained 0-2 h in advance, providing environmental information to maximize the operational power of radar. First, an EDH nowcasting model was established based on the LSTM network. Second, the effects of different hyperparameters on the performance of the model were analyzed and a set of optimized parameters was identified. Finally, the proposed model was compared with traditional methods. The results of the comparison indicate that the model proposed in this study has better nowcasting accuracy. In future research, more data can be used for the model training to further optimize the model parameters and improve nowcasting accuracy.