Sea Surface Temperature and High Water Temperature Occurrence Prediction Using a Long Short-Term Memory Model

Recent global warming has been accompanied by high water temperatures (HWTs) in coastal areas of Korea, resulting in huge economic losses in the marine fishery industry due to disease outbreaks in aquaculture. To mitigate these losses, it is necessary to predict such outbreaks to prevent or respond to them as early as possible. In the present study, we propose an HWT prediction method that applies sea surface temperatures (SSTs) and deep-learning technology in a long short-term memory (LSTM) model based on a recurrent neural network (RNN). The LSTM model is used to predict time series data for the target areas, including the coastal area from Goheung to Yeosu, Jeollanam-do, Korea, which has experienced frequent HWT occurrences in recent years. To evaluate the performance of the SST prediction model, we compared and analyzed the results of an existing SST prediction model for the SST data, and additional external meteorological data. The proposed model outperformed the existing model in predicting SSTs and HWTs. Although the performance of the proposed model decreased as the prediction interval increased, it consistently showed better performance than the European Center for Medium-Range Weather Forecast (ECMWF) prediction model. Therefore, the method proposed in this study may be applied to prevent future damage to the aquaculture industry.


Introduction
Due to global warming, high water temperatures (HWTs) are frequently observed along the coast of the Korean Peninsula. This phenomenon has led to mass mortality of farmed fish, resulting in massive economic losses to fishermen. The HWT warning period lasted for a total of 32 days in 2017, but persisted for 43 days in 2018. If this trend continues, the damage resulting from HWTs will likely be further exacerbated. To prevent and mitigate exposure risk, it is necessary to predict HWT occurrence accurately in advance. Therefore, in this study, we present a recurrent neural network (RNN)-based long short-term memory (LSTM) model based on deep-learning technology [1,2], to predict sea surface temperatures (SSTs).
Generally, extreme MHW is defined as the top 10% of all SST values observed over the past 30 years for a particular body of water [3,4]. However, in this study, we define HWTs in the context of typical water temperatures in Korea. We follow the criteria defined by the Korean Ministry of Maritime Affairs and Fisheries, which operates a HWT alert system to prevent damage to the aquaculture sector and respond as needed to these events. The HWT alert system consists of the following three stages: The HWT area and frequency are increasing in coastal areas off the Korean Peninsula due to global warming [32,33]. Figure 2 shows average SSTs for the target area. The maximum SST in the 7-year period between 2009 and 2015 was about 27 °C, and that from 2016 to 2018 (3 years) was about 28.5 °C. Thus, the target area, which contains numerous fish farms, experienced concentrated HWTs during the past 3 years. If HWT events in this area can be forecast in advance, the fishing industry may be able to respond more quickly to moderate the economic damage. The HWT area and frequency are increasing in coastal areas off the Korean Peninsula due to global warming [32,33]. Figure 2 shows average SSTs for the target area. The maximum SST in the 7-year period between 2009 and 2015 was about 27 • C, and that from 2016 to 2018 (3 years) was about 28.5 • C. Thus, the target area, which contains numerous fish farms, experienced concentrated HWTs during the past 3 years. If HWT events in this area can be forecast in advance, the fishing industry may be able to respond more quickly to moderate the economic damage. In general, HWTs in Korea are generated for the following reasons. First, as the heat-dome phenomenon becomes stronger due to the strengthening of the power of the North Pacific high pressure and the influence of Tibetan high pressure, the duration and frequency of the heat wave continued to grow [34,35]. Second, the power of the Tsushima warm current, which supplies heat at low latitudes, has been showing strong in the summer in Korea in recent years [36,37]. Third, since the frequency of typhoons that were concentrated in July and August every year decreased, the external force to mix the high-temperature surface sea water with the low-temperature sea water was weakened. For this reason, SST was continuously heated, leading to HWT.
HWTs in Korea are generally caused by the above 3 points. However, since the target area selected in this study is a very shallow area, it is greatly affected by the ocean depth. Figure 3 shows the ocean depth and current flow patterns around Korea. The Kuroshio warm current flows northeast from Philippines to the east coast of Japan, and these warm currents are separated into the Tsushima warm current and the Jeju warm current [38,39]. The Jeju warm current rotates clockwise on Jeju Island and merges with the Tsushima warm current. However, since the target area selected in this study was located on the coast, the effect of these warm current was small. Studies have shown that ocean depth has a significant effect on SST [40,41], and shallow coasts respond quickly to atmospheric changes due to their low heat storage capacity [42]. The ocean depth of the target area is also very shallow, and its SST tends to rise rapidly in midsummer. Therefore, it can be seen that the HWT phenomena occurred frequently in the target area while the HWT phenomena did not occur much in coasts near Busan and Jeju where the ocean depth was deeper and the current flow was faster than the target area.  In general, HWTs in Korea are generated for the following reasons. First, as the heat-dome phenomenon becomes stronger due to the strengthening of the power of the North Pacific high pressure and the influence of Tibetan high pressure, the duration and frequency of the heat wave continued to grow [34,35]. Second, the power of the Tsushima warm current, which supplies heat at low latitudes, has been showing strong in the summer in Korea in recent years [36,37]. Third, since the frequency of typhoons that were concentrated in July and August every year decreased, the external force to mix the high-temperature surface sea water with the low-temperature sea water was weakened. For this reason, SST was continuously heated, leading to HWT.
HWTs in Korea are generally caused by the above 3 points. However, since the target area selected in this study is a very shallow area, it is greatly affected by the ocean depth. Figure 3 shows the ocean depth and current flow patterns around Korea. The Kuroshio warm current flows northeast from Philippines to the east coast of Japan, and these warm currents are separated into the Tsushima warm current and the Jeju warm current [38,39]. The Jeju warm current rotates clockwise on Jeju Island and merges with the Tsushima warm current. However, since the target area selected in this study was located on the coast, the effect of these warm current was small. Studies have shown that ocean depth has a significant effect on SST [40,41], and shallow coasts respond quickly to atmospheric changes due to their low heat storage capacity [42]. The ocean depth of the target area is also very shallow, and its SST tends to rise rapidly in midsummer. Therefore, it can be seen that the HWT phenomena occurred frequently in the target area while the HWT phenomena did not occur much in coasts near Busan and Jeju where the ocean depth was deeper and the current flow was faster than the target area. In general, HWTs in Korea are generated for the following reasons. First, as the heat-dome phenomenon becomes stronger due to the strengthening of the power of the North Pacific high pressure and the influence of Tibetan high pressure, the duration and frequency of the heat wave continued to grow [34,35]. Second, the power of the Tsushima warm current, which supplies heat at low latitudes, has been showing strong in the summer in Korea in recent years [36,37]. Third, since the frequency of typhoons that were concentrated in July and August every year decreased, the external force to mix the high-temperature surface sea water with the low-temperature sea water was weakened. For this reason, SST was continuously heated, leading to HWT.
HWTs in Korea are generally caused by the above 3 points. However, since the target area selected in this study is a very shallow area, it is greatly affected by the ocean depth. Figure 3 shows the ocean depth and current flow patterns around Korea. The Kuroshio warm current flows northeast from Philippines to the east coast of Japan, and these warm currents are separated into the Tsushima warm current and the Jeju warm current [38,39]. The Jeju warm current rotates clockwise on Jeju Island and merges with the Tsushima warm current. However, since the target area selected in this study was located on the coast, the effect of these warm current was small. Studies have shown that ocean depth has a significant effect on SST [40,41], and shallow coasts respond quickly to atmospheric changes due to their low heat storage capacity [42]. The ocean depth of the target area is also very shallow, and its SST tends to rise rapidly in midsummer. Therefore, it can be seen that the HWT phenomena occurred frequently in the target area while the HWT phenomena did not occur much in coasts near Busan and Jeju where the ocean depth was deeper and the current flow was faster than the target area.

Data
SST data were obtained from the ERA5 product released by ECMWF. ERA5 is the fifth-generation ECMWF atmospheric reanalysis of global climate, and is the successor to FGGE reanalysis of the 1980s, which included ERA-15, ERA-40, and ERA-Interim. Reanalysis combines the model with observations to provide a numerical explanation of recent climate and is used by thousands of researchers worldwide. At present, ECMWF provides data such as SST, wind, dewpoint, and snowfall. We focused on ERA5 reanalysis SST data [43,44].
Hourly SST data were provided by ECMWF; we considered the noon measurement of SST to be representative of daily SST values. The SST data format consisted of 721 × 1440 data points, with a spatial resolution of 0.25 • latitude and 0.25 • longitude. The target coastal area from Goheung to Yeosu selected for SST and HWT prediction included a total of five pixels at a latitude of 34 Figure 4 shows the structure of an LSTM. LSTM solves the long-term dependency problem, a disadvantage of the RNN algorithm, using a forget gate. The forget gate allows the previous data to be forgotten if the current data are important; additionally, the previous data are remembered if the current data are unnecessary. The forget gate is calculated as follows:

Structure of the LSTM
where σ is a sigmoid activation function. In previous studies, σ can use various types of functions, but there are studies using sigmoid functions [45,46], so in this study, the sigmoid function was applied. h t−1 and x t are the output of the previous step and the current input, respectively, and W f and b f represent the weight and bias values, respectively. If f t , which is composed of the sigmoid function, approaches 0, the previous data are forgotten; if it approaches 1, the previous data are remembered. The next step, the input gate, determines which of the new data are stored in the cell state: where the input-gate layer i t determines which value to update, and the tanh layer creates a new candidate value C t . By combining the data from these two stages, the model creates a value to update the cell state. The new cell state C t is available by updating C t−1 in the past state. The last gate, the output gate, determines what to output: where h t represents the new output value.

LSTM Training Concept
To illustrate the LSTM training concept, consider typical 1-year SST values, as shown in Figure  5a. In this figure, the 1-year SST increased to approximately 30 °C in the summer and decreased to approximately 10 °C in the winter. Especially between days 210 and 250, roughly from July to August, HWTs exceeding 28 °C appeared more frequently. Figure 5b shows a schematic diagram of LSTM training based on input/output (I/O) data pairs. The input dataset used for LSTM training can be expressed as [ , , ⋯ , ] with days of SST data, and the output data can be expressed as [ , , ⋯ , ], where refers to the SST, and and represent the number of I/O data and the start day of the input data, respectively. The start day of input data was set to be random, to prevent certain parts from being trained.

LSTM Training Concept
To illustrate the LSTM training concept, consider typical 1-year SST values, as shown in Figure 5a. In this figure, the 1-year SST increased to approximately 30 • C in the summer and decreased to approximately 10 • C in the winter. Especially between days 210 and 250, roughly from July to August, HWTs exceeding 28 • C appeared more frequently. Figure 5b shows a schematic diagram of LSTM training based on input/output (I/O) data pairs. The input dataset used for LSTM training can be expressed as [t k−n+1 , t k−n+2 , · · · , t k ] with n days of SST data, and the output data can be expressed as [t k−n+2 , t k−n+3 , · · · , t k+1 ], where t refers to the SST, and n and k represent the number of I/O data and the start day of the input data, respectively. The start day of input data k was set to be random, to prevent certain parts from being trained.

LSTM Training Concept
To illustrate the LSTM training concept, consider typical 1-year SST values, as shown in Figure  5a. In this figure, the 1-year SST increased to approximately 30 °C in the summer and decreased to approximately 10 °C in the winter. Especially between days 210 and 250, roughly from July to August, HWTs exceeding 28 °C appeared more frequently. Figure 5b shows a schematic diagram of LSTM training based on input/output (I/O) data pairs. The input dataset used for LSTM training can be expressed as [ , , ⋯ , ] with days of SST data, and the output data can be expressed as [ , , ⋯ , ], where refers to the SST, and and represent the number of I/O data and the start day of the input data, respectively. The start day of input data was set to be random, to prevent certain parts from being trained.  Once the I/O data pairs are determined and trained, a trained LSTM model can be generated to predict the SST. The important point here was that unused data, i.e., data not used in the LSTM training, should make up the input data of the trained LSTM model; otherwise, the outcome could only be used as an index to judge whether the training was successful but would not be meaningful in predicting future water temperatures.

SST Prediction Concept using the Trained LSTM Model
Section 3.3 describes a method for predicting the m days SST based on the trained LSTM model. The predicted SST after 1 day was selected as the last component of the output. The SST prediction then proceeded to forecast the SST after m days using the following process. Figure 6 shows schematic diagrams for predicting SST after m days. The key concept is that LSTM's outputs are reused as an element of the input array to predict after m days. Where p is the output of the trained LSTM. To predict the SST after m days the following procedure was used (1) Extract the predicted SST after 1 day from the trained LSTM.
(2) To predict the SST after 2 days, the predicted SST after 1 day was substituted as the last component of the second input. (3) Extract the predicted SST after 2 days as a result of step 2. (4) To predict the SST after 3 days, the predicted SST after 2 days was substituted as the last component of the second input. Here, the predicted SST after 1 day was located before the last component. (5) Repeat this process m times.
By repeating the above procedure, the trained LSTM could generate the predicted SST after m days as an output.
Remote Sens. 2020, 12, x FOR PEER REVIEW 7 of 22 Once the I/O data pairs are determined and trained, a trained LSTM model can be generated to predict the SST. The important point here was that unused data, i.e., data not used in the LSTM training, should make up the input data of the trained LSTM model; otherwise, the outcome could only be used as an index to judge whether the training was successful but would not be meaningful in predicting future water temperatures.

SST Prediction Concept using the Trained LSTM Model
Section 3.3 describes a method for predicting the days SST based on the trained LSTM model. The predicted SST after 1 day was selected as the last component of the output. The SST prediction then proceeded to forecast the SST after m days using the following process. Figure 6 shows schematic diagrams for predicting SST after days. The key concept is that LSTM's outputs are reused as an element of the input array to predict after m days. Where is the output of the trained LSTM. To predict the SST after m days the following procedure was used 1) Extract the predicted SST after 1 day from the trained LSTM. 2) To predict the SST after 2 days, the predicted SST after 1 day was substituted as the last component of the second input. 3) Extract the predicted SST after 2 days as a result of step 2. 4) To predict the SST after 3 days, the predicted SST after 2 days was substituted as the last component of the second input. Here, the predicted SST after 1 day was located before the last component. 5) Repeat this process times. By repeating the above procedure, the trained LSTM could generate the predicted SST after days as an output.

HWT Determination Algorithm and Performance Evaluation
In this study, we assumed a HWT threshold of 28 °C in Korea. So we designed the state transition diagram of the algorithm used to determine HWT occurrence based on this threshold. If the predicted SST was < 28 °C after m days ( ), then the state was deemed normal; otherwise, the state was changed to HWT. To evaluate the performance of the HWT prediction algorithm, a space analysis of the receiver operating characteristic (ROC) curve was performed. Performance was assessed based on the true positive rate (TPR) or false positive rate (FPR) [47], which were calculated as follows:

HWT Determination Algorithm and Performance Evaluation
In this study, we assumed a HWT threshold of 28 • C in Korea. So we designed the state transition diagram of the algorithm used to determine HWT occurrence based on this threshold. If the predicted SST was < 28 • C after m days (p m ), then the state was deemed normal; otherwise, the state was changed to HWT. To evaluate the performance of the HWT prediction algorithm, a space analysis of the receiver operating characteristic (ROC) curve was performed. Performance was assessed based on the true positive rate (TPR) or false positive rate (FPR) [47], which were calculated as follows: where TP, TN, FP, and FN are the numbers of data points representing true positive, true negative, false positive, and false negative cases, respectively. If the outcome from a prediction is positive, and the real value is also positive, then the outcome is referred to as a TP; however, if the real value is negative, then it is called an FP. Conversely, a TN occurs when both the prediction outcome and real value are negative, and a FN is assigned when the prediction outcome is negative and the real value is positive. When the TPR approaches 1 and FPR approaches 0, the prediction performance is excellent; in the ROC space composed of TPR and FPR, the distribution of values in the upper left shows superior prediction performance. We also performed F1 score analysis to investigate how the HWT prediction algorithm responded to changes in the prediction interval. In statistical analyses, the F1 score represents the harmonic average between precision and sensitivity, as follows [48]:

Method of SST Prediction Performance Evaluation
To evaluate SST prediction performance, the predicted SST of the trained LSTM model was compared with actual SST data using the coefficient of determination (R 2 ), root mean square error (RMSE), and mean absolute percentage error (MAPE). R 2 , RMSE, and MAPE were calculated as follows [49]: where SST predicted and SST real are the predicted SSTs from the trained LSTM model and ERA5 SST data, and n is the number of samples. Figure 7 shows the structure of the actual LSTM network used in the experiments to predict the SST after m days. B, N, and n represent the numbers of batch sizes, neurons, and inputs, respectively. Datasets 1 through m are external meteorological data to be reflected during training. To prevent confusion of symbols, m used for "after m days" is in italics, and m is used to mean the number of datasets. For example, if only SST data are used in training, then dataset 1 becomes SST data, and if SST, wind, and air-temperature data are used, then dataset 1 becomes SST data, dataset 2 becomes wind data, and dataset 3 becomes air-temperature data. As such, the LSTM network was designed as a structure to which desired meteorological data can be continuously added. Each LSTM cell generates an output vector having the form (B, N). However, the output required is a vector having the form (B, 1). To convert the shape of the output, we used the following procedure.

LSTM Network for Experiments
(1) The output vectors of form (B, N) of each LSTM cell with n inputs were stacked in the vector having the form (B × n, N).
Remote Sens. 2020, 12, 3654 9 of 21 (2) A fully connected layer with one unit was applied to reshape the vector having the form (B × n, N) into a vector having the form (B × n, 1; because the fully connected layer is only a layer for dimension reduction, the activation function is not used). (3) The vector is reduced from the form (B × n, 1) into n outputs. Now the output is a vector of form (B × 1).
By performing the steps above, we could obtain the predicted SST value.
Remote Sens. 2020, 12, x FOR PEER REVIEW 9 of 22 1) The output vectors of form (B, N) of each LSTM cell with n inputs were stacked in the vector having the form (B × , N). 2) A fully connected layer with one unit was applied to reshape the vector having the form (B × , N) into a vector having the form (B × , 1; because the fully connected layer is only a layer for dimension reduction, the activation function is not used).
3) The vector is reduced from the form (B × , 1) into n outputs. Now the output is a vector of form (B × 1).
By performing the steps above, we could obtain the predicted SST value.

Parameter Values for the Simulation
For convenience, the target areas for predicting SSTs from 1 day to days is defined as shown in Figure 8a

Parameter Values for the Simulation
For convenience, the target areas for predicting SSTs from 1 day to m days is defined as shown in Figure 8a-e; the parameter values and training data of the LSTM network used in the simulation are presented in Table 1. 1) The output vectors of form (B, N) of each LSTM cell with n inputs were stacked in the vector having the form (B × , N). 2) A fully connected layer with one unit was applied to reshape the vector having the form (B × , N) into a vector having the form (B × , 1; because the fully connected layer is only a layer for dimension reduction, the activation function is not used).
3) The vector is reduced from the form (B × , 1) into n outputs. Now the output is a vector of form (B × 1).
By performing the steps above, we could obtain the predicted SST value.

Parameter Values for the Simulation
For convenience, the target areas for predicting SSTs from 1 day to days is defined as shown in Figure 8a-e; the parameter values and training data of the LSTM network used in the simulation are presented in Table 1.    Figure 8 shows the target area along the coast from Goheung to Yeosu in Jeollanam-do, which experienced a high occurrence of HWTs. The target area was divided into five areas, designated as a-e in the figure, corresponding to a latitude of 34.45 • and longitude spanning from 127.3 to 128.3 • . The color shown for each area corresponds to the number of HWT occurrences in 2018, as given in Figure 1.
The simulation was conducted using only one type of training data (SST) and three types of data (SST, wind, and air temperature); the latter case is referred to as the multi dataset. In session 2 said that HWTs in Korea are caused by strong Tsushima warm current, typhoons that were not concentrated in July and August, and heat wave caused by the heat-dome. Therefore, air temperature and wind that can be used as indicators of heat wave and typhoon were added to the multi dataset.
We used data collected during the past 10 years (2008-2017) to predict SSTs in 2018. The SST dataset (m = 1) contained 3650 input data and the multi dataset (m = 3) contained 3650 × 3 input data. However, both datasets resulted in 3650 output data because daily SSTs were the only output. Table 1 shows the parameter values and the I/O datasets for training the LSTM network used in the simulation. To perform the simulation, the number of batch sizes B, the number of neurons N, and the number of inputs n were selected as 30, 100, and 30, respectively. The cost function, which is an indicator of the performance of the model, was based on the mean square error. Minimizing the cost function means that the error between the LSTM output and the correct answer is small. The Adam optimizer was used as the optimization function to minimize the cost function.
As a result of experiments as the number of neurons and training data were varied, it was determined that 100 neurons and 3650 training data were appropriate. For better understanding, the sensitivity of the model to various experiments is shown in Appendix A's Table A1.
To predict the SSTs in the five areas shown in Figure 8a-e, the LSTM was trained using 10 years of data for the individual areas, thus effectively creating a trained LSTM model for that particular area. The trained LSTM model was then used to predict the SSTs in 2018. Since the SST prediction year was 2018, 335 data were needed (30 input data were required as the initial input for the LSTM model).

Results Obtained Using only SST Input Data
In this simulation, SSTs were predicted using only the widely used SST dataset for LSMT training. The simulation was conducted for area (a). To calculate R 2 , RMSE, and MAPE, the number of samples n was set to 335; only the data between 31 and 365 in 2018 were compared, in which the head data were eliminated for improved accuracy. The input of the trained LSTM model required SST data from 30 samples; thus, the SST data from these 30 samples were eliminated from the data head (i.e., used data elimination as discussed earlier). Figure 9 shows a comparison of real and predicted SSTs produced using only the SST dataset as input for area (a) in 2018; scatter plots were used to evaluate the performance of the trained LSTM model. Simulation results revealed a similar trend between the real and predicted SST data for SST prediction after 1 day. The results of the scatter plot analysis also showed that the LSTM model using only the SST dataset as input was very accurate. In the SST prediction after 1 day, the R 2 , RMSE, and MAPE values between the predicted SST and real data were 0.9936, 0.5076, and 2.2577, respectively. However, looking at the SST prediction after 7 days, the overall trend showed a lower accuracy than the prediction after 1 day, with R 2 , RMSE, and MAPE values (0.959, 1.3238, and 5.5454, respectively) between the predicted SST and real data. Figure 9 shows a comparison of real and predicted SSTs produced using only the SST dataset as input for area (a) in 2018; scatter plots were used to evaluate the performance of the trained LSTM model. Simulation results revealed a similar trend between the real and predicted SST data for SST prediction after 1 day. The results of the scatter plot analysis also showed that the LSTM model using only the SST dataset as input was very accurate. In the SST prediction after 1 day, the R 2 , RMSE, and MAPE values between the predicted SST and real data were 0.9936, 0.5076, and 2.2577, respectively. However, looking at the SST prediction after 7 days, the overall trend showed a lower accuracy than the prediction after 1 day, with R 2 , RMSE, and MAPE values (0.959, 1.3238, and 5.5454, respectively) between the predicted SST and real data. Figure 9. Comparison of predicted and real SST data, with scatter diagrams, for 1-day and 7-days prediction intervals using the SST dataset as input.

Results Obtained Using the Multi Dataset as Input
When only SST data were used as input, the SST prediction performance of the LSTM network rapidly deteriorated as the prediction interval increased. This result occurred because external meteorological data, which have a large influence on SSTs, were not used as input. To confirm that external meteorological data can improve SST prediction performance, we predicted SSTs using the multi dataset composed of SST, wind and air temperature datasets. The reanalysis dataset of ERA5 on the wind and air temperature were obtained in the same way as in Section 2.2. Figure 10 compares real and predicted SST results produced using the multi dataset as input for area (a) in 2018. Scatter plots were used to evaluate the performance of the trained LSTM model. After 1 day, SST predictions did not differ significantly between the two cases because the prediction interval was very short. However, after 7 days, the overall trend produced using the multi dataset was much more faithful to the observation data, with less variation than shown in the result produced using only the SST input dataset. Thus, the external meteorological data had a great influence on SST prediction performance. The R 2 , RMSE, and MAPE values between the predicted and real SST were 0.9832, 0.8242, and 4.1703, respectively, which indicated that the multi dataset improved our SST predictions. Figure 9. Comparison of predicted and real SST data, with scatter diagrams, for 1-day and 7-days prediction intervals using the SST dataset as input.

Results Obtained Using the Multi Dataset as Input
When only SST data were used as input, the SST prediction performance of the LSTM network rapidly deteriorated as the prediction interval increased. This result occurred because external meteorological data, which have a large influence on SSTs, were not used as input. To confirm that external meteorological data can improve SST prediction performance, we predicted SSTs using the multi dataset composed of SST, wind and air temperature datasets. The reanalysis dataset of ERA5 on the wind and air temperature were obtained in the same way as in Section 2.2. Figure 10 compares real and predicted SST results produced using the multi dataset as input for area (a) in 2018. Scatter plots were used to evaluate the performance of the trained LSTM model. After 1 day, SST predictions did not differ significantly between the two cases because the prediction interval was very short. However, after 7 days, the overall trend produced using the multi dataset was much more faithful to the observation data, with less variation than shown in the result produced using only the SST input dataset. Thus, the external meteorological data had a great influence on SST prediction performance. The R 2 , RMSE, and MAPE values between the predicted and real SST were 0.9832, 0.8242, and 4.1703, respectively, which indicated that the multi dataset improved our SST predictions. Remote Sens. 2020, 12, x FOR PEER REVIEW 12 of 22 Figure 10. Comparison of predicted and real SST data, with scatter diagrams, for 1-day and 7-days prediction intervals using the multi dataset as input. Figure 11 shows a comparison of R 2 , RMSE, and MAPE values between LSTM results produced using the SST and multi datasets as input, for different prediction intervals. For both input cases, the prediction performance of the LSTM model decreased as the prediction interval increased. However, the multi dataset led to better performance than the SST dataset in terms of R 2 , RMSE, and MAPE. Thus, to accurately predict SSTs, SST and external meteorological data (wind and air temperature) are important factors.

Comparison of Performance Between SST and Multi Dataset Inputs
Both models had similar performances in predicting SST after 1 day, therefore the SST dataset may be used for predicting 1 day. When using the SST dataset, the number of input data is three times less than that of the multi dataset, which has the advantage of shortening the training time. However, the processing (prediction) time is similar because both trained models were used to predict the SST. Therefore, it would be desirable to use the multi dataset with superior prediction performance as the prediction interval increases. Figure 10. Comparison of predicted and real SST data, with scatter diagrams, for 1-day and 7-days prediction intervals using the multi dataset as input. Figure 11 shows a comparison of R 2 , RMSE, and MAPE values between LSTM results produced using the SST and multi datasets as input, for different prediction intervals. For both input cases, the prediction performance of the LSTM model decreased as the prediction interval increased. However, the multi dataset led to better performance than the SST dataset in terms of R 2 , RMSE, and MAPE. Thus, to accurately predict SSTs, SST and external meteorological data (wind and air temperature) are important factors.

Comparison of Performance Between SST and Multi Dataset Inputs
Remote Sens. 2020, 12, x FOR PEER REVIEW 13 of 22 Figure 11. Comparison of coefficient of determination (R 2 ), root mean square error (RMSE), and mean absolute percentage error (MAPE) values between LSTM results produced using the SST and multi datasets as input, for different prediction intervals Figure 12 shows the ROC space and plots of HWT occurrence predictions from 1 to 7 days using the TPR and FPR. For both input cases, the calculated values plotted in the upper left corner, which indicates that the proposed algorithm was performed correctly, with the multi-dataset case showing better performance. Model performance was particularly good in terms of the FPR because Figure 11. Comparison of coefficient of determination (R 2 ), root mean square error (RMSE), and mean absolute percentage error (MAPE) values between LSTM results produced using the SST and multi datasets as input, for different prediction intervals Both models had similar performances in predicting SST after 1 day, therefore the SST dataset may be used for predicting 1 day. When using the SST dataset, the number of input data is three times less than that of the multi dataset, which has the advantage of shortening the training time. However, the processing (prediction) time is similar because both trained models were used to predict the SST. Therefore, it would be desirable to use the multi dataset with superior prediction performance as the prediction interval increases. Figure 12 shows the ROC space and plots of HWT occurrence predictions from 1 to 7 days using the TPR and FPR. For both input cases, the calculated values plotted in the upper left corner, which indicates that the proposed algorithm was performed correctly, with the multi-dataset case showing better performance. Model performance was particularly good in terms of the FPR because SST exceeded 28 • C during the summer months of the study year. TN is the value when the real and predicted SSTs do not exceed 28 • C. Therefore, it is necessary to focus on the TPR with respect to HWT prediction. Figure 11. Comparison of coefficient of determination (R 2 ), root mean square error (RMSE), and mean absolute percentage error (MAPE) values between LSTM results produced using the SST and multi datasets as input, for different prediction intervals Figure 12 shows the ROC space and plots of HWT occurrence predictions from 1 to 7 days using the TPR and FPR. For both input cases, the calculated values plotted in the upper left corner, which indicates that the proposed algorithm was performed correctly, with the multi-dataset case showing better performance. Model performance was particularly good in terms of the FPR because SST exceeded 28 °C during the summer months of the study year. TN is the value when the real and predicted SSTs do not exceed 28 °C. Therefore, it is necessary to focus on the TPR with respect to HWT prediction. F1 score is a statistical analysis method that evaluates the performance of the model when the data has an unbalanced structure. The frequency of occurrence of HWTs assumed in this study corresponds to this structure because it is concentrated in the summer of a year. Therefore, the performance of the proposed model was evaluated through the F1 score using the harmonic average rather the arithmetic average between precision and sensitivity. Figure 13 shows the comparison of F1 scores obtained using the two input datasets. For the HWT occurrence results obtained using only the SST dataset, the F1 score decreased from 0.8 to 0.6 as the prediction interval increased. However, for the multi-dataset case, even when the prediction interval increased, the F1 score remained constant at about 0.8. In terms of F1 score, it can be seen that the model using multi dataset had better performance in predicting HWTs than the case using the SST dataset.
Additionally, in order to confirm that the model proposed in this study could be applied not only to the target area but also to other areas, an additional experiment was conducted on the sea area near Jeju where HWT occurs frequently. The performance evaluation for this area is shown in Appendix A's Figure A2. F1 score is a statistical analysis method that evaluates the performance of the model when the data has an unbalanced structure. The frequency of occurrence of HWTs assumed in this study corresponds to this structure because it is concentrated in the summer of a year. Therefore, the performance of the proposed model was evaluated through the F1 score using the harmonic average rather the arithmetic average between precision and sensitivity. Figure 13 shows the comparison of F1 scores obtained using the two input datasets. For the HWT occurrence results obtained using only the SST dataset, the F1 score decreased from 0.8 to 0.6 as the prediction interval increased. However, for the multi-dataset case, even when the prediction interval increased, the F1 score remained constant at about 0.8. In terms of F1 score, it can be seen that the model using multi dataset had better performance in predicting HWTs than the case using the SST dataset.
Additionally, in order to confirm that the model proposed in this study could be applied not only to the target area but also to other areas, an additional experiment was conducted on the sea area near Jeju where HWT occurs frequently. The performance evaluation for this area is shown in Appendix A's Figure A2.
Remote Sens. 2020, 12, x FOR PEER REVIEW 14 of 22 Figure 13. Comparison of F1 scores obtained using the two input datasets.

Performance Comparison with ECMWF Forecast Data
To verify the performance of the SST prediction model using the multi dataset, the results of the proposed model were compared with those produced using ECMWF forecast data. The ECMWF forecast data provide predicted SSTs every 6 h starting from midnight on the first of every month and spatial resolution is composed of 1°. For accurate comparative analysis, we used noon forecast data (to match our output data) for the region located at latitude of 34° and longitude of 127° closest to area (a). Since the SST prediction model proposed in this study requires 30 days of data as an initial input, January was excluded from the comparison. For area (a), the HWT intensively occurred 18 times in August 2018. Therefore, in Figure 14, the performance of the proposed model and ECMWF forecast data was compared and analyzed by increasing the prediction interval from 7 to 20 days. Additionally, for the remaining months where HWT was not concentrated, the prediction interval was set to 7 days, and its performance was represented in Appendix A's Figure A3.

Performance Comparison with ECMWF Forecast Data
To verify the performance of the SST prediction model using the multi dataset, the results of the proposed model were compared with those produced using ECMWF forecast data. The ECMWF forecast data provide predicted SSTs every 6 h starting from midnight on the first of every month and spatial resolution is composed of 1 • . For accurate comparative analysis, we used noon forecast data (to match our output data) for the region located at latitude of 34 • and longitude of 127 • closest to area (a). Since the SST prediction model proposed in this study requires 30 days of data as an initial input, January was excluded from the comparison. For area (a), the HWT intensively occurred 18 times in August 2018. Therefore, in Figure 14, the performance of the proposed model and ECMWF forecast data was compared and analyzed by increasing the prediction interval from 7 to 20 days. Additionally, for the remaining months where HWT was not concentrated, the prediction interval was set to 7 days, and its performance was represented in Appendix A's Figure A3.  Figure 13. Comparison of F1 scores obtained using the two input datasets.

Performance Comparison with ECMWF Forecast Data
To verify the performance of the SST prediction model using the multi dataset, the results of the proposed model were compared with those produced using ECMWF forecast data. The ECMWF forecast data provide predicted SSTs every 6 h starting from midnight on the first of every month and spatial resolution is composed of 1°. For accurate comparative analysis, we used noon forecast data (to match our output data) for the region located at latitude of 34° and longitude of 127° closest to area (a). Since the SST prediction model proposed in this study requires 30 days of data as an initial input, January was excluded from the comparison. For area (a), the HWT intensively occurred 18 times in August 2018. Therefore, in Figure 14, the performance of the proposed model and ECMWF forecast data was compared and analyzed by increasing the prediction interval from 7 to 20 days. Additionally, for the remaining months where HWT was not concentrated, the prediction interval was set to 7 days, and its performance was represented in Appendix A's Figure A3.    Figure 14 shows the predicted SST results for the first to twentieth days of August 2018 obtained using the proposed model with multi-dataset input and ECMWF forecast data. SST predictions for August 2018 demonstrate that the proposed model showed superior performance to ECMWF forecast data. The ECMWF forecast data failed to predict HWTs, whereas the proposed model predicted HWT in most prediction interval except for blue circle 1 and green circle 2 and 3. Blue circle 1 indicates that the HWT prediction failed, and green circle 2 and 3 indicate that HWT was determined even though HWT did not occur. These results indicate that the proposed model will be helpful for SST or HWT predictions in the field.

Conclusions
In this study, we proposed a deep-learning-based SST and HWT prediction methodology to prevent mass mortality of aquaculture fish caused by HWTs. To predict SSTs and HWTs, we applied an RNN-based LSTM model for the coastal area between Goheung to Yeosu, Jeollanam-do, Korea, which suffered great economic damage due to HWTs in 2018. To train the SST prediction model, we used daily noon SST data provided by the ECMWF.
Previous studies have predicted SSTs by training deep-learning-based prediction models using only large SST datasets. However, these methods have limited accuracy because they do not take external meteorological factors into account. Therefore, in this study, we added external meteorological data (wind and air temperature) as input for the SST prediction model. The SST prediction model was designed to predict the SST after 1 day; to obtain predictions over a period of m days, this result was placed at the end of the next input, and the method was repeated.
First, the LSTM model was trained using a 10-year SST dataset for the study area. The model's 1 day SST prediction performance was excellent, but decreased as the prediction interval increased because predicted SSTs formed the basis of each subsequent day's predictions. Prediction of HWT occurrence was also evaluated based on TPR, FPR, and F1 score values. As the prediction interval increased, the prediction performance decreased overall.
To compensate for the decreases in the SST and HWT prediction performance as the prediction interval increased, we changed the training dataset. In the initial experiment, we used a training dataset composed of SST data only; we subsequently created a multi dataset composed of wind, air-temperature, and SST data. Our results obtained using the multi dataset were superior to those obtained using the SST training dataset, in area (a) as well as in other areas (b-e), according to R 2 , RMSE, and MAPE values. Model performance in each study area (b-e) is described in Appendix A's Figure A1. HWT was also better predicted using the multi dataset than using the SST dataset as input because wind and air temperature play important roles in contributing to SSTs. Further analyses of the impact of wind and air temperature input data on model training would improve the accuracy of SST prediction.
The deep-learning-based SST prediction methodology proposed in this paper shows very accurate performance when the prediction interval is short but deteriorates as the prediction interval increases. For this reason, it is possible to predict SSTs over a short period in real environments; however, the method is limited in its long-term forecasting ability. In this study, we used a multi dataset consisting of a 3D array as input for the LSTM model to predict the SST after m days. In a future study, we intend to train the LSTM model using multi-dimensional arrays (m ≥ 3) that include ocean currents, air pressure, heat flux, and salinity data in addition to wind and air-temperature data, which directly affect SST. We also plan to analyze how each type of meteorological data affects SST prediction performance to create an optimal training dataset for the model. These modifications are expected to improve the accuracy of the generated prediction model to improve prediction performance for m days.
We have a plan to contribute to the prevention of damage to the Korean aquaculture industry by recommending the SST and HWT prediction model using the multi dataset proposed in this study to Korean Ministry of Maritime Affairs and Fisheries and related organizations.   Table A is the result of comparing the SST and HWT prediction performance according to the changes in the number of neurons and the number of training data. Except for the red background with the smallest number of neurons and training data, it can be seen that the similar accuracy in terms of SST prediction performance. However, in this study, not only SST prediction but also HWT prediction was one of the main purposes, therefore we tried to select the numbers of neurons and training data that satisfied both SST and HWT prediction performances.
As a result of experiments for 9 cases, when the number of neurons was small (gray background), the F1 score values indicating HWT prediction performance tended to be low. In addition, it was confirmed that the F1 score value was also low when there were a lot of past data not related to the current pattern (blue box) and when a sufficient amount of training data was not included (green box). Therefore, in this study, considering the above three cases, among red box As a result of experiments for 9 cases, when the number of neurons was small (gray background), the F1 score values indicating HWT prediction performance tended to be low. In addition, it was confirmed that the F1 score value was also low when there were a lot of past data not related to the current pattern (blue box) and when a sufficient amount of training data was not included (green box). Therefore, in this study, considering the above three cases, among red box with similar HWT and SST prediction performance, we selected a blue background with 100 neurons and 10 year's training data. Figure A2 is the result of evaluating HWT and SST prediction performance by applying the proposed model (multi-dataset) for additional test area. Figure A2a shows the sea area near Jeju where HWT is generated frequently, and the area selected for an additional test is marked with a black box. Figure A2c shows the value of R 2 , RMSE, and MAPE according to the prediction interval. It can be seen that the performance was degraded as the prediction interval increases but the proposed model's performance was similar to that of Appendix B. In Figure A2b, it be seen that the F1 score also remained constant at about 0.8 as the prediction interval increased. Through these results, it was confirmed that the model proposed in this study had the ability to predict HWT not only in the target area but also in other areas.  Figure A2 is the result of evaluating HWT and SST prediction performance by applying the proposed model (multi-dataset) for additional test area. Figure A2(a) shows the sea area near Jeju where HWT is generated frequently, and the area selected for an additional test is marked with a black box. Figure A2(c) shows the value of R 2 , RMSE, and MAPE according to the prediction interval. It can be seen that the performance was degraded as the prediction interval increases but the proposed model's performance was similar to that of Appendix B. In Figure A2(b), it be seen that the F1 score also remained constant at about 0.8 as the prediction interval increased. Through these results, it was confirmed that the model proposed in this study had the ability to predict HWT not only in the target area but also in other areas.  Figure A3 shows the predicted SST results for the first to seventh day of each month (excluding January) of 2018 obtained using the proposed model with the multi-dataset input and ECMWF forecast data. SST predictions for February to March, June to October, and December demonstrated that the proposed model showed excellent performance. However, in other months, the performance of the proposed model was similar to or slightly inferior to that of ECMWF forecast data. Even when the performance of the proposed model was poor, the difference was as little as about +0.5 • C. From February to December, the difference between the real SST and ECMWF forecast data was 1.9 • C in average, and the difference from the proposed model (with multi dataset) was 0.45 • C on average. These results show that the accuracy of SST prediction was about 4 times better than the ECMWF forecast data. Figure A3. Comparison of SST prediction performance for August between the proposed model (with multi-dataset input) and European Center for Medium-Range Weather Forecast (ECMWF) forecast data. Figure A3 shows the predicted SST results for the first to seventh day of each month (excluding January) of 2018 obtained using the proposed model with the multi-dataset input and ECMWF forecast data. SST predictions for February to March, June to October, and December demonstrated that the proposed model showed excellent performance. However, in other months, the performance of the proposed model was similar to or slightly inferior to that of ECMWF forecast data. Even when the performance of the proposed model was poor, the difference was as little as about +0.5 °C. From February to December, the difference between the real SST and ECMWF forecast data was 1.9 °C in average, and the difference from the proposed model (with multi dataset) was 0.45 °C on average. These results show that the accuracy of SST prediction was about 4 times better than the ECMWF forecast data. Figure A3. Comparison of SST prediction performance for August between the proposed model (with multi-dataset input) and European Center for Medium-Range Weather Forecast (ECMWF) forecast data.