Prediction of Changeable Eddy Structures around Luzon Strait Using an Artificial Neural Network Model

Mesoscale eddies occur frequently in the Luzon Strait and its adjacent area, and accurate prediction of eddy structure changes is of great significance. In recent years, artificial neural network (ANN) has been widely applied in the study of physical oceanography with the continuous accumulation of satellite remote sensing data. This study adopted an ANN approach to predict the evolution of eddies around the Luzon Strait, based on 25 years of sea level anomaly (SLA) data, 85% of which are used for training and the remaining 15% are reserved for testing. The original SLA data were firstly decomposed into spatial modes (EOFs) and time-dependent principal components (PCs) by the empirical orthogonal function (EOF) analysis. In order to calculate faster and save costs, only the first 35 PCs were selected as predictors, whereas their variance contribution rate reached 96%. The results of predicted reconstruction indicated that the neural network-based model can reliably predict eddy structure evaluations for about 15 days. Importantly, the position and variation of four typical eddy events were reconstructed, and included a cyclone eddy event, an eddy shedding event, an anticyclone eddy event, and an abnormal anticyclone eddy event.


Introduction
The Luzon Strait is located between Taiwan Island and Luzon Island in the Philippines, an important channel connecting the South China Sea (SCS) and the Northwest Pacific, where water exchange, energy and material transport occurs ( Figure 1). The Luzon Strait is a significant area with many mesoscale eddy activities under the influence of complex topography, seasonal monsoon and Kuroshio intrusion, and the circulation around the Luzon Strait has obvious seasonal variation [1][2][3][4]. The general circulation pattern around the Luzon Strait is predominantly anticyclonic in summer, and is mainly cyclonic in winter [5,6]. Circulation variation and the distribution of nutrients and energy around Luzon Strait is significantly impacted by eddies [7,8]. Therefore, prediction of the changeable eddy structure around the Luzon Strait is greatly significant to military activities, production, and environmental protection in the SCS.
Extensive studies have been conducted on the eddy activity around the Luzon Strait based on different research methods, and many mesoscale eddies have been discovered [9][10][11]. In 2015, He et al. found that there are two cyclonic eddies in the north-western part of Luzon Island based on a variable gridded global ocean circulation model [10]. The upper layer cyclonic eddy is mainly caused by wind stress curl, but it is weakened by Kuroshio intrusion [10]. In 2002, Su et al. found an anticyclonic eddy in south-western Taiwan between October and November 1993 based on satellite-tracked drifting buoys [12]. In 1998, Li et al. showed that an anticyclonic eddy appeared in the north-eastern SCS from August to September 1994 by in situ observation data, and believed that the anticyclonic The Kuroshio is a typical western boundary current with high-temperatur salinity in the Northwest Pacific, which originates from the North Equator (NEC). When the Kuroshio intrudes into the SCS, the water mass, momentum are transported to the SCS, affecting the temperature, salinity, and circulation [14][15][16][17]. In the last few decades, extensive research has been conducted on th intrusion in the SCS via the Luzon Strait based on different methods. Among merical simulation [14,18,19], in situ observation [20,21] and satellite remote s are three traditional methods used to study the Kuroshio intrusion in the SCS acteristics of the Kuroshio intrusion in the SCS are depicted based on these me Kuroshio intrusion in the SCS has obvious seasonal variation, the intrusion is winter and weaker in summer [1,23]. In addition to seasonal variation, the K trusion also has inter-annual variability. However, due to the limited observa it is difficult to study the long-term variation of the Kuroshio [24,25].
Many studies have also been conducted on the variation in the spatial pat roshio intrusion in the SCS, though there is no unified final verdict. At presen three main paths of the Kuroshio, i.e., the leaping path, the leaking path, and t path [14,16,17]. The leaping path shows that Kuroshio bends clockwise to the L rather than directly intruding into the SCS [14]. The leaking path shows that the Kuroshio directly intrudes into the SCS and penetrates into the SCS basin 2006, Yuan et al. used satellite sea color, sea surface temperature, and altime The Kuroshio is a typical western boundary current with high-temperature and highsalinity in the Northwest Pacific, which originates from the North Equatorial Current (NEC). When the Kuroshio intrudes into the SCS, the water mass, momentum, and heat are transported to the SCS, affecting the temperature, salinity, and circulation in the SCS [14][15][16][17].
In the last few decades, extensive research has been conducted on the Kuroshio intrusion in the SCS via the Luzon Strait based on different methods. Among them, numerical simulation [14,18,19], in situ observation [20,21] and satellite remote sensing [22] are three traditional methods used to study the Kuroshio intrusion in the SCS. The characteristics of the Kuroshio intrusion in the SCS are depicted based on these methods. The Kuroshio intrusion in the SCS has obvious seasonal variation, the intrusion is stronger in winter and weaker in summer [1,23]. In addition to seasonal variation, the Kuroshio intrusion also has inter-annual variability. However, due to the limited observational data, it is difficult to study the long-term variation of the Kuroshio [24,25].
Many studies have also been conducted on the variation in the spatial pattern of Kuroshio intrusion in the SCS, though there is no unified final verdict. At present, there are three main paths of the Kuroshio, i.e., the leaping path, the leaking path, and the looping path [14,16,17]. The leaping path shows that Kuroshio bends clockwise to the Luzon Strait rather than directly intruding into the SCS [14]. The leaking path shows that a branch of the Kuroshio directly intrudes into the SCS and penetrates into the SCS basin [26,27]. In 2006, Yuan et al. used satellite sea color, sea surface temperature, and altimeter data to show that the direct path (leaking path) is the main form of surface Kuroshio intrusion in the Luzon Strait in winter [13]. In 2008, Liang et al. confirmed the existence of the SCS branch of the Kuroshio via in situ observations [28]. The looping path appears as an anticyclonic current, Remote Sens. 2022, 14, 281 3 of 23 which enters the SCS from the southern part of the Luzon Strait and returns to the Pacific Ocean from its northern part [25]. Generally, the looping path is accompanied by eddy shedding events and produces an anticyclonic eddy [25,29,30]. In 2021, Sun et al. used multiple remote sensing datasets to analyze the Kuroshio intrusion in the SCS, indicating that a strong Kuroshio loop current and accompanying anticyclonic eddy existed in winter 2020-2021 [31].
In recent years, some deep learning models have attracted the attention of scholars with the rise of big data. In 2015, Zeng et al. adopted artificial neural networks (ANN) to predict the loop current variation in the Gulf of Mexico and obtained good prediction results [32]. In 2019, Wang et al. used the divide-and-conquer machine learning method to predict the loop current system SSH evolution in the Gulf of Mexico 9 weeks ahead [33]. These studies have made it possible for us to predict the changeable eddy structure around the Luzon Strait based on deep learning. In this paper, the changeable eddy structure in the Luzon Strait was predicted based on ANN. Evaluation methods, such as correlation coefficient (CC), root-mean-square-error (RMSE) and prediction skill score (SS) were applied to assess the model's ability to predict the changeable eddy structure around the Luzon Strait.
The structure of the paper is as follows. Section 2 applies EOF analysis to process SLA data, and then adopts ANN to predict the processed data. Section 3 presents the results and analysis. The last section presents a discussion and a summary.

Datasets
The gridded altimeter-based SLA data from 1 January 1993 to 31 December 2017 around the Luzon Strait (115 • E-125 • E, 15 • N-25 • N) were used to study the SLA data provided by the French Archiving, Validation and Interpretation of Satellite Oceanographic Data (AVISO; Available online: http://www.aviso.altimetry.fr/en/home.html/ (accessed on 2 August 2021)). The merged data comes from the combination of Topex/Poseidon, Jason-1, OSTM/Jason-2, Jason-3. The initial SLA data are in the form of a time-dependent three-dimensional matrix, which is a time series composed of two-dimensional matrices. The latitude and longitude of the study area are represented by the rows and columns of the two-dimensional matrix, respectively. The time span of SLA data is 9129 days, the spatial span is 10 • × 10 • . The spatial resolution and temporal resolution of the data are 0.25 • and 1 day, respectively. Therefore, the initial data are composed of 9129 two-dimensional matrices of 40 × 40.
In order to quantify control, only the data deeper than 200 m were selected for research. The ETOP01, released by the National Oceanic and Atmospheric Administration (NOAA) National Geophysical Data Center, was interpolated to the SSH grid (Available online: https://www.ngdc.noaa.gov/mgg/global/global.html (accessed on on 6 September 2021)). The resolution of ETOP01 data is 1 . The geostrophic velocities used to show variation of geostrophic current were collected from AVISO. The tracks of the typhoon were obtained from the Tropical Cyclone Data Center of China Meteorological Administration (Available online: http://tcdata.typhoon.org.cn (accessed on 6 September 2021)) [34,35].

EOF Analysis
EOF analysis is a multivariate data processing method, which can convert multiple indicators into a few indicators [36]. It is widely applied in various fields, such as oceanography and meteorology. The wide application of EOF analysis benefits from an important feature: the ability to decompose the original data into a series of orthogonal vectors which are independent form each other, so that the initial data can be reconstructed by part of the vector [37]. That is to say, EOF analysis extracts the main information in the data and reconstructs the initial data with the main information. It can greatly reduce the amount of processed data to realize the compression of the datasets under the premise of considering the maximum correlation. The realization of EOF analysis is mainly through singular value decomposition (SVD) [38].
Let S be an m × n matrix with rows and columns representing the time series and spatial data points, respectively. EOF analysis is used to decompose the matrix S: where matrix P, Q are orthogonal matrices, and matrix Σ is diagonal matrix. Matrix PΣ is time-dependent principal component (PC), and matrix Q T is spatial mode (EOF), T is the transpose operation of the vector. The variance contribution rate of the kth mode (kth PC) can be obtained: where r ≤ min(m, n) is the rank of matrix S, λ i is the singular value of matrix S, and λ i (i = 1, 2, · · · , r) is arranged in descending order. The variance contribution rate of the PC can be used to determine the PC and the number of PCs used for data reconstruction. The larger the variance contribution rate is, the more important this PC is. The PC with the smaller variance contribution rate will be discarded [37]. According to Equation (2), the variance contribution rate is decreasing. Therefore, the first few PCs can be used to reconstruct the data, and the reconstructed data does not lose too much information.

ANN
ANN, a new type of large-scale computing system, can perform multi-factor analysis, which is derived from biological neural networks [39]. However, ANN does not simply model the operation of biological systems, which solves complex problems through known biological network functions [40]. In 1943, the artificial neuron was first introduced, and the earliest computational neural network was developed in 1960. Although neural networks appeared earlier, the development of ANN was relatively slow for a long time. Until the emergence of the back propagation algorithm (BP), ANN developed rapidly [41]. The goal of back propagation is to minimize the error between the output values and the actual values.
ANN is composed of interconnected neurons or nodes (the basic unit of ANN, see Figure 2) that receive input signals from external input or other neurons. Then, the input signals are weighted and passed to the activation function to obtain the output signal [42]. The activation function executed by nodes i is: where, neu i represents the output of nodes i, f i denotes the activation function. Generally, the activation function is nonlinear, such as Sigmoid, Heaviside and Rule functions. w ij is the weight between node i and node j, x j indicates input data, and θ i represents bias. The basic structure of ANN includes input layer, hidden layer and output layer. A fivelayer neural network is shown in Figure 3. Among them, the number of hidden layers needs to be determined according to specific problems. Generally, one hidden layer can make ANN approximate any nonlinear function at any desired precision. The hidden layer is to detect the characteristics of the data through the hidden nodes, so as to perform complex nonlinear mapping [42]. Fewer hidden nodes can avoid some over-fitting problems, but ANN may not be able to effectively learn data. The choice of the number of hidden nodes is a key and complex issue; the widely adopted method to determine the number of hidden nodes is to perform repeated experiments [42]. The detailed introduction to ANN refers to Jain et al. [40]. The basic structure of ANN includes input layer, hidden layer and output layer. A five-layer neural network is shown in Figure 3. Among them, the number of hidden layers needs to be determined according to specific problems. Generally, one hidden layer can make ANN approximate any nonlinear function at any desired precision. The hidden layer is to detect the characteristics of the data through the hidden nodes, so as to perform complex nonlinear mapping [42]. Fewer hidden nodes can avoid some over-fitting problems, but ANN may not be able to effectively learn data. The choice of the number of hidden nodes is a key and complex issue; the widely adopted method to determine the number of hidden nodes is to perform repeated experiments [42]. The detailed introduction to ANN refers to Jain et al. [40]. As a computing tool, ANN has significant advantages in control, forecasting, function approximation, optimization and so on [43]. Among them, forecasting is the most important function of ANN. Additionally, as a prediction method, ANN is widely used in various fields, such as medicine, water resources and oceanography. The back propagation neural network (BPNN) is the most common method. The back propagation neural network with three hidden layers is adopted in this paper and the number of nodes for each hidden layer is 10, 5, 1. The sigmoid and linear functions are chosen as active functions for three hidden layers, and sigmoid active function is defined as: The back propagation neural network is trained and tested with SLA PCs vectors. 85% of the datasets is used for training back propagation neural network, and the remaining 15% is used for testing. The initial learning rate of the model is set to 0.001, the momentum factor is 0.9, the number of iterations is 1000, and the number of intervals is 125.  The basic structure of ANN includes input layer, hidden layer and output layer. A five-layer neural network is shown in Figure 3. Among them, the number of hidden layers needs to be determined according to specific problems. Generally, one hidden layer can make ANN approximate any nonlinear function at any desired precision. The hidden layer is to detect the characteristics of the data through the hidden nodes, so as to perform complex nonlinear mapping [42]. Fewer hidden nodes can avoid some over-fitting problems, but ANN may not be able to effectively learn data. The choice of the number of hidden nodes is a key and complex issue; the widely adopted method to determine the number of hidden nodes is to perform repeated experiments [42]. The detailed introduction to ANN refers to Jain et al. [40]. As a computing tool, ANN has significant advantages in control, forecasting, function approximation, optimization and so on [43]. Among them, forecasting is the most important function of ANN. Additionally, as a prediction method, ANN is widely used in various fields, such as medicine, water resources and oceanography. The back propagation neural network (BPNN) is the most common method. The back propagation neural network with three hidden layers is adopted in this paper and the number of nodes for each hidden layer is 10, 5, 1. The sigmoid and linear functions are chosen as active functions for three hidden layers, and sigmoid active function is defined as: The back propagation neural network is trained and tested with SLA PCs vectors. 85% of the datasets is used for training back propagation neural network, and the remaining 15% is used for testing. The initial learning rate of the model is set to 0.001, the momentum factor is 0.9, the number of iterations is 1000, and the number of intervals is 125. As a computing tool, ANN has significant advantages in control, forecasting, function approximation, optimization and so on [43]. Among them, forecasting is the most important function of ANN. Additionally, as a prediction method, ANN is widely used in various fields, such as medicine, water resources and oceanography. The back propagation neural network (BPNN) is the most common method. The back propagation neural network with three hidden layers is adopted in this paper and the number of nodes for each hidden layer is 10, 5, 1. The sigmoid and linear functions are chosen as active functions for three hidden layers, and sigmoid active function is defined as: The back propagation neural network is trained and tested with SLA PCs vectors. 85% of the datasets is used for training back propagation neural network, and the remaining 15% is used for testing. The initial learning rate of the model is set to 0.001, the momentum factor is 0.9, the number of iterations is 1000, and the number of intervals is 125.

Forecasting Process
The SLA data around the Luzon Strait from 1 January 1993 to 31 December 2017 (25 years) is selected as the research data. Because the spatial span and time span of the SLA data are very large, it is difficult to directly predict SLA by ANN. In order to avoid the difficulty of forecasting caused by too large data dimension, the EOF analysis is used to compress the initial SLA data. The SLA data is decomposed to spatial modal (EOF) and time-dependent principal component (PC). The forecasting process is shown in Figure 4.
where, S is the SLA data, , ( 1, 2, , , 1, 2, , ) ij s i m j n  is the jth spatial SLA data point at time i, m represents the time index, and n represents the number of SLA spatial points.  From Table 1, it is clear that the first few main PCs are used to reconstruct and can make the reconstructed SLA data restore the information well. Therefore, the first k PCs can be used to reconstruct SLA data. The SLA data at time 1 m  can be reconstructed approximately: The initial data is a time-dependent three-dimensional matrix which cannot be decomposed directly, so it needs to be reduced in dimensionality. The initial 9129 two-dimensional 40 × 40 matrices expanded in rows (or columns). Then, the 9129 vectors are reorganized to form a two-dimensional 9129 × 1600 matrix S, in which rows represent the time series, and the columns represent the spatial data point. Using EOF analysis to decompose the matrix S can be expressed as: where, S is the SLA data, s i,j (i = 1, 2, · · · , m, j = 1, 2, · · · , n) is the jth spatial SLA data point at time i, m represents the time index, and n represents the number of SLA spatial points.
(p 1,i , p 2,i , · · · , p m,i ) T means the ith PC, T is the transpose operation of the vector. From Table 1, it is clear that the first few main PCs are used to reconstruct and can make the reconstructed SLA data restore the information well. Therefore, the first k PCs can be used to reconstruct SLA data. The SLA data at time m + 1 can be reconstructed approximately: It can be seen that if we predict the SLA data at time m + 1, we only need to predict the PCs at time m + 1.
The back propagation neural network with three hidden layers is used to predict the PCs. Above, it can be seen that the PCs decomposed by EOF analysis are independent of each other, so the value of the jth PC at time m + 1 can be obtained: where p m,j is the jth PC of the SLA at the mth record and l is time delay. The greater the time delay is, the longer the training time is. The jth PC of the SLA at the time m + 1 obtained by the ANN can be substituted into Equation (7) to predict p m+2,j (see Figure 5).

The Method of Evaluation
First, the prediction system is evaluated by calculating the correlation coefficient (the larger the correlation coefficient is, the higher the accuracy is) and RMSE (the smaller the correlation coefficient is, the higher the accuracy is) between the predicted and observed SLA values. The correlation coefficient is directly obtained through the program. The RMSE of each grid point at day n is defined as:

The Method of Evaluation
First, the prediction system is evaluated by calculating the correlation coefficient (the larger the correlation coefficient is, the higher the accuracy is) and RMSE (the smaller the correlation coefficient is, the higher the accuracy is) between the predicted and observed SLA values. The correlation coefficient is directly obtained through the program. The RMSE of each grid point at day n is defined as: where, m represents the total number of SLA data points in the study area, SLA Pi,n represents the ith predicted SLA data point at day n, and SLA Oi,n represents the ith observed SLA data point at day n.
Due to the slow variation in SLA, we made a persistence forecast [44]. Persistence represents that the observed SLA data at day 0 is selected as the predicted SLA data for each prediction window. In order to verify the accuracy of the prediction model, the prediction system is evaluated by comparing the RMSE of prediction and persistence. The RMSE between persisted and observed SLA of each grid point at day n is defined as: where, SLA O i,0 represents the ith observed SLA data point at week 0 for each prediction window. Finally, the skill score (the larger the skill score is, the higher the accuracy is) [45] is used to verify the model. The predicted skill score of the SLA at each grid point at day n is defined as: where, SLA O i,n represents the arithmetic mean of the observed SLA. The persisted skill score of the SLA at each grid point at day n is defined as: Figure 5 shows the correlation coefficient of the first PCs of the observed and predicted SCS SLA for different prediction days from 18 April 2014 to 31 December 2017. The predicted value is consistent with the observed value for the 1-day prediction, and the correlation coefficient reaches 1. It indicates that ANN can predict the first PCs values of the SLA almost exactly. The correlation coefficient decreases from 1.0000 to 0.9597 from day 1 to day 15, which indicates that the accuracy of the prediction decreases. According to the prediction method, the possible reason is that the error generated in each prediction step will propagate downward. Additionally, as the prediction time increases, error accumulation will occur. Although the correlation coefficient decreases with the increase in prediction time, the correlation coefficient can still reach 0.9597 at day 15. This indicates that the reliability of the prediction model is high.

SLA Prediction
The predicted SLA can be reconstructed by multiplying the predicted PCs and EOFs, as in Equation (6). Table 1 shows the time-averaged correlation coefficient and RMSE between the predicted and observed SCS SLA from 2 April 2014 to 31 December 2017 in different quantities of PCs. It can be obtained from Table 1 that the number of PCs are set to 7, 10, 14, 20, 35 and 40, the corresponding variance contribution rates are 75%, 80%, 85%, 90%, 95% and 96%, respectively. From Table 1, the correlation coefficient decreases and the RMSE increases with the increase in prediction time, which indicates that the accuracy of prediction is decreasing. The correlation coefficient increases and RMSE decreases with the increase in the number of PCs. However, for the prediction of 15 days, the correlation coefficient of 35 PCs is larger than that of 40 PCs, and RMSE of 35 PCs is smaller than that of 40 PCs. This shows that when the number of PCs increases to a certain extent, the performance of the model does not increase significantly or even maybe decreases. This is because PCs with a small variance contribution rate are noisier and more difficult to predict. From the prediction of 16 days to 30 days, even the number of PCs reaches 40, the correlation coefficient does not exceed 0.7. This indicates that the performance of the prediction model is not very good for long-term prediction.

SLA Evaluation
According to Equation (6) and Table 1, using the first 35 PCs to construct predicted SLA will have the largest correlation coefficient and the smallest RMSE at day 15 among the six cases ( Table 1). The large correlation coefficient and the small RMSE represent good prediction performance, so the first 35 PCs are selected to reconstruct SLA. Figure 6 shows the correlation coefficient and RMSE between the observed and predicted SLA in the study area from 18 April 2014 to 31 December 2017 with a 15-day ahead daily sliding prediction window using 35 PCs. From Figure 6, the correlation coefficients are all greater than 0.8 at day 1, day 3 and day 5. There are few time periods when correlation coefficients are lower than 0.8 at day 7 and day 9. The correlation coefficients oscillate around 0.8 at day 11. For the prediction of day 13 and day 15, correlation coefficients are lower than 0.8 at most of the time periods. The results show that the oscillation of correlation coefficient is becoming more and more obvious with the increase in prediction time. This reason may be the error generated in each prediction step will propagate downward. For example, there was obvious oscillation in September 2015, April 2016, and April 2017; the oscillation became more and more obvious with the increase in prediction time. The RMSEs are lower than 0.2 m overall, even they are lower than 0.1 m at day 1, day 3, day 5, and day 7. In summary, the accuracy of the forecast is decreasing with the increase in prediction time. Even if the performance of the model decreases with the increase in prediction time, all correlation coefficients of day 1, day 3, day 5, day 7 are not less than 0.7, which indicates that the eddy variation can be captured at least 7 days in advance. Additionally, some correlation coefficients are higher than 0.7 in some time periods at day 9, day 11, day 13, and day 15, indicating the eddy variation can be captured even 9 days, 11 days, 13 days, and 15 days ahead in some time periods.
In addition to the propagation and accumulation of errors, the oscillation of prediction performance may be caused by the non-linear variation of the eddy, and the temporary sea level variation is caused by a sudden change in atmospheric condition (such as typhoons). For example, the correlation coefficient is small in August 2015 at day 15 when Typhoon Soudelor and Goni pass through the study area.
To further quantify the prediction performance of the model, the SLA RMSEs for prediction and persistence from 18 April 2014 to 31 December 2017 are compared ( Figure 7). As the prediction time increases (from day 1 to day 15), the RMSEs of prediction and persistence are increasing. The growth rate in each prediction step is different, but the increase in prediction is slower than that of persistence at most time periods. When making short-term (no more than 3 days) forecast, the RMSEs of persistence are smaller than those of prediction. The small RMSE represents high accuracy. Therefore, the accuracy of persistence is higher than that of prediction for short-term forecast, which means the performance of persistence is better. However, the prediction model has obvious advantages when making a medium-term (higher than 3 days and no more than 15 days) forecast. The RMSEs of prediction are smaller than those of persistence for a medium-term prediction. It shows that the prediction outperforms the persistence for a medium-term forecast.
In addition, we also compared the averaged RMSE of SLA for prediction and persistence from 18 April 2014 to 31 December 2017 (Figure 8). Whether it is the average RMSEs of 15 days or 3 years, the average RMSEs of prediction are higher than those of persistence for short-term forecast, which means the accuracy of persistence is higher. It shows that the performance of persistence is better than that of prediction for short-term forecast. However, whether it is the average RMSE of 15 days or 3 years, the average RMSEs of persistence are higher than those of prediction for medium-term forecast, which means the accuracy of prediction is higher. This indicates that the performance of prediction is better than that of persistence. From Figure 8, the average RMSE of prediction and persistence from 1 day to 15 days gradually increases. At day 15, the 3-year average RMSE of prediction is 0.06 m, while the persistence one reaches 0.08 m. This indicates that skill of prediction is better than persistence. is higher than that of persistence, indicating that the performance of prediction is better than that of persistence. According to Figure 9, the skill score of persistence is lower than 0 in many time periods, which is caused by the accumulation of errors. Figure 10 shows the averaged skill scores of SLA for prediction and persistence from 18 April 2014 to 31 December 2017. As the time increases (from 1 day to 15 days), averaged skill scores of predictions and persistence gradually decrease. The 3-year average skill score of the prediction is 0.4764 at day 15, while the persistence one is only −0.1767. Whether it is the averaged skill score of 15 days or 3 years, the skill scores of predictions are lower than those of persistence for short-term forecasts, indicating that the accuracy of persistence is higher than prediction. However, the averaged skill scores of 15 days and 3-year of prediction are higher than those of persistence for medium-term forecasts, indicating that the performance of the prediction model is better than that of persistence. From (a-h) represent the predictions at day 1, day 3, day 5, day 7, day 9, day 11, day 13 and day 15, respectively. The red box is the part where the correlation coefficient oscillation greatly. Figure 6. The correlation coefficient and RMSE between the observed and predicted SLA in the study area from 18 April 2014 to 31 December 2017 with a 15-day ahead daily sliding prediction window using 35 PCs. The blue line represents correlation coefficient, and the orange line represents RMSE. From (a-h) represent the predictions at day 1, day 3, day 5, day 7, day 9, day 11, day 13 and day 15, respectively. The red box is the part where the correlation coefficient oscillation greatly.   Similarly, skill score is a way to evaluate prediction performance. Therefore, the skill scores of predictions and persistence are compared to evaluate the prediction model. Contrary to RMSE, the higher the skill score is, the higher the accuracy is and the better the performance is. Figure 9 shows skill scores comparison between the prediction and persistence from 18 April 2014 to 31 December 2017. As the prediction time increases (from 1 day to 15 days), the skill scores of predictions and persistence gradually decrease. While the descent rate is different in each prediction step. Generally, the skill score of prediction is higher than that of persistence, indicating that the performance of prediction is better than that of persistence. According to Figure 9, the skill score of persistence is lower than 0 in many time periods, which is caused by the accumulation of errors.    Figure 10 shows the averaged skill scores of SLA for prediction and persistence from 18 April 2014 to 31 December 2017. As the time increases (from 1 day to 15 days), averaged skill scores of predictions and persistence gradually decrease. The 3-year average skill score of the prediction is 0.4764 at day 15, while the persistence one is only −0.1767. Whether it is the averaged skill score of 15 days or 3 years, the skill scores of predictions are lower than those of persistence for short-term forecasts, indicating that the accuracy of persistence is higher than prediction. However, the averaged skill scores of 15 days and 3-year of prediction are higher than those of persistence for medium-term forecasts, indicating that the performance of the prediction model is better than that of persistence. Remote Sens. 2022, 14, x FOR PEER REVIEW 14 of 26

Four Eddy Event Examples
In order to further verify the accuracy of the prediction model, we made a 15-day prediction of four eddy events that occurred in the study area from 3 April 2014 to 31 December 2017 and compared those with the observed eddy events (see Figures 11-14). The four eddy events are: cyclone eddy event, eddy shedding event, anticyclone eddy event, and abnormal anticyclone eddy event, respectively. In fact, approximately 7 cyclonic eddy events, 7 anticyclonic eddy events and 2 eddy shedding events occurred in this area, but only 4 representative eddy events were selected for detailed introduction. The performance of the prediction model is verified by comparing the observed and predicted SLA for the eddy event. Due to the statistical characteristics of EOF analysis, the intensity of the predicted eddy is weaker than that of the observed eddy, which makes it very difficult to find an optimal contour to represent both the predicted and observed eddy boundaries. After numerous experiments, the contours of −0.13 m and 0.15 m are selected to represent the boundaries of the cyclone eddy and abnormal anticyclone eddy, respectively. The counters of 0.2 m and 0.165 m are chosen as the eddy edges for observation and prediction to demonstrate the ring of Kuroshio in Figure 12, respectively. The counters of 0.19 m and 0.17 m are chosen as the eddy edges for observation and prediction to show the anticyclone eddy in Figure 13, respectively.
To better understand the evolution of eddies, the variation in the geostrophic current around the Luzon Strait provided by long-term satellite observations is shown in Figures 11-14. In the following, the four eddy events and the variation of geostrophic current will be discussed in detail to verify the ability of the prediction model to track the evolution of eddies.     To better understand the evolution of eddies, the variation in the geostrophic current around the Luzon Strait provided by long-term satellite observations is shown in Figures  11-14. In the following, the four eddy events and the variation of geostrophic current will be discussed in detail to verify the ability of the prediction model to track the evolution of eddies. Figure 11 shows the comparison of observed and predicted SLA for a cyclone eddy event occurred in the study area from 27 December 2014 to 10 January 2015. According to the results from the observations, the main axis of the Kuroshio bent westward in the From Figures 11-14, it is clearly found that for the prediction of the four eddy events, the predicted value of SLA is lower than the observed value, which is meant in an absolute sense. That is, predicted SLA underestimates the intensity of the eddies and mesoscale fronts. As the prediction time increases, the difference between predicted and observed SLA becomes larger, and the difference mainly occurs in the area of eddy events. The possible reason is error propagation and the non-linear variation of the eddy. Although there are small differences between the predicted value and the observed value, the position and evolution of the eddy can still be captured. It shows that the performance of prediction model is good.

Discussions about the Influence of Extreme Weather to Prediction Model
From Figure 6, the performance of the prediction model is not very good in several time periods, such as May 2015, August 2015, April 2016, April 2017 and September 2017. The errors potentially come from prediction of the PCs by ANN and the construction process of the predicted SLA. In addition, the temporary sea level variation caused by sudden change of atmospheric condition (such as typhoons) may decrease the accuracy of prediction. Typhoons, a tropical cyclone in the eastern Pacific and Atlantic oceans, can cause strong winds, rainstorms, storm surges and huge wave heights [46,47]. When a typhoon occurs, the sea surface height will change drastically in a short time. This is a challenge for deep learning, because it cannot react quickly to rapid and short-lived changes. For example, when the Typhoon Soudelor and Goni passed the study area in August 2015, correlation coefficients were relatively lower than other periods at day 15 (see Figure 6). It can be seen from Figures 15 and 16 that the difference between the predicted and observed SLA becomes more and more obvious with the increase in prediction time, and the difference mainly occurs near the typhoon trajectory. This indicates that the prediction performance of the model will be reduced under the influence of extreme weather (such as typhoons).

Conclusions
SLA data obtained from satellite altimeter observations around the Luzon Strait from 1 January 1993 to 31 December 2017 were used for this study. A total of 85% of the data is used to train ANN for building a prediction model, and the remaining 15% is used to test the model. Because the dimension of data is too large, it is difficult and costly to directly predict by ANN. Therefore, the initial data was compressed by EOF analysis to reduce the data dimension.
A 15-day daily sliding prediction window was used to predict the SLA data from 3 April 2014 to 31 December 2017. Then, several methods were adopted to evaluate the performance of the model. First, the model was evaluated by calculating the spatial correlation coefficients and RMSEs between the predicted and observed SLA data. The results

Conclusions
SLA data obtained from satellite altimeter observations around the Luzon Strait from 1 January 1993 to 31 December 2017 were used for this study. A total of 85% of the data is used to train ANN for building a prediction model, and the remaining 15% is used to test the model. Because the dimension of data is too large, it is difficult and costly to directly predict by ANN. Therefore, the initial data was compressed by EOF analysis to reduce the data dimension.
A 15-day daily sliding prediction window was used to predict the SLA data from 3 April 2014 to 31 December 2017. Then, several methods were adopted to evaluate the performance of the model. First, the model was evaluated by calculating the spatial Remote Sens. 2022, 14, 281 21 of 23 correlation coefficients and RMSEs between the predicted and observed SLA data. The results showed that even if there are some performance oscillations in the later days of the prediction window, there are still higher correlation coefficients and lower RMSEs in most time periods. This indicated that the accuracy of prediction model is higher. In addition, the prediction system was evaluated by comparing the RMSE and skill score of prediction and persistence. The results show that for short-term forecast, the RMSE of persistence is lower than that of prediction, and the skill score of persistence is higher than that of prediction. This indicated that the performance of the persistence is better than that of prediction for short-term forecasts. On the contrary, the RMSE of prediction is lower than that of persistence and the skill score of prediction is higher than that of persistence for medium-term forecasts. Moreover, the 3-yr mean RMSEs of prediction and persistence are 0.06 m and 0.08 m at day 15, respectively. The 3-yr mean skill scores of predictions and persistence are 0.4764 and −0.1767 at day 15, respectively. It shows that the performance of prediction is better than that of persistence for medium-term forecasts. Finally, the prediction model is evaluated by comparing the SLA of the eddy and eddy shedding events occurred in the study area. It shows that the model can capture the location, propagation and evolution of eddy events and eddy shedding events well.
Although the accuracy of the prediction model is higher most of the time, the accuracy of the prediction decreases when extreme weather occurs near the study area. The possible reason is that air-sea interaction is not built into the prediction model. Therefore, the air-sea interaction will be considered to improve the performance of the prediction model in future research. At the same time, the algorithm will be optimized to reduce errors and improve the accuracy of prediction. In addition, the prediction model can be used to determine other ocean physical quantities, such as sea surface temperature (SST), and sea surface salinity (SSS), and applied in other sea areas.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.