Loop Current SSH Forecasting: A New Domain Partitioning Approach for a Machine Learning Model

: A divide-and-conquer (DAC) machine learning approach was ﬁrst proposed by Wang et al. to forecast the sea surface height (SSH) of the Loop Current System (LCS) in the Gulf of Mexico. In this DAC approach, the forecast domain was divided into non-overlapping partitions, each of which had their own prediction model. The full domain SSH prediction was recovered by interpolating the SSH across each partition boundaries. Although the original DAC model was able to predict the LCS evolution and eddy shedding more than two months and three months in advance, respectively, growing errors at the partition boundaries negatively affected the model forecasting skills. In the study herein, a new partitioning method, which consists of overlapping partitions is presented. The region of interest is divided into 50%-overlapping partitions. At each prediction step, the SSH value at each point is computed from overlapping partitions, which signiﬁcantly reduces the occurrence of unrealistic SSH features at partition boundaries. This new approach led to a signiﬁcant improvement of the overall model performance both in terms of features prediction such as the location of the LC eddy SSH contours but also in terms of event prediction, such as the LC ring separation. We observed an approximate 12% decrease in error over a 10-week prediction, and also show that this method can approximate the location and shedding of eddy Cameron better than the original DAC method.


Introduction
The Gulf of Mexico (GoM) is a semi-enclosed basin whose circulation is dominated by the Loop Current (LC), which sheds large anticyclonic eddies called Loop Current eddies (LCE) at varying time intervals [1]. These eddies drift westward and dissipate near the western boundary of the GoM. Predicting the LC and its eddy shedding is fundamental to many aspects of marine life and human activities in the GoM region, including natural disaster responses, short-term weather anomaly predictions, offshore oil and gas operations, and ecosystem services. The importance of this issue is stressed by a call to action in a report prepared by the National Academies of Sciences, Engineering, and Medicine [2]. The report recommends three tasks for researchers, one of which is to improve predictive skill in forecasting an eddy shedding event from an extended LC out to a forecast period of approximately three months.
In an effort to address this challenge, a deep learning approach to forecast the sea surface height (SSH) of the LC system was proposed in [3,4]. The problem was formulated as time sequence regression and prediction [5], and solved with a Recurrent Neural Network (RNN). RNN is a suitable tool that can learn from recurrent patterns and therefore realize predictions. Long Short-Term Memory (LSTM) networks, a type of RNN, were chosen because of their capability of handling long-term dependencies contained in time series [6]. In order to reduce the computational costs associated with the handling of large datasets, the region of interest was partitioned into smaller non-overlapping sub-regions. In such an approach, the local features associated with the LC evolution and eddy shedding can be resolved. For each partition, an empirical orthogonal function (EOF) decomposition was applied to the SSH in order to reduce its dimension. At this point, each time series of principle components from each partition was independently predicted with local expert LSTM networks. After each prediction, these principle component time series were reconstructed into the original subregions, and the partitions were pieced back together to produce an overall prediction of the region's SSH. With this Divide and Conquer (DAC) method, Wang et al. [4] predicted the LC evolution and eddy shedding more than two and three months in advance, respectively. At every prediction step, the predicted SSH across the neighboring partitions were smoothed using an interpolation function described in [4]. While the DAC method was relatively effective in forecasting the eddy frontal positions, merging the SSH predictions across the partitions contributed to error propagation.
In this article, a new method to reduce partition boundary errors is presented. In digital signal processing, signal time windows are often chosen to overlap one another to improve the signal-to-noise ratio of digital filters. Similarly, in this study, the region of interest is partitioned into 50% overlapped sub-regions. The SSH value at each prediction point is obtained by a weighted average of the SSH values of the same point in the overlapping region of the partitions. This procedure avoids the progressive smoothing process across partition boundaries, which was implemented in the original DAC method.
The details of the methodology are given in Section 2. Results from the methods are presented in Section 3, and concluding remarks follow in Section 4.

Dataset
The SSH used is this study was obtained from the HYCOM+CFSR 1/25 GoM 54-year Experiment (here-after GoM-HYCOM), which consists of 18 years of simulated SSH from 1992 to 2009 [7]. The dataset was split into approximately 15-, 1-, and 2-year periods for training, validation, and testing, respectively. Forecast experiments were conducted for approximately eighty 20-week sliding windows over the testing period.

Recurrent Learning
The prediction model used in this study is an RNN, which was developed to represent a time sequence (see Wang et al. [4] for details). RNNs work by feeding the output of each neuron, along with a new input, back into itself, forming loops within its architecture. In an RNN network, a simple RNN neuron or hidden unit's output behavior can be modeled by a recursion shown in Equation (1), where W g i represents the weights of the input, W g s the weights of the recurrence, f is an activation function, and x n and s n are the input and state at time n, respectively. Note that the output can be obtained from the state whenever it is needed.
A problem associated with the RNNs given in Equation (1) is its gradient vanishing problem. This is because RNNs are typically trained with a gradient descent algorithm, and the gradients may vanish for a multi-layer RNN due to the chain rule in differentiation [8]. Long Short-Term Memory (LSTM) neural networks were designed to deal with this issue [6,9,10], in which a memory unit m n was added to avoid the disappearance of gradients. Let α and β be constants and ⊕ denote an element-wise multiplication, then the memory unit is updated by the following rule: The state is then related to the memory unit with an activation function. In this way, the derivatives will not vanish due to the additive relationship described in Equation (2). The overall prediction scheme is the same as the one used in [4], in which the Long Short-Term Memory (LSTM) network was paired with EOF decomposition to predict the evolution of the LC system using SSH data. A brief overview of the prediction method is given here (a detailed description can be found in Wang et al. [4]).
In each sub-domain, the EOF+LSTM model predicts SSH of the future weeks using all available data (see Figure 1). Let us denote the SSH data for week 1 to week n by p 1 , p 2 , . . . , p n . To predictp n+1 , thus the PCs of the SSH at week n + 1, p 1 , p 2 , . . . , p n are first used to train the prediction model sequentially, so that it "learns" the evolution of the PCs. Then, the model can predict the PCs for week n + 1, orp n+1 . To predict the PCs for week n + 2, orp n+2 , p 1 , p 2 , . . . , p n ,p n+1 are then used to retrain the neural network. With the retrained neural network, the PCs for week n + 2, thusp n+2 are predicted. This recursive process is thus continued for as long as the prediction errors remain within the range set for the forecast. In practice, new SSH measurements in the forecasting period can be added progressively to retrain the prediction model.

Divide and Conquer Prediction Model of the Gom SSH
A key step in the algorithm is to decompose the SSH data into temporal and spatial parts using the EOF method. Only the temporal series is forecasted at each time step by the prediction model. The method, however, requires an eigen value decomposition of the SSH matrices. Given a SSH two-dimensional field time series, it can be represented by a matrix X such that the rows are a representation of the temporal direction, and the columns are that of the spatial direction. The singular value decomposition of X is given below.
Here, matrix UΣ represents the temporal principal components (PCs) and matrix V T contains the spatial patterns (EOFs).
When the SSH dataset is large, the above matrix decomposition may result in memory overflow errors. In order to reduce the computational burden and to exploit the high resolution of the SSH data, a divide-and-conquer strategy was proposed in [4]. The region of interest was first partitioned into a number of "non-overlapping" sub-regions. At each prediction step and for each partition, a "local" prediction is obtained from the EOF/LSTM algorithm. To reconstruct the SSH field over the whole prediction domain, a progressively weighted average was applied to the boundaries of each partition to smooth errors due to discontinuities across boundaries (Figure 1). However, these partition boundary errors contributed to the increase of model forecast errors over time as shown hereafter. For further details on the DAC approach, readers are referred to Wang et al. [4].

Sinusoidal-Weighted Overlapped Partitioning
In the original DAC method, a progressively weighted averaging procedure was used to smooth the SSH differences across every partition's boundaries in an effort to remove discontinuities between partitions. However, this smoothing approach may interfere with the SSH dynamics across the region by removing bumps or troughs that would grow across the boundaries. The method proposed herein relies on removing discontinuities across boundaries that now overlap. SSH in the overlapping areas will be calculated as the combined solution of all overlapping partitions. This new approach is shown to preserve event prediction integrity and reduce error propagation.
For simplicity, we group 50% overlapped partitions into two groups such that no partition in the same group overlaps one another. These groups are referenced as the firstand second-level partitions, respectively. The first-level partitions have equal areas, cover the entire domain, and are denoted by f (blue partitions in Figure 2). The second-level partitions are superimposed on the joint boundary of the previous ones (yellow partitions in Figure 2) and denoted by g. The predicted SSH of overlapping partitions is averaged at each point with weights computed with a sinusoidal function of the normalized distance between the particular point and the center of the partitions f and g, respectively, labeled as (C l,x,y ). l represents the input layer ( f layer or g layer) and (x, y) represents the index of the partition. Partitions h contain the final SSH state after merging of partitions f and g using the formula shown in Equation (5), which depends on the weights r f and r g , which are the distances calculated in Equation (4). ∀(l, i, j) ∈ Partition l,x,y , r l = dis((i, j), C l,x,y ) where h(i, j) is the resulting SSH, and f (i, j) and g(i, j) are the respective SSH values in each overlapping partition. At last, a median filter with a window size of 3 × 3 is applied to the h partitions to remove outliers. One may use a scheme with either more than or less than 50% overlapping. The advantage of using 50% overlapping is that we can construct two non-overlapping planes, as shown in Figure 2, so that for any point within the second (g) plane, two and only two values (one from each of the planes f and g) are obtained for reconstruction of each point in the output plane h.

LSTM Forecasting of the Gom SSH with Overlapping Partitions
The LSTM model was applied to the forecasting of the SSH during the 2-year testing period. Eighty 20-week forecasts were analyzed by using performance measure of the model skills defined in [4]. The forecasts of the overlapping partition method were compared to the forecasts made with no overlapping partitions. The effect of discontinuities between partitions of the SSH forecast can be seen in Figures 3 and 4.
We present a 3D plot (to stress the effect of the discontinuities on smooth fields) as well as a 2D plot which shows the affect the discontinuity may have on the LCS. Both these plots are snapshots of predictions 10 weeks ahead, so that the error propagation is large enough to see. We can clearly see in Figure 4 that the discontinuity alters not only the shape of the LC, but also alters its path. While the true LC and the overlapping model predict the current veering to the West, the non-overlapping partitions predict a pinch in the LC without spin. This shows that providing a more fine-grained partitioning method that allows for error corrections on the boundaries of the original non-overlapping partitions can aid long-term forecasting of eddy movement and shedding. The Root Mean Square Error of the distances from the eddy front to seven reference points between the observed (HYCOM) and predicted reference field was calculated as where dM i,n and dO i,n denote the distance from the ith reference point at week n for the predicted and observed SSH, respectively. For more details on the selection of these reference points, the selection of contours, and the calculation of this metric, the reader is referred to Wang et al. [4] and Oey et al. [1]. For consistency with previous studies, the 0.45 m SSH contour was used [1]. Figure 5 shows a decrease by approximately 2.5 km in the first 5 weeks of the RMSE dF followed by a 5 km decrease in the following weeks. The overlapping partition model significantly improved the SSH prediction over the nonoverlapping one along with the forecast duration. Overall, we can observe a 12% decrease in RMSE dF over a 10 week prediction.  This improvement is noticeable in the SSH field of the LC shown in Figure 6, particularly for the forecast of eddy detachment and re-attachment. Week 7 and 10 SSH show the first detachment of eddy Cameron in May 2008. Based on the SSH only, the detachment is better predicted by the model with overlapping partitions than the original DAC model. To concretely show the advantage of using this improved partitioning scheme, we revisit one of the original experiments presented in Wang et al. [4], specifically the prediction of eddy Cameron eight weeks in advance. On this eight-week horizon, the non-overlapping scheme works to perfectly capture and predict the timing/dynamics of eddy Cameron. However, on a longer horizon, its performance decays significantly. While eddy Darwin was able to be predicted 10 weeks in advance, the dynamics of Cameron were shown to be more complex. Thus, to show the decay of performance as well as the strength of the overlapping partitions scheme, we present Figure 7. Originally, Wang et al. found a 10-week prediction of eddy Cameron to be too erroneous, and thus presented an eight week prediction instead. With this new overlapping method, we can clearly extend the predictability of eddy Cameron a couple of weeks ahead. Figure 7. Presents the 10 week prediction of eddy Cameron with the non-overlapping DAC algorithm (left), overlapping DAC algorithm (middle), and ground truth (right). These results show that the original DAC algorithm, in a 10-weekahead prediction does not capture the proper timing of the eddy shedding event of eddy Cameron. Notice the significant discontinuities that arise on the field as well, showing that perhaps errors in the pinched area may be responsible for misinforming the timing of the shedding. We also provide a 0.45m contour to better show the location of the eddy front.

Conclusions
In this study, we applied a RNN model with a memory unit and an activation function, called LSTM, to the prediction of the LC system's SSH. In order to forecast the GoM domain, a DAC approach was used. The prediction domain was initially divided into multiple non-overlapping partitions and a simple smoothing function was used to enforce continuity of the SSH forecast at the partition boundaries. This model, implemented in [4], exhibited significant error growth at partition boundaries. In order to mitigate this DAC model error, we proposed in the study herein a new prediction domain partition approach in which the partitions are overlapping. The SSH in the overlapping regions was calculated as a sinusoidal weighted function of the SSH in each overlapping partitions. This new approach led to a significant improvement of the overall model performance both in terms of features prediction, such as the location of the LC eddy SSH contours, but also in terms of event prediction, such as the LC ring separation as shown in Figure 6. Although the overlapping method can be simply implemented, it is slightly more computationally expensive than its non-overlapping counterpart. Moreover, the sinusoidal weight function can also be swapped for other types of weight functions that could further improve the SSH prediction.