A Simple Statistical Intra-Seasonal Prediction Model for Sea Surface Variables Utilizing Satellite Remote Sensing

: In this paper, a novel and simple statistical prediction model for sea surface multivariate is developed based on extended empirical orthogonal functions (referred to as the MEEOF model). This simple model embeds the temporal evolution information into the empirical orthogonal function spatial patterns, which effectively captures the spatial distribution of the sea surface variables and their evolution process over time, and can be used to improve the accuracy of intra-seasonal ocean forecasts. At the same time, it considers both the correlation between different spatial and temporal points and the dynamic balance between different sea surface variables. The performance of the MEEOF prediction model has been examined in the South China Sea (SCS) based on remote sensing satellite datasets. Experimental results demonstrate that this model has significant forecasting capability within the forecast window of 15–90 days, compared with the traditional persistence forecasts and optimal climatic normal (OCN) results. Furthermore, this model exhibits good forecast performance throughout the entire forecast window; the final prediction model (referred to as the fusion model) is established by obtaining the weighted average for the MEEOF forecasts and persistence forecast results. Numerical experimental results show that this fusion prediction model consistently outperforms the persistence model, the OCN model, and the linear regression model over the entire forecast window. A case study shows that the propagation of ocean waves and the coordination between different sea surface variables can be well predicted by this simple model, indicating that this fusion model has a potential advantage in intra-seasonal predictions of the ocean.


Introduction
Ocean forecasting is based on the understanding of the past and present state of the ocean and its evolutionary laws, using numerical models, observational data, data assimilation schemes, and other statistical methods to analyze, judge, and forecast ocean processes and phenomena [1]. It is well known that accurate predictions of the ocean state at different spatial and temporal scales are necessary [2]. In particular, mesoscale phenomena are widespread throughout the ocean, where mesoscale eddies are the most energetic forms of motion prevailing in the ocean [3].
The South China Sea (SCS) is one of the largest marginal seas, exhibiting complex topography [4]. On the one hand, the SCS has abundant water exchange channels and is vulnerable to the large-scale circulation of the Sulu Sea and the Pacific Ocean, showing strong seasonal signals and inter-annual oscillations [5]; on the other hand, the unique geographical environment of the SCS makes it extremely abundant in oceanic eddies, and mesoscale fluctuations on the temporal-spatial scales of between 20-150 days and 50-500 km dominate the oceanic variability [6]. In view of this, It is essential to predict the intraseasonal variation of these mesoscale signals.
Generally, there are two main types of methods for forecasting ocean variables [7]. One is based on the numerical model, which achieves the forecasting of ocean variables through the use of ocean dynamics models [8][9][10][11]; the other is a data-driven model. Ocean numerical models use various evolution equations to describe ocean variability [12]. With the rapid development of ocean data assimilation schemes and the improvement of model resolution, numerical model forecasting technology has been increasingly developed and has become an important approach for prediction. The Global National Real-Time Ocean Forecasting System (RTOFS), which is based on the Hybrid Coordinate Ocean Model (HY-COM) [10], is the most widely used operational forecasting system, and it can provide daily forecasts for eight days [13]. The National Earth System Prediction Capability (ESPC) forecasting system is also based on the HYCOM, and it is a fully coupled forecasting system with a spatial resolution of 1/25, providing 16-day daily forecasts and 45-day weekly forecasts [14]. Also, the French Mercator Ocean International forecast system uses the Nucleus for European Modeling of the Ocean (NEMO) to predict sea level for 10 days in advance [15]. For some special and long-term oceanic phenomena, such as the western boundary currents, in the case of known external forcing, its predictability can be within a few months [16]. However, the numerical models mentioned above involve a large number of ideal assumptions and also require complex computational platforms [17,18].
With the development of ocean satellite observations and ocean data assimilation technologies, a large amount of marine environmental data has been accumulated, including in situ observation data, satellite remote sensing data, and high-quality reanalysis products. These data provide strong support for us to carry out ocean forecasting based on data-driven methods. Data-driven forecasting methods often use statistical tools or artificial intelligence methods to analyze large amounts of historical data and find the potential connections between the data. Such methods are suitable for the prediction of physical processes that are not clearly defined. Common statistical methods and machine learning methods are data-driven models. Recently, the research topic of the data-driven forecasting of ocean variables has received increasing attention [19][20][21], especially in the context of the limited understanding of ocean processes. Neural network models, such as multilayer perceptron (MLP) and long-and short-term memory (LSTM) models, have been used to predict sea surface temperature (SST) [22,23]; Imani, et al. [24] analyzed and predicted monthly sea level anomalies (SLAs) using an autoregressive integrated moving averaged (ARIMA) model; a linear regression model combining the dynamical ENSO was used to predict the SST in the Indian Ocean [19]; an efficient neural network has been used to predict SST for several sites [25]. However, we learn from the above studies that traditional data-driven methods usually describe the ocean state at a single scale, ignoring the interactions between different variables and the multi-scale effects of the ocean. Therefore, they are only suitable for the short-term forecasting of ocean variables. For traditional linear statistical models, tangent linear models are generally considered, which cannot resolve nonlinear signals and therefore, cannot extend the time validity of predictions; while for nonlinear models such as neural networks, such approaches generally do not consider the relationship between different variables and cannot explain their dependence over longer periods.
Considering the above, we develop a simple statistical prediction model based on the multivariate extended empirical orthogonal function (EEOF) decomposition, which is referred to as the MEEOF prediction model in this study. With the EEOF analysis, the maximum probable and orthogonal evolution patterns of the ocean can be extracted from the various possible evolutionary forms in the past; in addition, multiscale, spatiotemporal information, as well as the correlation information between different variables, are all integrated into the patterns, and those patterns are coherent both in space and time. The orthogonal patterns obtained in this way contain both the temporal evolution of a certain variable and the dynamic correlation between different variables. Then, the future predictions are achieved by projecting the current state onto these patterns to obtain the weight coefficients. This model transforms the spatiotemporal prediction problem into a series of linear combinations of basis functions and optimal weight coefficients, taking into account the fluctuations of the ocean. Our "test block" of the ocean is the SCS, which is a semienclosed basin [26,27].
The remaining sections of this paper are organized as follows: Satellite remote sensing observations data and the study area in this study are described in Section 2; the related methodology is also presented in this section. In Section 3, experiments and the corresponding results are given in detail. Then, in Section 4, we develop a fusion model to enable satisfactory performance throughout the entire forecast window. The discussion is included in Section 5. Finally, we summarize the conclusions in Section 6.

Data Collection
In this study, considering the correlation of different variables and the availability of current ocean satellite remote sensing data, we chose SLA and SST for modeling. The SLA data used here is released by the Copernicus Marine Environment Monitoring Service (CMEMS) with a spatial resolution of 1/4° and a temporal resolution of 1 day. This data is a multi-satellite fused altimetry data that provides daily SLA values from 1 January 1993 to the present. The SST data is produced by the National Oceanic and Atmospheric Administration (NOAA) daily Optimum Interpolation Sea Surface Temperature (daily OISST, version 2) with a spatial resolution of 1/4°. The AVHRR-Only SSTs data are used here, which use only satellite SSTs from the AVHRR and provide daily SST values from 1 September 1981 to the present [28].
The latitude and longitude of the study area ranges from 5° N to 23° N and 105° E to 122° E, respectively, as shown in Figure 1. Moreover, since the following modeling process involves orthogonal decomposition, we hope that the patterns obtained from the decomposition are not only the maximum probable results, but also the optimal spectral decomposition results [29]. Based on the above aims, the Sulu Sea is excluded in this study by temporarily disregarding the interaction between the Sulu Sea and the SCS. In addition, in the shore and shallow areas, the satellite remote sensing data are susceptible to tidal and internal waves that can lead to accuracy degradation [30]. Thus, these data are masked out in this study. Given the availability and time frame of the aforementioned remote sensing dataset, the 26-year data from 1993 to 2018 are selected as the study dataset. To ensure that the experiments are statistically significant, as well as to guarantee the reliability of the experimental results, a round-robin approach is adopted, i.e., any set of 25 years of data is used to establish each prediction model, while another one-year set of data serves as the independent variable to validate the performance of the prediction model. Here, the spatialtemporal grids for the analysis and prediction are set up with the spatial resolution of 0.25° × 0.25° and the temporal resolution of 1 day, respectively. In this study, we can regard them as true fields.

Description of the Proposed Prediction Model
The flowchart of the proposed prediction model is shown in Figure 2. In the model construction process, the multivariate dataset is divided into two parts, one with 25 years of data for constructing the prediction model, and the other with 1 year of data for validating. Then, EEOF analysis is performed on these 25 years of data, and the multivariate extended orthogonal spatiotemporal modes (MEEOFs, Appendix A) and the corresponding principal components (PCs) can be obtained. The details are described in Section 2.3. At the time of prediction, a certain proportion of MEEOFs are retained based on the significant test results [31] and divided into two parts: the fitting spatiotemporal basis and the prediction spatiotemporal basis. The weight coefficients can be obtained by projecting the observational data from the independent test sample onto the fitting spatiotemporal basis; then, this set of optimal weight coefficients can be combined with the prediction spatiotemporal basis to obtain the final prediction field. The details are explained in Section 2.4.

Extended Empirical Orthogonal Function Analysis of Multivariate Observations
Empirical orthogonal function (EOF) analysis is a classical statistical method in the field of the atmosphere and the ocean [32]. It can quickly decompose a large amount of data and extract the main spatial structure, effectively analyzing the spatial and temporal variability of the data field while reducing dimensionality [33][34][35]. However, it cannot describe the evolution of spatial features in the time dimension, which makes it impossible for us to understand its spatial variation using a long-time scale. Therefore, to solve this problem, heterogeneous datasets at different times could be used in EOF analysis, which is called EEOF analysis [36,37]. In this way, the variation of the variable over continuous time can be considered. Therefore, EEOF analysis has been widely used to study the longterm evolution of atmospheric and oceanic processes [38].
In this study, the window length is 1 year, i.e., the column vector consisting of the daily data from 1 January to 31 December of each year represent one subsample for EEOF analysis. EEOF analysis is also performed on the multiple sea surface variables simultaneously, and Z represents the original sample matrix. is the total dimension of one subsample, and M is the number of the subsamples. Apparently, NM here. By arranging the matrix in this way, each column vector contains the signal over a period of time, and also expresses the spatiotemporal multiscale details of the sea surface variables. At the same time, the simultaneous decomposition of multiple variables can take into account the dynamics and thermal matching relationships between different variables.
In this study, EEOF decomposition is applied to the anomaly matrix, which is constructed by removing the climatology. Climatology usually represents the periodic signals of the ocean [39], and here, we assume that    ,, 11

11
, and climatology can be expressed as where anomaly Z is the anomaly matrix.
In addition, the matrix (5) needs to be standardized, since the scales and units differ between different variables: where Z is the matrix for EEOF decomposition, and σ is the standardization matrix, Here, () Cor Z effectively takes into account not only the temporal and spatial correlation between different points, but also the dynamic balance of the different sea surface variables.
In general, we need to perform Jacobian decomposition on () Cor Z . However, as mentioned before, NM in this study, and the dimensions of () Cor Z , are too large to be allocated in the computer memory. Fortunately, T ZZ and T Ζ Z have the same nonzero eigenvalues, although their eigenvectors differ. Thus, the Jacobian decomposition would apply to the matrix T Ζ Z here to reduce the computational cost, Then multiply Z to each side of the Equation Error!  2 T 2 Λ ZZ ZV ZV (12) We found that 2 ZV is the eigenvector of () Cor Z , with Λ being the eigenvalue matrix. Assume that where main 1 V is the dominant part of 1 V . We can find that the relationship between main 1 V and 2 V can be written as follows: Thus,  () So far, the anomaly data of the sea surface multivariate are decomposed into MEEOFs and corresponding PCs in this way. Notably, MEEOFs here contain not only the temporal evolution information of a certain variable, but also the spatial relationship between different sea surface variables. Therefore, this provides effectual information for the prediction of the sea surface multivariate.

Multivariate Prediction
When enough samples are available, we can assume that the ocean has gone through enough possible initial states and has evolved in all possible initial states to provide information for predicting future state. In this sense, if the ocean state at the current moment has some similarity to the state at some point in the past, then the future evolutionary form can be determined based on the past evolutionary form. In this study, MEEOF analysis extracts the most likely orthogonal eigen evolution from the various possible evolution forms in the past.
Therefore, if a sufficiently large number of samples are used, more MEEOFs can be obtained by decomposition, allowing a more complete phase space to explain the sea surface processes represented by SLA and SST. By projecting the observations onto these MEEOFs, the optimal weight coefficients can be obtained; then, the spatiotemporal prediction can be achieved by combining these weight coefficients with the prediction spatiotemporal basis. Here, MEEOFs from the EEOF analysis can be divided into two parts, where fitting V is the spatiotemporal basis used for fitting the observation data and prediction V is the basis for the prediction. Similarly, the standardization matrix σ can also be divided into the following form: Climatology can also be divided into two parts, that is, If the observational data obs Z is known for a certain period of time, we can obtain the corresponding coefficients optimal T using the least squares estimation (LSE) approach, and the cost function J can be constructed as follows: fitting fitting optimal obs fitting fitting optimal obs By making the gradient of the cost function J be zero, that is we can find fitting fitting fitting fitting fitting obs TV σ VV σ ZZ (21) Then, based on the optimal weight coefficients optimal T obtained above and the prediction spatiotemporal basis prediction V , the prediction results can be expressed as:

Performance Evaluation Criteria
In this study, a round-robin experiment is conducted on the original dataset of the 26-year period in the SCS: in each experiment, the datasets with 25 years are used as the samples for EEOF decomposition; then, another independent year is used for prediction validation. For each independent year, we conduct 200 prediction tests, thus a total of 5200 tests are conducted for 26 trails. With a spatial resolution of 0.25° × 0.25°, the total number of spatial grid points of SLA and SST over the entire study area for 1 year is 1,679,730, and the total number of samples is 25. The statistical metrics of the mean absolute error (MAE), root mean square error (RMSE), and correlation coefficient (CC) are used to assess the accuracy of the prediction. In this study, MAE and RMSE for th i grid point can be expressed as follows: The spatial-temporal averaged CC can be expressed as:

Code Performance
In this section, experiments are carried out to test the code performance of the EEOF decomposition, since it is the most time-consuming part of this prediction model. We perform the experiments on a workstation of Intel i9 CPU @2.90 GHz, 32 G RAM, with Windows 10 Home 64-bit operating system. The running times for the EEOF decomposition with the different numbers of historical samples are presented in Table 1. In Table 1, m and n represents the number of rows and columns of the sample matrix, respectively. The experimental results show that the running time becomes longer as the number of samples increases. The running time for the 25-year historical sample is 483.330 s, which is much shorter than the running time of the numerical model forecasts. The code performance shows the advantage of this prediction model in operational applications.

Fitting Performance and Parameters Determination
First, we design experiments to explore the prediction performance of this model for different fitting days. Here, we take the average forecast results of 1-60 days as the assessment standard to compare the forecasting results of different fitting days. In the experiments, 7, 15, 30, 60, and 90 fitting days are used, respectively. We use 24 MEEOFs here, the variance proportion of which is 100%. Figure 3 shows the MAE and RMSE results of the SLA and SST forecasts for different fitting days in the 26 experiments. It is easy to see from the figure that the overall forecast errors are relatively stable for both SLA and SST, indicating the reliability and stability of this prediction model. Notably, in this study, we use different sea surface variables for simultaneous modeling, and different variables have different evolutionary processes and different cycles. On this basis, the optimal fitting days need to be considered and selected comprehensively so that the simultaneous prediction performance of SLA and SST can be optimized. From Figure 3, we can see that different fitting days have different effects on the prediction performance of SLA and SST (Appendix B). When the fitting days are 60 days, the comprehensive prediction performance is the best; the MAEs of the temporal-basin averages of 1-60 days for SLA and SST are 0.0490 m and 0.628 °C, and the corresponding RMSEs for SLA and SST are 0.0664 m and 0.795 °C . The above statistical results are the average results over 26 trials.
Then, we investigate the performance of different forecast days. Based on the above results, we adopt 24 MEEOFs, and the number of fitting days is fixed at 60 days. The forecast performance for 7, 15, 30, 60, 90, and 120 days are compared in Figure 4, which shows the MAEs and RMSEs of SLA and SST at different forecast times. As can be seen from Figure 4, the changes in MAEs and RMSEs are relatively gentle as the forecast window is extended. The above conclusions hold for both SSHA and SST. We conclude that this prediction model has a potential advantage for intra-seasonal prediction.

RMSE and CC Evaluation of Forecast Performance
Next, we show the model's prediction performance. Based on the above model, we use 60 days for fitting and 90 days for prediction. Here, persistence prediction and the optimal climatic normal (OCN) method are compared with the MEEOF model. We use the above models to perform round-robin predictions to ensure the reliability of the experiments. Persistence prediction is a widely accepted reference for baseline comparisons in the atmospheric and oceanic sciences [40]. The basic idea is to assume that the ocean state at the initial moment will continue indefinitely throughout the entire forecast window. The OCN model is a common method for mid-and long-term ocean forecasting, and it uses the average of the previous k years as the subsequent forecast value [41]; here, the value of k is taken as 25.  (26) where  26 p here. Figures 5-8 show the basic statistics of the MEEOF model for the RMSEs and CCs of SLA and SST in the SCS. At the same time, as a comparison, the persistence and OCN prediction results for these two sea surface variables are also shown in these figures, by which the prediction capability of the MEEOF prediction model can be demonstrated.
The RMSEs of SLA forecasts validated at the 7th day, 30th day, 60th day, and 90th day are shown in Figure 5. Comparisons of the three models in Figure 5 show that the MEEOF prediction model outperforms the persistence and OCN forecasts over the 90-day forecast horizon. Although the persistence forecast has a small error at the beginning of the forecast window, its error grows rapidly as the forecast time increases, while the MEEOF model and OCN predictions remain largely stable throughout the entire forecast window. In particular, the error of the MEEOF model grows very slowly with the extension of the forecast window and is always smaller than that of the OCN model. This is also true for the SST forecasts (see also Figure 6). Thus, it can be concluded that the predictive capability of the MEEOF prediction model has an obvious advantage over persistence prediction in the mid-and long-term, and it outperforms the OCN prediction model throughout the entire forecast period.
As presented in Figure 5, these three models all show two areas with a large RMSE, one starting from the Luzon Strait, and the other area roughly located in the west-central part of the SCS, near Vietnam. It is well known that the Luzon Strait is the open boundary of the SCS, serving as a bridge for information exchange between the Northwest Pacific and the SCS. When the Kuroshio intrusion occurs, part of the Kuroshio water, with high temperature and high salinity, enters the SCS through the Luzon Strait and extends to the interior. Since the MEEOF model does not consider the influence of the boundary conditions, the error in this area cannot be well handled. Another area is located in the strong current zone, where ocean fronts are likely to be generated, resulting in a large variation in this area, which is obviously not well solved by the statistical model.

Description of the Fusion Model
From what has been discussed above, it is worth noting that the advantage of persistence forecasting is maintained in the short-term forecasts, while the benefit of the MEEOF forecasting is reflected in the mid-and long-term forecasts. Here, to ensure that the MEEOF model has good performance throughout the entire forecast window, we adopt a weighted average form of the persistence forecasts and the MEEOF forecasts in the final predictions. The final forecast can be expressed as follows: In addition, the previously noted error coefficients are calculated based on 26 trials. Figure 9 shows the error fitting results of the persistence and MEEOF predictions for SLA and SST.

Forecast Performance of the Fusion Model
Furthermore, we evaluate the prediction performance of the above fusion model, comparing it with the linear regression model [19], in addition to the traditional persistence forecasting and OCN model described above.
Here, the linear regression model is combined with the EOF analysis for predicting the sea surface multivariate. The specific process is: in each experiment, the multivariate dataset is decomposed by EOF analysis to obtain the orthogonal spatial modes (EOFs) and the corresponding PCs; then, the corresponding linear regression model is constructed for each PC series. Here, regression models are established as follows.
where the predictand and predictor are denoted as q i P and q i pc , respectively, in which q represents the th q PC and i represents the lead time;  is the regression coefficient; q i cons is the constant term. Here, we keep the same experimental setup as in the MEEOF model, i.e., we use the information from the first 60 days to predict the states for the next 90 days. In this process, 4,602 EOFs can be obtained, which is consistent with the number of grid points in one day for the two sea surface variables, but only 29 EOFs passed the significant test, so we use 29 PCs for modeling here. The forecast performance of the fusion prediction model is further demonstrated in Figure 10 and Figure 11; the persistence prediction model, OCN results, MEEOF model, and linear regression model are also shown as comparisons. It is easy to see from the figures that the fusion model outperforms the persistence forecast within the short-term forecast. When the forecast window is continuously extended, the fusion model consistently outperforms the linear regression model and the OCN model.
From the above discussions, we understand that the forecast performance of the fusion model is more effective than the other comparative models. This is because traditional persistence forecasting assumes that the ocean variations occur in a purely linear manner over an infinite period of time, which is inconsistent with the real variations in the ocean environment, so the predictions are only valid for shorter periods of time; the linear regression method describes the processes of ocean change in terms of linear evolution over a finite period of time, which cannot be well described for the nonlinear signal; OCN model results generally represent a multiyear average, which describes the largescale periodic signal of the ocean, and therefore, they are insufficient to describe the intraseasonal variability of oceanic multiscale processes. However, the MEEOF prediction model effectively takes into account the fluctuations of the ocean, and when there is enough data, this model can effectively capture both the fluctuating and multiscale signals of the ocean. The fusion model combines the advantages of persistence forecasting in the short term and the advantages of MEEOF forecasting in the medium and long term, so it performs well on an intra-seasonal scale.

Forecast Example
Here, to further illustrate the effectiveness of the fusion model in forecasting the trends of ocean variables, we show the time-station map for SLA and SST. The red dots in Figure 12 represent the locations of the sampling sites-a total of 52 sampling points, extending from northeast to southwest in the SCS-as shown in this figure. Figure 13 shows the temporal evolution of SLA and SST for the sampling sites, with a forecast window of 90 days, i.e., from 9 May 2018 to 6 August 2018; (a) and (b) indicate the temporal evolution of SLA and SST, respectively, where the horizontal coordinates indicate the sampling sites and the vertical coordinates indicate the forecast time. It can be seen from (a) that the prediction results of the fusion model are more consistent with the propagation trend of the waves in the true field, i.e., the propagation of the ocean waves is obvious. In addition, since the fusion model considers the coordinating relationship between SLA and SST, the coordination variations between these two sea surface variables can also be seen in Figure 13.

Discussion
In the SCS, which is a semi-enclosed basin, the internal dynamic processes play a much greater role than the external drivers for its intra-seasonal time scale evolutionary process, so the evolution of eddies and the propagation of ocean waves in the SCS follow their own evolutionary laws. In addition, the remote sensing data used in this study not only represent the current state of the ocean, but also imply its evolutionary process and the influence of other external driving forces. More importantly, unlike other linear methods, the MEEOF model in this study takes into account the evolution of oceanic fluctuations. Based on the above factors, we can see why the MEEOF model performs well in intra-seasonal scale forecasts.
In addition, through experiments using code performance, we find that this prediction model also has the advantages of simplicity and miniaturization. The time spent for the EEOF decomposition in this study is 483.330 s, which is much less than the running time of the numerical model forecasts. It is worth mentioning that in the practical operational prediction process, we only need to save the MEEOFs obtained from the above decomposition and project the observed data onto them, and the time for projection and prediction is much smaller than the time for the above decomposition process. Moreover, this forecasting model can be installed on any common workstation computer, with high portability. For operational forecasting, we can also directly use the error parameters of the persistence and MEEOF models in this study to construct the fusion model, since these errors are statistically significant. The above illustrates the advantages of this forecasting model for operational applications.
Finally, this prediction model has strong applicability. In addition to the sea surface prediction in this study, it can also be used in for predictions in the three-dimensional ocean, atmosphere, groundwater, etc.
However, there are still some limitations of this prediction model that need to be further discussed and addressed. First, it is worth noting that any data-driven model is directly related to the "amount" of the data, and the "amount" of the data we mentioned here consists of two parts: one is the number of variables and the other is the length of the dataset. Since we use satellite remote sensing data to construct the model in this study, the time length of the data and the number of variables are relatively small, and the system formed by them is not close enough, so the accuracy of the prediction results obtained by this model is not greatly improved. In addition, we do not consider the influence of the external drivers, such as atmosphere; thus, our model unable to make accurate forecasts under the abrupt oceanic conditions. Therefore, in subsequent studies, we will further improve the model by using long-time series ocean and atmosphere reanalysis data, considering the influence of external drivers such as the atmosphere of the ocean, and constructing a fusion model in an ocean-atmosphere coupled frame, so as to devise a more accurate set of spatiotemporal basis functions and improve the prediction accuracy.

Conclusions
In this study, we propose a simple model based on the traditional EEOF analysis to predict multiple sea surface variables simultaneously for intra-seasonal predictions, which is referred to as the MEEOF model. This model not only accounts for the spatial and temporal correlations across grid points, but also considers the dynamic balance between different sea surface variables, consistent with the real variations of the marine environment. Moreover, it embeds the temporal information into the spatial patterns, which effectively captures the spatial distribution and temporal evolution of the sea surface variables.
To evaluate the model's forecast performance in the SCS, we conduct daily forecast experiments for SLA and SST. To ensure the reliability of the experimental results, we perform a round-robin experiment, i.e., the EEOF analysis is performed for any 25 years of data and then validated with another year of data. To better illustrate the superiority of the MEEOF model, persistence forecasting and the OCN model are used as comparisons, and RMSE and CC are used as the evaluation criteria; the comparison results show that the MEEOF model has a stable forecast performance in the forecast range of 15-90 days, and the rising trend of error is much slower. At the end of the forecast window, the RMSE and CC of the MEEOF model for SLA are 0.0703 m and 0.54, respectively; the RMSE and CC for SST are 0.680 °C and 0.66, respectively, which are better results than those using the persistence and OCN models.
Furthermore, to ensure that the MEEOF model has good forecasting performance in the short-to mid-and long-term, we adopt a weighted average form of persistence forecasts and the MEEOF forecasts for the final predictions, called the fusion model. It not only inherits the advantages of the MEEOF model in the mid-and long-term forecasts, but also rivals the persistence prediction model in the short-term forecasts. The fusion model also outperforms the linear regression model throughout the forecast period. Data Availability Statement: SLA data is available in the Copernicus Marine Environment Monitoring Service (CMEMS) (http://marine.copernicus.eu), and SST data is available in the NOAA (https://www.ncdc.noaa.gov/oisst).

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Choice of MEEOFs
Here, we design experiments to determine the optimal number of MEEOFs to achieve the best fitting performance to the observational data. Therefore, we conduct the fitting experiments on the independent testing set. Taking the 90-day fitting as an example, different amounts of MEEOFs are used to fit the data on the independent testing set. Figure  A1 in Appendix A shows the statistical results of the fitting mean absolute error of the SSHA and SST, respectively, both of which are the average results of 90 days. As can be seen in Figure A1, 24 MEEOFs are optimal for fitting (the number of MEEOFs that minimizes the error in the cross-validation), and the variance proportion of these 24 MEEOFs is 100%. The above conclusion is held for each experiment. When the number of MEEOFs is 24, the fitting mean absolute errors of SSHA and SST within 90 days minimize the expected error to 0.0451 m and 0.484 °C , respectively.
In fact, the number of MEEOFs has a significant influence on both the fitting performance and the prediction performance. As the number of MEEOFs increases, the total proportion of spatiotemporal modes increases, the interpretable variance becomes larger, and the phase space formed by these MEEOFs becomes more complete. In other words, the more MEEOFs we use, the better the interpretation of the spatial characteristics and the temporal evolution process of the sea surface variables. Notably, in each experiment, all of the 24 MEEOFs passed the significance test [42].