2. Data and Methods
2.1. Data
The sea level data in this paper are obtained from the Zhapo Marine Station in Jiangcheng District, Yangjiang City, Guangdong Province (
Figure 1). The Zhapo Marine Station is located in the northern part of the South China Sea and was established in 1957. It is recognized as one of the earliest tide gauge stations in China. The sea level data provided by this station are considered highly reliable for the northern South China Sea region. The sea surface data presented in this paper cover the period from 1970 to 2021, and they represent the longest instrumental record of sea level variations currently available in the northern South China Sea. As we can see from
Figure 2, the maximum sea level in the northern South China Sea over the past 52 years is 2238 mm, the minimum is 2069 mm, the average is 2134 mm, and the range (the difference between the maximum and minimum values) is 169 mm. From
Figure 2, it can be seen that the sea level in the northern South China Sea exhibits three characteristics, as follows:
- (1)
Trend. The sea level in the northern South China Sea shows a clear upward trend.
- (2)
Fluctuation. A series of interannual and interdecadal fluctuations are superimposed on the upward trend of sea level, which indicates the periodicity of sea level changes.
- (3)
Instability. The sea level fluctuations are unstable. There are particularly small fluctuations in certain periods such as 1979–1998, as well as particularly intense fluctuations in other periods such as 2000–2014.
Figure 1.
Schematic diagram of location of Zhapo Marine Station (marked by red five-pointed star).
Figure 1.
Schematic diagram of location of Zhapo Marine Station (marked by red five-pointed star).
Figure 2.
Sea level variation of northern South China Sea from 1970 to 2021.
Figure 2.
Sea level variation of northern South China Sea from 1970 to 2021.
2.2. Harmonic Analysis
Harmonic analysis is a time-series analysis method; its core idea is to regard a complex time series as the result of the superposition of many harmonic vibrations with different frequencies. Mathematically, the Discrete Fourier Transform involves expressing a complex periodic function as a series of simple trigonometric functions, as shown in Equation (1).
where A0, An, and φn are all constants.
Harmonic analysis can extract periodic signals in a time series, helping us to understand the causes of time-series fluctuations and further analyze the driving mechanisms of the time series. Usually, only the most prominent few peaks are analyzed when selecting harmonics, and other harmonics are regarded as random phenomena. The goodness of fit can be judged based on the variance in each harmonic. Prediction can be achieved by extending the time series based on harmonic analysis.
The raw sea level data (1970–2021) contain long-term trend components and periodic fluctuations. In this study, a linear fitting method is used for detrending, with the specific steps as follows:
- (1)
Trend component extraction: Linear regression is performed on the raw data using the least squares method to obtain the trend equation
where
is the year and
is the sea level height of the trend component, which explains approximately 78% of the long-term trend variation in the data (
).
- (2)
Residual series generation: The raw data are subtracted by the trend component to generate the detrended residual series , which contains only periodic fluctuation components for subsequent harmonic analysis.
- (3)
Data validation: The effectiveness of trend extraction is validated by calculating the standard deviations of the data before and after detrending (28.5 mm for raw data and 18.2 mm for detrended data), indicating that the detrending process successfully separates the aperiodic components.
The sea level time series contains periodic variations of multiple time scales, among which the most obvious and significant for research are seasonal variations and multi-year variations. From the variation curve of sea level, it can be seen that its fluctuations are relatively regular and can be approximately fitted and represented using cosine or sine functions.
where H(sl) represents sea level height, and t represents time.
In practical calculations, harmonic theory provides objective and accurate formulas for calculating the phase and amplitude, as follows:
where
represents the annual mean sea level, n represents the data length, and t represents the year.
According to the theory of least squares, the expressions for A and B can be obtained as follows:
After obtaining the values of A and B, the phase can be calculated using Equation (6).
2.3. GA-BP Neural Network
The BP neural network is prone to becoming trapped in local minima when handling nonlinear problems, whereas the global search capability of genetic algorithms (GAs) can optimize initial weights and network structures. In this study, the initial weights and structure of the BP neural network are optimized by GAs to prevent traditional BP neural networks from converging to local minima while improving the model’s convergence speed and generalization ability. Sea level variations in the northern South China Sea are characterized by strong nonlinearity due to complex influencing factors such as monsoons and typhoons. GA-BP is demonstrated to enhance the model’s fitting capability for complex nonlinear relationships by avoiding local optima. The GA-BP neural network is considered particularly suitable for simulating the inherent complex nonlinear relationships in sea level variations of the northern South China Sea. A schematic diagram of the GA-BP neural network is illustrated in
Figure 3.
Figure 3 is the original diagram, showing the GA-BP neural network process. The process is divided into the BP neural network section (orange area on the left) and the genetic algorithm (GA) section (blue-green area on the right).
The process starts from ‘Start’. Firstly, we determine the network topology structure, which determines the basic architecture of the neural network, such as the number of neurons in the input layer, hidden layer, and output layer. We randomly generate the initial weights of the BP neural network, which will be continuously adjusted during subsequent training. Next, we determine whether the genetic algorithm-optimized optimal solution of biases of the BP neural network has been received. If received, we decode it and apply it to the BP neural network; if not received, we proceed to the next step. We perform forward calculations of the BP neural network output value and calculate the output value based on the input data and current weights. We calculate the output error and obtain the error by comparing it with the actual value. If sufficient accuracy has been achieved, the training ends; if the accuracy is not satisfactory, we adjust the connection weights in reverse, including the weights between the output layer and the hidden layer, as well as between the input layer and the hidden layer. Then, we determine if the training frequency has reached the required level. If achieved, the training ends; if not reached, we continue with forward calculations and the other steps.
Firstly, we encode the initial values into genes and generate a population. Then, we transform the initial weights and other parameters of the BP neural network into gene forms that can be processed by genetic algorithms to form the initial population. We determine the fitness function and calculate the fitness value, which is used to evaluate the quality of each individual (i.e., a set of parameters) in the population. We determine whether it meets the optimization standards. If satisfied, we pass the optimal solution to the BP neural network; if not satisfied, we perform a selection operation to select the better individual. We perform a crossover operation to exchange and recombine the genes of the selected individual, resulting in a new individual. We perform mutation operations to make small probability random changes to an individual’s genes, increasing the diversity of the population. We generate a new population, and repeat the fitness calculation, selection, crossover, mutation, and other operations until the optimization criteria are met.
The maximum number of iterations is 1000, the error threshold is , the learning rate is 0.01, the genetic number is 50, and the population size is 5. The number of iterations and genetic generations are determined through preliminary experiments. In these experiments, when iterations exceed 1000 times, the reduction in model error (RMSE) becomes less than 1%, and continued training tends to cause overfitting. An error threshold of 10−6 is commonly adopted in machine learning model training to evaluate convergence accuracy. In geophysical prediction fields (such as sea level change), this threshold ensures that the deviation between model predictions and true values remains within the acceptable millimeter range. For the GA-BP model, the maximum number of iterations is set to 1000, combined with the 10−6 error threshold, effectively preventing underfitting due to insufficient iterations or overfitting caused by excessive iterations. A learning rate of 0.01 represents a classical empirical value in neural network training, suitable for most gradient descent algorithms. This value ensures that parameter updates maintain sufficiently large step sizes (avoiding overly slow convergence) while preventing oscillation or divergence resulting from excessively large steps. The population size is set to 5 because a smaller population reduces computational load, and the crossover/mutation operations in GA already guarantee solution diversity.
2.4. RBF Neural Network
The single-hidden-layer structure and local approximation characteristics of Gaussian functions in RBF neural networks enable their training speed to be significantly accelerated compared with other networks, making them suitable for rapid modeling. In the northern South China Sea where typhoons occur frequently, high-frequency noise is prominently observed in the data. The local learning capability of RBF networks is demonstrated to reduce noise interference, while their single-hidden-layer structure ensures fast training speed, allowing for rapid capture of short-term sea level fluctuations (such as transient changes induced by typhoons). This makes the RBF particularly suitable for real-time prediction scenarios. Additionally, RBF neural networks are characterized by simple structures and minimal parameter adjustment requirements. These features render them highly suitable for predicting sea level variations in the northern South China Sea where available data are limited.
Figure 4 is a diagram of the architecture of the radial basis function (RBF) neural network. The RBF neural network is composed of an input layer, a hidden layer, and an output layer.
It has d input nodes, labeled as x1, x2, …, xd−1, xd. These nodes receive input data.
It contains H hidden layer radial basis functions. The nodes are labeled as φ1, φj (where j represents any hidden layer node), and φH. The hidden layer nodes process the input data through radial basis functions. The connection weights between the input layer and the hidden layer nodes are represented by Uji and U = XT (where X is the input data matrix). The output of the hidden layer nodes is represented by yj.
It has c output nodes, labeled as z1, zk (where k represents any output layer node), and zc. The connection weights between the hidden layer and the output layer nodes are represented by wjk. The input to the output layer nodes is calculated through netk, and finally, the output values are produced.
The expansion speed of the RBF is 1000. The spread parameter, which determines the width of Gaussian functions, is tested with different values (100, 500, 1000, 2000) during preliminary experiments. Based on the test set error, a spread value of 1000 is ultimately selected.
Figure 4.
RBF neural network.
Figure 4.
RBF neural network.
2.5. LSTM Neural Network
The LSTM neural network is specifically designed to capture long-term trends (e.g., interannual rising trends driven by global warming) and periodic patterns (e.g., seasonal cycles, ENSO event influences) in South China Sea level variations through its gated mechanisms (forget gate, input gate, output gate). This architecture is naturally suited for time-series prediction as it can effectively process long-term observational data from the region, including satellite altimeter records and tide gauge measurements. Temporal patterns in the data are automatically learned by the LSTM without requiring manual feature engineering, while its robustness in handling noise and missing values has been well demonstrated, enabling adaptation to imperfect real-world observational data. The northern South China Sea level dataset contains multiple periodic components. The LSTM’s capability to automatically learn temporal patterns without manual feature design makes it particularly suitable for processing such multi-scale periodic signals. A schematic diagram of the LSTM neural network is shown in
Figure 5.
This figure shows the structure of the long short-term memory (LSTM) neural network. LSTM is a special type of recurrent neural network (RNN) designed to address the vanishing gradient problem that traditional RNNs face when dealing with long-sequence data.
The following are the main key components in the figure:
- (1)
Input: xt represents the input data at time step t. It is taken as input together with the hidden state ht−1 from the previous time step.
- (2)
Cell State: ct−1 and ct represent the cell states of the previous time step and the current time step, respectively. The cell state is like a conveyor belt that passes information throughout the sequence. It has only a few linear interactions, which helps in maintaining long-term dependencies of information.
- (3)
Gate Mechanisms
Forget Gate: Represented by ft, it determines which information to discard from the cell state ct−1. ft is calculated through a σ (sigmoid) function, and the output value ranges from 0 to 1, where 0 means completely discarding and 1 means completely retaining.
Input Gate: Represented by it, it determines which new information to add to the cell state. It is also calculated through the σ function. is a candidate value generated through the tanh function, which is used to update the cell state.
Output Gate: Represented by ot, it determines which information from the cell state will be output as the hidden state ht at the current time step; ot is also calculated through the σ function, and then it is multiplied by the cell state value processed by tanh to obtain ht.
Hidden State: ht−1 and ht are the hidden states of the previous time step and the current time step, respectively. The hidden state is not only passed to the next time step but also used to generate the output.
Using the Adam gradient descent algorithm, the maximum training times are 1200 times, the initial learning rate is , the learning rate reduction factor is 0.1, and the learning rate after 800 training times is . After 800 epochs, the learning rate is decayed from 0.005 to 0.0005 to prevent excessively slow convergence in later stages. The Adam optimizer, when combined with learning rate decay, is demonstrated to effectively balance convergence speed and prediction accuracy. This approach is particularly suitable for LSTM networks processing non-stationary time-series data.
4. Sea Level Height Prediction Model Based on Artificial Neural Network
During the process of establishing prediction models using the GA-BP, RBF, and LSTM neural networks, the dataset is first divided into training and testing sets in an 11:2 ratio. Then, the dataset needs to be normalized. The value of variable
is normalized to obtain
. The normalization equation is as follows:
The final predicted value (
) is obtained via de-normalization of
. The de-normalization equation is as follows:
Figure 11 shows the prediction effects of the training set and testing set of the sea level height prediction model established based on the GA-BP neural network. The accuracy and robustness of the sea level height prediction model based on the GA-BP neural network are evaluated. The RMSE of the model is 29.1371, the MAE is 24.9411, the MBE is 5.6809, and the R
2 is 0.4003.
Figure 12 shows the prediction effects of the training set and testing set of the sea level height prediction model established based on the RBF neural network. The accuracy and robustness of the sea level height prediction model based on the RBF neural network are evaluated. The RMSE of the model is 27.1433, the MAE is 22.7533, the MBE is 2.1322, and the R
2 is 0.4690.
Figure 13 shows the prediction effects of the training set and testing set of the sea level height prediction model established based on the LSTM neural network. The accuracy and robustness of the sea level height prediction model based on the LSTM neural network are evaluated. The RMSE of the model is 23.7929, the MAE is 19.7899, the MBE is 1.3700, and the R
2 is 0.5872.
Figure 14 and
Figure 15 show the relationship between residuals and time, and the distribution of residuals in the sea level prediction model based on harmonic analysis, respectively.
Figure 16 and
Figure 17 show the relationship between residuals and time, and the distribution of residuals in the sea level prediction model based on the GA-BP neural network, respectively.
Figure 18 and
Figure 19 show the relationship between residuals and time, and the distribution of residuals in the sea level prediction model based on the RBF neural network, respectively.
Figure 20 and
Figure 21 show the relationship between residuals and time, and the distribution of residuals in the sea level prediction model based on the LSTM neural network, respectively. The temporal variation of residuals is found to reveal periodic or systematic biases (e.g., long-term drift or transient anomalies). As shown in
Figure 14,
Figure 16,
Figure 18 and
Figure 20, the harmonic analysis model is observed to exhibit the smallest residual range (−36.8372 to 24.8940 m), followed by the LSTM neural network model (−59.1870 to 44.7200 m) and the RBF neural network model (−62.4861 to 60.0700 m). The most significant residual fluctuation is seen in the GA-BP model (−57.1122 to 71.2489 m), suggesting that overfitting or sensitivity to extreme values may be present in this model. The probability distributions of residuals are presented in
Figure 15,
Figure 17,
Figure 19 and
Figure 21. The results indicate that the residuals of harmonic analysis approximately follow a normal distribution (standard deviation: 14.8300 m), though a slight left skewness (mean: −0.2889 m) is noted, implying a potential minor negative bias. In contrast, the residual distributions of neural network models are more dispersed (standard deviation: 23–29 m), with the GA-BP model showing the largest standard deviation and mean values, demonstrating that this prediction model has the poorest robustness.
Table 4 presents the residual statistical parameters for the harmonic analysis, GA-BP neural network, RBF neural network, and LSTM neural network models. The residual statistics in
Table 4 reveal significant performance differences among the four modeling approaches for sea level prediction in the northern South China Sea. Harmonic analysis demonstrates superior stability with the smallest residual range, lowest standard deviation and RMS residual, and minimal systematic bias. Among the neural networks, the LSTM model shows the best performance with a relatively narrow residual range and lower standard deviation and RMS residual, suggesting its effectiveness in handling temporal patterns. The RBF network exhibits intermediate results with a residual range and standard deviation between those of the LSTM and GA-BP models. In contrast, the GA-BP network displays the weakest performance, characterized by the widest residual range, highest standard deviation and RMS residual, and largest systematic bias. These findings quantitatively demonstrate that while harmonic analysis provides the most reliable predictions, the LSTM neural network offers the best neural network alternative, particularly for applications requiring complex temporal pattern recognition. The substantial performance gaps, especially evident in residual ranges varying from 61.7311 m to 128.3611 m, underscore the critical importance of model selection for accurate sea level prediction in this region.
This section establishes sea level height prediction models based on the GA-BP, RBF, and LSTM neural networks. The statistical parameters of the neural network prediction model are shown in
Table 5. We use four statistical indicators, RMSE, MAE, MBE, and R
2, to evaluate the accuracy and robustness of each model. These four calculation formulas are shown in Formulas (12)–(15). The results indicate that the harmonic analysis model achieves the highest accuracy and robustness among the three models.
Figure 22 shows the results of predicting sea level heights for the next decade (2023–2033) based on the harmonic analysis model. The results show that the rising trend of sea level will continue, and it is expected that the altitude will increase to about 2250–2300 mm by 2033. This prediction further supports the observed upward trend in historical data, highlighting the importance of conducting long-term sea level monitoring and disaster risk assessment.
5. Discussion
The robustness of the harmonic analysis method is validated through three aspects. First, compared with mainstream machine learning models such as GA-BP, RBF, and LSTM, harmonic analysis outperforms them in RMSE and MAE metrics (RMSE = 14.73 vs. LSTM = 23.79), indicating its stability across different modeling frameworks. Second, the long-term prediction results based on 52 years of full-period data show no significant error accumulation over time, verifying its adaptability to long-term trends. Finally, the consistency between model parameters and physical mechanisms such as solar activity cycles (13 years) and ENSO events (2–7 years) avoids the overfitting risk of purely data-driven models, further ensuring the reliability of the prediction results.
The accuracy and robustness of sea level height prediction models established using harmonic analysis and LSTM neural networks are evaluated through statistical metrics, including the RMSE, MAE, MBE, and R2. The results indicate that the LSTM neural network achieves superior accuracy and robustness. This phenomenon can be attributed to the fact that sea level variations are primarily driven by strongly periodic factors such as tides and astronomical forces. Harmonic analysis, which models periodic signals directly through Fourier series or harmonic functions, excels at capturing fixed-frequency components. Specifically, when the data exhibit strong periodicity and low noise, the physics-based harmonic analysis outperforms the data-driven LSTM approach.
The superior performance of the harmonic analysis model in sea level prediction compared to artificial neural network (ANN) models can be primarily attributed to the alignment between data characteristics and model principles. The 52-year observational data used in this study exhibit significant periodicity (e.g., interannual and decadal fluctuations), and harmonic analysis directly models periodic signals through Fourier series, effectively separating trend terms and multi-scale periodic components (such as 13-year and 52-year main cycles) with clear physical meanings and strong noise resistance. In contrast, although ANN models (e.g., LSTM) excel at handling nonlinear data, their “black-box” nature leads to insufficient analytical capabilities for strong periodic signals and reliance on large amounts of data for training, prone to overfitting or degraded generalization when samples are limited.
The identification of dominant periodic components reveals the driving mechanisms of regional sea level changes. For example, the 52-year period closely aligns with the solar magnetic cycle (approximately 11 years) and its harmonic cycles (e.g., 55 years), suggesting that solar radiation modulation may drive long-term sea level fluctuations by influencing ocean heat budgets. The 13-year period may be related to the superposition effect of the quasi-period (approximately 2–7 years) of ENSO events, reflecting the remote impact of tropical Pacific air–sea interactions on the northern South China Sea. These periodic features provide an interpretable physical basis for prediction, helping to enhance model reliability.
The research results have important practical implications for coastal management and disaster prediction. The multi-scale periodic signals identified by harmonic analysis can be used to establish a hierarchical early warning system: for interannual cycles (such as short-term fluctuations caused by typhoon seasons), short-term forecasts can be optimized by integrating real-time tidal data; for decadal cycles (such as 13-year and 52-year trends), they can provide long-term scientific bases for coastal protection projects (e.g., breakwater design, sea level rise adaptation planning). Additionally, the high robustness of the model indicates that harmonic analysis can be a preferred prediction tool in regions with scarce data or limited real-time updates, assisting developing countries in enhancing their coastal vulnerability assessment capabilities.
Global climate warming directly leads to an increase in the Earth’s surface air temperature and ocean temperatures. Rising temperatures cause glaciers to melt, ultimately increasing the total ocean volume and leading to sea level rise. Additionally, the frequency of extreme weather events is also increasing. Tian et al. (2013) [
34] conducted a study comparing the subtropical temperature in southern China with the global average temperature and global average sea surface temperature, finding an overall similar warming trend. The upward trend of sea levels in the northern South China Sea is roughly similar to the upward trends of global and subtropical temperatures and sea surface temperatures. Sea level rise is the positive feedback of climate warming and sea surface temperature increase, and the changes among the three are synchronous. Li et al. (2002) point out that an important cause of sea level rise in the South China Sea is the warming of upper ocean waters, and this trend may be related to decadal-scale changes in the warmer western Pacific pool nearby [
35]. The South China Sea is located near the equator, and the sea level variations in its northern region are closely linked to the effects of the El Nino–Southern Oscillation (ENSO). During the selected time period for the study when El Nino events occurred, the sea level in the South China Sea was relatively low. Since ENSO and El Nino are companion events, ENSO is inevitably accompanied by overall oscillations, and El Nino events play a compensatory balancing role in sea level variations.
This study focuses on the comparison between traditional harmonic analysis and classical neural networks, with Transformer and other emerging architectures not being involved. Given the strong periodic characteristics of sea level variations in the northern South China Sea, the advantages of harmonic analysis are verified through physical mechanisms. Future research directions may include the following:
- (1)
The self-attention mechanism of Transformer should be applied to long-term sea level data to capture non-periodic anomalous fluctuations (such as those influenced by typhoons and ENSO events);
- (2)
Multi-source data (e.g., satellite altimetry and GRACE gravity field) should be incorporated to develop hybrid models for improving prediction accuracy in complex scenarios.
To further improve the prediction accuracy of the model, a hybrid model combining harmonic analysis and neural networks could be constructed in the future. Specifically, the periodic components obtained from harmonic decomposition could be used as the basis, while LSTM could be employed to predict the nonlinear residual terms. By integrating the strengths of both approaches, prediction accuracy in complex fluctuating scenarios can be enhanced. This study investigates sea level prediction using long-term tidal gauge data from the Zhapo Station in the northern South China Sea. While achieving important research outcomes, we must acknowledge several limitations: The single-station data cannot fully represent the spatial heterogeneity of sea level changes across the entire northern South China Sea region, and the absence of integrated multi-source data like satellite altimetry may affect the model’s generalizability to broader areas. The identified periodic characteristics require validation at other stations to confirm their universality, while the single data source presents challenges in resolving coupled signals between global climate change and regional ocean–atmosphere interactions. Future research will improve spatiotemporal representation by establishing multi-station observation networks, incorporating multi-source remote sensing data, and developing regionally calibrated models. These enhancements will help build a more robust regional sea level prediction system, providing more precise scientific support for comprehensive coastal zone management.
6. Conclusions
This comprehensive study of sea level variations in the northern South China Sea, based on 52 years of tidal data from the Zhapo Station (1970–2021), yields several significant scientific and practical insights. Through comparative analysis of harmonic decomposition and neural network approaches, we establish that sea levels have risen at 1.4 mm/year (95% CI: 1.1–1.7 mm/year) over the past 44 years, with the discrepancy from satellite altimetry estimates (3.5 mm/year) likely attributable to differences in observation periods and spatial scales. The superior performance of harmonic analysis, which effectively characterizes sea level variations through four main and six secondary components, challenges the prevailing assumption that complex machine learning models inherently outperform traditional methods in geophysical applications.
The scientific significance of these findings is threefold. First, the physically interpretable components of harmonic analysis provide a more reliable framework for long-term sea level forecasting than artificial neural networks in this context. Second, the identified 1.4 mm/year trend and dominant periodicities (potentially linked to ENSO and PDO) establish a critical baseline for climate change impact assessments. Third, our methodological comparison offers a rigorous benchmark for evaluating physics-based versus data-driven approaches in oceanographic studies.
These advances translate directly into practical applications for coastal management and climate adaptation. The multi-scale periodic signals enable the development of hierarchical protection systems, combining short-term early warning capabilities (e.g., for typhoon seasons) with long-term planning guidance (e.g., breakwater design based on 52-year cycles). The demonstrated accuracy of harmonic analysis supports its integration into risk assessment tools for designing adaptive infrastructure, while fisheries and maritime industries can leverage these forecasts to optimize operations and mitigate economic losses.
At the policy level, our findings provide actionable data for vulnerable coastal communities in monsoon Asia, supporting UN Sustainable Development Goal 13 (Climate Action). The 1.4 mm/year trend projection informs long-term coastal zoning and disaster preparedness planning, while the methodological framework—transferable to other tide-dominated regions like the Bay of Bengal and the Gulf of Mexico—facilitates international collaboration on coastal resilience strategies.
Future research should address the limitations of single-station data by incorporating multi-source observations (e.g., satellite altimetry, GRACE) and expanding to regional monitoring networks. This will enhance the spatial representativeness of predictions and further elucidate the complex interactions between global climate change and regional ocean–atmosphere dynamics in governing sea level variations.