1. Introduction
Marine environments are undergoing rapid transformation owing to the effects of climate change and human activities, which directly influence the formation of fishing grounds and the distribution of fishery resources [
1,
2]. Sea temperature and salinity are key indices that explain the physical structure of the ocean and changes in marine ecosystems, and are closely related to the distribution of fish schools, feeding behavior, and the resilience of resources [
3]. Sea surface temperature (SST) is a primary indicator for climate change, and an increase in SST of 1 °C can result in a 9% decrease in overall catch volume and a 13% decrease in deep-sea fishery resources [
4]. Climate change, particularly global warming, is leading to shifts in the distribution of fish species and the relocation of fishing grounds in the offshore and coastal waters of South Korea [
5]. Korean offshore and coastal waters are important fishing grounds for various types of fisheries, such as large-scale purse seiners and trawlers. The analysis of environmental factors influencing the formation of these fishing grounds and their prediction is directly related to fishery production. Therefore, accurately determining spatiotemporal variations in environmental factors such as sea temperature and salinity is crucial for marine science as well as essential for establishing foundational data for the management of fishery resources and responding to climate change. Active research is being conducted in this area [
6,
7].
Currently, marine environmental data is generated and provided through various methods; however, each source has distinct advantages and limitations. For example, satellite-based data such as MODIS-a enables long-term monitoring of large water bodies, but its accuracy is affected by cloud cover, daily variation, and electromagnetic interference [
8,
9,
10,
11,
12,
13]. Numerical model reanalyses, based on models such as the Hybrid Coordinate Ocean Model, provide global marine data by assimilating satellite and in situ observations; however, their accuracy remains limited in coastal waters and near western boundary currents [
14,
15]. Moreover, the extent to which specific fixed-station coastal observations are incorporated into global reanalysis products is often not explicitly documented, which complicates their use as fully independent reference data. In-site observational data, such as the National Institute of Fisheries Science (NIFS) Serial Oceanographic Observations (NSO), are unaffected by these errors; however, because they are collected discontinuously at regular intervals, they do not adequately indicate spatiotemporal variation in the marine environment. Consequently, numerical models and satellite data have been used to address these limitations [
16].
Previous studies have predominantly relied on data from satellite models or have focused on analyzing single variables, such as SST. Studies that use in situ data interpolation to construct and validate high-resolution environmental data are limited. Currently used advanced time-series interpolation techniques for marine environmental variables, such as DINEOF, Gaussian processes, and 3D/4D-Var, face limitations in usability, practical applicability, and reproducibility outside specialized marine science laboratories due to their high computational cost, complex hyperparameter tuning, and the need for expert-level implementation [
17,
18].
Therefore, for the temporal dimension, this study adopted linear interpolation, cubic spline interpolation, and STL decomposition combined with PCHIP interpolation as transparent and readily applicable methods that have been traditionally used for stable reconstruction of missing values in time-series data [
19,
20,
21]. For the spatial dimension, IDW, kriging, and natural neighbor interpolation were selected considering observation density and the heterogeneous characteristics of the coastal–open ocean boundary, and the performance of each method was systematically compared [
22,
23]. Interpolation performance was evaluated using MAE, RMSE, and Pearson correlation coefficients, separately assessing the ability to reproduce monthly spatial patterns and the long-term consistency of time-series variations [
22].
In this study, spatiotemporal interpolation was applied to sea temperature and salinity using NSOs, and the resulting interpolated fields were compared with numerical model reanalysis products to examine their relative consistency and limitations. Rather than treating numerical reanalysis as an independent ground truth, this study adopts interpolation as a transparent and standalone framework for utilizing NSO data and assessing the extent to which in situ observations alone can represent marine environmental variability.
At the same time, the results provide a basis for identifying conditions under which interpolation alone may be insufficient, thereby highlighting the necessity of complementary use with physically based numerical modeling approaches. Ultimately, this framework offers empirical grounds for discussing the integrated use of in situ observations and numerical models, and for motivating the development of regionally optimized marine numerical models tailored to the Korean coastal and offshore seas.
2. Materials and Methods
2.1. Data
The NSO provides long-term, in situ, observational data from points along the offshore and coastal water of South Korea. By contrast, the GOFS 3.1 reanalysis data are based on a high-resolution numerical model of global waters (
Figure 1). In this study, NSO sea temperature and salinity data observed at depths of 0, 30, and 50 m between January 2021 and December 2023 were used for interpolation. The GOFS 3.1 reanalysis data were employed as a complementary reference to examine large-scale spatiotemporal consistency with the interpolated NSO-derived fields, rather than as a strictly independent validation dataset. The characteristics of these two datasets are summarized in
Table 1.
2.1.1. NIFS Serial Oceanographic Observation
This study used NSO data provided by the NIFS. The NSO data is based on a long-term observational survey that has been conducted six times yearly since 1961 in Korean offshore and coastal waters and the East China Sea. Currently, various marine environmental factors, such as sea temperature, salinity, dissolved oxygen, and mineral nutrients, are measured at 207 sites. The data in this study consisted of sea temperature and salinity measurements collected at each site at depths of 0, 30, and 50 m between January 2021 and December 2023. The NSO data is highly reliable because it consists of in situ observations at fixed sites accumulated over a long period of time. However, it is limited by spatiotemporal discontinuity, with large distances between sites and an observation frequency of only six times annually. Consequently, the usability is relatively lower than that of satellite-based remote sensing data and numerical model reanalysis, which has constrained its ability to effectively explain the spatially continuous marine environment. Therefore, interpolation was used to reconstruct this dataset as spatiotemporally continuous marine environmental data.
2.1.2. Global Ocean Forecast System 3.1
Global Ocean Forecast System version 3.1 (GOFS 3.1) reanalysis data were used for comparison and supplementary analysis. The GOFS 3.1 is a numerical model of the global ocean, generated based on the Hybrid Coordinate Ocean Model (HYCOM). It has a spatial resolution of 0.08° (lon) × 0.04° (lat) and, vertically, includes more than 40 levels. This study used sea temperature and salinity data, which were provided in 3 h intervals. The GOFS 3.1 assimilates data observed by satellites, buoys, and vessels, and was considered to be suitable as spatiotemporally continuous reference data for the marine environment [
24]. While GOFS 3.1 provides a physically consistent large-scale background, the assimilation status of NSOs used in this study is not explicitly documented. Therefore, GOFS 3.1 was employed not as a strictly independent validation dataset, but as a complementary reference to compare large-scale spatial and temporal consistency with the interpolated NSO-derived fields.
2.2. Time Series Interpolation
In time series interpolation, the same location is defined as a longitude-latitude pair, and the points are arranged in ascending time order. Irregular observations were standardized into 1-day units by aggregating multiple observations in a single day as the daily average, and days without observations were recorded as missing values. Interpolation was conducted independently for each site, with times with observations remaining unchanged to anchor the curve. For numerical stability, the time axis was converted to units of ‘number of days since the reference day , and extrapolation was not performed. To construct the daily continuous time series from irregular observations, three methods were used: linear interpolation, natural cubic spline interpolation, and STL decomposition + PCHIP interpolation. All three methods maintained at the observation times.
2.2.1. Linear Interpolation
Linear interpolation is a simple technique used to estimate the midpoint when a straight line is drawn between two adjacent data points. It is a computationally fast method that is easy to implement. However, in regions with non-linearity and severe variance, such as marine environmental data, this method may have difficulties in accurately capturing actual changes, and accuracy declines in intervals with abrupt changes [
25].
In this study, using Equation (1), adjacent observations were connected with a straight line. In the interval only the continuity of the function’s values is guaranteed, and restoration of curved seasonality in areas with long periods of missing observations is limited. However, the method is computationally simple and is effective at restoring short periods of missing observations without excessive oscillations.
2.2.2. Natural Cubic Spline Interpolation
Natural cubic spline interpolation smoothly connects data points using a cubic polynomial, producing a smoother curve by maintaining continuity in the first and second derivatives. Compared to linear interpolation, this method more effectively captures natural and accurate changes; however, its increased complexity results in a heavier computational load and sensitivity to boundary conditions [
26].
The formula used in this study is shown in Equation (2). The total interval was divided into and a cubic polynomial was constructed for each interval. At the observation points, the function value, slope, and continuity of the curve were constrained, and the natural boundary conditions were imposed at each end of the curve. The second derivatives at the observation points were obtained by solving a tridiagonal system of linear equations.
The interpolation equation for each interval is as follows:
2.2.3. STL Decomposition + PCHIP Interpolation
- (1)
Season-Trend Decomposition using LOESS (STL)
STL is a technique for decomposing time series data into season, trend, and residual components. By effectively separating long-term changes from seasonal variation, this method facilitates a more precise interpretation of complex time series patterns [
27].
A time series is decomposed based on Equation (4), where is the low-frequency trend, is the seasonal component, and is the residual. The STL function in the Python (version 3.11.4) statsmodels module was used, setting the annual period to m = 365 and the robustness flag to robust = True. Because STL does not allow missing values, linear interpolation was used to fill in missing values temporarily before fitting and decomposition.
- (2)
Trend Interpolation using PCHIP
Partial Cubic Hermite Interpolating Polynomial (PCHIP) is an interpolation method that smoothly and naturally connects data points, and maintains continuity of the first and second derivatives. It provides a method for stable filling of missing values in observed data, and retains local extremes while preserving patterns in the original data without excessive oscillation [
28].
PCHIP was performed on only the trends
obtained from STL decomposition. The knot slope was set as
based on the interval slope
Although
and
are distinct, if either is 0, then
, preventing unnecessary extremes. In the interval
, when
,
The final restored values combine the interpolated trends and the seasonal component of STL in the form . At observation points, this is reverted to , to preserve exact interpolation. This method can prevent overshooting and maintain seasonal patterns even in regions with long intervals of missing values.
To test the reliability of daily water temperature and salinity data generated using the three interpolation techniques above, the results were compared with GOFS 3.1 data. Local comparisons based on the time series plots and quantitative comparisons across all points were performed simultaneously, and the reliability of the time series data and spatial applicability of each interpolation method were assessed.
2.3. Spatial Interpolation
To ensure spatial continuity of the marine environmental data, the Spatial Analyst tool in ArcGIS Pro 3.3 (Esri, Redlands, CA, USA, 2025) was used to apply three types of spatial interpolation techniques. Monthly raster datasets were generated by aggregating daily data that had been produced through prior time-series interpolation, and these monthly datasets were used as input data for spatial interpolation between January 2021 and December 2023. The coordinate system was converted to WGS1984, and the output cell size was set to a resolution of 0.08°, considering the spacing between observation points. The interpolation results were stored in GeoTIFF format and compared with monthly averages calculated from GOFS 3.1, which is based on a numerical ocean model.
2.3.1. Inverse Distance Weighted (IDW) Interpolation
IDW is a deterministic interpolation method in which weights are assigned depending on the distance from nearby observations, with large weights assigned to closer observations [
23]. The value
at the prediction location
is defined as expressed in Equation (6):
where
is the distance between the prediction location and the observation point, and p is the exponent of the distance. In this study, the commonly adopted value
was used, as it provides a stable balance between emphasizing local influence and avoiding excessive surface roughness, particularly under heterogeneous observation densities typical of coastal–open ocean transition zones. A global search radius was applied to ensure sufficient neighboring observations across the study area, thereby maintaining interpolation stability. IDW was selected for its computational efficiency and its ability to preserve observed spatial patterns without introducing model-based assumptions.
2.3.2. Kriging Interpolation
Kriging is a geostatistical method in which the variogram is estimated as a model of the spatial autocorrelation structure of the data, and then the linear unbiased estimator with the minimum variance is computed [
23]. The general form of kriging is expressed in Equation (7).
where the weights
are determined by the fitted semivariogram model. In this study, a spherical variogram model was adopted, as it is widely applied in marine environmental studies and effectively represents spatial structures characterized by short-range autocorrelation followed by a clear sill. The empirical semivariogram was first computed from the observed data, and initial parameter values were set to a nugget of 0.0, a partial sill of 1.0, and a range equal to 20 times the raster cell size. These parameters were subsequently fitted using the standard variogram fitting procedure implemented in ArcGIS Pro. A variable search radius was employed with a maximum of 12 neighboring points to avoid over-smoothing while maintaining numerical stability.
2.3.3. Natural Neighbor (NN) Interpolation
NN interpolation is a local interpolation method based on Voronoi diagrams. When the prediction location is inserted, the interpolation weights are equal to the proportional area occupied in each neighboring polygon [
23]. The predicted values are expressed as follows.
where
represents the area-based weight associated with each neighboring point. NN interpolation generates a smooth (
-continuous) surface within the convex hull of the observations and does not extrapolate beyond the data domain. In this study, the default Natural Neighbor algorithm implemented in ArcGIS Pro was used without modification. This choice was intentional, as NN interpolation involves no user-defined smoothing or distance-related parameters, and the default implementation represents the standard and reproducible form of the method, minimizing subjective parameter tuning.
Natural Neighbor interpolation is sensitive to non-uniform station distributions, particularly under dense coastal and sparse offshore coverage. Accordingly, NN was included in this study as a comparative and diagnostic method to evaluate the influence of station distribution on interpolation performance, rather than as a primary prediction approach.
2.4. Indices to Access Interpolation Performance
When comparing the interpolated NSO-derived fields with GOFS 3.1 reanalysis data, direct assessment of absolute accuracy is inherently limited, as the comparison relies on time series at discrete locations rather than fully independent ground truth. Therefore, in this study, a set of statistical indices was employed to characterize relative error magnitude, variability, and consistency between datasets. The magnitudes of absolute and variance-based differences were quantified using MAE and RMSE, explanatory consistency was examined using the coefficient of determination (R2), and linear and monotonic relationships were analyzed using Pearson and Spearman correlation coefficients.
These indices are widely used to examine the performance and consistency of climate and marine datasets [
29,
30,
31]. In this study, they were applied to facilitate a systematic comparison of interpolated NSO-derived fields with satellite-derived and numerical reanalysis data, rather than to imply strict independent validation.
4. Conclusions
In this study, spatiotemporal interpolation was applied to sea temperature and salinity data from the NSO, and this was compared with a numerical model-based reanalysis to examine the applicability and limitations of interpolated data. Specifically, time series and spatial interpolations were applied separately, and quantitative performance indices were derived using methods to calculate the monthly spatial performance and the per-pixel time series-based performance of different interpolation methods. This approach aimed to improve the usability of spatiotemporally discontinuous in situ observational data by transforming it into continuous environmental data.
The analysis showed that, for surface layer sea temperature, the linear, cubic spline, and STL + PCHIP interpolation methods each demonstrated relatively high explanatory power. Notably, the cubic spline interpolation method exhibited the highest overall performance when evaluated across all water layers. Among the spatial interpolation methods, kriging interpolation demonstrated the best performance, with the lowest error and highest coefficients of determination at all depths. IDW interpolation produced stable results despite its simplicity. NN interpolation performed well at certain sites but was generally unstable. In contrast, the interpolated salinity data generally exhibited low explanatory power, and poor interpolation performance was observed in the surface layer (0 m). Notably, a relatively stable structure was observed in the deep layer (50 m), demonstrating a degree of reliability, and correlation coefficients were also high, showing strong consistency with the overall trends for changes in salinity.
These results demonstrate that interpolation performance varies substantially by environmental variable and depth. Interpolation shows considerable potential for sea temperature, owing to its strong seasonality and relatively coherent spatial structure. In contrast, salinity is characterized by pronounced local variability, discontinuity, and control by multi-scale physical processes, which fundamentally constrain the physical plausibility and practical applicability of pure mathematical interpolation based on sparse observation networks. As a result, interpolated salinity fields derived solely from NSOs exhibited low reliability, indicating a clear limitation in their independent use.
In particular, while interpolated sea temperature data offer substantial potential utility for a wide range of applications, including offshore and coastal fishing ground analysis, fish school distribution prediction, and marine ecosystem dynamics studies, salinity requires a more cautious approach. The results highlight the necessity of integrated use with physically based numerical models or satellite-derived products to compensate for the inherent limitations of interpolation. Such an integrated framework not only mitigates uncertainties in existing satellite- and model-based datasets but also clarifies the complementary value of in situ observation-based interpolation as a supporting component rather than a standalone solution.
Because of the limitations in observation density and the lack of data in some waters, the assessment of the interpolation methods in this study was limited to fit the characteristics of the NSO data. Thus, relative differences among methods identified here may not be directly generalized to all marine environments. Because satellite and reanalysis data were used for comparative analysis, uncertainty in this data likely affected the results.
Satellite-based sea temperature data show long-term stability, but demonstrate bias owing to cloud cover, daily warming, and surface-deep layer temperature differences [
8,
12,
13,
32]. Recent studies have highlighted the importance of uncertainty assessment and quantification when constructing satellite-based long-term sea temperature datasets [
10,
11]. Satellite-based salinity data are limited in their ability to detect small-scale variation, such as rainfall or river discharge, in coastal or high latitude regions, and exhibit high uncertainty owing to electromagnetic interference and low spatial resolution [
9]. Meanwhile, numerical model reanalysis data (GOFS 3.1, HYCOM-based) integrates satellite and in situ data to provide global marine data. However, this approach exhibits large errors in coastal waters and western boundary currents [
14]. Consequently, the importance of integrated and complementary use of satellite data, in situ observations, and numerical models have been highlighted repeatedly [
15]. Accordingly, the applicability of the proposed interpolation framework to other marine satellite products or regions is expected to depend strongly on variable-specific physical characteristics and regional observation density, rather than being universally transferable.
Future research should prioritize improvements in observation density and long-term monitoring capacity for the NSO, rather than solely refining specific interpolation techniques or improving short-term performance, as the results of this study indicate that interpolation performance is fundamentally constrained by observation density, variable characteristics, and depth-dependent physical processes. Further comprehensive studies will be required to evaluate various strategies for using this in situ observational data, including the identification of conditions under which interpolation is physically meaningful or potentially inappropriate. NSO data play an important role in reducing uncertainty in existing satellite-derived and numerical model-based datasets and can contribute to more detailed assessments of marine environmental variability in both coastal and offshore waters.
Furthermore, beyond the generation of interpolated datasets, in situ observations can provide scientific evidence to support timely responses to changes in the marine environment and to inform policies and practices aimed at sustainable fisheries management. Such efforts are expected to improve the reliability of long-term monitoring and predictions of offshore and coastal fish school characteristics in South Korea and to contribute to the development of adaptive aquatic resource management and fishery strategies under ongoing climate change.