1. Introduction
Approximately 71% of the Earth’s surface is covered by seawater, with about two-thirds of the ocean area located outside national jurisdiction. High-precision monitoring of this vast and remote ocean environment is one of the key challenges for global ocean scientific research and resource management. In some remote sea areas, the deployment density of Argo floats is insufficient, and monitoring of the deep-sea environment relies primarily on shipborne observations. Ocean sensor data is acquired through multi-parameter in situ sensors mounted on shipborne profilers, encompassing core raw measurement data such as depth and environmental parameters, as well as auxiliary information such as latitude, longitude, and time [
1]. The DeepData database, established by the International Seabed Authority (ISA), is one of the most important deep-sea environmental databases globally. Its core data originates from relevant environmental and resource data obtained through shipborne observations of the International Seabed Area (The Area) [
2], providing a crucial data foundation for global deep-sea environmental baseline assessment and continuous monitoring. Shipborne observation data is used to calibrate and cross-validate satellite inversion, the Argo profile and AUV/ROV observation data, thereby improving the accuracy of data assimilation and model-driven reconstruction. Shipborne surveys can efficiently provide cross-regional and cross-seasonal environmental samples, providing real-world data for deep-sea ecological baseline assessment, mining environmental impact assessment, and post-event monitoring and early warning. However, compared with nearshore observation stations or remote sensing observations, shipborne surveys are limited by a wide monitoring range and limited ship voyages, resulting in a lower spatial coverage density and the inability to conduct continuous observations at the same location. Sensor deployment density may be as low as one point source sample per 100 square kilometers, resulting in data with characteristics such as limited data volume, discrete spatial distribution, discontinuous time series, and high vertical sampling density [
3,
4]. Due to the inherent sparsity and discontinuity of data acquired from shipborne observations, data deviations caused by environmental anomalies or sensor malfunctions are difficult to effectively identify and correct. This limits the reliability and consistency verification of deep-sea environmental data to some extent, and increases the difficulty of achieving efficient anomaly detection and data correction. Therefore, reconstructing the distribution at the station scale from limited discrete point source observation data is the basis for understanding the evolution of the deep-sea environment and the key to effectively detecting and correcting anomalies [
5].
In ocean sensor data, anomalous data primarily manifests as significant outliers caused by sensor failures or abrupt environmental changes, which are easily identifiable by traditional rule-based or statistical methods. However, more challenging are the concealed anomalies that fall within reasonable value ranges but are inconsistent with the spatio-temporal correlations of the ocean and the physical laws of vertical stratification. These anomalous data, combined with the inherent data limitations of sparsity, significantly increase the difficulty of point-based reconstruction and consistency verification, and also limit the applicability of traditional grid-based methods. Existing anomaly detection and correction technologies for this type of data include: (1) Rule-based automated quality control, including thresholding, gradient checks, and consistency checks, etc. Its advantages are interpretability and ease of deployment. However, constrained by baseline environmental differences across ocean regions and characteristic differences between different water layers, its ability to identify non-stationary environments and concealed anomalies is limited. (2) Unsupervised detection based on statistics and machine learning, including methods like Isolation Forests [
6], Support Vector Machines (SVM) [
7], and Variational Autoencoders [
8]. These approaches identify anomalies via reconstruction error and are suitable for multivariate data, but they typically ignore site topology and voyage structure. (3) Time series algorithms such as ARIMA and LSTM can detect anomalies. However, they lack cross-site spatiotemporal relationships and rely on complete, continuous time series data for modeling [
9,
10,
11]. This makes it difficult to capture the complex spatiotemporal correlations between data points.
The shipborne observation network and the sensor data it generates essentially constitute a complex, non-grid, relation-based data structure. The connections between each deployment site need to consider not only geographical proximity and time periodicity, but also the unique vertical hierarchical structure of the ocean environment. This topological structure, composed of heterogeneous spatiotemporal relationships, naturally fits the modeling paradigm of graph neural networks. In recent years, graph neural networks (GNN) have been widely used for field reconstruction or joint distribution estimation by learning node embeddings of non-grid, relational data through message passing and using reconstruction errors or probability biases to identify anomalies [
12,
13]. To explicitly describe the dynamic evolution of environmental features and observation network topology, spatiotemporal graph neural networks (STGNN) combine graph convolution with temporal encoders and memory mechanisms such as temporal convolutional networks and Transformers. This combination enables a unified modeling of spatiotemporal dependencies [
14]. Ye et al. developed a spatiotemporal model using a graph convolutional network, achieving high-precision forecasts of sea surface temperature and chlorophyll-a [
15]. Ou et al. proposed a spatiotemporal graph neural network based on incremental learning, achieving accurate predictions of ocean parameters [
16]. However, the fragmentation of sparse ocean sensor data in the spatiotemporal dimension makes it difficult for STGNN to construct an effective spatiotemporal graph in this scenario. Since homogeneous GNNs usually assume a single type of node and a single type of edge [
17], it is difficult to simultaneously represent multiple relationships such as geographical proximity, temporal correlation, and deep layering. Heterogeneous Graph Neural Networks (HGNN) [
18] simulate heterogeneous connections through a message passing mechanism with multiple types of edges and specific relationships. Significant results have been achieved in fields such as communication, transportation, and meteorology. Li et al. proposed a heterogeneous temporal graph reinforcement learning algorithm, which was successfully used to optimize the channel allocation of the maritime Internet of Things [
19]. However, research reports on HGNN in the field of marine environmental monitoring are still relatively rare. In fact, the characteristics of HGNN are highly compatible with the requirements of DeepData observation network structure and deep-sea environmental quality control, and have broad application prospects.
In summary, existing methods for sparse ocean sensor data still have significant limitations and deserve further investigation, which is the motivation for this study. We propose DAHSGNN, a Depth-Aware Heterogeneous Spatiotemporal Graph Neural Network. The main contributions are as follows:
- (1)
For sparse ocean sensor data with discrete observation points and inability to construct continuous observation time series, we propose a novel graph construction method based on a sliding window along the depth axis. This method leverages the strong correlation and continuity of ocean environmental parameters in the vertical dimension, combining fragmented spatiotemporal observations with the vertical physical structure, forming a graph topology that can capture vertical physical processes.
- (2)
Owing to the ocean’s non-stationary stratification, environmental parameters exhibit different trend and morphological patterns in different water layers. To capture these intra-layer patterns, we devised depth-aware hierarchical node feature engineering to effectively transform raw sensor data into structured graph nodes. We employ a Gaussian Hidden Markov Model (Gaussian HMM) to partition water layers and propose a weighted trend encoder guided by water-layer probabilities. A Transformer-based architecture independently encodes different water layers, and a cross-layer fusion mechanism captures inter-layer relationships. A Bidirectional Long Short Term Memory (BiLSTM)-based deep sequence encoder provides rich node features.
- (3)
The ocean environment exhibits complex phenomena involving relationships such as geographical proximity, temporal periodicity, water-layer stratification effects, and abrupt physical transitions [
20]. Homogeneous graphs struggle to simultaneously represent these complex relationships. Therefore, we employ a HGNN to explicitly construct heterogeneous edges to accurately model the complex relationships between different observation stations.
- (4)
This method models various environmental variables based on the aforementioned heterogeneous graph structure and utilizes the reconstruction error of the heterogeneous graph autoencoder for anomaly detection and correction. This method further exhibits strong cross-variable transferability, achieving high reconstruction accuracy across diverse environmental variables while maintaining high sensitivity to anomalous data.
This method provides a scalable technical approach for modeling sparse ocean sensor data under shipborne survey conditions. Its efficient validation on multiple environmental parameters such as temperature, salinity, and turbidity in the International Seabed Authority (ISA) DeepData database demonstrates its generalization capability for discrete ocean observation scenarios.
2. Problem Description and Method
A non-grid spatial layout composed of ship trajectories and discrete sites is naturally suited for a graph representation. Traditional spatiotemporal methods rely on fixed grid-based adjacency and time series, making it difficult to uniformly represent the heterogeneous coupling relationships between geographic location, temporal evolution, and depth profiles in ocean data. While homogeneous graphs can attempt to model such interactions by designing complex relationship types, heterogeneous graphs offer a more efficient modeling paradigm. Considering the limitations of the data and the advantages of heterogeneous graphs, we reformulate the task of shipborne measurement anomaly detection and correction as a node-level physical quantity reconstruction problem on a heterogeneous graph, thereby converting discrete sensor observation data into a structured heterogeneous graph.
A heterogeneous graph [
21], denoted as
, consists of an object set
and a link set
. A heterogeneous graph is also associated with a node type mapping function τ:
→
and a link type mapping function φ:
→
.
and
denote the sets of predefined node types and edge types, where |
| + |
| > 2. Among them, the node set
consists of all the observation data points in the window. Considering the *homogeneous nature of shipborne survey data, all nodes are abstracted as a single type. Each node
corresponds to an observation point, some of which are shown in
Table 1. The set
contains edges of multiple relation types, and the set
defines all relation types. The graph topology is stored efficiently using an edge index dictionary, where each key-value pair records the sparse adjacency matrix
for a specific edge type. Each edge represents a particular physical or logical association, and edge weights are represented by weighted, typed adjacency matrices.
where
is the edge weight corresponding to the relationship type
.
We process a collection of oceanic profiles
obtained at various times and locations, with S being the total number of profiles. As shown in
Figure 1, all observation profiles share the same domain and timestream.
To address the temporal discontinuities in shipborne sensor data and leverage local correlations in the vertical dimension, we propose a local graph construction method based on a depth sliding window. This method slides a fixed-length window along the depth direction of the vertical axis, aggregating discrete observation points within the window into a local graph sample, thereby capturing complex spatiotemporal-depth correlations. We use a sliding window of length
to slice the depth axis. For each window t, all observation points within the covered depth range are aggregated into a single graph to form the node set
. Under four relation criteria defined in
, we then generate the corresponding typed edge sets
and weighted adjacency matrices
. This yields a graph sample
. By traversing all windows in sequence, we obtain a sequence of graphs
. Windowing partitions large-scale data into a set of size-controlled, relationally complete local heterogeneous graphs. This approach preserves the vertical structure and facilitates efficient batch training and inference, as illustrated in
Figure 2. These heterogeneous graph samples are subsequently processed in batches into the heterogeneous graph autoencoder for training and inference.
For shipborne-acquired sensor data, we construct a heterogeneous graph-based reconstruction model , which integrates node features and the relationships between nodes in three dimensions: geospatial, temporal evolution, and vertical depth. Let denote the vector of ground-truth physical quantities for all nodes in graph . The reconstructed vector is inferred by the model as , where denotes the learnable parameters. Because anomalous samples in the shipborne observations within DeepData are exceedingly rare, we adopt an unsupervised learning approach. The model parameters are trained by minimizing a composite reconstruction loss function on a preprocessed graph sequence.
After training, for any observation node i, its anomaly score is defined by its reconstruction error with respect to . An adaptive threshold selection method based on the validation set is used to dynamically determine the optimal threshold by maximizing the F1-Score. If this error exceeds a threshold dynamically determined based on the reconstruction error distribution of all samples, the data point is considered to have deviated from its normal pattern and is assigned a higher anomaly probability.
3. DAHSGNN
The DAHSGNN method for sparse ocean sensor observations follows a multi-stage processing flow, comprising three core steps: (1) Depth-aware hierarchical node feature engineering: Extract the depth distribution pattern of the profile data of each station and generate node attribute vectors to represent the vertical distribution characteristics of the station at the corresponding depth. (2) Heterogeneous edge construction: Heterogeneous edges are constructed through domain knowledge and together with nodes form a heterogeneous graph. (3) Heterogeneous graph modeling: Feed the graph samples into a HGNN Autoencoder, which leverages the complex topology and node features to learn latent representations, thereby modeling geographical, temporal, and vertical dependencies.
3.1. Depth-Aware Hierarchical Feature Engineering
As shown in
Figure 3, each node is associated with a feature vector rich in physical information, derived from the raw observations collected by the sensors. The vertical stratification of the ocean exhibits obvious heterogeneity. Different water layers have unique physical properties, and the water layers are mostly diffuse transition zones. In order to probabilistically divide the inherent complex vertical structure in the observation node data, we adopt the Gaussian HMM and use its ability to model the hidden state and its transition to explicitly characterize the water layer constraints [
22]. In order to capture the intra-layer variation trend of the water layer, we use the sequence modeling capability of the Transformer architecture to generate a water layer weighted depth trend vector. In addition, a cross-layer fusion mechanism is designed to capture the interrelationship between water layers. A BiLSTM encoder is introduced to extract dynamic context features of local depth sequences [
23].
3.1.1. Water Layer Probability Division Based on Gaussian HMM
According to classical physical oceanography, the ocean’s vertical structure is commonly conceptualized as three layers with distinct physical properties [
24], comprising shallow waters that are readily influenced by the atmosphere, a mid-layer that serves as an important buffer, and a vast, relatively stable deep layer [
25].
Figure 4 shows the water column division in the Mid-Atlantic Ridge area and the changes in key environmental parameters with depth.
To accurately characterize the physical layered structure of the ocean, we employ the Gaussian HMM for probabilistic water-layer partitioning, as shown in
Figure 5a. This model combines discrete state transition processes with continuous sensor observations, using a sequence of hidden states to model transitions between layers. The probability density function of the multivariate Gaussian distribution is used to process continuous physical quantities. The hidden state sequence
corresponds to the three water layers, while the observation sequence
consists of the multivariate feature vectors of the nodes. State transition probability matrix
:
The covariance matrices
are learned from the data via the Baum-Welch algorithm [
26]. Finally, using the trained model parameters, the posterior probability of belonging to each water layer is calculated for each observation point
:
This probability vector serves as the key representation of the vertical structure and is input into the subsequent trend encoding process.
3.1.2. Water Layer Probability-Guided Trend Encoder
Different observation stations may exhibit similar environmental evolution trends due to driving factors such as ocean currents and climate patterns. Different water layers are influenced by physical, chemical, and biological processes, resulting in significant differences in their internal trends and morphological patterns. To effectively capture intralayer distribution patterns and correlate similar nodes, we propose a trend encoder.
The trend encoder learns a local dynamic trend vector
for each site
. At its core is a self-attention mechanism guided by the water-layer probability
at depth index
. We employ a hierarchical Transformer architecture [
27] that extracts specialized features for different water layers, comprising three parallel Transformer encoder layers, each dedicated to one layer, as illustrated in
Figure 5b. When computing self-attention within layer
, attention is modulated by the joint probability that both points belong to the same layer:
where
is the standard self-attention score between two depth points
and
within the layer-
sequence. This modulation prioritizes interactions between points that are confidently assigned to layer
, while suppressing spurious cross-layer interactions.
We replace standard self-attention with joint probabilistic modulation of attention to form the encoded representation
at position
. Then, we obtain the layer-specific trend vector through probabilistic weighted pooling:
where
are the CHMM-derived layer probabilities used as weights and
is a small constant for numerical stability. This prioritizes processing high-confidence intra-layer interactions to generate
that reflects the true intra-layer distribution pattern at site
. Each depth point is labeled with location encoding to achieve ordered depth-related trend learning.
The outputs of each Transformer layer are stacked to construct a hierarchical feature sequence, which is then input into a cross-layer Transformer to model the interdependencies between different water layers [
28]. This module is based on the standard Transformer encoder architecture, globally capturing the relationships between trend features of different water layers through a multi-head self-attention mechanism, and enhancing the non-linear expressive power of the features through a feedforward neural network. Finally, average pooling is applied to the output sequence of the cross-layer Transformer to generate a unified cross-layer trend vector:
where
is the number of layers.
models the dependencies between different water layers.
Finally, to form the node-level integrated trend vector, we apply a probability gate to each layer’s contribution and then concatenate the gated layer-wise vectors with the cross-layer vector:
The threshold
is a dynamic threshold used to filter significant water layers.
is adaptively generated by the dominant confidence of the water layer probability distribution of node i:
The physical information of a water layer is considered significant only when the probability of a certain water layer is higher than the threshold , which is dynamically scaled by the dominant layer confidence. This design achieves: (1) Strict noise reduction in regions with clearly defined water layer stratification: When the confidence of the dominant water layer is high, the threshold is increased to retain significant water layer information. (2) Fusion of multiple physical layers in water layer transition regions: When the confidence of the dominant layer is low, the threshold is decreased to accommodate information from other water layers. This gating mechanism ensures that nodes only carry trend information of the layer to which they may belong, while preserving layer-specific structure and cross-layer context through splicing.
Through layer-by-layer specialization and cross-layer fusion, an integrated trend vector that incorporates cross-layer relationship information is ultimately generated. This integrated vector provides a high-dimensional representation of the physical state relevant to the water column at the node’s location and encodes information at two levels: (1) Positional Information: Positional encoding tags each depth point with its relative position within its inferred primary water layer. This enables the model to discern orderly depth-dependent trends in physical quantities. (2) Vertical structural information: It captures whether the intra-layer distribution of physical quantities is uniformly mixed, exhibits a gradual linear variation, or contains local extrema. This is used to characterize the inter-layer associations and transitional features among different water layers.
In this way, the model can identify spatially distant nodes that share similar structural characteristics, enabling the establishment of long-range dependencies based on physical properties.
3.1.3. BiLSTM Deep Sequence Encoder
The profiler equipped with multi-parameter sensors performs in situ high-frequency sampling during the vertical descent process, so that the acquired ocean profile data shows extremely high data density, continuity and local dynamic changes in the vertical direction. When processing graph samples, nodes at the same site consisting of densely sampled points are sorted by depth, forming a depth sequence
. Adjacent nodes in a depth sequence have close physical connections and dynamic interactions, and this dependency is bidirectional. The state of the upper water layer affects the lower layer, while processes in the lower layer, such as upwelling and heat conduction, also have feedback effects on the upper layer. To capture the continuity and local dynamics inherent in high-resolution vertical sampling, we use a bidirectional long short-term memory (BiLSTM) network as a deep sequence encoder. This encoder is used to extract the continuity and dynamic trend features of these local, depth-ordered sequences in the graph samples.
Figure 5c shows the basic framework of BiLSTM. The BiLSTM consists of a forward LSTM and a backward LSTM. The forward LSTM processes the sequence from shallow to deep (
= 1 →
), recursively updating its hidden state. Conversely, the backward LSTM processes the sequence from deep to shallow (
=
→ 1). Both propagate information based on the current input and the hidden states of adjacent layers. This mechanism enables the model, when generating features for any node at depth
, to simultaneously consider information from the shallower and deeper waters, thereby constructing a more complete local vertical environmental context.
Finally, the hidden states of the forward and backward LSTM layers at the current depth are concatenated to obtain a vector output that integrates the bidirectional context information of the sequence. We define this as the depth-sequence feature
.
The extracted deep sequence features are concatenated with the original features of the node and the additional feature information obtained through feature engineering, forming the final node features that are input to the heterogeneous graph neural network.
3.2. Physical Information-Guided Heterogeneous Edge Construction
Considering the complex interactions within marine systems, we construct four types of heterogeneous edges based on geographic-distance, temporal, vertical, and trend similarities between nodes:
- (1)
Geographic Distance Edge: It aims to capture the spatial correlation of the ocean in the quasi-horizontal plane.
where
is the geographical location (latitude and longitude) of sensor node
, dist( , ) is the Haversine distance between two points [
29], and
is the bandwidth parameter of the Gaussian kernel, controlling the rate of decay in similarity with distance [
30].
- (2)
Temporal Edge: The physical state of marine environmental variables often exhibits seasonal and diurnal variations. Nodes sharing temporal similarities likely exhibit analogous ocean states. The weight is based on the cosine similarity between temporal feature vectors:
where
and
are the temporal feature vectors of nodes i and j, respectively.
- (3)
Vertical Edge: By connecting pairs of points in the same profile with adjacent depths, similar measured environmental parameter values, and similar local curve shapes, a vertical edge is established to reflect the physical continuity and discontinuities of the water body. Its weight is a linear combination of multiple physical factors:
where
is an exponential decay function.
and
represent the depth and measured environmental parameter value of node i,
and
represent the gradient and second-order gradient of node
,
is a hyperparameter tuned on the validation set, and
is a scale parameter.
- (4)
Trend Edge: By calculating the similarity of trend vectors, similar oceanographic characteristics and generation mechanisms between sensor data from different observation sites can be captured:
is the comprehensive trend vector obtained by the Transformer encoder.
This study employs the KNN algorithm [
31] to balance similarity capture and computational efficiency for four types of heterogeneous edges. This strategy may include a small number of edges with low connection strength during the selection process, but the subsequent neighborhood aggregation mechanism automatically weakens their contribution. Simultaneously, KNN may also miss a few highly similar nodes, but the selected topological associations cover the main dependencies, sufficient to support high-precision reconstruction of node physical quantities.
When constructing geographic distance edges and temporal edges, different depth nodes at the same site have the same geographical location and sampling time, resulting in identical edge weight calculations. To reduce redundant computations during edge construction, we use the shallowest depth sampling point of the site within the sliding window as the representative node. Only the weights between the representative nodes are calculated, and as shown in
Figure 2, these weights are copied to other depth points of the site to obtain the final edge weights.
3.3. Heterogeneous Graph-Based Autoencoder Model
After the heterogeneous graph is constructed, the data is input into the heterogeneous graph autoencoder for anomaly detection.
Figure 6 shows its detailed architecture. The core task of this model is to learn the inherent patterns and intrinsic structures of normal ocean profile data under high-dimensional spatiotemporal relationships. The encoder compresses the complex graph signal into a low-dimensional latent representation, and the decoder reconstructs the original signal. Since the model learns the normal data distribution pattern during training, the inherent abnormal structure of anomalous data leads to a high reconstruction error. Anomaly detection is achieved based on the reconstruction error, and data correction is performed based on the reconstructed values. The entire process, from graph sequence input, deep processing of node features and edge information, to the final output of reconstructed physical quantity values, is a continuous process.
Each node feature passes through an input projection module. This module consists of a linear layer, a batch normalization layer, and a GELU activation function. The input features are mapped to a higher-dimensional latent space, providing a more expressive initial representation for subsequent graph convolution operations.
The data then flows through a core network consisting of multiple layers of heterogeneous graph convolution blocks. The core of each convolution block is the HeteroConv layer, which uses a dedicated message passing mechanism for edges
with different physical meanings in the graph. For each type of relation, the model collects information from the neighboring nodes of the central node and aggregates it into a single message. As shown in
Figure 7, appropriate graph convolution operators are selected based on the characteristics of different relations. To characterize the complex and non-uniform interactions within and between different depth layers, we employ a graph attention network (GATConv) on vertical and cross-layer trend edges. Its self-attention mechanism can adaptively learn and assign weights based on the characteristics of neighboring depth nodes, thereby highlighting key interactions and suppressing secondary connections in complex ocean environments, and improving the ability to model intra-layer and cross-layer dependencies. Geographic distance edges connect nodes in neighboring spatial locations, while temporal edges connect nodes deployed at different times. These two heterogeneous edges typically exhibit relatively smooth, region-dependent characteristics. Therefore, graph convolutional networks (GCNConv) [
32] with higher computational efficiency are chosen to effectively capture relatively uniform and stable association patterns within geographical or temporal neighborhoods.
After aggregating neighbor information for each relationship type, the HeteroConv layer merges information from different relationship types into a unified neighborhood information through mean pooling. The forward propagation of layer
can be formalized as:
represents the feature representation of node at layer . represents the relation type of all edges. represents the aggregation function of layer for relation . is the edge weight between nodes and under relation . represents the weighted node features. The edge weights serve as physical reliability indicators, and the features of neighboring nodes with strong associations are given greater weight when passed to the central node . is the set of neighbors of node under relation , and is the learnable weight matrix of residual connections.
To ensure effective training of deep networks and prevent overfitting, batch normalization, GELU activation function, and dropout are integrated after each HeteroConv convolutional module. At the same time, residual connections [
33] are introduced to combine neighborhood information with the original features of nodes to generate new representations of nodes and alleviate the gradient vanishing problem. After feature extraction and fusion through multi-layer graph convolution, the last layer of the encoder maps the high-dimensional representation of nodes to a low-dimensional latent space, generating the final latent representation matrix
. This vector is the encoder’s final abstract representation of the input graph information.
The decoder takes the latent representation as input and restores the dimension of the node representation layer by layer through a stack of HeteroConv modules with the same structure but the reverse order of the hidden layer dimensions. However, it is difficult to recover all the details using only low-dimensional vectors. To address this, skip connections are introduced, directly passing the output feature maps of the corresponding layers in the encoder to the decoder, compensating for the details and high-frequency information lost due to information compression during encoding. Finally, at the end of the decoding network, the decoded high-dimensional node features are mapped back to a single-dimensional physical quantity through an output projection consisting of a linear layer, batch normalization, and GELU activation.
The model is trained in an unsupervised manner on a training set containing only normal samples. The goal is to optimize the parameters by minimizing a composite loss function, which consists of three parts:
- (1)
The mean square error (MSE) is used as the core reconstruction loss function:
where
is the reconstructed physical quantity value of node
,
is the corresponding true physical quantity value, and
is the total number of nodes in the batch.
- (2)
Regularization loss for regulating the latent space:
where
is the representation vector of node i in the latent space.
- (3)
Smoothness loss based on physical prior:
where
represents the reconstructed value of the physical quantity at depth
, and
is the number of nodes sorted by depth.
6. Conclusions
This paper aims to address the challenges posed by the inherent spatial discreteness and temporal discontinuity of sparse ocean sensor data. We propose a deep-aware heterogeneous spatiotemporal graph neural network (DAHSGNN) for anomaly detection and correction in sparse ocean sensor data. This method, based on sensor data, segments water layers using a Gaussian HMM and extracts multidimensional features using Transformer-based water layer trend encoding and BiLSTM modeling. It captures the complex dependencies of ocean sensor data in terms of vertical profile, geographic distribution, and temporal dynamics by constructing a depth-sensing heterogeneous map, thereby processing dense depth sampling sequences. Experiments demonstrate that, for multiple environmental variables (salinity, water temperature, turbidity), DAHSGNN outperforms traditional homogeneous graph models and benchmark heterogeneous graph models, validating its efficacy in sparse ocean sensor data scenarios.
Future research will focus on investigating the coupling effects among multiple environmental variables to achieve multivariate joint modeling and adaptive water layer states. It will also explore innovative methods for adaptively delineating water layer boundaries through a fusion strategy of physical constraints and data-driven approaches. We will also expand the reconstruction of site-scale data to encompass the entire “Area” environmental field. We will also promote the evolution of models towards lightweight and scalable models to meet the needs of larger-scale, higher-resolution marine environmental monitoring.
In summary, this study provides an effective solution to the problem of anomaly detection and correction in sparse ocean sensor data, and the solution has been validated in the ISA DeepData database. Through continuous improvement and multidisciplinary integration, this method is expected to provide a scalable technical path for data quality optimization in the field of deep-sea environmental monitoring.