Next Article in Journal
Tracking Fine-Grained Public Opinions: Two Datasets from Online Discourse on Trending Topics
Previous Article in Journal
The Master Integral Transform with Entire Kernels
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Enhance Graph-Based Intrusion Detection in Optical Networks via Pseudo-Metapaths

by
Gang Qu
1,*,
Haochun Jin
1,
Liang Zhang
1,
Minhui Ge
1,
Xin Wu
1,
Haoran Li
2 and
Jian Xu
2
1
State Grid Corporation of China East China Branch, Shanghai 200120, China
2
Software College, Northeastern University, Shenyang 110169, China
*
Author to whom correspondence should be addressed.
Mathematics 2025, 13(21), 3432; https://doi.org/10.3390/math13213432
Submission received: 19 September 2025 / Revised: 21 October 2025 / Accepted: 22 October 2025 / Published: 28 October 2025
(This article belongs to the Special Issue Advances in Computational Methods for Network Security)

Abstract

Deep learning on graphs has emerged as a leading paradigm for intrusion detection, yet its performance in optical networks is often hindered by sparse labeled data and severe class imbalance, leading to an “under-reaching” issue where supervision signals fail to propagate effectively. To address this, we introduce Pseudo-Metapaths: dynamic, semantically aware propagation routes discovered on-the-fly. Our framework first leverages Beta-Wavelet spectral filters for robust, frequency-aware node representations. It then transforms the graph into a dynamic heterogeneous structure using the model’s own pseudo-labels to define transient ‘normal’ or ‘anomaly’ node types. This enables an attention mechanism to learn the importance of different Pseudo-Metapaths (e.g., Anomaly–Normal–Anomaly), guiding supervision signals along the most informative routes. Extensive experiments on four benchmark datasets demonstrate quantitative superiority. Our model achieves state-of-the-art F1-scores, outperforming a strong spectral GNN backbone by up to 3.15%. Ablation studies further confirm that our Pseudo-Metapath module is critical, as its removal causes F1-scores to drop by as much as 7.12%, directly validating its effectiveness against the under-reaching problem.

1. Introduction

Modern global communications fundamentally depend on optical networks, which provide the high-capacity, long-haul infrastructure necessary to sustain today’s exponentially growing data demands [1,2,3,4]. Their continued advancement is not merely a technological imperative but a societal one, as they enable essential services—from broadband internet to the operation of national critical systems [5,6,7]. Yet, as these networks grow more intricate and indispensable, their role in transmitting sensitive, high-stakes information makes them prime targets for advanced cyber threats [8,9,10]. Attack vectors now extend beyond traditional service outages to include insidious physical-layer intrusions, jeopardizing both network reliability and broader security interests [10,11]. Consequently, fortifying the resilience and security of optical infrastructure against such adaptive—and often covert—adversarial tactics has emerged as a paramount challenge [11].
We select optical networks as the focal point of this study precisely because they create a “perfect storm” of challenges for intrusion detection, making them an ideal proving ground for advanced graph-based methods. Firstly, the high-stakes and critical nature of optical infrastructure demands extremely high reliability, meaning that successful intrusions are, by design, extremely rare events. This operational reality directly translates into datasets with severe class imbalance and sparse labels—the very conditions that cripple standard supervised learning models. Secondly, the sheer volume and speed of data transmission make traditional deep packet inspection infeasible, shifting the focus to relational and behavioral patterns among network devices. This naturally motivates modeling the system as a graph, where subtle, coordinated malicious activities can be identified. Therefore, fortifying optical networks necessitates a paradigm that can effectively learn from sparse, imbalanced relational data and propagate supervisory signals across vast topologies, which is the central challenge we address.
Recently, graph-based deep learning has become a compelling paradigm for intrusion detection. Communication systems can be naturally modeled as graphs where nodes represent network devices and edges capture traffic flows, control-plane associations, or physical-layer couplings. Graph Neural Networks (GNNs) excel at learning from such relational data, enabling context-aware detection that surpasses methods based on isolated features [12,13]. However, directly applying GNNs to real-world optical networks reveals critical limitations. The core challenge stems from the intersection of severe label sparsity—anomalies are inherently rare and expensive to annotate—and the need for supervision signals to propagate across vast, complex topologies [14].
This leads to a critical bottleneck we formalize as under-reaching: supervision signals from the few labeled nodes fail to effectively influence the predictions of distant, unlabeled nodes that may be semantically related. This issue is exacerbated by two intertwined phenomena. First, deep message-passing GNNs are hampered by over-smoothing and over-squashing, which respectively dilute discriminative features and create informational bottlenecks that prevent long-range signal transmission [15,16,17,18,19]. Second, intrusion patterns in optical networks frequently violate the homophily assumption [20,21], as malicious nodes often connect to predominantly benign neighborhoods. This heterophilous nature blunts the effectiveness of standard GNNs, which are optimized for homophilous connections [22,23,24].
Existing remedies address parts of this problem but are insufficient in isolation. Post hoc label propagation can improve local consistency but falters under the severe label scarcity [25,26]. Self-training with pseudo-labels amplifies supervision but risks propagating confirmation bias, especially under strong class imbalance [27,28,29,30]. On the representation side, spectral methods are adept at capturing high-frequency anomaly signatures but lack a mechanism to guide supervision along meaningful multi-hop routes [31,32]. Finally, while heterogeneous GNNs elegantly handle relational semantics, they require a predefined schema, which is absent in this dynamically evolving problem [33,34]. A unified framework that captures anomaly-specific signals while simultaneously learning where to propagate them remains a critical gap.
We address this gap by introducing Pseudo-Metapaths: dynamic, semantically aware propagation routes discovered on-the-fly. Our framework begins by using Beta-Wavelet spectral filtering to extract high-frequency components indicative of anomalies, producing robust initial node embeddings that are resilient to heterophily and local noise. The core of our innovation lies in then transforming the graph into a transiently heterogeneous structure. At each training epoch, we assign temporary “normal” or “anomaly” types to nodes using the model’s own pseudo-labels. This transformation enables an attention mechanism to automatically learn which Pseudo-Metapaths (e.g., ‘Anomaly–Normal–Anomaly‘) are most effective for conveying supervisory signals. By coupling high-frequency spectral cues with dynamic metapath attention, our framework delivers label information across semantically promising—not merely topologically adjacent—regions of the network, directly combating the under-reaching problem.
Our contributions are threefold:
  • We introduce a formalization of the under-reaching problem for intrusion detection in optical networks, identifying its roots in the interplay of label sparsity, class imbalance, and network heterophily.
  • We propose the concept of Pseudo-Metapaths—a novel mechanism to learn dynamic, pseudo-label-conditioned propagation channels. We integrate this into a framework with Beta-Wavelet filtering to both capture and intelligently route anomaly signals.
  • We provide extensive empirical validation demonstrating that our method consistently outperforms remarkable GNNs and specialized graph anomaly detection techniques, showcasing superior robustness and long-range supervision propagation.

2. Background

Optical networks inherently exhibit a graph structure, where network devices can be naturally represented as nodes and their interconnections as edges. This structural correspondence makes Graph Neural Networks (GNNs) a compelling choice for intrusion detection, as they leverage message passing to capture relational dependencies that are invisible to traditional, feature-based models [12]. By aggregating information from local neighborhoods, GNNs learn contextualized representations that reflect both node attributes and topological context. This formulation situates the problem within the broader field of graph anomaly detection (GAD), which aims to identify nodes, edges, or subgraphs that deviate significantly from expected behavioral or structural patterns [31]. At its core, GAD relies on the principle that anomalies manifest as statistical or structural outliers in the graph [35].
However, applying this paradigm to intrusion detection in optical networks introduces significant practical challenges. Unlike typical GAD settings, security-related anomalies in optical infrastructures are not only rare and difficult to label but also exhibit complex, non-localized patterns that strain the assumptions and mechanisms of standard GNNs. Two interdependent issues stand out: the heterophilous nature of malicious activity and the limited capacity of GNNs to propagate supervision effectively across sparse and unbalanced graphs.
One major challenge is heterophily, where malicious nodes frequently reside in neighborhoods dominated by benign entities [35]. This contradicts the homophily bias embedded in most conventional GNN architectures, which assume that connected nodes are likely to share similar labels or features. As a result, these models tend to average out anomalous signals during neighborhood aggregation, leading to the suppression or complete loss of critical evidence [22]. In such scenarios, anomalies become harder to distinguish.
To mitigate this limitation, recent research has increasingly turned to spectral graph theory. Wavelet-based frameworks offer a powerful means to isolate specific frequency components of graph signals, thereby amplifying high-frequency patterns that are often indicative of anomalous behavior [24,36]. Several state-of-the-art methods have built upon this principle [37]. For example, BWGNN [38] employs Beta kernels to construct adaptive, localized band-pass filters specifically tuned for high-frequency anomalies. AMNet [39] integrates signals across multiple frequency bands to enrich node representations, while BernNet [40] leverages Bernstein polynomials to achieve more expressive and flexible spectral filtering. Although these approaches excel at capturing the spectral signatures of anomalies, they do not inherently address the challenge of propagating such discriminative information effectively across the graph, especially when supervision is limited to only a few labeled nodes.
Another remaining core bottleneck rooted in the GNN’s message-passing mechanism itself. Constrained to propagate information strictly along the graph’s explicit edges, standard GNNs are prone to over-smoothing, where node representations become indistinguishable after multiple hops, and over-squashing, where rich signals are compressed through narrow structural bottlenecks [17,18,41]. The consequences of these propagation issues are significantly exacerbated in domains such as intrusion detection, which suffer from sparse labels and severe class imbalance. In such challenging settings, this confluence of factors gives rise to what we term the under-reaching problem: supervision signals from the few labeled anomalies are too attenuated to effectively influence distant yet semantically relevant nodes [42,43]. Given the topological sparsity of malicious entities, critical discriminative cues must often traverse long, multi-hop paths, a task for which conventional message passing is fundamentally ill-suited, which is the central concern of this paper.

3. Preliminary

In this section, we introduce some necessary concepts used in this paper.

3.1. Heterogeneous Graph and Metapath

Much real-world data can be effectively represented as Heterogeneous Graphs (HeteGs). In contrast to homogeneous graphs, HeteGs encompass multiple types of nodes and edges, each conveying rich information and forming a complex topological structure. Motivated by the need to analyze such data, various well-performing Heterogeneous Graph Neural Networks (HeteGNNs) have been proposed [44,45]. HeteGNNs often adopt a metapath-centric framework to integrate the diverse attributes inherent in HeteGs. The core concept involves utilizing metapaths to semantically aggregate nodes that are distantly located from each other [44]. For example, as shown in Figure 1, with the metapath paper-author-paper, paper-typed node-P0 and {P4,5,6} can be viewed as neighbors directly while ignoring author-typed node-A2. Unlike traditional GNNs, which directly aggregate messages from adjacent nodes, most HeteGNNs employ a concept known as metapaths.

3.2. Spectral Graph Processing

By treating the features on nodes as signals, regarding the smoothness of the signals as frequency, and associating it with the eigenvalues of the graph’s Laplacian matrix, spectral graph methods provide an alternative but effective aspect for graph data processing distinct from the topological domain [46,47].
An undirected and unweighted graph comprising N vertices can be formally described as G = ( V , E ) . When we consider the features associated with each node, we denote the graph as G = ( V , E , X ) , where X represents the node feature matrix. This distinction allows us to first discuss the topological properties and subsequently incorporate the node-level signal information required for our spectral analysis and GNN model, with V representing the set of vertices and E the collection of edges. The structure is encoded in the adjacency matrix A R N × N . To analyze graph signals spectrally, we utilize the Laplacian matrix defined as L = D A , where the diagonal matrix D has entries D i i = j A i j . Given that L exhibits real, symmetric, and positive semi-definite properties, it can be factorized as L = U Λ U . Here, U encompasses the eigenvectors while Λ = diag ( λ 1 , , λ N ) represents eigenvalues arranged in ascending order ( 0 = λ 1 λ N ). The eigenvectors contained in U constitute the Graph Fourier Basis, with eigenvalues corresponding to graph frequencies [48].
For a signal x R N defined on the graph, its smoothness metric is expressed by Equation (1):
x L x = ( i , j ) E ( x i x j ) 2 = k = 1 N λ k x ^ k 2 ,
with x ^ = U x . This formulation reveals that signal components associated with smaller eigenvalues exhibit smoother (lower-frequency) variations across the graph, whereas those linked to larger eigenvalues manifest higher-frequency fluctuations. The Graph Fourier Transform (GFT) facilitates spectral processing through x ˜ = U ϕ ( Λ ) U x , where ϕ functions as a component-wise filter in the spectral domain.

3.3. Graph Spectral Wavelet and Its Chebyshev Approximation

The Graph Wavelet Transform is constructed from two fundamental elements: a scaling function and a set of wavelet kernels. For any given scale s { 1 , 2 , , S } , the wavelet kernel is formulated as an operator g s ( L ) = U g s ( Λ ) U , and similarly, the scaling function is defined as the operator g ϕ ( L ) = U g ϕ ( Λ ) U .
Direct computation of these transforms is often impractical for large graphs due to the high computational cost of eigen-decomposition, which scales cubically with the number of nodes ( O ( N 3 ) ). To circumvent this bottleneck, a common strategy is to employ Chebyshev polynomial expansion to approximate the operators g s ( L ) and g ϕ ( L ) . A prerequisite for applying Chebyshev approximation to a function f ( x ) is that its input domain must be confined to the interval [ 1 , 1 ] . To meet this requirement, it is advantageous to utilize the normalized graph Laplacian, which is defined as
L ^ = I D 1 2 A D 1 2 ,
where D represents the diagonal degree matrix ( D i i = j A i , j ) and A is the adjacency matrix. The eigenvalues of this normalized Laplacian L ^ are guaranteed to fall within the range [ 0 , 2 ] , a property that facilitates the necessary rescaling for the approximation of g s and g ϕ . For the remainder of this paper, unless otherwise specified, the eigen-decomposition U Λ U T will refer to that of the normalized Laplacian L ^
Both the wavelet kernels g s ( L ^ ) and the scaling function g ϕ ( L ^ ) can be approximated using this polynomial expansion. For notational convenience, we can consolidate these operators into a single matrix:
W = [ g ϕ ( L ^ ) , g 1 ( L ^ ) , g 2 ( L ^ ) , , g S ( L ^ ) ] .
As established in [36], these approximations, denoted as g ˜ ϕ and g ˜ s , are formulated using a truncated Chebyshev series of order Z:
g ˜ ϕ ( L ^ ) = 1 2 c 0 , 0 + z = 1 Z c 0 , z T z ( L ^ ) ; g ˜ s ( L ^ ) = 1 2 c s , 0 + z = 1 Z c s , z T z ( L ^ ) ,
where T z ( L ^ ) represents the z-th order Chebyshev polynomial evaluated at the normalized Laplacian L ^ . These polynomials are generated via the recurrence relation:
T z ( L ^ ) = 2 2 λ max L ^ I T z 1 ( L ^ ) T z 2 ( L ^ ) ,
with initial conditions T 0 ( L ^ ) = I and T 1 ( L ^ ) = 2 λ max L ^ I . The corresponding Chebyshev coefficients, c 0 , z and c s , z , are determined by the following integrals:
c 0 , z = 2 π 0 π cos ( z θ ) g ϕ λ max ( cos θ + 1 ) 2 d θ ,
c s , z = 2 π 0 π cos ( z θ ) g s λ max ( cos θ + 1 ) 2 d θ .
The resulting approximated transform operator can be written as
W ˜ = [ g ˜ ϕ ( L ^ ) , g ˜ 1 ( L ^ ) , g ˜ 2 ( L ^ ) , , g ˜ S ( L ^ ) ] .
An important property of this transform is the Parseval tight frame condition, which is met if g ϕ ( λ ) 2 + s = 1 S g s ( λ ) 2 = 1 for all eigenvalues λ Λ . When this condition holds, the transform is energy-preserving. This is typically realized by carefully choosing kernel functions, such as those from the Meyer or Mexican hat families. In cases where the tight frame property is not satisfied, signal reconstruction requires the Moore–Penrose pseudoinverse of the operator, W ˜ + , which is calculated as
W ˜ + = ( W ˜ W ˜ ) 1 W ˜ ,
Consequently, for a given signal x R N on the graph, its wavelet coefficients are c = W ˜ x , and the signal can be reconstructed via the inverse transform:
x ˜ = W ˜ + c ,
where the operator W ˜ has dimensions R ( S + 1 ) N × N , while its pseudoinverse W ˜ + has dimensions R N × ( S + 1 ) N , with N representing the total number of graph nodes.
The primary advantage of the Chebyshev approximation lies in its computational efficiency. For a graph comprising N nodes and E edges, a Z-order polynomial approximation requires a series of matrix-vector multiplications involving the sparse Laplacian matrix. This process has a computational cost of O ( Z · E ) . Given that Z is a small constant (typically between 10 and 50) that does not depend on the graph size N, and for sparse graphs where E is often proportional to N, the overall complexity is nearly linear. This makes the method significantly more scalable and tractable for large-scale graphs than the computationally prohibitive O ( N 3 ) complexity associated with direct eigen-decomposition.
It is important to acknowledge that the explicit computation of the graph Laplacian’s eigen-decomposition has a theoretical complexity of O ( N 3 ) , which can be prohibitive for very large graphs. However, for the small-to-medium-scale datasets considered in our experiments (where the number of nodes N < 5000 ), this direct spectral approach remains computationally feasible. Thanks to modern hardware acceleration, particularly CUDA-enabled GPUs such as the NVIDIA H800, the practical execution time for this operation is well within an acceptable range for typical research workflows. Therefore, we adopt the direct eigen-decomposition method in our current framework for its precision. The exploration of approximation techniques to enhance scalability for larger graphs, such as the Chebyshev polynomial expansion, is deferred as a direction for future work.

3.4. Spectral Properties of Anomaly Signals: Anomalies as High-Frequency Components

A foundational challenge in applying GNNs to anomaly detection is that many conventional architectures, such as GCN, implicitly function as low-pass filters. By design, they smooth signals across neighborhoods, which is effective under the homophily assumption but counterproductive for anomaly detection. Anomalies, by definition, represent significant deviations from the norm, introducing sharp, localized changes into the graph signal. These abrupt variations are fundamentally high-frequency phenomena. The seminal work by Tang et al. [38] provides a rigorous spectral analysis of this behavior, mathematically proving a key characteristic: the presence of anomalies causes the signal’s spectral energy to concentrate in the high-frequency domain. To understand this, we delve into their theoretical framework. To formalize the analysis, given an unweighted, undirected homogeneous graph, we assume its node features (the signal) x R N are drawn from a multivariate Gaussian distribution:
x N ( μ e N , σ 2 I N ) ,
where μ e N is the mean vector (with e N being an all-ones vector) and σ 2 is the variance. In this model, the degree of anomaly is quantified by the coefficient of variation, σ / | μ | . A larger variance σ or a smaller absolute mean | μ | signifies greater deviation among node features, thus representing a more pronounced anomalous state in the overall graph signal.

3.4.1. Signal Analysis in the Spectral Domain

The analysis transitions from the spatial domain to the spectral domain via the Graph Fourier Transform (GFT), x ^ = U x . Due to the rotational invariance of the Gaussian distribution, the transformed signal x ^ is also Gaussian: x ^ N ( μ U e N , σ 2 I N ) .
A crucial insight lies in the structure of U e N . For a connected, undirected graph, the all-ones vector e N is the eigenvector corresponding to the smallest eigenvalue λ 1 = 0 . Consequently, the vector U e N is non-zero only at its first component, i.e., U e N = [ N , 0 , , 0 ] . This implies that the distributions of the spectral coefficients are not identical:
  • The first spectral coefficient (the DC component) follows a non-central distribution: x ^ 1 N ( μ N , σ 2 ) .
  • All other coefficients (the AC components) are centered at zero: x ^ i N ( 0 , σ 2 ) for i = 2 , , N .
This decomposition is key: the signal’s mean primarily influences the lowest frequency, while its variance—and thus, its anomalous nature—is captured by the higher frequencies.

3.4.2. Mathematical Proof of Increased High-Frequency Energy

To quantify how energy is distributed across frequencies, Tang et al. [38] introduce the low-frequency energy ratio:
η k ( x , L ) = i = 1 k x ^ i 2 i = 1 N x ^ i 2 ,
which measures the proportion of signal energy within the first k lowest frequencies. The goal is to show that as the anomaly degree σ / | μ | increases, η k ( x , L ) decreases.
To achieve this, the Proposition 2 in [38] demonstrates that the expected inverse of this ratio, E x [ 1 / η k ( x , L ) ] , is monotonically increasing with the anomaly degree σ / | μ | . The logic is as follows: the term E x [ 1 / η k ( x , L ) ] 1 can be expressed as E i = k + 1 N x ^ i 2 i = 1 k x ^ i 2 . By normalizing with σ and letting z i = x ^ i / σ , we have z 1 N ( μ N σ , 1 ) and z i N ( 0 , 1 ) for i > 1 . The expectation becomes dependent on the term E i = k + 1 N z i 2 z 1 2 + i = 2 k z i 2 . The sum i = 2 k z i 2 follows a central chi-squared distribution, but z 1 2 follows a non-central chi-squared distribution whose non-centrality parameter is related to ( μ N σ ) 2 . The core of the proof is realizing that as the anomaly degree σ / | μ | increases, the non-centrality parameter for z 1 2 decreases. This reduces the expected value of z 1 2 . A smaller denominator E [ z 1 2 + i = 2 k z i 2 ] leads to a larger value for the entire expectation E x [ 1 / η k ( x , L ) ] .
This chain of reasoning mathematically confirms that a higher degree of anomaly leads to a smaller proportion of energy in the low-frequency spectrum. Since the low-frequency and high-frequency energy components must add up to the total signal energy, a decrease in the proportion of low-frequency energy directly implies a corresponding increase in the proportion of high-frequency energy. This provides a strong theoretical justification for our approach: to effectively detect intrusions, it is not just beneficial but necessary to employ a mechanism capable of capturing these high-frequency spectral signatures. This motivates our choice of the Beta-Wavelet module, which is explicitly designed as a set of learnable band-pass filters, making it exceptionally well-suited to isolate the discriminative high-frequency components that characterize anomalous activities.
Discussion
To further justify our spectral approach, we contrast it with methods that rely on purely topological features. The key distinction lies in the nature of the anomalies we aim to detect: they manifest as high-frequency signals rather than distinctive topological structures. Consequently, a purely structural approach, such as one based on graph domination, would fail in two key scenarios:
  • Camouflaged Anomalies: A node with a highly anomalous feature vector (a large deviation in its signal, leading to a large x L x ) could be located in a structurally unremarkable part of the graph (e.g., low centrality, not part of a minimal dominating set). A structural method would miss it, whereas spectral filtering is designed to detect such high-frequency variations.
  • Structural Outliers with Normal Behavior: A node could be a structural outlier (e.g., a bridge node, high centrality) but exhibit perfectly normal features. Its signal x v would be similar to its neighbors, resulting in a low x L x . A structural parameter might flag it as important or suspicious, while spectral analysis would correctly identify its signal as low-frequency (smooth) and thus non-anomalous.
Therefore, while structural parameters are valuable for understanding graph topology, they are mathematically ill-equipped to detect anomalies defined by signal characteristics. The spectral approach, by its very formulation, is designed to analyze this crucial signal-structure relationship, making it the more appropriate and powerful choice for our task.

4. Methodology

Our proposed framework, PseudoMetapathNet, tackles the under-reaching problem in graph-based intrusion detection by creating and leveraging dynamic, semantically aware propagation routes. In contrast to methods that augment graph structures topologically, our approach learns to route supervision signals along “Pseudo-Metapaths” that are most informative for identifying anomalies. The entire process can be broken down into three main stages: (a) learning robust, frequency-aware node representations using Beta-Wavelet spectral filters; (b) dynamically transforming the graph into a pseudo-heterogeneous structure based on the model’s evolving predictions; and (c) propagating information via an attention mechanism that learns the importance of different Pseudo-Metapaths. Figure 2 provides a schematic overview of our framework.
We detail these three stages in the following three subsections.

4.1. Beta-Wavelet-Based Graph Representation Learning

As spectral graph theory suggests, anomalies often manifest as high-frequency signals relative to their local neighborhoods. To effectively isolate these discriminative spectral signatures, our framework is built upon a spectral GNN backbone. We specifically select a model inspired by the Beta-Wavelet Graph Neural Network (BWGNN) [38], leveraging its proven ability to construct precise, tunable band-pass filters ideal for graph-based anomaly detection (GAD).
The filter’s design is grounded in the scaled Beta kernel function, which provides mathematical control over its spectral response. This function is expressed as
β p , q * ( ω ) = 1 B ( p + 1 , q + 1 ) ω p ( 1 ω ) q ,
where B ( p + 1 , q + 1 ) = p ! q ! ( p + q + 1 ) ! , and ω ( 0 , 1 ) represents a scaled eigenvalue from the graph spectrum. As detailed in the work of Tang et al. [38], the two parameters, p and q, allow for precise control over the filter’s central frequency and bandwidth, analogous to the mean and variance of a Beta distribution.
First, the filter’s central frequency ( μ f ), where its response is maximal, is determined by the ratio of p to q:
μ f = 2 ( p + 1 ) p + q + 2 .
To target high-frequency signals, which correspond to the largest eigenvalues of the graph Laplacian (approaching 2), we can simply set p q . This ensures the filter is most sensitive to the spectral bands where anomaly signatures typically reside. Second, the filter’s bandwidth, or precision, is controlled by its variance ( σ f 2 ):
σ f 2 = 4 ( p + 1 ) ( q + 1 ) ( p + q + 2 ) 2 ( p + q + 3 ) .
As the polynomial order p + q increases, the variance σ f 2 approaches zero, concentrating the filter’s energy around its central frequency μ f . This transforms it into a highly selective, narrow band-pass filter, crucial for distinguishing anomalous patterns from other innocuous high-frequency noise. In essence, by tuning p and q, we can engineer a filter that is both centered on the high-frequency regions and highly selective in its response.
This kernel is used to construct a family of C + 1 wavelet base filters, which together form the multi-scale filter of our representation learning module g θ . Each wavelet filter matrix, representing a specific instance of the general graph wavelet operator g s ( L ^ ) discussed previously, is defined as
Ψ i = U · diag ( β i , C i * ( Λ ) ) · U T
The complete filtering operation on a d-dimensional input signal X R N × d then produces an initial set of node embeddings Z initial , as defined in Equation (17):
Z initial = g θ ( L ^ ) X = U g θ ( Λ ) U T X
where L ^ = I D 1 / 2 A D 1 / 2 is the normalized graph Laplacian with eigen-decomposition L ^ = U Λ U T .
While these initial embeddings are effective at capturing intrinsic node characteristics, relying solely on this module for prediction suffers from the “under-reaching” issue: the supervision signal from the few labeled nodes cannot effectively propagate to distant unlabeled nodes. This limitation motivates the subsequent steps of our methodology. It is also worth noting that our framework is modular. While we select BWGNN as a powerful spectral backbone, other GNNs adept at capturing high-frequency signals, such as ACM-GNN [49] or FAGCN [50], could serve as alternative feature extractors. As our experiments will demonstrate, the most significant performance gains stem from our novel propagation mechanism, which is agnostic to the specific choice of the initial encoder.

4.2. Dynamic Graph Heterogenization

To overcome the propagation limits of standard GNNs and enable more sophisticated, semantically driven message passing, we introduce a novel step: dynamic graph heterogenization. The core idea is to temporarily impose a heterogeneous structure onto the natively homogeneous graph, allowing us to leverage powerful semantic aggregation via metapaths.
This transformation is achieved dynamically at each training iteration k. First, using the node representations Z initial ( k ) generated by the Beta-Wavelet module, the model computes a preliminary set of predictions (i.e., pseudo-labels) for all nodes. A node v is assigned a temporary type τ ( v ) based on its predicted probability of being an anomaly:
τ ( v ) = Anomaly ( A ) if p ( y v = 1 | Z initial ( k ) ) δ Normal ( N ) otherwise
where δ is a confidence threshold. This process transforms the input homogeneous graph G into a transient, pseudo-heterogeneous graph G ( k ) = V , E , T , X , where T is the set of temporary node types T = { Normal , Anomaly } .
Crucially, this heterogenization is not static. The node types are recalculated at every training step, evolving as the model’s understanding of the graph improves. This dynamic nature allows the model to self-correct and progressively refine the semantic structure it uses for information propagation, preventing early-stage prediction errors from causing permanent damage.

4.3. Propagating via Learnable Pseudo-Metapaths

With the dynamically induced heterogeneous structure, we propose a mechanism to automatically learn optimal composite Pseudo-Metapaths in an on-the-fly fashion. Specifically, we generate new adjacency matrices representing useful multi-hop relations and then perform propagations on these learned Pseudo-Metapaths.
First, based on the node types in G ( k ) , we define a set of candidate adjacency matrices A . This set includes matrices for each possible pseudo-relation type:, namely A A A , A A N , A N A , A N N , where A τ 1 τ 2 ( i , j ) = 1 if there is an edge from node j of type τ 2 to node i of type τ 1 . To learn variable-length metapaths, we also include the identity matrix I in this set.
A layer inspired by the Graph Transformer Network (GTN) [51] is then used to learn a soft selection of these candidate matrices. Specifically, we introduce a learnable weight vector ϕ R | A | , which is implemented as the kernel of a 1 × 1 convolution. Each element in ϕ corresponds to a candidate matrix in the set A = { A A A , A A N , , I } . Applying a softmax function to this vector yields a normalized attention vector, α = softmax ( ϕ ) , where each element α t represents the learned importance of the corresponding candidate matrix A t . These attention scores are then used to compute a weighted combination of the candidate matrices, forming a new graph structure F ( A ; ϕ ) , as expressed in Equation (19):
F ( A ; ϕ ) = t = 1 | A | α t A t
The resulting matrix F ( A ; ϕ ) represents a new graph structure defined by a weighted combination of the base pseudo-relations.
To discover longer and more complex Pseudo-Metapaths, we stack K GT layers. The output of the k-th layer, A ( k ) , is generated by multiplying the output of the previous layer with a new learned combination of base matrices. This composition allows the model to construct metapaths of up to length K, expressed as Equation (20):
A ( k ) = A ( k 1 ) · F ( A ; ϕ ( k ) ) , with A ( 0 ) = I
To learn multiple distinct Pseudo-Metapaths simultaneously, we extend this operation to have C out channels.
Finally, we perform graph convolution on these newly generated graphs. For each learned metapath graph A c ( K ) , we apply a GNN layer (e.g., GCN) to propagate information from the concatenated representations [ Z initial | | X ] , where | | denotes the feature concatenation operator. This process can be expressed as Equation (21):
H c = σ act ( D ˜ c 1 / 2 A ˜ c ( K ) D ˜ c 1 / 2 [ Z initial | | X ] W )
where A ˜ c ( K ) = A c ( K ) + I , D ˜ c is its degree matrix, and W is a shared trainable weight matrix. The final, semantically enriched representation for each node, H final , is obtained by aggregating the outputs from all channels, as shown in Equation (22):
H final = H 1 | | H 2 | | H 3 | |
This final representation is then used for the ultimate intrusion detection prediction. By learning to construct the most salient propagation pathways automatically, our framework establishes powerful, long-range information highways that directly and effectively combat the under-reaching problem.
In contrast to fixed schemas, our work dynamically induces pseudo-types from model predictions and learns to route supervision along Pseudo-Metapaths, coupling frequency-aware representations with adaptive, semantically guided propagation to mitigate under-reaching in low-label, imbalanced, and heterophilous regimes.
The pseudo-code of the forward process of PseudoMetapathNet is shown in Algorithm 1, and the PseudoMetapathNet framework is trained end-to-end by optimizing a composite loss function. This objective is carefully designed to address two distinct but interconnected goals: (1) ensuring high accuracy on the final intrusion detection task and (2) regularizing the intermediate dynamic graph heterogenization stage to produce a coherent and meaningful semantic structure. To this end, our total loss function L is composed of a primary supervised task loss L task and a self-supervised auxiliary contrastive loss L aux .
Algorithm 1 PseudoMetapathNet (Forward Process)
  • Require:  Input graph G = ( V , E , X ) ; Confidence threshold δ .
  • Ensure:  Anomaly prediction scores P for all nodes in V.
  •   1:  // Stage 1: Frequency-Aware Representation Learning
  •   2:  Compute initial node embeddings Z initial (see Equation (17)).
  •   3:  
  •   4:  // Stage 2: Dynamic Graph Heterogenization
  •   5:  Compute anomaly probabilities p ( y v = 1 | Z initial ) for all nodes (see Equations (17) and (18)).
  •   6:  Assign temporary node types τ ( v ) using threshold δ (see Equation (18)).
  •   7:  Construct candidate adjacency matrices A = { A A A , A A N , A N A , A N N , I } based on types τ .
  •   8:  
  •   9:  // Stage 3: Pseudo-Metapath Propagation
  •  10:  Generate multi-hop pseudo-metapath graphs { A c ( K ) } c = 1 C out (see Equation (20)).
  •  11:  Concatenate features for propagation: X concat [ Z initial | | X ] .
  •  12:  for c = 1 to C out  do
  •  13:     Propagate information along the c-th metapath to get H c (see Equation (21)).
  •  14:  end for
  •  15:  Aggregate final representations H final from all channels (see Equation (22)).
  •  16:  
  •  17:  // Final Prediction
  •  18:  Compute final anomaly scores P from the enriched representations H final .

4.3.1. Primary Task Loss

The primary objective is to correctly classify nodes as either normal or anomalous. This is achieved through a standard supervised learning setup. Given the final semantically enriched node representations H final from Equation (22), a final linear classifier is applied to produce the prediction logits. For the set of labeled training nodes, denoted as V L , we have
L task = v V L y v log ( p v ) + ( 1 y v ) log ( 1 p v )
where y v is the true label for node v, and p v is the predicted probability of node v being an anomaly, derived from H final . This loss directly drives the model to learn effective Pseudo-Metapaths for the downstream task. Here, L t a s k is a binary cross-entropy (BCE) loss, which is equivalent to the negative log-likelihood (NLL) of the predictions. The standard cross-entropy (CE) loss can also be adopted according to the dataset. The negative sign is necessary as the log-likelihood (the term inside the summation) is non-positive, and the training objective is to minimize this loss (i.e., maximize the likelihood). The formulation is numerically stable at boundary conditions: for a perfect prediction (e.g., y v = 1 , p v = 1 ), the loss correctly evaluates to 0, as the 0 · l o g ( 0 ) term’s limit is 0.

4.3.2. Auxiliary Contrastive Loss for Stable Heterogenization

A critical challenge in our framework is that the quality of the learned Pseudo-Metapaths is highly dependent on the stability and semantic coherence of the pseudo-labels generated. Relying solely on the distant supervision from L task can lead to unstable training, as the pseudo-labeling process lacks a direct, immediate learning signal.
To address this, we introduce an auxiliary self-supervised loss, L aux , which acts as a regularization term on the initial node embeddings Z initial . The goal of this loss is to enforce a more structured embedding space, providing a strong inductive bias for the dynamic heterogenization stage. Specifically, it encourages the representations of nodes that are assigned the same pseudo-type to be closer to each other, while pushing apart the representations of nodes with different pseudo-types.
We formulate this as a contrastive loss. For a given pair of nodes ( v i , v j ) , we define their relationship based on their pseudo-labels τ ( v i ) and τ ( v j ) from Equation (18). The loss is defined over a batch of randomly sampled node pairs and consists of two components:
L aux = E ( v i , v j ) I ( τ ( v i ) = τ ( v j ) ) · d ( z i , z j ) + I ( τ ( v i ) τ ( v j ) ) · max ( 0 , m d ( z i , z j ) )
where z i = Z initial , i is the initial embedding of node v i , d ( z i , z j ) = z i z j 2 2 is the squared Euclidean distance, I ( · ) is the indicator function, and m is a positive margin hyper-parameter. This loss minimizes the distance between positive pairs (nodes with the same pseudo-type) and enforces that negative pairs (nodes with different pseudo-types) are separated by at least the margin m. This provides a direct supervisory signal to the Beta-Wavelet module, ensuring it produces embeddings that are not only spectrally discriminative but also well-clustered according to the model’s own evolving semantic understanding.

4.3.3. Overall Objective

The final training objective is a weighted combination of the task loss and the auxiliary loss, controlled by a balancing hyperparameter λ :
L = L task + λ L aux
By optimizing this composite objective, PseudoMetapathNet learns to simultaneously perform the classification task and refine its own internal representation of the graph’s semantic structure. This dual-objective approach ensures that the dynamic heterogenization process is stable and produces meaningful pseudo-types, which in turn allows the model to discover powerful and effective propagation pathways to combat the under-reaching problem.

5. Experiments

5.1. Settings

5.1.1. Environments

In this paper, all experiments are conducted on a server equipped with 8 NVIDIA Tesla A100 GPUs, and all reported results are averaged over five independent runs. The version of the software is Pytorch 2.4 and Pytorch Geometric 2.6.

5.1.2. Datasets

We utilize four real-world open-source benchmark datasets, including the following:
  • NSL-KDD is a dataset that improves upon the KDD Cup 1999 dataset, containing various optical network traffic features and attack types.
  • UNSW-NB15 is a comprehensive dataset with a wide range of modern attack types and normal optical network traffic.
  • CICIDS2017 is a dataset that includes various optical network traffic data with different types of attacks.
  • KDD Cup 1999 is a classic dataset used for intrusion detection, containing a large amount of optical network traffic data.
To apply our GNN-based framework, we first converted the above tabular datasets into graph structures. In this process, following [52], we make each unique IP address a node, and an undirected edge is created between two nodes if any network flow is recorded between their corresponding IP addresses. The initial node features for matrix X are constructed by aggregating the statistical attributes from all flow records associated with each IP address. Before final graph generation, the data undergoes a standardized preprocessing pipeline to ensure feature consistency and prevent data leakage from the test set. Specifically, categorical features are first transformed using an unsupervised target encoder that is fitted solely on the training data. Following this, all numerical features are normalized using an L2 normalizer, which is also fitted exclusively on the training set. Any resulting null or infinite values from these steps are imputed with 0. The final output is a graph, represented by an adjacency matrix A and a node feature matrix X, which serves as the direct input for our model.

5.1.3. Baselines

To provide a thorough analysis, we compare our method against a diverse set of baselines, which can be broadly categorized into two groups. The first group consists of classical machine learning methods, including RandomForest [53], SVM [54], MLPClassifier [55]; tree-based ensemble models like GradientBoosting [56] and XGBoost [57]; as well as DecisionTree [58] and LogisticRegression [59]. The second group comprises representative Graph Neural Network (GNN) models to cover state-of-the-art graph learning techniques. This suite includes SuperGAT [60], GraphSAGE [61], ARMA [62], GIN [63], BWGNN [38], ACM-GCN [49], FSGNN [50], and FAGCN [64]. This selection ensures a comprehensive evaluation against both fundamental and advanced techniques.

5.2. Comparative Study

To evaluate our proposed framework (PseudoMetapathNet), we conducted extensive comparisons against diverse baseline models across four network intrusion detection datasets. The baselines include traditional machine learning methods (RandomForest and SVM), classic GNN architectures (GraphSAGE and GIN), and a state-of-the-art spectral GNN (BWGNN). Results are summarized in Table 1, Table 2, Table 3 and Table 4.
A consistent observation across all datasets is that graph-based methods significantly outperform traditional machine learning approaches. Traditional methods rely solely on node features, ignoring network topology and traffic relationships. In contrast, GNNs leverage structural information through message passing to identify complex intrusion patterns. On CICIDS2017 (Table 1), most traditional models yield F1-scores below 56%, while leading GNNs exceed 80%.
Our PseudoMetapathNet demonstrates consistently superior performance across all datasets. On CICIDS2017, it achieves 93.46% F1-score and 99.56% Precision, outperforming SuperGAT (F1: 87.73%) and BWGNN (F1: 91.85%). On NSL-KDD, it secures the highest metrics with an F1-score of 97.80%. Similarly, it achieves state-of-the-art results on KDD CUP 1999 and UNSW-NB15 (F1-scores of 90.19% and 98.55% respectively).
The performance improvement over BWGNN, which serves as our framework’s spectral backbone, directly validates our core contributions: dynamic graph heterogenization and Pseudo-Metapath propagation. Beyond the gains on CICIDS2017, our model boosts the F1-score on UNSW-NB15 from 95.38% to 98.55%, and on NSL-KDD, from 97.43% to 97.80%. This confirms that relying solely on frequency-aware node features is insufficient; learning to route supervision signals along semantically relevant paths effectively combats the “under-reaching” problem caused by sparse labels.
To deconstruct our architecture’s advantages, we compare it specifically with two advanced GNNs: SuperGAT and BWGNN. Each possesses distinct strengths but also limitations that our framework overcomes:
BWGNN leverages Beta-Wavelet filters to capture high-frequency signals characteristic of anomalies. While it excels as a feature extractor, it remains constrained by standard message-passing, limiting propagation to distant nodes under sparse supervision.
SuperGAT addresses long-range dependencies by aggregating information from different neighborhood ranges. However, its multi-hop pathways are structurally fixed and semantically agnostic, unable to follow specific semantic patterns crucial for intrusion detection.
Our PseudoMetapathNet synthesizes these strengths while overcoming their limitations. It begins with a strong spectral foundation for robust initial representations, then transcends the propagation bottleneck with our dynamic Pseudo-Metapath mechanism. This allows the model to route supervision signals along semantically meaningful paths discovered on-the-fly—a capability both baselines lack.
The empirical results strongly validate this design:
  • PseudoMetapathNet vs. BWGNN: Our consistent performance improvement over BWGNN (e.g., F1-score of 93.46% vs. 91.85% on CICIDS2017) demonstrates the benefit of our dynamic propagation mechanism.
  • PseudoMetapathNet vs. SuperGAT: The larger gap between our model and SuperGAT (93.46% vs. 87.73% F1-score on CICIDS2017) highlights the superiority of adaptive, semantic propagation over fixed multi-scale aggregation.
As illustrated by the loss curves in Figure 3, PseudoMetapathNet exhibits a smooth and consistent convergence trend across all three datasets, comparable to the established baseline models. This demonstrates that despite the inclusion of the auxiliary loss L a u x , our model maintains excellent training stability and feasibility.
In conclusion, our framework’s remarkable performance stems from synergistically combining a powerful spectral feature extractor with a novel, semantically aware propagation mechanism that directly addresses the limitations of existing GNNs.
We acknowledge that the performance gap on the original intrusion detection datasets could be further substantiated. To more rigorously test the robustness and generalizability of our framework, we carry out a broader evaluation on three widely-recognized benchmark datasets from the related domain of Graph Anomaly Detection (GAD): Amazon, T-Finance, and Questions. These datasets are known for their challenging characteristics, such as severe class imbalance and heterophily, making them an ideal testbed to validate the effectiveness of our Pseudo-Metapath mechanism beyond its initial application domain. In these datasets, we following the setting from [65], using three metrics: Area Under the Receiver Operating Characteristic Curve (AUROC) and Area Under the Precision–Recall Curve (AUPRC), calculated by average precision, and the Recall score within top-K predictions (Rec@K). We set K as the number of anomalies within the test set. In all metrics, anomalies are treated as the positive class, with higher scores indicating better model performance.
As shown in Table 5 and Table 6, our framework demonstrates a consistently strong, and often superior, performance against a comprehensive suite of GNN baselines. Specifically, on the Amazon and T-Finance datasets, PseudoMetapathNet achieves state-of-the-art results by securing the top performance across all three evaluation metrics (AUROC, AUPRC, and Rec@K). For instance, on T-Finance, it obtains the highest AUROC of 96.40%, AUPRC of 86.62%, and Rec@K of 81.55%, decisively outperforming all competitors. On the challenging Questions dataset, where performance is highly competitive, our model achieves the highest AUPRC of 18.09%. This result is particularly significant, as AUPRC is a more informative metric than AUROC for evaluating models on severely imbalanced datasets, a key feature of this benchmark. These comprehensive results strongly corroborate our central claim: the dynamic Pseudo-Metapath mechanism is a powerful and generalizable strategy for enhancing node anomaly detection in complex graph structures.

5.3. Ablation Study

To verify the individual contributions of the key components within our proposed framework, we conducted a series of ablation experiments on all four datasets. We investigated the impact of two core modules: (1) the Beta-Wavelet spectral filter module, responsible for learning frequency-aware node representations and (2) our novel Pseudo-Metapath propagation module, which dynamically routes supervision signals. We evaluated two variants of our model: one without the spectral module and another without the metapath module. The performance degradation relative to the full model is presented in Figure 4.
The Pseudo-Metapath module, as illustrated by the results, are unequivocally the most critical component of our framework. Removing this module resulted in a substantial and consistent performance drop across all datasets and nearly all metrics. The impact was particularly dramatic on the UNSW-NB15 dataset, where its removal led to a catastrophic decrease in Recall by 8.92% and in AUC by 11.75%. Similarly, on CICIDS, the F1-score plummeted by 7.12%. These significant degradations strongly validate our central hypothesis: standard message passing is insufficient for this task. The dynamic, semantically aware propagation routes learned by the Pseudo-Metapath module are essential for effectively combating the “under-reaching” problem and ensuring that supervision signals reach relevant nodes throughout the network.
The Beta-Wavelet spectral filter also proved to be a vital component for achieving optimal performance. Disabling this module consistently led to a noticeable drop in performance, confirming its role in generating robust initial node embeddings. For instance, on the CICIDS dataset, removing the spectral filter caused a 4.92% drop in Precision. On the NSL-KDD dataset, Accuracy and Precision decreased by 2.53 and 2.09%, respectively. This demonstrates that effectively capturing the high-frequency signals characteristic of network anomalies provides a strong and necessary foundation for the subsequent propagation and classification steps. The synergy between a powerful feature extractor and an intelligent propagation mechanism is therefore key to the model’s success.
In summary, our ablation studies confirm that both the spectral filter and the Pseudo-Metapath module are integral and synergistic components. The spectral filter provides a robust feature basis by capturing anomaly signatures, while the Pseudo-Metapath module provides an indispensable mechanism for effective, long-range information propagation, with the latter being the primary driver of our model’s superior performance.

5.4. Hyper-Parameter Study

In this section, we conduct a comprehensive study to investigate the impact of two critical hyper-parameters on the performance of our proposed model: the number of Dynamic Metapath Learning Layers (aka, Num Layers) and the pseudo-labeling cutoff threshold δ (aka, Cutoff).
The number of layers determines the model’s complexity and receptive field, while the cutoff threshold controls the confidence required for assigning pseudo-labels during the training process. The model’s performance is measured using Accuracy and Recall, with the results visualized as 3D surface plots to illustrate the interplay between these two parameters. As illustrated in Figure 5 and Figure 6, we can draw several key insights from the experimental results.
First, a striking observation is the high degree of consistency between the optimal hyper-parameter regions for maximizing accuracy and recall. Across all datasets, the parameter combinations that yield the highest accuracy also tend to produce the highest recall. This indicates that our model does not require a significant trade-off between these two crucial metrics, simplifying the tuning process.
Second, the Cutoff threshold emerges as the most dominant factor influencing performance. For all four datasets, setting the threshold to a high value (e.g., greater than 0.8) invariably leads to a sharp decline in both accuracy and recall. This is intuitive, as a stricter criterion for classifying positive instances causes the model to miss more potential threats, thereby increasing the number of false negatives and degrading overall performance. The results suggest that a lower-to-mid-range cutoff (approximately 0.5 to 0.7) is optimal.
Third, the ideal model complexity, dictated by the Num Layers, varies depending on the characteristics of the dataset. For KDD CUP 99 and its subset NSL-KDD, a simpler model with two to three layers achieves the best results. Increasing the model’s depth beyond this point leads to a performance drop, likely due to over-smoothing or overfitting. Conversely, for the CICIDS2017 dataset, the model’s performance is largely insensitive to the number of layers, provided that the cutoff threshold is set appropriately. This suggests that the features in this dataset are robust enough to be effectively captured by models of varying depths. The UNSW-NB15 dataset shows a more complex relationship, but a model with 2 layers still provides a reliable and high-performing baseline.
In summary, this study underscores the importance of careful hyper-parameter tuning. The primary guideline for our model is to maintain the Cutoff threshold within a moderate range of [0.5, 0.7]. Within this range, the PseudoMetapathNet with 2 to 3 layers offers a robust and effective configuration for achieving high accuracy and recall across diverse network intrusion detection environments.

5.5. Case Study

To provide a qualitative and intuitive understanding of how PseudoMetapathNet overcomes the limitations of standard GNNs, we conduct a detailed case study on a representative subgraph extracted from the dataset. As illustrated in Figure 7, we selected a homophilous cluster consisting of four interconnected anomaly nodes. This scenario is particularly insightful as it tests a model’s ability to recognize and amplify signals within a group of coordinated malicious entities, a situation where simpler models can surprisingly fail.
The results of the case study clearly demonstrate the superiority of our proposed framework. The leftmost panel of Figure 7 shows the ground truth, a cluster where all four nodes are anomalies. The second panel reveals the surprising failure of the baseline Graph Convolutional Network (GCN) model [67]. Despite the absence of any normal nodes to cause confusion, the GCN misclassifies every single anomaly as normal, yielding a low confidence score of 0.45 for each. This suggests that the standard message-passing mechanism is insufficient for creating a signal reinforcement loop among anomalous peers and may be biased by the globally prevalent normal class.
In stark contrast, the third panel shows the decisive success of our PseudoMetapathNet. It correctly identifies all four nodes as anomalies with the highest possible confidence score of 1.00. The key to this success is revealed in the rightmost panel, which visualizes the learned attention weights. Our model has learned to assign the highest importance to the A-A (Anomaly-Anomaly) metapath, with a weight significantly greater than other path types.
This learned knowledge is critical. When processing this subgraph, PseudoMetapathNet’s dynamic typing and attention mechanism explicitly amplify the information flow along these high-weight A-A paths. This creates a powerful positive feedback loop where each anomaly node mutually reinforces its neighbors’ anomalous status, rapidly driving the prediction confidence to its maximum. This case provides strong qualitative evidence that the Pseudo-Metapath mechanism is crucial for identifying not only isolated threats in heterophilous environments but also coordinated patterns of malicious activity by learning and exploiting the underlying semantic graph structure.

6. Conclusions

In this paper, we propose PseudoMetapathNet, a novel GNN framework that tackles the “under-reaching” problem in optical network intrusion detection by learning dynamic, semantically aware propagation routes—Pseudo-Metapaths. Leveraging Beta-Wavelet spectral filters for high-frequency anomaly capture and a dynamic heterogenization module that assigns transient node types via pseudo-labels, our model guides supervision signals along adaptive, meaningful paths beyond fixed topology. Experiments on four benchmarks show remarkable performance over ML and GNN baselines, with ablations confirming the critical role of Pseudo-Metapath propagation.

Limitations & Future Work

A primary limitation, common to research in this area, is the reliance on general-purpose network intrusion benchmarks due to the scarcity of large-scale, public datasets specific to the physical or control layers of optical networks. While our experiments on these proxies effectively demonstrate our method’s ability to handle the core structural challenges of label sparsity and heterophily, a crucial avenue for future work is to validate PseudoMetapathNet on multiple large-scale real-world optical network traffic data as it becomes accessible, which would confirm its practical utility in the target domain. From a technical perspective, future research should explore scalable spectral approximations to improve efficiency. Second, the model’s performance shows sensitivity to the fixed pseudo-labeling cutoff threshold. A promising future direction is to develop an adaptive thresholding mechanism that can adjust dynamically based on model confidence. Finally, the core concept of learning dynamic propagation routes is highly generalizable. We plan to extend the Pseudo-Metapaths framework to other challenging, label-scarce, and heterophilic domains such as financial fraud detection and online misinformation tracking.

Author Contributions

Conceptualization, G.Q. and H.J.; Methodology, G.Q. and L.Z.; Software, M.G. and X.W.; Validation, H.J., L.Z. and J.X.; Formal analysis, G.Q., M.G. and J.X.; Investigation, H.L. and J.X.; Resources, G.Q.; Data curation, X.W. and H.L.; Writing—original draft, H.L.; Writing—review and editing, H.J., L.Z., H.L. and J.X.; Visualization, X.W. and H.L.; Supervision, G.Q.; Project administration, G.Q. and J.X.; Funding acquisition, G.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the science and technology project of State Grid Corporation of China East China Branch “Research on Key Technologies of Security Protection Architecture System for Optical Transmission System” (Project No. 52992424000L).

Data Availability Statement

The original data presented in the study are openly available in Kaggle.com at https://www.kaggle.com/datasets/hassan06/nslkdd (accessed on 1 September 2025), https://www.kaggle.com/datasets/mrwellsdavid/unsw-nb15 (accessed on 1 September 2025), https://www.kaggle.com/datasets/chethuhn/network-intrusion-dataset (accessed on 1 September 2025), and https://www.kaggle.com/datasets/galaxyh/kdd-cup-1999-data (accessed on 1 September 2025). The GAD dataset, including Amazon, T-Finance, Questions are openly available in https://github.com/squareRoot3/GADBench (accessed on 1 September 2025). The Optical Failure Dataset is openly available in https://github.com/Network-And-Services/optical-failure-dataset (accessed on 1 September 2025).

Conflicts of Interest

Authors Gang Qu, Haochun Jin, Liang Zhang, Minhui Ge and Xin Wu were employed by the State Grid Corporation of China East China Branch. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Wang, D.; Wang, Y.; Jiang, X.; Zhang, Y.; Pang, Y.; Zhang, M. When large language models meet optical networks: Paving the way for automation. Electronics 2024, 13, 2529. [Google Scholar] [CrossRef]
  2. Al-Tarawneh, L.; Alqatawneh, A.; Tahat, A.; Saraereh, O. Evolution of optical networks: From legacy networks to next-generation networks. J. Opt. Commun. 2024, 44, s955–s970. [Google Scholar] [CrossRef]
  3. Agrawal, G.P. Fiber-Optic Communication Systems, 4th ed.; Wiley: Hoboken, NJ, USA, 2012. [Google Scholar]
  4. O’Mahony, M.J.; Politi, C.; Klonidis, D.; Nejabati, R.; Simeonidou, D. Future optical networks. J. Light. Technol. 2006, 24, 4684–4696. [Google Scholar] [CrossRef]
  5. Wang, Z.; Raj, A.; Huang, Y.K.; Ip, E.; Borraccini, G.; D’Amico, A.; Han, S.; Qi, Z.; Zussman, G.; Asahi, K.; et al. Toward Intelligent and Efficient Optical Networks: Performance Modeling, Co-Existence, and Field Trials. In Proceedings of the 2025 30th OptoElectronics and Communications Conference (OECC) and 2025 International Conference on Photonics in Switching and Computing (PSC), Sapporo, Japan, 29 June–3 July 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 1–4. [Google Scholar]
  6. Saleh, B.E.A.; Teich, M.C. Fundamentals of Photonics, 2nd ed.; Wiley: Hoboken, NJ, USA, 2007. [Google Scholar]
  7. Essiambre, R.J.; Tkach, R.W. Capacity trends and limits of optical communication networks. Proc. IEEE 2012, 100, 1035–1055. [Google Scholar] [CrossRef]
  8. Mukherjee, B. Optical WDM Networks; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
  9. Mohsan, S.A.H.; Mazinani, A.; Sadiq, H.B.; Amjad, H. A survey of optical wireless technologies: Practical considerations, impairments, security issues and future research directions. Opt. Quantum Electron. 2022, 54, 187. [Google Scholar] [CrossRef]
  10. Skorin-Kapov, N.; Furdek, M.; Zsigmond, S.; Wosinska, L. Physical-layer security in evolving optical networks. IEEE Commun. Mag. 2016, 54, 110–117. [Google Scholar] [CrossRef]
  11. Kartalopoulos, S.V. Optical network security: Countermeasures in view of attacks. In Optics and Photonics for Counterterrorism and Crime Fighting II; SPIE: Bellingham, WA, USA, 2006; Volume 6402, pp. 49–55. [Google Scholar]
  12. Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, P.S. A Comprehensive Survey on Graph Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 4–24. [Google Scholar] [CrossRef]
  13. Akoglu, L.; Tong, H.; Koutra, D. Graph-based Anomaly Detection and Description: A Survey. Data Min. Knowl. Discov. 2015, 29, 626–688. [Google Scholar] [CrossRef]
  14. Fu, T.; Zhang, J.; Sun, R.; Huang, Y.; Xu, W.; Yang, S.; Zhu, Z.; Chen, H. Optical neural networks: Progress and challenges. Light. Sci. Appl. 2024, 13, 263. [Google Scholar] [CrossRef]
  15. Li, J.; Zhang, Q.; Liu, W.; Chan, A.B.; Fu, Y.G. Another perspective of over-smoothing: Alleviating semantic over-smoothing in deep GNNs. IEEE Trans. Neural Netw. Learn. Syst. 2024, 36, 6897–6910. [Google Scholar] [CrossRef]
  16. Li, Q.; Han, Z.; Wu, X. Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
  17. Alon, U.; Yahav, E. On the Bottleneck of Graph Neural Networks and its Practical Implications. arXiv 2021, arXiv:2006.05205. [Google Scholar] [CrossRef]
  18. Topping, J.; Giovanni, F.D.; Chamberlain, B.P.; Dong, X.; Bronstein, M.M. Understanding Over-Squashing and Bottlenecks on Graphs. arXiv 2022, arXiv:2111.14522. [Google Scholar] [CrossRef]
  19. Peng, J.; Lei, R.; Wei, Z. Beyond over-smoothing: Uncovering the trainability challenges in deep graph neural networks. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, Boise, ID, USA, 21–25 October 2024; pp. 1878–1887. [Google Scholar]
  20. Zheng, Y.; Luan, S.; Chen, L. What is missing for graph homophily? disentangling graph homophily for graph neural networks. Adv. Neural Inf. Process. Syst. 2024, 37, 68406–68452. [Google Scholar]
  21. Rey, S.; Navarro, M.; Tenorio, V.M.; Segarra, S.; Marques, A.G. Redesigning graph filter-based GNNs to relax the homophily assumption. In Proceedings of the ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 6–11 April 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 1–5. [Google Scholar]
  22. Zhu, Q.; Han, B.; Zhu, J.; Wen, Y.; Pei, J. Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs. In Advances in Neural Information Processing Systems (NeurIPS); Curran Associates, Inc.: Red Hook, NY, USA, 2020. [Google Scholar]
  23. Pei, H.; Wei, B.; Chang, K.C.; Lei, Y.; Yang, B. Geom-GCN: Geometric Graph Convolutional Networks. arXiv 2020, arXiv:2002.05287. [Google Scholar] [CrossRef]
  24. Shuman, D.I.; Narang, S.K.; Frossard, P.; Ortega, A.; Vandergheynst, P. The Emerging Field of Signal Processing on Graphs: Extending High-Dimensional Data Analysis to Networks and Other Irregular Domains. IEEE Signal Process. Mag. 2013, 30, 83–98. [Google Scholar] [CrossRef]
  25. Huang, Q.; Yin, H.; Cui, B.; Cui, Z.; Zhang, Z.; Wang, H.; Wang, E.; Zhou, X. Combining Label Propagation and Simple Models Out-performs Graph Neural Networks. arXiv 2020, arXiv:2010.13993. [Google Scholar] [CrossRef]
  26. Zhou, D.; Bousquet, O.; Lal, T.N.; Weston, J.; Schölkopf, B. Learning with Local and Global Consistency. In Advances in Neural Information Processing Systems (NIPS); Curran Associates, Inc.: Red Hook, NY, USA, 2004; pp. 321–328. [Google Scholar]
  27. Li, M.; Jia, L.; Su, X. Global-local graph attention with cyclic pseudo-labels for bitcoin anti-money laundering detection. Sci. Rep. 2025, 15, 22668. [Google Scholar] [CrossRef] [PubMed]
  28. Yang, S.; Liao, Z.; Chen, R.; Lai, Y.; Xu, W. Multi-view fair-augmentation contrastive graph clustering with reliable pseudo-labels. Inf. Sci. 2024, 674, 120739. [Google Scholar] [CrossRef]
  29. Lee, D.H. Pseudo-Label: The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks. In Proceedings of the ICML 2013 Workshop on Challenges in Representation Learning, Atlanta, GA, USA, 21 June 2013. [Google Scholar]
  30. Sohn, K.; Berthelot, D.; Carlini, N.; Zhang, Z.; Zhang, H.; Raffel, C.; Cubuk, E.D.; Kurakin, A. FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence. In Advances in Neural Information Processing Systems (NeurIPS); Curran Associates, Inc.: Red Hook, NY, USA, 2020. [Google Scholar]
  31. Qiao, H.; Tong, H.; An, B.; King, I.; Aggarwal, C.; Pang, G. Deep graph anomaly detection: A survey and new perspectives. IEEE Trans. Knowl. Data Eng. 2025, 37, 5106–5126. [Google Scholar] [CrossRef]
  32. Tu, B.; Yang, X.; He, B.; Chen, Y.; Li, J.; Plaza, A. Anomaly detection in hyperspectral images using adaptive graph frequency location. IEEE Trans. Neural Netw. Learn. Syst. 2024, 36, 12565–12579. [Google Scholar] [CrossRef]
  33. Zhang, C.; Song, D.; Huang, C.; Swami, A.; Chawla, N.V. Heterogeneous Graph Neural Network. In Proceedings of the KDD’19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019. [Google Scholar]
  34. Fu, T.; Chen, W.; Sun, Y. MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding. In Proceedings of the Web Conference (WWW), Taipei, Taiwan, 20–24 April 2020; pp. 2331–2341. [Google Scholar] [CrossRef]
  35. Guo, R.; Zou, M.; Zhang, S.; Zhang, X.; Yu, Z.; Feng, Z. Graph Local Homophily Network for Anomaly Detection. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, Boise, ID, USA, 21–25 October 2024; pp. 706–716. [Google Scholar]
  36. Hammond, D.K.; Vandergheynst, P.; Gribonval, R. Wavelets on Graphs via Spectral Graph Theory. Appl. Comput. Harmon. Anal. 2011, 30, 129–150. [Google Scholar] [CrossRef]
  37. Ruhan, A.; Shen, D.; Liu, L.; Yin, J.; Lin, R. Hyperspectral anomaly detection based on a beta wavelet graph neural network. IEEE MultiMedia 2024, 31, 69–79. [Google Scholar] [CrossRef]
  38. Tang, J.; Li, J.; Gao, Z.; Li, J. Rethinking graph neural networks for anomaly detection. In Proceedings of the 39th International Conference on Machine Learning, PMLR, Baltimore, MD, USA, 17–23 July 2022; pp. 21076–21089. [Google Scholar]
  39. Chai, Z.; You, S.; Yang, Y.; Pu, S.; Xu, J.; Cai, H.; Jiang, W. Can abnormality be detected by graph neural networks? In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22), Vienna, Austria, 23–29 July 2022; pp. 1945–1951. [Google Scholar]
  40. He, M.; Wei, Z.; Xu, H. Bernnet: Learning arbitrary graph spectral filters via bernstein approximation. Adv. Neural Inf. Process. Syst. 2021, 34, 14239–14251. [Google Scholar]
  41. Shen, D.; Qin, C.; Zhang, Q.; Zhu, H.; Xiong, H. Handling over-smoothing and over-squashing in graph convolution with maximization operation. IEEE Trans. Neural Netw. Learn. Syst. 2024, 36, 8743–8756. [Google Scholar] [CrossRef]
  42. Jamadandi, A.; Rubio-Madrigal, C.; Burkholz, R. Spectral graph pruning against over-squashing and over-smoothing. Adv. Neural Inf. Process. Syst. 2024, 37, 10348–10379. [Google Scholar]
  43. Huang, K.; Wang, Y.G.; Li, M. How universal polynomial bases enhance spectral graph neural networks: Heterophily, over-smoothing, and over-squashing. arXiv 2024, arXiv:2405.12474. [Google Scholar] [CrossRef]
  44. Bing, R.; Yuan, G.; Zhu, M.; Meng, F.; Ma, H.; Qiao, S. Heterogeneous graph neural networks analysis: A survey of techniques, evaluations and applications. Artif. Intell. Rev. 2023, 56, 8003–8042. [Google Scholar] [CrossRef]
  45. Sang, L.; Wang, Y.; Zhang, Y.; Wu, X. Denoising heterogeneous graph pre-training framework for recommendation. ACM Trans. Inf. Syst. 2025, 43, 1–31. [Google Scholar] [CrossRef]
  46. Ding, L.; Li, C.; Jin, D.; Ding, S. Survey of spectral clustering based on graph theory. Pattern Recognit. 2024, 151, 110366. [Google Scholar] [CrossRef]
  47. Wan, G.; Tian, Y.; Huang, W.; Chawla, N.V.; Ye, M. S3GCL: Spectral, swift, spatial graph contrastive learning. In Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria, 21–27 July 2024. [Google Scholar]
  48. Chung, F.R. Spectral Graph Theory; American Mathematical Society: Providence, RI, USA, 1997; Volume 92. [Google Scholar]
  49. Luan, S.; Hua, C.; Lu, Q.; Zhu, J.; Zhao, M.; Zhang, S.; Chang, X.W.; Precup, D. Revisiting heterophily for graph neural networks. Adv. Neural Inf. Process. Syst. 2022, 35, 1362–1375. [Google Scholar]
  50. Maurya, S.K.; Liu, X.; Murata, T. Simplifying approach to node classification in graph neural networks. J. Comput. Sci. 2022, 62, 101695. [Google Scholar] [CrossRef]
  51. Yun, S.; Jeong, M.; Kim, R.; Kang, J.; Kim, H.J. Graph Transformer Networks. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2019. [Google Scholar]
  52. Caville, E.; Lo, W.W.; Layeghy, S.; Portmann, M. Anomal-E: A self-supervised network intrusion detection system based on graph neural networks. Knowl.-Based Syst. 2022, 258, 110030. [Google Scholar] [CrossRef]
  53. Biau, G.; Scornet, E. A random forest guided tour. Test 2016, 25, 197–227. [Google Scholar] [CrossRef]
  54. Xue, H.; Yang, Q.; Chen, S. SVM: Support vector machines. In The Top Ten Algorithms in Data Mining; Chapman and Hall/CRC: Boca Raton, FL, USA, 2009; pp. 51–74. [Google Scholar]
  55. Windeatt, T. Accuracy/diversity and ensemble MLP classifier design. IEEE Trans. Neural Netw. 2006, 17, 1194–1211. [Google Scholar] [CrossRef] [PubMed]
  56. Natekin, A.; Knoll, A. Gradient boosting machines, a tutorial. Front. Neurorobot. 2013, 7, 21. [Google Scholar] [CrossRef] [PubMed]
  57. Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16), San Francisco, CA, USA, 13–17 August 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
  58. Quinlan, J.R. Learning decision tree classifiers. ACM Comput. Surv. (CSUR) 1996, 28, 71–72. [Google Scholar] [CrossRef]
  59. LaValley, M.P. Logistic regression. Circulation 2008, 117, 2395–2399. [Google Scholar] [CrossRef]
  60. Kim, D.; Oh, A. How to Find Your Friendly Neighborhood: Graph Attention Design with Self-Supervision. arXiv 2022, arXiv:2204.04879. [Google Scholar] [CrossRef]
  61. Hamilton, W.L.; Ying, R.; Leskovec, J. Inductive Representation Learning on Large Graphs. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 1024–1034. [Google Scholar]
  62. Bianchi, F.M.; Grattarola, D.; Livi, L.; Alippi, C. Graph neural networks with convolutional arma filters. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3496–3507. [Google Scholar] [CrossRef]
  63. Hu, W.; Liu, B.; Gomes, J.; Zitnik, M.; Liang, P.; Pande, V.; Leskovec, J. Strategies for Pre-training Graph Neural Networks. arXiv 2019, arXiv:1905.12265. [Google Scholar]
  64. Bo, D.; Wang, X.; Shi, C.; Shen, H. Beyond low-frequency information in graph convolutional networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Palo Alto, CA, USA, 2–9 February 2021; Volume 35, pp. 3950–3957. [Google Scholar]
  65. Do, T.U.; Ta, V.C. Tackling under-reaching issue in Beta-Wavelet filters with mixup augmentation for graph anomaly detection. Expert Syst. Appl. 2025, 275, 127033. [Google Scholar] [CrossRef]
  66. Silva, M.F.; Pacini, A.; Sgambelluri, A.; Valcarenghi, L. Learning long- and short-term temporal patterns for ML-driven fault management in optical communication networks. IEEE Trans. Netw. Serv. Manag. 2022, 19, 2195–2206. [Google Scholar] [CrossRef]
  67. Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Figure 1. HeteGNNs have two typical working steps: (1) Collect info in a specific metapath context. (2) Fuse the collected info and produce predictions. For example, with the metapath paper(P)-author(A)-paper(P), the features on paper-typed node-P0 can be propagated to paper-typed node-{P4,5,6} directly while ignoring author-typed node-A2.
Figure 1. HeteGNNs have two typical working steps: (1) Collect info in a specific metapath context. (2) Fuse the collected info and produce predictions. For example, with the metapath paper(P)-author(A)-paper(P), the features on paper-typed node-P0 can be propagated to paper-typed node-{P4,5,6} directly while ignoring author-typed node-A2.
Mathematics 13 03432 g001
Figure 2. An illustration of our proposed framework. (a) The Beta-Wavelet module generates initial node embeddings. (b) Based on pseudo-labels from the model, the graph is dynamically heterogenized with ‘Normal’ (N) and ‘Anomaly’ (A) node types. (c) An attention mechanism learns the importance of different Pseudo-Metapaths (e.g., A-N-A) to guide the aggregation of information and produce the final predictions, effectively mitigating the under-reaching issue.
Figure 2. An illustration of our proposed framework. (a) The Beta-Wavelet module generates initial node embeddings. (b) Based on pseudo-labels from the model, the graph is dynamically heterogenized with ‘Normal’ (N) and ‘Anomaly’ (A) node types. (c) An attention mechanism learns the importance of different Pseudo-Metapaths (e.g., A-N-A) to guide the aggregation of information and produce the final predictions, effectively mitigating the under-reaching issue.
Mathematics 13 03432 g002
Figure 3. Training loss curves for PseudoMetapathNet and baseline GNN models on the CICIDS2017, KDD CUP 99, and NSL-KDD datasets. The loss (y-axis) is plotted on a logarithmic scale against the number of training epochs (x-axis), illustrating the convergence behavior of each model.
Figure 3. Training loss curves for PseudoMetapathNet and baseline GNN models on the CICIDS2017, KDD CUP 99, and NSL-KDD datasets. The loss (y-axis) is plotted on a logarithmic scale against the number of training epochs (x-axis), illustrating the convergence behavior of each model.
Mathematics 13 03432 g003
Figure 4. Ablation analysis on CICIDS, KDDCup99, NSL-KDD, and UNSW-NB15. The bar charts illustrate the performance degradation in Accuracy, Precision, Recall, F1-Score, and AUC when either the Spectral module (green) or the Metapath module (red) is removed from the full model. All values represent the difference (Ablation − Full Model) in percentage (%) relative to the full model’s performance. The zero baseline indicates no change; negative values denote performance decline due to module removal.
Figure 4. Ablation analysis on CICIDS, KDDCup99, NSL-KDD, and UNSW-NB15. The bar charts illustrate the performance degradation in Accuracy, Precision, Recall, F1-Score, and AUC when either the Spectral module (green) or the Metapath module (red) is removed from the full model. All values represent the difference (Ablation − Full Model) in percentage (%) relative to the full model’s performance. The zero baseline indicates no change; negative values denote performance decline due to module removal.
Mathematics 13 03432 g004
Figure 5. The impact of varying the number of GNN layers and the cutoff threshold on model Accuracy across the four datasets.
Figure 5. The impact of varying the number of GNN layers and the cutoff threshold on model Accuracy across the four datasets.
Mathematics 13 03432 g005
Figure 6. The impact of varying the number of GNN layers and the cutoff threshold on model Recall across the four datasets.
Figure 6. The impact of varying the number of GNN layers and the cutoff threshold on model Recall across the four datasets.
Mathematics 13 03432 g006
Figure 7. A case study on a homophilous subgraph with four anomaly nodes. From left to right: (a) The ground truth labels, where all nodes are anomalies (red). (b) The prediction confidence from the baseline GCN, which catastrophically fails by classifying all anomalies as normal (blue). (c) The prediction confidence from our PseudoMetapathNet, which correctly identifies all nodes with maximum confidence (red). (d) The learned attention weights of PseudoMetapathNet for different Pseudo-Metapaths, revealing the high importance assigned to the A-A (Anomaly–Anomaly) path.
Figure 7. A case study on a homophilous subgraph with four anomaly nodes. From left to right: (a) The ground truth labels, where all nodes are anomalies (red). (b) The prediction confidence from the baseline GCN, which catastrophically fails by classifying all anomalies as normal (blue). (c) The prediction confidence from our PseudoMetapathNet, which correctly identifies all nodes with maximum confidence (red). (d) The learned attention weights of PseudoMetapathNet for different Pseudo-Metapaths, revealing the high importance assigned to the A-A (Anomaly–Anomaly) path.
Mathematics 13 03432 g007
Table 1. Performance (%) on CICIDS2017 dataset.
Table 1. Performance (%) on CICIDS2017 dataset.
ModelAccuracyPrecisionRecallF1AUC
RandomForest81.8363348.1070545.4350846.7315187.64375
SVM81.0379243.5765842.9564843.2648184.37055
MLPClassifier79.7405226.3687226.3536726.3611970.89022
GradientBoosting79.2415257.4344853.9285055.6264883.23684
XGBoost77.6447176.5128478.1411277.3182184.47390
DecisionTree65.3692638.9292337.8992238.4068262.74335
LogisticRegression16.167676.001405.732755.8643450.67821
SuperGAT98.0039998.1969684.7982991.0218799.41239
GraphSAGE97.2055999.0219574.8245385.2235695.42238
ARMA95.9081891.8234187.4578989.5816294.19063
GIN84.6307471.4355169.2188470.3092581.66264
ACM-GCN97.5148296.5811281.2234188.2458396.88175
FSGNN96.9338194.1128480.5337286.7915296.10234
FAGCN98.1528497.0215385.9813291.1524898.52185
BWGNN98.5029997.3442689.1377193.0634598.94627
PseudoMetapathNet98.6027999.5642890.1783194.6368197.57524
Table 2. Performance (%) on KDD CUP 1999 dataset.
Table 2. Performance (%) on KDD CUP 1999 dataset.
ModelAccuracyPrecisionRecallF1AUC
RandomForest82.3762463.6617549.9019755.9432182.22796
SVM93.4653581.2513278.9145680.0687187.44851
MLPClassifier92.2772355.8178252.3498154.0254367.45665
GradientBoosting42.1794049.1852234.4780440.5753178.62996
XGBoost91.7821879.5491777.2923478.4066586.63364
DecisionTree36.8316832.6690734.4097333.5165958.51039
LogisticRegression80.8910947.2753024.2632532.0679664.63950
SuperGAT99.1089199.1150077.1794986.8524599.78974
GraphSAGE98.3723294.2123590.5511892.3486299.52858
ARMA98.0628892.8876589.7342191.2823499.05285
GIN98.2947593.1598288.1963290.6047199.50293
ACM-GCN98.8811995.0214286.1528390.3741599.65182
FSGNN98.5346594.8824184.2281389.2215399.58241
FAGCN99.0594196.1214287.5812091.6548299.71534
BWGNN99.2079296.8980185.7154590.9632499.73870
PseudoMetapathNet99.3069399.3230483.7179590.8654499.80544
Table 3. Performance (%) on NSL-KDD dataset.
Table 3. Performance (%) on NSL-KDD dataset.
ModelAccuracyPrecisionRecallF1AUC
RandomForest74.2313771.1804867.2032569.1325885.27167
SVM73.6421373.8879668.2467870.9547882.07971
MLPClassifier75.8194273.1678768.7653370.8962384.49529
GradientBoosting72.7535870.7539666.4135168.5146581.73078
XGBoost68.3872963.7552461.7189362.7202180.68568
DecisionTree57.8646153.3690852.1950752.7744864.57063
LogisticRegression59.6285444.3924744.1107944.2509172.10095
SuperGAT98.0478397.5469197.2939697.4202899.89684
GraphSAGE94.7827694.6038991.2395892.8913499.25991
ARMA93.5614993.0758289.4617891.2314599.08637
GIN88.2936592.2172879.6271385.4542398.39659
ACM-GCN95.8124295.3215292.1124193.6895299.41251
FSGNN95.1228494.9512490.8912492.8741299.35124
FAGCN97.5124196.9512496.8124296.8817899.79152
BWGNN98.0153897.1783197.6914597.4341999.83052
PseudoMetapathNet98.3747297.5943998.0112397.8024199.66414
Table 4. Performance (%) on UNSW-NB15 dataset.
Table 4. Performance (%) on UNSW-NB15 dataset.
ModelAccuracyPrecisionRecallF1AUC
RandomForest98.5371886.7971958.7726770.1158185.40183
SVM98.5629586.7971958.7726770.1158171.79104
MLPClassifier97.8153756.3372252.6359954.4225162.20454
GradientBoosting97.8496264.7260561.3069262.9649886.84699
XGBoost98.2738572.5196867.2910169.8037475.31566
DecisionTree97.8841764.7260561.3069262.9649871.30692
LogisticRegression98.9264899.4466867.6470680.4571983.44444
SuperGAT99.7582496.7733794.0667895.4013194.27323
GraphSAGE99.8173997.0079697.0079697.0079699.92819
ARMA98.6427697.1598696.8933397.0264399.90426
GIN99.9368397.2222299.8982898.5422499.92221
ACM-GCN99.8512497.1024197.5212497.3113899.92981
FSGNN99.8213196.9512497.1242197.0376599.91521
FAGCN99.8812497.2812498.0212497.6498499.93512
BWGNN99.7815696.7733794.0667895.4013199.93118
PseudoMetapathNet99.9649497.9487699.1534298.5471499.94016
Table 5. Performance comparison of Graph Neural Network models on three different graph anomaly detection datasets. For each dataset and metric, the highest value is indicated in bold and the second-highest in underline. Since the source code of BWMixup is not publicly available, we report its performance as they claimed in their paper [65].
Table 5. Performance comparison of Graph Neural Network models on three different graph anomaly detection datasets. For each dataset and metric, the highest value is indicated in bold and the second-highest in underline. Since the source code of BWMixup is not publicly available, we report its performance as they claimed in their paper [65].
DatasetModelPerformance Metrics
AUROC (%)AUPRC (%)Rec@K (%)
AmazonGAT90.4878.7474.78
GraphSAGE89.8766.8463.59
ARMA91.4381.4382.61
GIN83.6042.5952.33
BWGNN91.9981.4377.33
ACM-GCN60.3113.5513.59
FSGNN94.1680.1486.96
FAGCN86.9164.1258.70
BWMixup94.3682.9478.75
PseudoMetapathNet94.9582.6388.42
T-FinanceGAT95.1571.3672.12
GraphSAGE80.9215.6018.31
ARMA91.2171.3971.01
GIN85.0451.2356.82
BWGNN91.5663.9770.49
ACM-GCN87.0659.3367.82
FSGNN92.4078.6270.40
FAGCN85.3353.6261.07
BWMixup94.9372.0772.47
PseudoMetapathNet96.4086.6281.55
QuestionsGAT70.0214.767.53
GraphSAGE72.3416.8012.84
ARMA72.1914.4310.27
GIN67.7312.279.18
BWGNN70.9316.3910.27
ACM-GCN71.0917.0010.27
FSGNN73.3817.428.63
FAGCN68.4711.193.70
BWMixup64.428.0310.41
PseudoMetapathNet72.0018.099.45
Table 6. Performance comparison of various models on a optical failure dataset [66].
Table 6. Performance comparison of various models on a optical failure dataset [66].
Model NameAccuracyPrecisionRecallF1 ScoreAUC
SVM82.518481.726576.192378.859185.1029
RandomForest86.102285.341884.508284.923091.0455
DecisionTree73.459171.028470.153370.588278.5210
GradientBoosting87.953087.159386.921587.040292.8164
MLPClassifier84.382983.528182.043982.778589.5833
LogisticRegression79.204578.952275.839177.365082.4173
XGBoost88.420187.981287.239187.608693.1532
ARMA88.951188.102487.853387.977793.8519
GraphSAGE89.524089.153288.601988.876794.3168
FSGNN89.983489.557189.214489.385594.8812
ACM-GNN90.152289.829989.438289.633695.1046
SuperGAT90.671890.235590.012590.123895.7320
FAGCN90.934590.681390.358290.519596.0155
GIN91.228191.053390.529490.790696.3571
BWGNN92.481591.853391.769291.811297.2588
PseudoMetapathNet92.316091.954291.802191.878197.1053
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Qu, G.; Jin, H.; Zhang, L.; Ge, M.; Wu, X.; Li, H.; Xu, J. Enhance Graph-Based Intrusion Detection in Optical Networks via Pseudo-Metapaths. Mathematics 2025, 13, 3432. https://doi.org/10.3390/math13213432

AMA Style

Qu G, Jin H, Zhang L, Ge M, Wu X, Li H, Xu J. Enhance Graph-Based Intrusion Detection in Optical Networks via Pseudo-Metapaths. Mathematics. 2025; 13(21):3432. https://doi.org/10.3390/math13213432

Chicago/Turabian Style

Qu, Gang, Haochun Jin, Liang Zhang, Minhui Ge, Xin Wu, Haoran Li, and Jian Xu. 2025. "Enhance Graph-Based Intrusion Detection in Optical Networks via Pseudo-Metapaths" Mathematics 13, no. 21: 3432. https://doi.org/10.3390/math13213432

APA Style

Qu, G., Jin, H., Zhang, L., Ge, M., Wu, X., Li, H., & Xu, J. (2025). Enhance Graph-Based Intrusion Detection in Optical Networks via Pseudo-Metapaths. Mathematics, 13(21), 3432. https://doi.org/10.3390/math13213432

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop