Accelerating Subgraph Matching Through Advanced Compression and Label Filtering

Chai, Yanfeng; Li, Jiashu; Zhang, Qiang

doi:10.3390/a18090541

Open AccessArticle

Accelerating Subgraph Matching Through Advanced Compression and Label Filtering

by

Yanfeng Chai

^1,*

,

Jiashu Li

¹ and

Qiang Zhang

²

¹

College of Computer Science and Technolog, Taiyuan University of Science and Technology, Taiyuan 030024, China

²

School of Economics and Managemen, North University of China, Taiyuan 030051, China

^*

Author to whom correspondence should be addressed.

Algorithms 2025, 18(9), 541; https://doi.org/10.3390/a18090541

Submission received: 7 July 2025 / Revised: 13 August 2025 / Accepted: 19 August 2025 / Published: 26 August 2025

(This article belongs to the Section Randomized, Online, and Approximation Algorithms)

Download

Browse Figures

Versions Notes

Abstract

Efficiently identifying subgraphs that match a given query graph within large-scale graphs has become a critical focus in both academic and industrial research. Subgraph matching, a fundamental problem in graph algorithms, facilitates the effective querying of graph data and is fundamentally based on the subgraph isomorphism problem, which is known to be NP-complete. Among the various stages of subgraph matching, the filtering phase is particularly crucial as it directly affects the overall efficiency of the algorithm. A robust filtering mechanism can rapidly identify candidate nodes that satisfy the query criteria, thereby significantly reducing computational costs in the subsequent stages. The analysis of existing subgraph matching techniques reveals several challenges in the filtering stage: (1) redundant enumeration of equivalent nodes; (2) incomplete filtering due to structural limitations; and (3) excessive redundant validations during the verification phase. To overcome these issues, we propose an adaptive subgraph matching (ASM) framework that integrates efficient compressed graph nodes (CGNs) and a novel label count filter (LCF) algorithm. These innovations enhance the filtering process, resulting in significant improvements in query processing performance. Experimental evaluations demonstrate that our approach achieves substantial gains, outperforming state-of-the-art subgraph search and matching algorithms by several orders of magnitude in query processing time.

Keywords:

subgraph matching; subgraph isomorphism; reducing verification cost; node compression; label filtering

1. Introduction

With the growing prominence of graph data structures in modern application domains such as social network analysis [1], bioinformatics [2], and semantic search [3], increasing research efforts and resources have been dedicated to enhancing the search performance on large-scale graphs. As one of the most important issues in the field of graph analysis is subgraph matching; given a data graph G and a query graph q, the subgraph matching problem seeks to identify all the matches of q in G [4], which is known as the NP-complete problem [5]. That is, the time complexity of matching grows exponentially in the worst case. To reduce the search space and accelerate the subgraph matching query process, various methods have been proposed to minimize the number of candidate nodes in the data graph, such as constructing indexes that store vertex-adjacent features [6] or designing effective filtering strategies [7]. However, the enumeration and storage of substructures still incur exponential time and space complexity during computation [8]. Consequently, enabling a large data graph to enumerate all query results within an efficient time frame remains a key challenge in subgraph matching research.

In Table 1, methods for subgraph matching are broadly categorized into two groups: exploratory backtracking methods [9,10,11,12,13,14,15] and join methods [16,17]. Each algorithm has its own advantages and disadvantages. For example, the Ullmann [10] algorithm is the first subgraph isomorphic search algorithm, which does not define the matching order of the query vertices, but only utilizes the order of the input nodes of the query graph to perform sequential node-depth-first matches. The VF2 [11] algorithm selects the start node arbitrarily, and then proceeds to expand by selecting the next vertex that is connected to the query vertices that have already been matched. RapidMatch [16] for distributed environments performs queries in main memory without indexing, but is limited in scope and complex to implement. According to recent studies, the exploration backtracking method is suitable for large sparse graphs, while the join method is more suitable for small dense graphs [18]. The algorithmic studies mentioned above usually focus on analyzing and comparing the methods and performance of current algorithms [19]. Therefore, instead of trying to identify or compare differences between algorithms, we focus on current problems in subgraph matching and advanced algorithms to solve them. The problem of subgraph matching still faces several major challenges: (1) one is the repeated enumeration of equivalent nodes leading to degradation of search performance when faced with large-scale data graphs, and (2) the other is that for the generation of candidate node sets stage, the filtering conditions are incomplete and there still exists a portion of undesired candidate nodes, which leads to excessive overhead in the validation stage and affects the overall performance of the algorithm.

To address the above issues, this paper investigates the subgraph matching query problem for undirected graphs and proposes an efficient adaptive subgraph matching (ASM) algorithm. The proposed approach introduces a novel graph compression technique in the preprocessing stage, a new label count filter in the candidate node filtering stage, and an adaptive adjustment model for ultra-large-scale graphs. By reducing the size of the data graph and further filtering candidate nodes through a comprehensive filtering mechanism, the method not only improves data processing efficiency but also minimizes resource consumption.

Aiming at the above problems, the main contributions are summarized as follows:

(1): We propose a strict compressed graph node (CGN) technique to compress both the data graph and the query graph. Starting from the initial node, nodes belonging to the same equivalence class—based on their equivalence relationships—are compressed into a single node while preserving the original graph structure. This approach achieves a maximal compression of the graph data, effectively reducing the data graph to a smaller-scale representation.
(2): We introduce a label count-based filtering (LCF) algorithm. Existing filtering methods can exclude some nodes that do not meet query conditions but often incur significant redundant validations during the subsequent verification phase, resulting in high memory overhead and extended validation times. By incorporating filtering conditions based on the labeling characteristics of dataset nodes during the filtering phase, our method further reduces the search space size, thereby enhancing overall query processing performance.
(3): For large-scale graphs, we propose an adaptive tuning model that leverages caching to improve efficiency. This model accelerates subgraph matching (ASM) queries by reusing results from previous queries or by assessing the overall graph structure through a subset of frequently accessed (hot) nodes, forming an adaptive subgraph matching framework that dynamically adjusts according to dataset characteristics.
(4): Extensive experiments on multiple real-world datasets have been conducted to compare the performance of ASM with three other subgraph matching algorithms, evaluating metrics such as execution time, callback counts, and average search time. Experimental results demonstrate that ASM achieves superior performance in terms of query processing time.

2. Background

2.1. Preliminaries

To enhance the readers’ understanding, we introduce several key concepts that are essential for comprehending the content of our work. Table 2 lists the symbols that are frequently used in our work.

Theorem 1 (Graph).

In this paper, we focus on the undirected graph [20] G = (V, E, L), where V =

\{v_{1}, v_{2}, \dots, v_{m}\}

denotes the set of vertices and E =

\{(v_{i}, v_{j}) | v_{i}, v_{j} \in V\}

denotes the set of edges. vertices denote entities and edges denote relationships between two entities. The label description L(v) (or L(e)), contains information about the current node (or edge), which comes from the label set Σ.

Theorem 2 (Subgraph Matching).

Given a data graph G = (V, E, L) and a query q = (

V_{q}

,

E_{q}

, L). A matching subgraph [21] φ(q) of q in G is defined as a subgraph in G that satisfies the mapping function φ.φ maps a point u in q to a point φ(u) in G, and an edge e = (u, v) in q to an edge φ(e) = (φ(u), φ(v)) in G.

Theorem 3 (Subgraph Isomorphism).

Given a data graph G = (V, E, L) and a query graph q = (

V_{q}

,

E_{q}

, L), q is subgraph isomorphic [22] to G if and only if there is a bijective mapping M from

V_{q}

to V such that ∀u ∈

V_{q}

, ∃ M(u) ∈ V:

L_{V}

[u] ⊆

L_{V}

[M(u)] and ∀ u,

u^{'}

∈

V_{q}

, ∃ (u,

u^{'}

) ∈

E_{q}

: (M(u), M(

u^{'}

)) ∈ E and

L_{E}

[(u,

u^{'}

)] ⊆

L_{E}

[(M(u), M(

u^{'}

))].

Theorem 4 (Forward (Backward) Neighbors).

Given a φ-order query vertex, the forward (backward) neighbors [23]

N_{-}^{φ^{'}} (u)

,

N_{+}^{φ^{'}} (u)

of u ∈ φ are the neighbors that lie before (after) u in φ.

Theorem 5 (Knowledge Graph).

Knowledge graph (KG) can accurately semantically describe various entities and their connections in the real world [24], assigning practical meanings to nodes and edges in the graph. This practical meaning is formalized as labels for vertices and edges. Thus, a knowledge graph is formally defined as a labeled graph and is rich in semantic information [25]. A knowledge graph is formed as G = (V, E, L). Here, V is a set of vertices, L = (

L_{V}

∪

L_{E}

) is a labeling function,

L_{V}

assigns type-labels and attribute-labels to vertices and

L_{E}

assigns relation-labels to edges.

2.2. Related Work

We summarize existing works related to the subgraph matching as follows:

Domestic and international research on subgraph matching problem has a long history and has long been proven to be an NP-complete problem, but it is still a long-lasting research hotspot for scholars at home and abroad due to its wide range of application domains and high demand for applications [26]. Whether it is the exploratory backtracking method [27], which maps query vertices to data vertices iteratively through a certain query vertex matching sequence to extend the intermediate results, or the join method, one of the two frameworks described below is used:

Direct-enumeration framework [28]. This type of subgraph matching algorithm does not generate auxiliary index structures for preprocessing in advance, but prunes the candidate points through some filtering strategies during the enumeration process, so it directly accesses the original data graph to match the vertices and edges of the query graph during the enumeration process.
Indexing-enumeration framework [29]. This kind of subgraph matching algorithm preprocesses the data graph and the given query graph, and generates an auxiliary indexing structure [30] to maintain the candidate vertices and edges between candidate vertices, which is accessed during the enumeration process to match the vertices and edges of the query graph. During the enumeration process, when matching the vertices and edges of the query graph, they will access this index structure instead of directly accessing the original data graph, which can reduce many invalid accesses [31]. GADDI and SPath, as representatives of Offline-index, usually build indexes on data without communicating with real-time data sources, thus facilitating the filtering of candidate sets of query variables during the actual execution of the query; however, CFL [32] and CECI [33], as representatives of Online-index, need to dynamically build and update indexes in real-time or near-real-time data streams.

In large graphs, numerous subgraphs may exist, and some of them may occur frequently or be repeatedly utilized in subsequent algorithms. To improve efficiency and simplify later computations, these subgraphs can be preprocessed. Previous studies have shown that preprocessing [34] plays a critical role in the subgraph matching problem [35], as it can significantly reduce memory usage and shorten total runtime by several orders of magnitude. By performing preprocessing, the data can be effectively organized and optimized prior to algorithm execution, thereby minimizing resource consumption and computational time in subsequent operations.

Selection of root query node. In the process of subgraph matching, the selection of the starting node is an issue that needs to be emphasized [36]. If an appropriate matching starting node can be selected, some unmatched nodes can be excluded as early as possible, thus reducing the number of extended validations. If the starting node is not selected properly, it may cause a lot of redundant enumeration, consuming a lot of time to obtain the final correct result only through multiple extension verification. In this paper, the ranking rule is determined by the equation Rank(u) = freq(g,L(u))/d(u). Here, freq(g,L) denotes the number of nodes labeled L in the graph g, and d(u) denotes the degree of node u, i.e., the number of edges associated with node u.
Determining the matching (visit) order. Usually, when determining the matching order in the subgraph matching process, priority is given to nodes with a smaller number of candidate nodes, i.e., nodes with smaller degrees. The advantage of this approach is that it reduces the search space and decreases the size of the intermediate result set [37]. In addition, if a wrong candidate node is found during traversal, backtracking can also be performed to explore the next node at a lower cost of trial and error.
Generating the query tree. When a BFS traversal [38] is performed on the query graph starting from the root query node, this comes to create the query tree. The edges that appear on the query tree in the query graph are called tree edges (TEs). If an edge is on the query graph but not on the BFS tree, it is called a non-tree edge (NTE). BFS is used because, in some existing studies, it is shown that BFS minimizes the diameter of the search space [39].

3. Adaptive Subgraph Matching Architecture

In this section, we first present the compressed graph nodes (CGNs) algorithm in Section 3.1, followed by a description of the efficient filtering mechanisms in Section 3.2. Subsequently, Section 3.3 introduces the adaptive subgraph matching (ASM) algorithm, which is designed to further enhance performance.

Figure 1 illustrates the overall workflow of ASM, outlining the complete processing pipeline. The workflow consists of four primary components: input, equivalent-node compression of the knowledge graph, adaptive node filtering, and the output of the matching results. First, the data graph and query graph are loaded into memory, and root nodes are selected based on predefined formulas. The graphs then undergo strict equivalent-node compression, with edge connections updated according to the compressed nodes. The resulting compressed graph is further refined through multiple filtering stages to reduce the number of valid candidate nodes. In the adaptive module, a caching mechanism stores results from previous queries, thereby accelerating subsequent searches and improving efficiency. Finally, the matching results are generated and output based on the identified matches.

3.1. Compressed Graph Nodes (CGNs) Algorithm

In large graph databases, many nodes may share identical structures. These nodes can be small twigs within the larger graph or key nodes embedded in its core structure. Reducing the number of one-to-one node matches in the subgraph matching process is critical for improving algorithmic efficiency. To address the issue of the repetitive enumeration of equivalent nodes, we propose a strict compressed graph nodes (CGNs) algorithm for preprocessing graph data. Before the matching algorithm begins, multiple equivalent nodes in the graph are merged into a single node, thereby reducing the size of the original graph and minimizing the number of one-to-one matches required during the validation phase.

The graph compression idea is used in the existing Boost_ISO [40] algorithm. The algorithm first classifies nodes with equivalence relationships and merges those sharing the same label in the original data graph into a single node. It then examines the edges in the original graph to determine whether connecting edges exist between these merged nodes. Based on this information, the nodes with equivalence relationships are further transformed into an undirected equivalence graph.

Although this algorithm avoids the unnecessary repetitive enumeration of equivalent nodes during the subgraph isomorphism verification stage, it still contains redundant operations in the process of identifying equivalent nodes. For instance, as illustrated in Figure 2, during the verification of the path V₁-V₂-V₅, it can be determined that there is no equivalence relationship between V₁ and V₅. However, because the node has not been marked as visited, the verification of the relationship between V₁ and V₅ is repeated when processing the path V₁-V₃-V₅.

The concept related to equivalent nodes is also proposed in the literature [40] to compress the original graph. The paper proposes that two nodes are considered as equivalent nodes if they have the same parent node and the same child node after clipping the edges between them. This is an approximate node-compression technique. Although this approximate node-compression technique has been widely used, its disadvantages are obvious. This is because it is prone to produce results that cannot be returned to the original graph, thus causing the problem of over-compression.

How can it be determined whether the equivalent node is more appropriate? Our understanding will be given below: (1) Same-labeled nodes with the same father and child nodes in the case of non-prunable existing edges; and (2) Isolated nodes with the same label (nodes with degree 0).

In Algorithm 1, the main flow of the CGN algorithm for graph compression strategy is given below:

Algorithm 1 Compressed graph nodes (CGNs).

Require: Data graph g, query graph q
Ensure: Compressed graph g′, q′

1:: Create the compressed graph’s vertex set V;
2:: // Creating a collection of nodes for a compressed graph
3:: for each v ∈ {v|v ∈ V(g)} do
4:: nodeExamination(v);// Examination of each node
5:: end for
6:: V₀ = $\{v | v \in V, d (v) = 0\}$
7:: group v ∈ V₀ by labels
8:: update V = V − V₀
9:: if (v == V_n ∈ V) then
10:: V_n ← v;
11:: else
12:: V ← v;
13:: end if
14:: for each v ∈ {v|v ∈ V(g)} do
15:: nodeExamination(e);// Examination of each edge
16:: end for
17:: find all edges connect to the equal points of v in g
18:: if (the begin-point is in v AND has edges between v and v′) then
19:: create an edge e between v and v′;
20:: record points in e;
21:: update the vertex number and edge number of g′;
22:: end if

First, the set of nodes of the compressed graph is created (line 1) and each node in the original graph g is examined (line 3). If an isolated node with node degree 0 is found, the processing rules for isolated nodes are used (lines 6–7), at which point the data graph nodes to be found as equivalent nodes are updated (line 8). When the first node V₁ of the original graph arrives, the set V of nodes of the compressed graph is the empty set. So add V₁ directly to the point set V. When the second node V₂ of the original graph arrives, go to the collection of nodes of the compressed graph to find out whether V₂ is equivalent to one of the recorded nodes. If V₂ is equivalent to V₁, then V₂ is added to the domain of equivalent nodes of V₁. Otherwise, V₂ will be added directly to V. Scan each node of the original graph in turn until all nodes have been visited. At this point (lines 9–11), the processing for points is finished, and the following processing for edges is carried out, which is based on the compressed node set just obtained. Nodes are selected from V in turn. We select the first equivalent node V_a of V₁. We query the neighbor table of V_a in the original graph, and if we find that there is an edge starting from V_a and its endpoint is not in the domain of equivalent nodes of V_a, we add an edge between V₁ and V_a.

The above Figure 3a is an example. We can figure out that the values of labels A, B, C, and D are 0.3, 0.5, 0.4, and 0.5, respectively, so we decide to start traversing from the label A. We start our examination at V₁, at which point the set of compressed graph nodes is V = Ø. Thus, it is straightforward to add V₁ to the point set V, denoted by

V_{1}^{'}

. At this point V =

\{V_{1}^{'}\}

and

V_{1}^{'}

=

\{V_{1}\}

. For the second node V₂, first determine the equivalence with the nodes in the set V. It is determined that V₂ is not equivalent to

V_{1}^{'}

, so V₂ is added to V. At this point V =

\{V_{1}^{'}, V_{2}^{'}\}

,

V_{2}^{'}

=

\{V_{2}\}

. Similarly, V₃ is added to V. At this point, V =

\{V_{1}^{'}, V_{2}^{'}, V_{3}^{'}\}

. When examining V₄, it is found that V₄ is equivalent to

V_{3}^{'}

in V. Therefore, V₄ is added to the domain of equivalent nodes of the

V_{3}^{'}

. At this point, V =

\{V_{1}^{'}, V_{2}^{'}, V_{3}^{'}\}

and

V_{3}^{'}

=

\{V_{3}, V_{4}\}

. Finally, the V₅ is examined, and since there is no node that is equivalent to it, V₅ is added directly to V. At this point, V =

\{V_{1}^{'}, V_{2}^{'}, V_{3}^{'}, V_{4}^{'}\}

and

V_{4}^{'}

=

\{V_{5}\}

.

Once the processing of the points is complete, the edges are then processed accordingly. Each node from V is selected in turn for examination. Take out

V_{1}^{'}

first and examine each node in

V_{1}^{'}

. It is found that V₁ ∈

V_{1}^{'}

, there is an edge from V₁ to V₃ in the original graph, and V₃ ∈

V_{3}^{'}

. Therefore, an edge from

V_{1}^{'}

to

V_{3}^{'}

is added to the compressed graph. Continuing the examination, it is found that there is an edge from V₁ to V₄ and V₄ ∈

V_{3}^{'}

. However, at this point, it will be found that an edge already exists between

V_{1}^{'}

and

V_{3}^{'}

. Therefore there is no need to repeat the insertion of this edge. The compression completion effect is shown in Figure 4.

From the above description, it is evident that executing the CGN algorithm involves traversing the original graph. For node processing, each node in the original graph must be visited and examined, resulting in a worst-case time complexity of O(n²) for node compression. For edge processing, it is sufficient to traverse the edges of the original graph sequentially, leading to an edge-adding operation with a time complexity of O(E).

3.2. Efficiency Filtering Mechanisms

To enhance algorithm performance, the preprocessing–enumeration subgraph matching algorithm introduces a preprocessing phase prior to enumeration, aiming to reduce the size of the candidate set for each query vertex and obtain accurate statistics to optimize the matching order. Specifically, the preprocessing phase generates a complete set of candidate vertices for each query vertex. The overall process consists of three phases: candidate vertex set generation, matching order determination, and enumeration. To minimize the cost of enumeration, an effective filtering algorithm is employed to ensure that the candidate set is as small as possible while remaining complete.

Using only some looser filtering conditions leads to the existence of a large number of wrong candidate vertices and increases the computation, so the stricter the matching conditions, the better the results [27]. Since labels on both the data graph and query graph are represented by numbers, this makes the labels not only have the function of identifying nodes, but also add computational functions. We propose the concept of label count filter (LCF) for the dataset characteristics, i.e., first determine the number of nodes in the query graph, then add the label counts of all nodes to get the corresponding result, and then compare and filter with the labels of nodes with the same number of vertices in the datagram so that the LC in the datagram is equal to the LC in the query graph. The candidate nodes that pass successfully come to the label and degree filter (LDF) next, and the methods take into account the node’s label information while also incorporating the node’s degree information. Specifically, they calculate the number of labels and degree value of each node separately in the query graph, and then perform the same calculation for each node in the data graph to obtain the number of labels and degree value of all nodes in the data graph. Next, the values of the query graph are compared with the values of the data graph to filter out the set of data graph nodes that do not match and retain the nodes with exactly matching values for the combination of label and degree. Finally superimposed on the neighbor label count filter (NLCF), which is a filtering method based on the label count of neighboring nodes. Firstly, the neighbor nodes of each node are label counted to get the frequency information of the neighbor node labels. Then, this frequency information is compared with the neighbor node label counts of the corresponding nodes in the query graph. Through this comparison, the nodes whose neighbor node label counts in the data graph match those in the query graph can be filtered to achieve efficient node filtering. This helps to extract the candidate set in space and time. The main process of extracting the candidate set is as follows:

Algorithm 2 gives the process of extracting candidate sets:

First, the boundaries of the query are determined, and we generate our bounding set with the concatenated set of the parents of all nodes. For each node, we set up four filters to filter them. The first is the label count filter, as the name suggests, which uses the sum of the numbers of labels used by the nodes in the query graph to compare with the data graph to find the candidate nodes that satisfy the conditions. The next label filter (LF) collects the neighboring nodes vs. in

v_{f}

with the u label, that is, nodes with different labels are removed for further filtering. The degree filter ensures that the degree (number of connected edges) of the data graph node vs. is greater than or equal to the degree of the query node u. Neighbor label count filter imposes two constraints: one is to satisfy the fact that the number of labels of the neighbors of the current node in the data graph is greater than or equal to the number of neighbor labels of the corresponding node in the query graph; rather, the labels corresponding to the same number of labels need to be the same as the kind as well. When generating the candidate set through the forward neighbor vertices, for the candidate nodes on the tree edges, if they are filtered by any of the above filters during the matching process because they do not satisfy the filtering conditions, we choose to remove their parent nodes and their corresponding child nodes in the tree edges at the same time; for the non-tree edge candidate nodes that do not satisfy the conditions, we directly remove the nodes. While generating the candidate set by backward neighboring vertices, we refine the candidate set along the opposite direction from the forward direction to get a smaller and more accurate candidate set. For the nodes that successfully pass all the filtering conditions, these are useful candidate nodes and they can be used to form the subgraph used for matching, which we store for the next step. The time complexity of Algorithm 2 is O(

|E (q)| \times |E (g)|

).

Algorithm 2 Extract candidates algorithm.

Require: Data graph g, query graph q
Ensure: candidate set C

1:: for u ∈ T_q in BFS order do
2:: $u_{p} . frontiers = \cup u_{p} . TE_Candidate$
3:: end for
4:: for v_f ∈ u_p.frontiers do
5:: // label count filter
6:: LCF(N( $v_{f}$ ),q);
7:: // label filter
8:: LF(N( $v_{f}$ ),L_q(u));
9:: // degree filter
10:: DF(u,v);
11:: // neighbor label count filter
12:: NLCF(u,v);
13:: u.TE_candidate[ $v_{f}$ ].add(v);
14:: //Forward Vertex Neighbor Generation Candidate Set
15:: end for
16:: for i ← 2 to $|φ^{'}|$ do
17:: u ← $φ^{'}$ [i];
18:: end for
19:: for vs. ∈ $⋂_{u^{'} \in N_{-}^{φ^{'}} (u)}$ N(u′.C) do
20:: //Backward Neighborhood refined Candidate Collection
21:: for i ← $|φ^{'}|$ to 1 do
22:: u ← $φ^{'}$ [i];
23:: for vs. ∈ u.C and vs. $\notin ⋂_{u^{'} \in N_{+}^{φ^{'}} (u)}$ N(u′.C) do
24:: remove vs. from u.C;
25:: end for
26:: end for
27:: end for

3.3. Adaptive Subgraph Matching (ASM) Algorithm

From prior research and empirical evidence, it is observed that the label characteristics of a dataset primarily influence the performance of the filtering phase, whereas the density characteristics significantly affect the performance of the sorting phase. Accordingly, the adaptive algorithm proposed in this paper evaluates the label density and adaptively selects the optimal strategy based on the dataset’s label density profile.

Given the substantial time and memory overhead required to load a mega-graph into memory, we introduce a novel concept—local weights. By partitioning the mega-graph into localized subregions, the candidate set of high-degree or “popular” nodes in the query graph can be leveraged to pinpoint approximate target regions. If isomorphic subgraphs exist, they must reside within these localized subregions, allowing subgraph isomorphism queries to be executed directly on them. This approach effectively reduces the scope of the query, thereby improving overall efficiency.

The inputs to the adaptive subgraph matching (ASM) algorithm (Algorithm 3) are a data graph g and a query graph q. The output is the result of the isomorphism of all the subgraphs of q in g. A central node that is most closely connected to other nodes is found by comparing the node influence of nodes in the query graph. Local weights consider the importance of a node relative to the entire query graph, including information such as the connectivity between nodes, and the degree of a node. Firstly, each node is assigned the same value AA_u, and the sum of AA values of all nodes is 1. During a new round of AA value calculation, node u will use the AA_u value of its own node as a weight to equally distribute to the other nodes that are connected. Nodes with no connected edges (degree 0) have unchanged AA_u values; the updated node’s AA_u value becomes the sum of the weights of all pointing nodes. Complete the above steps until the AA_u values gradually converge. The stable AA_u value of each node is finally obtained (line 8). Where N(u) is the set of all nodes pointing to node u, node vs. is a node belonging to the set N(u), and L(v) is the out-degree of node v. The local influence of a node is measured by comparing its final AA_u value.

Algorithm 3 Adaptive subgraph matching (ASM).

Require: Data graph g, query graph q
Ensure: Subgraph isomorphism of all query graphs in a data graph

1:: //preprocessing process of dataset
2:: function CGN(g(v,e))
3:: for each v ∈ {v|v ∈ V(g)} do
4:: nodeExamination(v);// Examination of each node
5:: nodeExamination(e);// Examination of each edge
6:: end for
7:: end function
8:: function
9:: ExtractCandidate(LCF(N( $v_{f}$ ),q),LF(N( $v_{f}$ ),L_q(u)),DF(u,v),NLCF(u,v))
10:: SCORE ← GET_GRAPH_DENSITY;
11:: $A A (u) = \sum_{u \in N (u)} \frac{A A (v s .)}{L (v s .)}$
12:: //The filtering method of ASM
13:: if score(G) >= score_threshold then
14:: C ← GQL_FILTER;
15:: else
16:: C ← DAF_FILTER;
17:: end if
18:: //The ordering method of ASM
19:: π ← GenerateMatchingOrder(q,g,C,score(g));
20:: //The enumeration method of ASM
21:: SUBGRAPHMATCH(q,g,C,score(g));
22:: end function

The labeled density value score(g) of the nodes in the data graph is then computed. When score(g) is greater than or equal to the score_threshold (which is set empirically and artificially), we recognize the graph as dense and vice versa as sparse. Depending on the characteristics of the dataset, we can choose the appropriate algorithm at each stage, as no algorithm can overwhelm the others on all queries [18].

When subgraph matching queries are performed, a caching approach can also be used to accelerate subgraph matching queries by reusing previous query results [41]. The matching results between the query graph and the data graph are first stored in a cache. These results can be stored in an in-memory data structure, such as a hash table, a tree structure, or a database. When a new subgraph matching query is performed, it first checks whether the matching result corresponding to the current query graph exists in the cache. If so, these results can be retrieved directly from the cache without having to execute the matching algorithm again.

4. Experiments

This section presents a comprehensive analysis of the proposed method, along with a comparison of the experimental results. It begins by describing the experimental setup in Section 4.1, including hardware and software configurations as well as the specific implementation parameters. The effectiveness of the method is then validated through a series of subgraph matching experiments—a critical task in many graph-based applications—reported in Section 4.2 and Section 4.3. The evaluation provides an in-depth assessment of the method’s performance across diverse scenarios. To ensure the reliability of the results, all reported values are averaged over five independent runs, thereby reducing the influence of random fluctuations and providing a robust measure of consistency and accuracy.

4.1. Experimental Setting

This section presents a detailed description of the experimental environment, the datasets used for evaluation, and the performance metrics adopted. First, it specifies the hardware and software configurations of the experimental setup, including the computational resources employed. Next, it introduces the experimental datasets, outlining their structure, size, and relevance to the subgraph matching task. Finally, it describes the evaluation metrics used to assess the proposed method, focusing on execution time, the number of callbacks, and the average callback time measured in this study.

4.1.1. Experimental Environment

The environment configurations for the experiments in this paper are shown in Table 3. All the source codes are implemented in C++. Experiments are conducted on a PC running Ubuntu with Intel i3-6100 3.70 GHz CPUs, 7.7 GB memory, and 1T disk capacity. All evaluation results are averaged over five runs.

4.1.2. Datasets

In order to validate the proposed algorithm in this paper more accurately and explicitly, the experiments in this paper are mainly conducted on three real datasets, which have been widely used in previous work [30]. The dataset data are all from the Stanford Large Network Dataset Collection [42].

Dblp Dataset: A dataset of scholarly literature in the field of computer science, containing metadata information on a large number of scholarly and conference papers in computer science, information technology, and related fields. The dataset contains 317,080 nodes and 1,049,866 edges.
YouTube Dataset: A dataset containing user behavior and video information. Based on the videos uploaded and recommended by a user in YouTube, a relationship graph is built to get the content uploaded or recommended by other users with similar preferences to that user, and finally the content is pushed to that user. This dataset contains 1,134,890 nodes and 2,987,624 edges.
Human Dataset: A graph dataset describing human protein interactions, where each node represents a protein entity and edges represent interactions between proteins. The dataset contains 4674 nodes and 86,292 edges.

4.1.3. Evaluation Metrics

Experiment with the subgraph matching algorithm (SUPER) designed in this paper and comparatively analyze the following existing algorithms with different indexing and sorting strategies:

ASM: The adaptive subgraph matching algorithm we proposed in this paper contains the efficiency Compressed Graph Node (CGN) algorithm and the label filter mechanism.
CFL [32]: One of the most advanced algorithms. Effectively reduces the search space through two phases: join filtering and locally sensitive filtering. Uses tree-structured indexing and path-based sorting strategy.
GraphQL [13]: Employing a neighborhood signature filter and a left-depth connected sorting strategy, the method models the query as a left-depth connected tree, where the leaf nodes are the set of candidate vertices.
CECI [33]: One of the state-of-the-art algorithms. Compact embedded clustering index is designed to sort the tree edge and non-tree edge candidate nodes using forward BFS traversal filtering and reverse BFS refinement process. Compared with the other three state-of-the-art algorithms, the proposed method demonstrates clear improvements in execution time, the number of algorithm callbacks, and the average query time during the matching process. For each dataset, results are reported for query graph node sizes ranging from 3 to 7.

Based on the method proposed in this paper, combined with the comparison algorithms used in the experiments, the overall performance of different algorithms in the experimental process is mainly measured by the following evaluation indexes to measure their effectiveness:

Execution time: the average time for processing the query graph in the query set, excluding the time for loading data from disk, which mainly includes filtering nodes, indexing time, and enumeration time.
Execution algorithm efficiency: the average number of callbacks and the average search time are used as metrics. In program operation, the main time overhead lies in the recursive callbacks of the algorithm, such as depth-first search, node order matching, etc. The fewer the number of callbacks or the shorter the search time, the higher the efficiency of the algorithm.

4.2. Evaluations on Execution Time

The overall performance of the algorithm compares the execution time and implementation algorithm efficiency of different datasets.

According to the experimental results shown in Figure 5a, Figure 6a and Figure 7a, different algorithms exhibit significant differences in execution time on the three datasets. The experimental results are presented according to the increasing scale of the number of nodes in the query graph, and it can be clearly seen that all the algorithms usually take more time to process larger queries. Specifically, the four algorithms, ASM, CFL, GQL, and CECI, exhibit different performances on different data graphs. Notably, the ASM algorithm consistently outperforms the other algorithms on a variety of data graphs, fully demonstrating its superior performance in terms of efficiency and stability. Whether it is a graph with a large or small amount of data, ASM demonstrates a stable execution time and superior processing power. In addition, the performance gap between these four algorithms on the Human dataset is relatively small. This phenomenon can be attributed to the fact that, although the Human dataset itself has a small amount of data, it has a large number of different labels that make the dataset relatively complex to query. A large number of labels can significantly increase the difficulty of a query. In the case of node number 6, for example, the execution time of ASM on the DBLP dataset is reduced by 75.34%, 31.27%, and 18.35% in that order.

4.3. Relative Performance Comparison Evaluation

The relative performance of the algorithm is affected by the query time: when the query time is shorter, the value of the relative performance is smaller, which represents the better performance of the algorithm. From the experimental results in Figure 5, Figure 6 and Figure 7, it can be found that ASM has the shortest or close to the shortest running time in different datasets, so the relative performance of ASM is more competitive due to the fact that there are a large number of erroneous candidate vertices in the candidate set of the other three types of algorithms and the order of generating matches is less efficient.

4.3.1. Evaluations on the Counts of Callbacks

In the experimental results shown in Figure 5b, Figure 6b and Figure 7b, we explore the number of callbacks of different algorithms on the three datasets. It can be seen that, although CECI uses more complex and stringent constraints to filter candidate nodes, it also increases the computational complexity at each step of the matching process, resulting in a significant increase in the number of combinations that satisfy all of these complex constraints. ASM obtains fewer candidates than the other algorithms on most of the datasets, which suggests that ASM has a stronger filtering capability. Since Human is dense, the effect of ASM on this datagram is more significant than that of other algorithms. In the case of node number 6, for example, ASM’s execution time on the DBLP dataset is reduced by 71.47%, 73.82%, and 75.08% in that order.

4.3.2. Evaluations on the Average Callbacks Time

The average callback times of the different algorithms on the three datasets are shown in Figure 5c, Figure 6c and Figure 7c. GQL can increase the chance of data reuse by increasing the number of nodes, which reduces the average search time. However, when the number of nodes is increased to a certain level, the complexity of the data still leads to an increase in the search time of the query due to the need to process more nodes and more data. It can be seen that ASM is able to maintain the horizontal trend under ideal conditions, and this performance is significantly better than the other algorithms. The ASM algorithm effectively reduces the number of unnecessary calculations and callbacks through its own advantages, and even when the number of nodes is increased, ASM is able to optimize the processing logic to avoid a significant increase in the search time. Again taking the case of node number 6 as an example, ASM reduces the execution time on the DBLP dataset by 90.33%, 76.5%, and 64.54% in that order. Combining the number of callbacks and callback time, we can also infer the space usage of the candidate nodes. Since the callback number and callback time directly affect the memory usage and computational overhead of the algorithm, the advantages of the ASM algorithm in these two aspects indicate that the space occupation of its candidate nodes is also relatively small. This means that, in practice, ASM cannot only process large-scale data efficiently, but also save storage space effectively, further enhancing its applicability in resource-constrained environments. In contrast, other algorithms often require more memory space to store candidate nodes and intermediate results when processing complex queries due to the high number of callbacks and the long callback time, which may lead to performance bottlenecks in large-scale data processing.

5. Conclusions

To enhance the performance of subgraph matching, we propose a strict compressed graph node (CGN) mechanism in the preprocessing stage, which processes the number of nodes and edges in both the data graph and the query graph to reduce redundant nodes. In the filtering stage, a label count filter based on dataset characteristics is employed, in combination with a label filter, a degree filter, and a neighbor label count filter. This comprehensive filtering mechanism significantly improves the efficiency of the overall subgraph matching process. Experimental results demonstrate that the preprocessing-based adaptive subgraph matching (ASM) algorithm achieves notable advantages in matching speed compared with the mainstream algorithms CFL, GQL, and CECI.

Author Contributions

Y.C. and J.L. conceived the experiments, J.L. conducted the experiments, Y.C. and J.L. analyzed the results. Y.C., Q.Z. and J.L. wrote and reviewed the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by Fundamental Research Program of Shanxi Province (No. 202403021211085), Scientific and Technological Innovation Programs of Higher Education Institutions in Shanxi (No. 2022L323), Taiyuan University of Science and Technology Scientific Research Initial Funding (No. 20232003).

Conflicts of Interest

The authors declare no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. This manuscript complies with ethical standards.

References

Guo, M.; Chi, C.H.; Zheng, H.; He, J.; Zhang, X. A subgraph isomorphism-based attack towards social networks. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Melbourne, Australia, 14–17 December 2021; pp. 520–528. [Google Scholar]
Sahoo, T.R.; Patra, S.; Vipsita, S. Decision tree classifier based on topological characteristics of subgraph for the mining of protein complexes from large scale PPI networks. Comput. Biol. Chem. 2023, 106, 107935. [Google Scholar] [CrossRef]
Xu, Q.; Wang, X.; Li, J.; Gan, Y.; Chai, L.; Wang, J. StarMR: An efficient star-decomposition based query processor for SPARQL basic graph patterns using MapReduce. In Proceedings of the Web and Big Data: Second International Joint Conference, APWeb-WAIM 2018, Macau, China, 23–25 July 2018; Proceedings, Part I 2. Springer: Berlin/Heidelberg, Germany, 2018; pp. 415–430. [Google Scholar]
Kim, H.; Choi, Y.; Park, K.; Lin, X.; Hong, S.H.; Han, W.S. Fast subgraph query processing and subgraph matching via static and dynamic equivalences. VLDB J. 2023, 32, 343–368. [Google Scholar] [CrossRef]
Hartmanis, J. Computers and intractability: A guide to the theory of np-completeness (michael r. garey and david s. johnson). Siam Rev. 1982, 24, 90. [Google Scholar] [CrossRef]
Sun, Y.; Li, G.; Du, J.; Ning, B.; Chen, H. A subgraph matching algorithm based on subgraph index for knowledge graph. Front. Comput. Sci. 2022, 16, 163606. [Google Scholar] [CrossRef]
Ba, L.; Liang, P.; Gu, J. Subgraph Matching Algorithm Based on Preprocessing-enumeration. Comput. Technol. Dev. 2023, 33, 85–91. [Google Scholar]
Zeng, L.; Jiang, Y.; Lu, W.; Zou, L. Deep analysis on subgraph isomorphism. arXiv 2020, arXiv:2012.06802. [Google Scholar]
Choi, Y.; Park, K.; Kim, H. BICE: Exploring Compact Search Space by Using Bipartite Matching and Cell-Wide Verification. Proc. VLDB Endow. 2023, 16, 2186–2198. [Google Scholar] [CrossRef]
Ullmann, J.R. An algorithm for subgraph isomorphism. J. ACM (JACM) 1976, 23, 31–42. [Google Scholar] [CrossRef]
Cordella, L.P.; Foggia, P.; Sansone, C.; Vento, M. A (sub) graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell. 2004, 26, 1367–1372. [Google Scholar] [CrossRef]
Shang, H.; Zhang, Y.; Lin, X.; Yu, J.X. Taming verification hardness: An efficient algorithm for testing subgraph isomorphism. Proc. VLDB Endow. 2008, 1, 364–375. [Google Scholar] [CrossRef]
He, H.; Singh, A.K. Graphs-at-a-time: Query language and access methods for graph databases. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, Vancouver, BC, Canada, 9–12 June 2008; pp. 405–418. [Google Scholar]
Zhao, P.; Han, J. On graph query optimization in large networks. Proc. VLDB Endow. 2010, 3, 340–351. [Google Scholar] [CrossRef]
Han, W.S.; Lee, J.; Lee, J.H. Turboiso: Towards ultrafast and robust subgraph isomorphism search in large graph databases. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, New York, NY, USA, 22–27 June 2013; pp. 337–348. [Google Scholar]
Sun, S.; Sun, X.; Che, Y.; Luo, Q.; He, B. Rapidmatch: A holistic approach to subgraph query processing. Proc. VLDB Endow. 2020, 14, 176–188. [Google Scholar] [CrossRef]
Chang, J.S.; Luo, Y.F.; Su, K.Y. GPSM: A generalized probabilistic semantic model for ambiguity resolution. In Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics, Newark, DE, USA, 28 June–2 July 1992; pp. 177–184. [Google Scholar]
Sun, S.; Luo, Q. In-memory subgraph matching: An in-depth study. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, Portland, OR, USA, 14–19 June 2020; pp. 1083–1098. [Google Scholar]
Dann, J.; Götz, T.; Ritter, D.; Giceva, J.; Fröning, H. GraphMatch: Subgraph Query Processing on FPGAs. arXiv 2024, arXiv:2402.17559. [Google Scholar] [CrossRef]
Kamada, T.; Kawai, S. An algorithm for drawing general undirected graphs. Inf. Process. Lett. 1989, 31, 7–15. [Google Scholar] [CrossRef]
Yang, D.; Ge, Y.; Nguyen, T.; Molitor, D.; Moorman, J.D.; Bertozzi, A.L. Structural equivalence in subgraph matching. IEEE Trans. Netw. Sci. Eng. 2023, 10, 1846–1862. [Google Scholar] [CrossRef]
Sun, X.; Sun, S.; Luo, Q.; He, B. An in-depth study of continuous subgraph matching. Proc. VLDB Endow. 2022, 15, 1403–1416. [Google Scholar] [CrossRef]
Yiu, M.L.; Papadias, D.; Mamoulis, N.; Tao, Y. Reverse nearest neighbors in large graphs. IEEE Trans. Knowl. Data Eng. 2006, 18, 540–553. [Google Scholar] [CrossRef]
Wang, X.; Chen, W.; Yang, Y.; Zhang, X.; Feng, Z. Research on Knowledge Graph Partitioning Algorithms: A Survey. Chin. J. Comput. 2021, 44, 235–260. [Google Scholar]
Liu, G.; Inae, E.; Zhao, T.; Xu, J.; Luo, T.; Jiang, M. Data-centric learning from unlabeled graphs with diffusion model. Adv. Neural Inf. Process. Syst. 2024, 36, 21039–21057. [Google Scholar]
Lan, Z.; Yu, L.; Yuan, L.; Wu, Z.; Niu, Q.; Ma, F. Sub-gmn: The neural subgraph matching network model. In Proceedings of the 2023 16th IEEE International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Taizhou, China, 28–30 October 2023; pp. 1–7. [Google Scholar]
Jian, X.; Li, Z.; Chen, L. Suff: Accelerating subgraph matching with historical data. Proc. VLDB Endow. 2023, 16, 1699–1711. [Google Scholar] [CrossRef]
Borgwardt, S.; Viss, C. A polyhedral model for enumeration and optimization over the set of circuits. Discret. Appl. Math. 2022, 308, 68–83. [Google Scholar] [CrossRef]
He, J.; Chen, Y.; Liu, Z.; Li, D. Optimizing subgraph retrieval and matching with an efficient indexing scheme. Knowl. Inf. Syst. 2024, 66, 6815–6843. [Google Scholar] [CrossRef]
Sun, Z.; Zhou, X.; Li, G. Learned index: A comprehensive experimental evaluation. Proc. VLDB Endow. 2023, 16, 1992–2004. [Google Scholar] [CrossRef]
Yu, M.M.; Chen, L.H. Productivity change of airlines: A global total factor productivity index with network structure. J. Air Transp. Manag. 2023, 109, 102403. [Google Scholar] [CrossRef]
Gaihre, A.; Wu, Z.; Yao, F.; Liu, H. XBFS: eXploring runtime optimizations for breadth-first search on GPUs. In Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing, Phoenix, AZ, USA, 22–26 June 2019; pp. 121–131. [Google Scholar]
Bhattarai, B.; Liu, H.; Huang, H.H. Ceci: Compact embedding cluster index for scalable subgraph matching. In Proceedings of the 2019 International Conference on Management of Data, Amsterdam, The Netherlands, 30 June–5 July 2019; pp. 1447–1462. [Google Scholar]
Chen, K.; Liu, S.; Zhu, T.; Qiao, J.; Su, Y.; Tian, Y.; Zheng, T.; Zhang, H.; Feng, Z.; Ye, J.; et al. Improving expressivity of gnns with subgraph-specific factor embedded normalization. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Long Beach, CA, USA, 6–10 August 2023; pp. 237–249. [Google Scholar]
Liu, T.; Li, D. Endgraph: An efficient distributed graph preprocessing system. In Proceedings of the 2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS), Bologna, Italy, 10–13 July 2022; pp. 111–121. [Google Scholar]
Turner, M.; Berthold, T.; Besançon, M.; Koch, T. Cutting plane selection with analytic centers and multiregression. In Proceedings of the International Conference on Integration of Constraint Programming, Artificial Intelligence, and Operations Research, Nice, France, 29 May–1 June 2023; Springer: Berlin/Heidelberg, Germany, 2023; pp. 52–68. [Google Scholar]
Hu, M.; Zhou, Y. Dynamic type matching. Manuf. Serv. Oper. Manag. 2022, 24, 125–142. [Google Scholar] [CrossRef]
Bi, F.; Chang, L.; Lin, X.; Qin, L.; Zhang, W. Efficient subgraph matching by postponing cartesian products. In Proceedings of the 2016 International Conference on Management of Data, San Francisco, CA, USA, 26 June–1 July 2016; pp. 1199–1214. [Google Scholar]
Levinas, I.; Scherz, R.; Louzoun, Y. BFS-based distributed algorithm for parallel local-directed subgraph enumeration. J. Complex Netw. 2022, 10, cnac051. [Google Scholar] [CrossRef]
Ren, X.; Wang, J. Exploiting vertex relationships in speeding up subgraph isomorphism over large graphs. Proc. VLDB Endow. 2015, 8, 617–628. [Google Scholar] [CrossRef]
Qin, Y.; Wang, X.; Hao, W.; Liu, P.; Song, Y.; Zhang, Q. OntoCA: Ontology-Aware Caching for Distributed Subgraph Matching. In Proceedings of the Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data, Nanjing, China, 25–27 August 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 527–535. [Google Scholar]
Leskovec, J.; Krevl, A. SNAP Datasets: Stanford Large Network Dataset Collection. 2014. Available online: http://snap.stanford.edu/data (accessed on 10 April 2025).

Figure 1. Overall architecture diagram.

Figure 2. Example of duplicate lookup of equivalent nodes.

Figure 3. Original graph.

Figure 4. Strict graph compression techniques.

Figure 5. The comparison of experimental result on the DBLP dataset.

Figure 6. The comparison of experimental results on the YouTube dataset.

Figure 7. The comparison of the experimental result on the Human dataset.

Table 1. Representative subgraph matching methods.

Community	Model	Category	Methodology	Algorithms/System
Dataset			Direct enumeration	Ullmann [10], VF2 [11], GraphQL [13]
	Exploration	Backtracking search	Offline-index	GADDI, SPath [14]
			Online-index enumeration	CFL, CECI, DP_ISO
	Join	Multi-way join	Pair-wise join	postgreSQL, Neo4j, GPSM [17]
			Worst-case optional join	LogicBlox, GraphFlow, EmptyHeaded

Table 2. Some necessary concepts and symbols used in our work.

Notations	Descriptions
q,G	Query graph and data graph
V(g),E(g),Σ	Vertex set, edge set, and label set of a graph g
d(u),L(u),N(u)	Degree, label, and neighbor vertex
e(u,v)	The edge between u and vs.
E(q_t),E(q_nt)	Tree edge and non-tree edge
C_u	Set of candidate vertices of u in C
φ and φ′	Match order and index order
$N_{-}^{φ^{'}} (u)$ , $N_{+}^{φ^{'}} (u)$	Neighbors of u before(after) u in φ

Table 3. Experimental environment.

Experimental Environment	Setting
CPU	Intel(R) Core(TM) i3-6100
Main frequency	3.70 GHz
Random access memory (RAM)	7.7 GB
Disk capacity	1 T
System type	64 bit
Operating system	Ubuntu 22.04.2 LTS
Programming environment	Microsoft VS Code1.84.1
Programming language	C++

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chai, Y.; Li, J.; Zhang, Q. Accelerating Subgraph Matching Through Advanced Compression and Label Filtering. Algorithms 2025, 18, 541. https://doi.org/10.3390/a18090541

AMA Style

Chai Y, Li J, Zhang Q. Accelerating Subgraph Matching Through Advanced Compression and Label Filtering. Algorithms. 2025; 18(9):541. https://doi.org/10.3390/a18090541

Chicago/Turabian Style

Chai, Yanfeng, Jiashu Li, and Qiang Zhang. 2025. "Accelerating Subgraph Matching Through Advanced Compression and Label Filtering" Algorithms 18, no. 9: 541. https://doi.org/10.3390/a18090541

APA Style

Chai, Y., Li, J., & Zhang, Q. (2025). Accelerating Subgraph Matching Through Advanced Compression and Label Filtering. Algorithms, 18(9), 541. https://doi.org/10.3390/a18090541

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Accelerating Subgraph Matching Through Advanced Compression and Label Filtering

Abstract

1. Introduction

2. Background

2.1. Preliminaries

2.2. Related Work

3. Adaptive Subgraph Matching Architecture

3.1. Compressed Graph Nodes (CGNs) Algorithm

3.2. Efficiency Filtering Mechanisms

3.3. Adaptive Subgraph Matching (ASM) Algorithm

4. Experiments

4.1. Experimental Setting

4.1.1. Experimental Environment

4.1.2. Datasets

4.1.3. Evaluation Metrics

4.2. Evaluations on Execution Time

4.3. Relative Performance Comparison Evaluation

4.3.1. Evaluations on the Counts of Callbacks

4.3.2. Evaluations on the Average Callbacks Time

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI