Next Article in Journal
Leveraging Symmetry in Multi-Agent Code Generation: A Cross-Verification Collaboration Protocol for Competitive Programming
Previous Article in Journal
Assessing Voluntary Guardianship and Personal Autonomy Using a Circular q-Rung Orthopair Fuzzy CoCoFISo Decision Framework
Previous Article in Special Issue
A Lightweight Batch Authenticated Key Agreement Scheme Based on Fog Computing for VANETs
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Logical Characterization for Approximate Matching of Pattern Graphs with Regular Expressions

1
Faculty Development Center, Wenzhou Polytechnic, Wenzhou 325035, China
2
School of Intelligent Manufacturing, Wenzhou Polytechnic, Wenzhou 325035, China
3
School of Computer and Information Security, Guilin University of Electronic Science and Technology, Guilin 541004, China
*
Author to whom correspondence should be addressed.
Symmetry 2025, 17(10), 1659; https://doi.org/10.3390/sym17101659 (registering DOI)
Submission received: 30 June 2025 / Revised: 5 September 2025 / Accepted: 11 September 2025 / Published: 5 October 2025
(This article belongs to the Special Issue Applications Based on Symmetry in Applied Cryptography)

Abstract

A graph simulation and its variants are widely used in graph pattern matching. Among them, there have been related works involving the addition of regular expressions to graph patterns, which can discover more meaningful data and solve problems in polynomial time. In this research, which is based on Fan’s investigations, we first propose an approximation of graph simulation using the concept of metric and formal verification techniques, and then give the definition of approximate matching between pattern graphs with regular expressions and data graphs, which introduces a symmetric tolerance for errors, bridging exact and approximate matching. Finally, we present a logical characterization of the approximate graph simulation by extending Hennessy–Milner logic.

1. Introduction

Finding all matches in a data graph for a specific pattern graph, also known as graph pattern matching (GPM), is a fundamental challenge in [1]. Series applications, such as data analysis, biological network investigation, and intelligence analysis, are based on GPM [2,3,4,5,6]. There are lots of methods to focus on solving GPM problems, among which graph simulation [7,8,9,10] and subgraph isomorphism [11] are the two most widely known and used. The former performs better in capturing useful information than the latter due to its flexibility. Moreover, it is solvable in polynomial time.
Numerous extensions to graph simulation have been created to fulfill the specific needs of practical use. These include, for instance, strong simulation [12], which captures topology, and simulation for lattice-valued systems [13], among several others [1,14,15,16]. Among them, Fan et al. [10] added regular expressions as the specifications to edges in the pattern graph, expressing a data graph’s connection using several sorts of edges. Afterward, they solved the GPM by remodifying the notion of graph simulation. In addition, they demonstrated that the enhanced expressive capacity is not accompanied by an increase in complexity, i.e., it is amenable to cubic time resolution.
Recent hybrid approaches combining formal methods with machine learning [17,18] and logic-based graph analysis [19] have shown promise in balancing interpretability and scalability. However, a principled logical foundation for approximate graph pattern matching with regular expressions, which is our focus, remains underexplored. Our work bridges this gap by introducing a metric-based approximation with a sound and complete logical characterization.
However, this expressiveness comes at the cost of rigidity; even minor data inconsistencies cause matches to fail. This limitation, combined with the increasing noise and scale of real-world graph data, has spurred research in two directions. One dominant direction leverages data-driven approaches like Graph Neural Networks (GNNs) [20] and deep learning [21] to learn approximate similarities. While powerful for finding statistical patterns, these methods often lack interpretability and verifiable guarantees. The other direction, extending the formal foundations of graph matching, has seen less progress since Fan et al.’s work in 2012 [10]. This 13-year gap highlights a significant opportunity: developing formally sound, approximate matching techniques that retain interpretability and guarantees while accommodating real-world imperfection. Our work addresses this gap by introducing a metric-based approximation framework, providing a principled alternative to data-driven black boxes in domains where understanding the ‘why’ behind a match is critical.
This work makes several key contributions that distinguish it from prior art: (1) Unlike the exact matching of Fan et al. [10], we introduce a metric-based framework for δ-approximate graph simulation with regular expressions, the first to handle both structural and label noise in this context. (2) While Cui et al. [22] proposed approximate simulation for transition systems, we generalize the approach to attributed graphs with complex patterns containing regular expressions. (3) Most significantly, we propose a novel modal logic, R E H M L δ , and provide the first sound and complete logical characterization for approximate graph simulation with regular expressions, establishing a crucial bridge between semantic relations and syntactic logic for this problem.
The traditional graph simulation and its variants are all exact. In other words, they do not permit any error. They can only determine whether or not a node v that exists in a data graph simulates a node u that exists in a pattern graph.
That is, they can only answer whether a node v in a data graph simulates or does not simulate a node u in a pattern graph. However, in the real world, due to data noise or errors arising from data collection, there exist some nodes that approximately simulate the node in a pattern graph. By traditional graph simulation and its variants, these nodes will be ignored, which leads to a few potential results that can help researchers conduct the data analysis being missed. Based on the work of [10], we offer an approximative graph simulation based on the concept of metric [23,24] that makes use of formal verification approaches [12,13,25,26,27,28,29,30] to handle this problem.
While investigating simulation or its variants, scholars usually consider their logical characterization, connecting these relations with logic. Following the seminal study of Hennessy and Milner [31], a significant body of work has focused on characterizing different kinds of simulations through modal logics [27,30,32,33,34,35,36,37]. This includes characterizations for probabilistic systems [32,33], fuzzy transition systems [30,35], and weighted systems [27,36], demonstrating the well-established relationship between simulation relations and logic. Generally speaking, logic characterizes a relation if the characterization satisfies soundness and completeness: (1) the validity of logical formulas can be demonstrated by the relations; and (2) the power of expression between the logic and the relation is the same. Consequently, a soundness and completeness characterization transforms the challenge of determining whether two nodes are simulated into a logical judgment of whether one node satisfies all formulas satisfied by another node, and it can profit from traditional logic theories and particular instruments. Unfortunately, the associated work on the approximation graph simulation’s logical characterization is absent. As a result, using Hennessy–Milner logic, which has a very pleasant relationship with the concept of simulation, we offer a logical characterization of the approximation graph simulation.
Recent studies have explored the fundamental connections between graph neural networks and logical expressiveness [19], and have developed neural frameworks incorporating logical rules [18]. These hybrid approaches show promise in balancing interpretability and scalability. However, a principled logical foundation for approximate graph pattern matching with regular expressions, which is our focus, remains underexplored. Our work bridges this gap by introducing a metric-based approximation with a sound and complete logical characterization.
This paper is structured as follows: In Section 2, some preliminary information is provided. Then, Section 3 introduces the concept of approximate graph simulation. Afterward, we proposed a logical characterization of the approximate graph simulation in Section 4 and concluded the study in Section 5.

2. Preliminaries

In this part, we review the basics about regular expressions, data graphs, pattern graphs, and metrics. Before recalling the above basic facts, we first introduce some notations used throughout the paper. The content below can be referred to [10,23,38].
Table 1 summarizes the key notations used in this paper.
Let Σ be a finite set. Σ represents the set of finite strings on Σ . We write P Σ for the power set of Σ . The concatenation of two strings ρ = a 1 a n and σ = b 1 b m is the string ρ σ = a 1 a n b 1 b m . ρ = n denotes the length of ρ . Moreover, for 0 i n , we denote ρ i as the i -th position of ρ . Let Π 0 , Π 1 Σ . Π 0 Π 1 = ρ σ Σ : ρ Π 0 , σ Π 1 is the concatenation of Π 0 and Π 1 .
The syntax of regular expression ω over Σ is defined as below:
ω 1 , ω 2 : ε a a k a + ω 1 ω 2 | ω 1 + ω 2 ,
where a Σ . We write ϖ Σ to represent the set of all regular expressions over Σ . The language L ω of a regular expression ω ϖ Σ is defined indutively by the following:
(1)
L ε = ε ;
(2)
L a = a ;
(3)
L a k = ε , a , a a , , a k a ;
(4)
L a + = a , a a , a a a , ;
(5)
L ω 1 ω 2 = L ω 1 L ω 2 ;
(6)
L ω 1 + ω 2 = L ω 1 L ω 2 .
We next review the concepts of data graphs and pattern graphs as show in Figure 1.
A data graph is a triple G = V , E , f A , where (1) V is a finite set of nodes; (2) E V × Σ × V is a set of edges, where Σ is an alphabet; and (3) f A is a function defined on V , and f A v is a tuple ( A 1 = a 1 , , A n = a n ) with v V , where A i is referred to as an attribute of v and a i is a constant ( 1 i n ), written as v .   A i = a i . Intuitively, an edge v , a , v 0 represents that there is an a -labeled edge from v to v 0 . We denote an edge v , a , v 0 E by v a v 0 . In the latter, we write G for G = V , E , f A .
In a similar way, a pattern graph is defined as a directed graph P = V P , E P , f V , where (1) V P is the set of its nodes; (2) E P V P × ϖ Σ × V P is a set of edges, where ϖ Σ is the set of all regular expressions over Σ ; (3) f V is a function defined on V P , and f V u is the predicate of u with u V P , which is the combination of atomic expressions in A op a , where A is an attribute, a is a constant, and op is an expression for comparison operator < , , = , , > , . We use u ω u 0 to replace u , ω , u 0 E P . In the latter, we write P = V P , E P , f V by P .
A path from a node v to a node v n in G is a finite sequence π = v a 1 v 0 a 2 v 1 a n v n ; for simplicity, it will be replaced by π = v ρ v n , where ρ = a 1 a 2 a n Σ .
Let v V and u V P ; v satisfies the search condition of u , denoted as v u , if for each atomic formula ‘ A op a ’ in f V u , there exists an attribute A in f A v such that v .   A op a .
We end this section with the notion of metrics. A function d : Σ × Σ R is a metric over Σ if for all x , y , z Σ
(1)
d x , y 0 , d x , y = 0 iff x = y ;
(2)
d x , y = d y , x ;
(3)
d x , z d x , y + d y , z .
The pair Σ , d is called a metric space. For clarity, we write Σ instead of Σ , d . The metric-based distance function adheres to symmetry, ensuring unbiased approximate matching.

3. Approximate Graph Pattern Matching

In this part, we initially define a concept for an approximate matching between an approximate simulation data graph and a pattern graph; then, we give a formal definition of an exact match between the graphs of the data and the pattern; we finally provide a formal definition of an approximate simulation definition.
The latter definitions are defined on P and G , and we assume u V P and v V , as well as in those of the following sections.
Definition 1 
([10]). A relation  S V P × V  is called a graph simulation if for any  u , v S : (1)  v u ; and (2) for each  u ω u 0 , there exists a nonempty path  v ρ v 0  in  G  such that  u 0 , v 0 S , and  ρ L ω . We call that  v  simulates  u  if there exists a graph simulation  S  such that  u , v S .
The following defines the concept of the exact match.
Definition 2 
([10]). A data graph  G  matches a pattern graph  P  via graph simulation, denoted by  P G , if there exists a simulation  S V P × V  such that, for all  u V P , there exists  v V  with  u , v S . Then,  S  is the graph simulation matching of  G  for  P .
Example 1. 
We have known the following:
v 0 u 0 , v 1 u 1 , v 2 u 2 a n d v 4 u 4 .
We can find a relation R = u 0 , v 0 , u 1 , v 1 , u 2 , v 2 , u 4 , v 4 that satisfies the simulation’s conditions. Therefore, it can be concluded that graph G matches graph P .
In order to define the approximate matching relationship between data graphs and pattern graphs, we need to introduce an approximate simulation relationship. To achieve this, we need to incorporate string distance and define the distance between a string and a regular expression language based on it. Unlike the traditional definition of string distance, this paper introduces the form of discount to define it. The discount represents the distance between events happening i steps into the future multiplied by α i , where α represents the discount factor, which ranges from 0,1 . This form of discount definition is commonly used in game theory and optimal control. With the discount, the differences in future behaviors are weighted less compared to the differences in current or recent behaviors. The following is the definition of string distance.
Let Σ , d be a metric space, and α 0,1 . A function s d : Σ × Σ R + is the string distance of string, defined as follows:
        s d ρ , σ = m a x 1 i ρ α i 1 d ρ i , σ i i f ρ = σ , + o t h e r w i s e ,
for all strings ρ , σ Σ .
Proposition 1 
([22]).  s d  is a metric on the set  Σ .
The distance between a string ρ Σ  and the language L ω Σ  of a regular expression ω ϖ Σ  is defined as follows:
      d ρ , L ω = i n f σ L ω s d ρ , σ .
Proposition 2 
([22]). The following properties hold the following:
(1)
d ρ , L ω 0 , for all  ρ Σ  and  ω ϖ Σ .
(2)
ρ L ω  iff  d ρ , L ω = 0 , where  ρ Σ ,  ω ϖ Σ .
Based on the above Propositions 1 and 2, the concept of approximate graph simulation will be introduced, called δ -simulation.
Definition 3. 
A relation  S δ V P × V  is called a  δ  -simulation if for any  u , v S δ  such that the following occurs:
(1)
v u ;
(2)
for each  u ω u 0 , there exists a nonempty path  v ρ v 0  in  G  such that  u 0 , v 0 S , and  d ρ , L ω δ .
For a given δ 0 , v   δ -simulates u  if there exists a δ -simulation S δ  such that u , v S δ .
Following this, we can define approximative graph pattern matching.
Definition 4. 
A data graph  G   δ -approximately matches a pattern graph  P  via  δ -simula tion, denoted by  P δ G , if there exists a  δ -simulation  S δ V P × V  such that, for all  u V P , there exists  v V  with  u , v S δ . Then,  S δ  is a  δ -approximate match in  G  for  P  via  δ -simulation.
Example 2. 
The diagrams of  P  and  G 1  are depicted in Figure 2. We have known the following:
        v 0 u 0 , v 1 u 1 , v 2 u 2 a n d v 4 u 4 .
Let Σ = a , b , c , e , d a , b = d b , c = 0.3 , d a , e = d b , e = 0.2 , d a , c = 0.4 , and d c , e = 0.5 . We can find a relation R = u 0 , v 0 , u 1 , v 1 , u 2 , v 2 , u 4 , v 4 which is a 0.5-simulation by Definition 2. Therefore, we can conclude that the data graph G 1 0.5-approximately matches the pattern graph P .
Consider the pair u 0 , v 0 . Condition (1) holds as v 0 u 0 . For condition (2), consider the pattern edge u 0 a u 1 . In G1, v0 has a path v 0 a v 1 . We calculate d*(a, L(a)) = 0 ≤ 0.5. Furthermore, u 1 , v 1 is in R. Now consider the more critical edge u 0 b u 2 . v0 has a path v 0 e v 2 . We have d(e, b) = 0.2, and since |ρ| = |σ| = 1, sd(e, b) = α0 * d(e,b) = 1 * 0.2 = 0.2. Thus, d*(e, L(b)) = inf{sd(e, b)} = 0.2 ≤ 0.5. The target node v 2 is related to u 2 . Therefore, the edge is satisfied. A similar check for other edges confirms R is a 0.5-simulation.

4. A Logical Characterization

In this part, we provide the logical characterization of the approximate graph simulation introduced in Section 3, which is motivated by the well-established characterization of simulation based on Hennessy–Milner logic (HML).
Let Σ , d be a metric space and δ 0 ; the modal logic R E H M L δ is the set of the following formulas, whose syntax is defined as shown below:
φ 1 , φ 2 : t t f f φ 1 φ 2 φ 1 φ 2 ω , δ φ 1 | ω , δ φ 1 ,
where ω ϖ Σ .
The interpretation of R E H M L δ formulas differs fundamentally between pattern and data graphs, reflecting their different structures. Intuitively, in a pattern graph P, an edge u ω u 0 is a specification: it demands the existence of a path whose label is in L(ω). In a data graph G, we look for a concrete path v ρ v 0 and measure how far its label ρ is from the specification L(ω). For example, the formula a b , 0.5 tt is satisfied by a pattern node u if it has an outgoing ab-labeled edge. A data node v satisfies the same formula if it has an outgoing path whose label (e.g., ‘ac’) is within a distance of 0.5 from the string ‘ab’, according to the metric d.
R E H M L δ formulas are interpreted over the nodes of pattern graphs or data graphs. For a R E H M L δ formula ω , δ φ 1 or ω , δ φ 1 , it is inconvenient to give the same semantic interpretation in pattern graphs and data graphs, since the labels on the edges are regular expressions and elements of the alphabet Σ , respectively. Therefore, we consider giving the different semantic interpretations in pattern graphs and data graphs for a given R E H M L δ formula.
Satisfaction of a R E H M L δ   φ by a node u P , notation u φ , is defined inductively by the following:
u t t , u f f , u φ 1 φ 2 i f f u φ 1 a n d u φ 2 , u φ 1 φ 2 i f f u φ 1 o r u φ 2 , u ω , δ φ 1 i f f t h e r e e x i s t s u 0 V P s u c h t h a t u ω u 0 a n d u 0 φ 1 , a n d u ω , δ φ 1 i f f u 0 φ 1 f o r e a c h u 0 s u c h t h a t u ω u 0 .
Satisfaction of a R E H M L δ φ by a node v G , notation v φ , is defined inductively by the following:
v t t , v f f , v φ 1 φ 2 i f f v φ 1 a n d v φ 2 , v φ 1 φ 2 i f f v φ 1 o r v φ 2 , v ω , δ φ 1 i f f t h e r e e x i s t s v 0 V a n d ρ Σ s u c h t h a t v ρ v 0 , d ρ , L ω δ , v 0 φ 1 , a n d v ω , δ φ 1 i f f v 0 φ 1 f o r e a c h v 0 s u c h t h a t v ρ v 0   a n d d ρ , L ω δ .
Notice that, for a given R E H M L δ formula φ = ω , δ φ 1 , if ω = a and δ = 0 , where a is an arbitrary element in Σ , then the formula φ reduces to the established semantic interpretation of HML. The situation of φ = ω , δ φ 1 is similar to the situation of φ = ω , δ φ 1 .
Example 3. 
There are some examples of  R E H M L δ   formulas over P  or  G  of Figure 2:
(1)
φ 1 = a 2 b , 0.6 t t  expresses the property that there exist nodes that have an outgoing  a 2 b -labeled edge in the pattern graph and have a path  π = v ρ v 0  with  d ρ , L a 2 b 0.6  in the data graph. We can find that in pattern graph  P u 0 a 2 b , 0.6 t t  and  v 0 a 2 b , 0.6  in data graph  G 1 .
(2)
For a formula  φ 2 = c , 0.6 a + , 0.4 t t , we can obtain that  u 0 φ 2  in pattern graph  P  and  v 0 φ 2  in data graph  G 1 .
Given an R E - H M L δ \ formula φ , for cases φ = t t , φ = f f , φ = φ 1 φ 2 , and φ = φ 1 φ 2 the semantics of φ on the data graph and the pattern graph are indistinguishable. However, when the formula φ = ω , δ φ 1 or φ = ω , δ φ 1 is considered, it becomes apparent that there is a significant difference in the semantics of Y on the data graph and the pattern graph. In the pattern graph, the semantics of φ are independent of parameter g, but in the data graph, the semantics of φ depend on parameter g. One reason for this difference is the distinct edge labels mentioned earlier for the data graph and the pattern graph. In the pattern graph, the edge labels are regular expressions, and the objects inside operators and [ ] are also regular expressions, making the semantics interpretable using classical Hennessy–Milner Logic (HML). However, the edge labels in the data graph are single characters, which, although they can be seen as regular expressions, are not interpretable using classical HML semantics. Therefore, to utilize R E - H M L δ \ logic formulas for characterizing δ -approximate simulation relations in subsequent sections, some adjustments need to be made to the semantics interpretation of formula φ on the data graph.
For o p , , , , we use R E H M L δ \ o p to denote the set of formulas for which the operator op is not used.
The following theorems show that an approximative graph simulation can be sufficiently characterized by a set of   R E H M L δ logics.
Theorem 1. 
v   δ -simulates u  iff v u  and for each φ R E - H M L δ \ , if u φ  then v φ .
Proof Sketch: The proof has two parts: (⇒) Soundness: We assume v δ-simulates u and show by induction on the structure of φ that u ⊨ φ implies v ⊨ φ. (⇐) Completeness: We assume the logical condition holds and construct a relation R that we prove is a δ-simulation. The key step is showing that for every pattern edge from u, a suitable path exists from v; if not, we construct a formula φ that u satisfies but v does not, leading to a contradiction.
Proof. 
First, we prove the soundness. Assume that v  δ-simulates u and u φ for some formula φ R E H M L δ \ . We show that v φ by induction on the structure of φ .
  • φ = t t : For each v V , v t t .
  • φ = φ 1 φ 2 : Then u φ 1 φ 2 . We can obtain that u φ 1 and u φ 2 . By induction hypothesis, v φ 1 and v φ 2 . Hence, v φ 1 φ 2 .
  • φ = φ 1 φ 2 : Then u φ 1 φ 2 . We can obtain that u φ 1 or u φ 2 . By induction hypothesis, v φ 1 or v φ 2 . Therefore, v φ 1 φ 2 .
  • φ = ω , δ φ 1 : Then u ω , δ φ 1 . We know that there exists u ω u 0 and u 0 φ 1 . Since v   δ -simulates u , there exists v ρ v 0 hold d ρ , L ω δ and v 0   δ -simulates u 0 . By induction hypothesis, v 0 φ 1 . Hence, v ω , δ φ 1 .
To show completeness, assume that v u and for each φ R E - H M L δ \ , if u φ then v φ , where u V P and v V . Let R = { u , v V P × V : v u a n d i f u φ t h e n v φ f o r a l l φ R E H M L δ \ } .
It is sufficient to prove that the above relation R is a δ -simulation.
Assume that u , v R and u ω u . We need to argue that there exists a node v such that v ρ v , d ρ , L ω δ , and u , v R .
Now assume, towards a contradiction, that there is no v such that v ρ v , d ρ , L ω δ , and u , v R . Let v 0 , , v n be the set of nodes which there exists v ρ i v i such that v i u and d ρ i , L ω δ with i 1 , , n . By our assumption, none of the nodes in the above set satisfies that u φ implies v i φ . Thus, for each i 1 , , n , there is a formula φ i holding the following:
u φ i a n d v i φ i .
It is obvious that φ = ω , δ φ 1 φ n will be a formula satisfied by u but not by v , contradicting our assumption that v u , and if u φ then v φ . The proof of the theorem is now complete. □
Complexity Analysis. The algorithm for checking the δ-simulation relation involves iteratively refining a candidate relation. For each pair (u, v) and each pattern edge u ω u 0 , we must find a path v ρ v 0 such that d ρ , L ω δ . The cost of checking the string-distance condition is polynomial. In the worst case, the overall time complexity is O ( | V P | · | V | · | E P | · | E | · | Σ | · L ) , where L is an upper bound on the length of paths.
Theorem 2. 
v   δ -simulates  u  iff  v u  and for each  φ R E - H M L δ \ , if  v φ  then  u   φ .
Proof. 
First, we show the soundness. Let u V P , v V , v   δ -simulates u and φ R E H M L δ . We show that v φ implies u φ by structural induction on φ . The cases of t t , f f , conjunction, and disjunction are similar to Theorem 1. Now, we consider the case of φ = ω , δ φ 1 as follows.
Assume that u ω u 0 for some u 0 . We wish to show that u 0 φ 1 . Now, since v   δ simulates u and u ω u 0 , there exists a v ρ v 0 such that v 0   δ -simulates u 0 . By our assumption that v ω , δ φ 1 , we have that v 0 φ 1 . The inductive hypothesis yields that u 0 φ 1 . Therefore, each u 0 such that u ω u 0 satisfies φ 1 and we can conclude that u ω , δ φ 1 , which is to be shown.
For completeness, we define
R = { u , v V P × V : v u a n d i f v φ t h e n u φ f o r a l l φ R E H M L δ } .
It suffices to prove that R is a δ -simulation.
Assume that u , v R and u ω u . We need to argue that there exists a node v such that v ρ v , d ρ , L ω δ , and u , v R .
Now assume, towards a contradiction, that there is no v such that v ρ v , d ρ , L ω δ and u , v R . Let v 0 , , v n be the set of nodes which there exists v ρ i v i such that v i u and d ρ i , L ω δ with i 1 , , n . By our assumption, none of the nodes in the above set satisfies that v i φ implies u φ . Thus, for each i , the following:
u φ i a n d v i φ i .
It is obvious that φ = ω , δ φ 1 φ n will be a formula satisfied by v but not by u , contradicting our assumption that v u and if v φ then u φ . □
Conclusion 1. 
The following statements hold the same equivalence for   v V  and  u V P :
  • v   δ -simulates u .
  • v u and for each φ R E - H M L δ \ , if u φ then v φ .
  • v u and for each φ R E - H M L δ , if v φ then u φ .

Corollaries and Limitations

Corollary 1. 
When δ = 0,  R E H M L 0  characterizes exact graph simulation [10], as d*(ρ, L(ω)) = 0 iff ρ ∈ L(ω).
Corollary 2. 
The δ-simulation relation is reflexive and transitive but not symmetric.
The following counterexample demonstrates the lack of symmetry. Let Σ = {a, b} with d(a, b)=1. Let P have one node u with a self-loop u a u . Let G have one node v with a self-loop v b v . For δ ≥ 1, v δ-simulates u because d*(b, L(a)) = d(b, a) = 1 ≤ δ. However, u does not δ-simulate v for any δ < 1, because d*(a, L(b)) = d(a, b) = 1 > δ.
This framework has several limitations: (1) The performance is highly sensitive to the choice of the distance metric d and the tolerance parameter δ, which may require domain expertise to tune. (2) The polynomial-time complexity, while tractable, can be high for very large graphs due to the path exploration and distance calculation. (3) The current framework approximates edge labels but requires exact matching of node attributes ( v u ) ; extending it to approximate attribute matching is future work.
Next, we explain the conclusion visually with an example.
Example 4. 
In Figure 2, we have known that  v 0 u 0 , v 0 u 0 , v 1 u 0 a n d v 3 u 0 .   R = u 0 , v 0 , u 0 , v 0 , u 0 , v 1 , u 0 , v 3  is a 0.5-simulation by Definition 2, and the data graph  G 1  0.5 approximately matches the pattern graph  P . Based on Conclusion 1, we might as well let  u 0 ϕ  and  v 0 ϕ . In the case of  φ R E - H M L δ \ , we have the following:
u 0 c , 0.6 t t a 2 b , 0.6 t t , a n d v 0 c , 0.6 t t a 2 b , 0.6 t t
Then, in the case of  φ R E  -  H M L δ , we have the following:
v 0 a 2 b , 0.6 ϕ , b u t u 0 a 2 b , 0.6 ϕ
The diagrams of  P  and  G 2  are depicted in Figure 3. We removed the edge  v 0  to  v 0  from the data graph, otherwise unchanged. So, we have the following:
v 0 u 0 , v 1 u 0 a n d v 3 u 0 , b u t v 0 u 0 .
We can find a relation  R = u 0 , v 0 , u 0 , v 1 , u 0 , v 3  . We will show that  v 0  and  u 0  do not satisfy Conclusion 1 in Figure 3. We first discuss the case of  φ R E  -  H M L δ \  . We have the following:
u 0 c , 0.6 t t , b u t v 0 c , 0.6 t t
Then, we discuss the case of  φ R E  -  H M L δ  . We have the following:
v 0 a 2 b , 0.6 t t , b u t u 0 a 2 b , 0.6 t t
v 0  and  u 0  do not satisfy the same formula in either case, so  v 0 u 0 .

5. Case Studies

5.1. Empirical Evaluation and Complexity Analysis

To evaluate the practical utility of δ-simulation, we conducted experiments on a synthetic graph dataset (DS1) containing 10,000 nodes and approximately 85,000 edges. We measured the number of matches found and the precision for varying values of δ (0.0, 0.5, 1.0, 1.5, and 2.0) against 100 randomly generated pattern graphs, and exact matching (δ = 0.0) found matches for 65% of patterns. As δ increased to 1.0 and 2.0, the success rate rose to 92% and 98%, respectively, demonstrating the method’s ability to recover meaningful results missed by exact matching. However, manual verification showed precision decreased from 100% at δ = 0.0 to 75% at δ = 2.0, highlighting the tunable trade-off between recall and precision. Runtime measurements on graphs of varying sizes confirmed the polynomial time complexity analyzed in Section 4.
While simplified, this pattern captures a realistic query intent: finding a group of users (researchers) with a specific internal consensus and a specific antagonistic relationship with another group (programmers), mediated by interactions with other roles (doctors). Such patterns are relevant in studying echo chambers, cross-community influence, and targeted marketing.

5.2. Illustrative Example in Social Network Context

This section considers an instance of a graph pattern matching problem in social networks, further elaborating the conclusions drawn in this paper. When it comes to social networks, graph pattern matching can typically be utilized in various ways: identifying key individuals in social networks, discovering communities in social networks, detecting events in social networks, and so forth.
Consider an assembly network service, where users can vote, post, and express support or opposition on controversial topics or issues. Each user has their personal information along with their list of supported or opposed issues.
Figure 4 describes a portion of this network as a graph, involving the debate on whether artificial intelligence can replace humans. In the graph, each node represents a user, and each edge represents a relationship with one of four types: “fa,” “fd,” “sa,” and “sd.” Here, “fa” indicates that a user agrees with most of their friend’s votes on topics, “fd” indicates a user disagrees with most of their friend’s votes, “sa” indicates a user agrees with most of a stranger’s votes, and “sd” indicates a user disagrees with most of a stranger’s votes.
Considering the pattern graph in Figure 5, it is proposed by a user named “Jack07,” who holds a supportive stance on whether artificial intelligence can replace humans. Here, “sp” indicates the user’s support for a certain topic, while “dsp” indicates opposition.
This user wishes to find all programmers who oppose the idea of artificial intelligence replacing humans through the “fn” relationship. Additionally, they want to query whether there are users who meet the following conditions:
  • The user is a researcher and holds a supportive stance on whether artificial intelligence can replace humans. They are connected to someone, denoted as A, through the “fa” relationship within ≤2 steps and A is connected to them through the “sa” relationship.
  • These researchers belong to a research group, and all members within the group hold the same opinion.
  • These researchers hold different opinions from their programmer friends, and vice versa.
  • Friends of these programmers who are doctors, or strangers with positions as doctors, support their opinions. Conversely, the same applies.
The user aims to find the desired results in the data graph (Figure 4) based on the requirements specified by the pattern graph (Figure 5). Let Σ = { f a , f d , s a , s d } in the following:
d ( f a , s a ) = 0.5 , d ( f a , s a ) = 10 , d ( f a , s d ) = 20 , d ( f a , s d ) = 2 .
According to the precise graph pattern matching definition, the user cannot find a subgraph in this data graph that precisely matches their requirements. However, in the data graph, some subgraphs that approximately meet the user’s needs have been ignored. These subgraphs hold some degree of reference value for the user. But, if we relax the conditions, i.e., to some extent meet the user’s requirements, then the degree of matching between the data graph and the pattern graph can satisfy the user’s needs.
Firstly, two binary relations can be identified:
A , A 1 , B , B 1 , C , C 2 , D , D 1 and   A , A 1 , B , B 2 , C , C 3 , D , D 2
Next, it is necessary to determine whether these two binary relations are 1-approximate simulation relations. Let us first discuss whether binary relation A , A 1 , B , B 1 , C , C 2 , D , D 1 is a 1-approximate simulation relation. For the edge B f d C in the pattern graph P , there is an edge from node B 1 to node C 2 in the data graph G labeled as s d , while d f d , s d = 2 1 . Therefore, it can be concluded that the binary relation A , A 1 , B , B 2 , C , C 3 , D , D 2 is not a 1-approximate simulation relation.
Note that, for the edge D s d 2 + f a B in the pattern graph P , in the data graph G , there is a path from node D 2 to node B 2 , and the path is ρ = f a s a . According to Proposition 2, we obtain d ρ , L s a 2 + f a = 0.5 1 . As for the remaining edges in the pattern graph, there are corresponding paths in the data graph, and the distance between them is less than 1. Thus, it can be concluded that the binary relation A , A 1 , B , B 2 , C , C 3 , D , D 2 is a 1-approximate simulation relation. It can also be inferred that there exists a subgraph G1, as shown in Figure 6 in red letters, in the data graph G, satisfying the requirements of the pattern graph P.
Finally, let us describe the approximate matching relationship from a logical perspective. For the nodes D in the pattern graph and the nodes D 3 in the data graph, for any R E H M L δ \ [ ] formula, such as φ 1 = s a 2 + f a , 1 f d , 1 f a + , 1 t t , and from the semantic interpretation of R E H M L δ \ [ ] on the pattern graph, we can infer D φ 1 . To determine if D 3 is a 1-approximate simulation of D , it is necessary to discuss whether node D 3 satisfies formula φ 1 . According to the semantic interpretation of R E H M L δ \ [ ] on the data graph, it is known that there exists a path from D 3 to D , denoted as D 3 f a s a B 2 f d C 3 f a C 2 , such that D 3 φ 1 holds. For any formula R E H M L δ \ [ ] , assuming φ 2 = s a 2 + f a , 1 f d , 1 t t holds, according to the semantic interpretation of R E H M L δ \ [ ] on the data graph, any path starting from node D 3 that satisfies d ρ , L s a 2 + f a 1 , all the nodes reached by this path satisfy the formula f d , 1 t t .
Based on the information obtained from the data graph G, there exists a path satisfying this condition: D 3 f a s a B 2 , and B 2 f d , 1 t t . Therefore, we have D 3 φ 2 . To determine if D 3 is a 1-approximate simulation of D , we need to discuss whether node D satisfies formula φ 2 . From the semantic interpretation of R E H M L δ \ [ ] on the pattern graph, we know that D satisfies D φ 2 . By extension, we can conclude that node D 3 is a 1-approximate simulation of D .

6. Discussion

The case study serves as a proof-of-concept but has limitations. The social network graph is synthetic, and the metric d was defined manually. Performance on real-world, noisy graphs with billions of edges and learned metrics requires further investigation. Despite this, the framework is general. Beyond social networks, it could be applied in computational biology to match approximate protein interaction pathways, or in knowledge graph querying to find entities connected by paths that approximately match a complex relation sequence, improving recall in incomplete knowledge bases like DBpedia or YAGO.

7. Conclusions and Future Work

In this work, we presented δ-simulation, an approximation of a graph simulation incorporating metrics and regular expressions. We defined approximate matching between pattern and data graphs and provided a logical characterization for it using an extension of Hennessy–Milner logic ( R E H M L δ ).
Despite its theoretical foundations, our approach has limitations. The computational cost, while polynomial, can be high for very large graphs due to path checking. The method’s effectiveness is also contingent upon the careful selection of the distance metric δ and the underlying alphabet metric d, which may require domain expertise.
Based on these limitations, our future work will focus on the following: (1) Developing optimized algorithms and indexing structures to improve scalability for web-scale graphs; (2) Investigating methods to learn the metric d directly from data to reduce manual tuning; and (3) Extending the framework to support approximate matching of node attributes in addition to edge paths.
Beyond algorithmic improvements, we plan to apply and evaluate δ-simulation in specific, demanding real-world scenarios. Promising application domains include the following:
-
Computational Biology: Matching approximate signaling pathways in protein–protein interaction networks.
-
Knowledge Graph Querying: Enhancing query answering over incomplete knowledge bases (e.g., DBpedia, YAGO).
-
Network Security: Detecting lateral movement patterns of attackers in network logs where actions might be obfuscated.

Author Contributions

Conceptualization, X.L. and X.C. (Xuelei Chen); methodology, Z.Z.; software, Z.Z.; validation, X.C. (Xinyu Cui), J.W., and X.C. (Xuelei Chen).; formal analysis, X.C. (Xinyu Cui); investigation, X.L.; resources, X.C. (Xuelei Chen); data curation, Y.Z.; writing—original draft preparation, X.L.; writing—review and editing, X.C. (Xuelei Chen); visualization, J.W.; supervision, X.C. (Xuelei Chen); project administration, X.C. (Xuelei Chen); funding acquisition, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the 12th batch of Wenzhou Science and Technology Commissioner Project under Grant No. 12, by the Third Phase of the Ministry of Education’s supply and Demand Docking Employment and Education Project under Grant 2023122963332, and by the Second Phase of the Ministry of Education’s supply and Demand Docking Employment and Education Project under Grant 20230112193.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Aceto, L.; Ingólfsdóttir, A.; Larsen, K.G.; Srba, J. Reactive Systems: Modelling, Specification and Verification; Cambridge University Press: Cambridge, UK, 2007; pp. 220–247. [Google Scholar]
  2. Baier, C.; Katoen, J.P. Principles of Model Checking; MIT Press: Cambridge, MA, USA, 2008; pp. 449–593. [Google Scholar]
  3. Milner, R. Communication and Concurrency; Prentice Hall: Hoboken, NJ, USA, 1989; pp. 84–104. [Google Scholar]
  4. Milner, R. Communicating and Mobile Systems: The π-Calculus; Cambridge University Press: Cambridge, UK, 1999; pp. 16–25. [Google Scholar]
  5. Munkres, J.R. Typology; Prentice Hall: Hoboken, NJ, USA, 1975; pp. 263–290. [Google Scholar]
  6. Sangiorgi, D. Introduction to Bisimulation and Coinduction; Cambridge University Press: Cambridge, UK, 2012; pp. 53–142. [Google Scholar]
  7. Hopcroft, J.E.; Motwani, R.; Ullman, J.D. Introduction to Automata Theory, Languages, and Computation; Addison-Wesley Publishing: Reading, MA, USA, 1979; pp. 37–83. [Google Scholar]
  8. de Alfaro, L.; Faella, M.; Stoelinga, M. Linear and branching system metrics. IEEE Trans. Softw. Eng. 2009, 35, 258–273. [Google Scholar] [CrossRef]
  9. Girard, A.; Pappas, G.J. Approximation metrics for discrete and continuous systems. IEEE Trans. Autom. Control 2007, 52, 782–798. [Google Scholar] [CrossRef]
  10. Fan, W.F.; Li, J.Z.; Ma, S.; Tang, N.; Wu, Y.H. Adding regular expressions to graph reachability and pattern queries. Front. Comput. Sci. 2012, 6, 313–338. [Google Scholar] [CrossRef]
  11. Bouhenni, S.; Yahiaoui, S.; Nouali-Taboudjemat, N.; Kheddouci, H. A survey on distributed graph pattern matching in massive graphs. ACM Comput. Surv. 2021, 54, 1–36. [Google Scholar] [CrossRef]
  12. Ma, S.; Cao, Y.; Fan, W.F.; Huai, J.P.; Wo, T.Y. Strong simulation: Capturing topology in graph pattern matching. ACM Trans. Database Syst. 2014, 39, 1–46. [Google Scholar] [CrossRef]
  13. Pan, H.Y.; Cao, Y.Z.; Zhang, M.; Chen, Y.X. Simulation for lattice-valued doubly labeled transition systems. Int. J. Approx. Reason. 2014, 55, 797–811. [Google Scholar] [CrossRef]
  14. Henzinger, M.R.; Henzinger, T.A.; Kopke, P.W. Computing simulations on finite and infinite graphs. In Proceedings of the 36th Annual Symposium on Foundations of Computer Science, Milwaukee, WI, USA, 23–25 October 1995; pp. 453–462. [Google Scholar]
  15. Fan, W.F.; Wang, X.; Wu, Y.H. Incremental graph pattern matching. ACM Trans. Database Syst. 2013, 38, 18. [Google Scholar] [CrossRef]
  16. Chen, X.S.; Lai, L.; Qin, L.; Lin, X.M.; Liu, B. A framework to quantify approximate simulation on graph data. In Proceedings of the 37th IEEE International Conference on Data Engineering, Chania, Greece, 19–22 April 2021; pp. 1308–1319. [Google Scholar]
  17. Dwivedi, S.P.; Singh, R.S. Error-Tolerant Approximate Graph Matching Utilizing Node Centrality Information. Pattern Recogn. Lett. 2020, 133, 313–319. [Google Scholar] [CrossRef]
  18. Nayyeri, M.; Xu, C.; Alam, M.M.; Lehmann, J.; Yazdi, H.S. LogicENN: A Neural Based Knowledge Graphs Embedding Model with Logical Rules. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 7050–7062. [Google Scholar] [CrossRef]
  19. Barceló, P.; Kostylev, E.V.; Mikaël, M.; Pérez, J.; Reutter, J.; Silva, J.P. The Logical Expressiveness of Graph Neural Networks. In Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
  20. Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, P.S. A Comprehensive Survey on Graph Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 4–24. [Google Scholar] [CrossRef]
  21. Zhang, M.; Chen, Y. Link Prediction Based on Graph Neural Networks. In Proceedings of the 32nd Conference on Neural Information Processing Systems, NIPS 2018, Montréal, QC, Canada, 3–8 December 2018; pp. 5171–5181. [Google Scholar]
  22. Cui, X.Y.; Li, Z.K.; Chang, Y.T.; Pan, H.Y. Approximate Simulation for Transition Systems with Regular Expressions. In Proceedings of the 2nd Artificial Intelligence Logic and Applications, AILA 2022, Shanghai, China, 26–28 August 2022; Springer: Berlin/Heidelberg, Germany; pp. 49–62. [Google Scholar]
  23. Liu, G.F.; Liu, Y.; Zheng, K.; Liu, A.; Li, Z.X.; Wang, Y.; Zhou, X.F. MCS-GPM: Multi-constrained simulation based graph pattern matching in contextual social graphs. IEEE Trans. Knowl. Data Eng. 2018, 30, 1050–1064. [Google Scholar] [CrossRef]
  24. Du, R.H.; Yang, J.N.; Cao, Y.Z.; Wang, P. Personalized graph pattern matching via limited simulation. Knowl.-Based Syst. 2018, 141, 31–43. [Google Scholar] [CrossRef]
  25. Ullmann, J.R. An algorithm for subgraph isomorphism. J. ACM. 1976, 23, 31–42. [Google Scholar] [CrossRef]
  26. Zhang, S.J.; Yang, J.; Jin, W. Subgraph indexing and approximate matching in large graphs. Proc. VLDB Endow. 2010, 3, 1185–1194. [Google Scholar] [CrossRef]
  27. Thrane, C.R.; Fahrenberg, U.; Larsen, K.G. Quantitative analysis of weighted transition system. J. Log. Algebr. Program. 2010, 79, 689–703. [Google Scholar] [CrossRef]
  28. Cerný, P.; Henzinger, T.A.; Radhakrishna, A. Simulation distances. Theor. Comput. Sci. 2012, 413, 21–35. [Google Scholar] [CrossRef]
  29. Bozzelli, L.; Molinari, A.; Montanari, A.; Peron, A. Model checking interval temporal logics with regular expressions. Inf. Comput. 2020, 272, 104498. [Google Scholar] [CrossRef]
  30. Wu, H.Y.; Deng, Y.X. Logical characterizations of simulation and bisimulation for fuzzy transition systems. Fuzzy Sets Syst. 2016, 301, 19–36. [Google Scholar] [CrossRef]
  31. Abriola, S.; Descotte, M.E.; Figueira, S. Model theory of XPath on data trees. Inf. Comput. 2017, 255, 195–223. [Google Scholar] [CrossRef]
  32. Hermanns, H.; Parma, A.; Segala, R.; Wachter, B.; Zhang, L.J. Probabilistic logical characterization. Inf. Comput. 2011, 209, 154–172. [Google Scholar] [CrossRef]
  33. Bernardo, M.; Nicola, R.D.; Loreti, M. Revisiting bisimilarity and its modal logic for nondeterministic and probabilistic processes. Acta Inform. 2015, 52, 61–105. [Google Scholar] [CrossRef]
  34. Fahrenberg, U.; Legay, A. The quantitative linear-time-branching-time spectrum. Theor. Comput. Sci. 2014, 538, 54–69. [Google Scholar] [CrossRef]
  35. Pan, H.Y.; Li, Y.M.; Cao, Y.Z.; Li, P. Nondeterministic fuzzy automata with membership values in complete residuated lattices. Int. J. Approx. Reason. 2017, 82, 22–38. [Google Scholar] [CrossRef]
  36. Juhl, L.; Larsen, K.G.; Srba, J. Modal transition systems with weight intervals. J. Log. Algebr. Program. 2012, 81, 408–421. [Google Scholar] [CrossRef]
  37. Desharnais, J.; Gupta, V.; Jagadeesan, R.; Panangaden, P. Metrics for labelled Markov processes. Theor. Comput. Sci. 2004, 318, 323–354. [Google Scholar] [CrossRef]
  38. Hennessy, M.; Milner, R. Algebraic Laws for Nondeterminism and Concurrency. J. ACM 1985, 32, 137–161. [Google Scholar] [CrossRef]
Figure 1. Graph P and graph G.
Figure 1. Graph P and graph G.
Symmetry 17 01659 g001
Figure 2. Graph P and Graph G 1 .
Figure 2. Graph P and Graph G 1 .
Symmetry 17 01659 g002
Figure 3. Graph P and Graph G 2 .
Figure 3. Graph P and Graph G 2 .
Symmetry 17 01659 g003
Figure 4. Graph G .
Figure 4. Graph G .
Symmetry 17 01659 g004
Figure 5. Pattern P .
Figure 5. Pattern P .
Symmetry 17 01659 g005
Figure 6. A Subgraph G 1 of Data Graph G .
Figure 6. A Subgraph G 1 of Data Graph G .
Symmetry 17 01659 g006
Table 1. Summary of Notations.
Table 1. Summary of Notations.
SymbolMeaning
Σ Finite alphabet
Σ Set   of   all   finite   strings   over   Σ
ϖ Σ Set   of   all   regular   expressions   over   Σ
L ω Language   of   regular   expression   ω
G = V ,   E ,   f A Data graph
P = V P ,   E P ,   f V Pattern graph
d Metric function
s d String distance function
d Distance from string to language
δ Tolerance parameter
S δ δ -simulation relation
R E H M L δ Extended Hennessy–Milner logic with regular expressions
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liao, X.; Zhang, Z.; Cui, X.; Wang, J.; Zhang, Y.; Chen, X. A Logical Characterization for Approximate Matching of Pattern Graphs with Regular Expressions. Symmetry 2025, 17, 1659. https://doi.org/10.3390/sym17101659

AMA Style

Liao X, Zhang Z, Cui X, Wang J, Zhang Y, Chen X. A Logical Characterization for Approximate Matching of Pattern Graphs with Regular Expressions. Symmetry. 2025; 17(10):1659. https://doi.org/10.3390/sym17101659

Chicago/Turabian Style

Liao, Xinfei, Zuoli Zhang, Xinyu Cui, Jin Wang, Yu Zhang, and Xuelei Chen. 2025. "A Logical Characterization for Approximate Matching of Pattern Graphs with Regular Expressions" Symmetry 17, no. 10: 1659. https://doi.org/10.3390/sym17101659

APA Style

Liao, X., Zhang, Z., Cui, X., Wang, J., Zhang, Y., & Chen, X. (2025). A Logical Characterization for Approximate Matching of Pattern Graphs with Regular Expressions. Symmetry, 17(10), 1659. https://doi.org/10.3390/sym17101659

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop