# Multiple Minimum Support-Based Rare Graph Pattern Mining Considering Symmetry Feature-Based Growth Technique and the Differing Importance of Graph Elements

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Related Work

## 3. Mining Weighted Rare Graph Patterns from Graph Databases with Multiple Minimum Support Thresholds

#### 3.1. Preliminaries

**Definition 1.**(Graph pattern) Let G be a graph pattern composed of multiple elements, where there are two element types, vertex and edge. That is, G has a set of vertices, V

_{G}= {v

_{1}, v

_{2}, …, v

_{m}}, and a set of edges, E

_{G}= {e

_{1}, e

_{2}, …, e

_{n}}. Note that we consider a simple, labeled, undirected graph form in this paper to help understand the contents of our approach more easily, but it is trivial to apply other graph forms in this approach through several additional considerations.

**Definition**

**2.**(Support of a graph pattern) Let GDB = {Gtr

_{1}, Gtr

_{2}, …, Gtr

_{k}} be a given graph database composed of multiple graph transactions, Gtrs, and G be a graph pattern. Then, a support of G, Sup(G) is calculated as follows:

_{k}, as a sub graph pattern. That is, Sup(G) obtained from Equation (2) presents how many times G appears in GDB. Hence, in traditional frequent graph pattern mining, if Sup(G) is higher than or equal to a given minimum support threshold, G is considered as a frequent graph pattern. Consequently, the main goal of frequent graph pattern mining is to find all of the possible graph patterns such that each of their supports is not lower than the threshold.

**Definition**

**3.**(Degree of a graph pattern) Every graph has one of the three graph forms, path, free-tree, and cyclic graph, where their coverage is path ⊆ free-tree ⊆ cyclic graph. That is, a cyclic graph includes path and free-tree forms; a free-tree contains a path form. In the case of a path, all of its vertices except for both of its ends have degree 2, while both ends have degree 1. In other words, assuming that a given graph pattern, G is a path with n vertices, the following conditions are satisfied:

_{1}and v

_{n}are the first and last vertices, respectively. |V

_{G}| and |E

_{G}| signify the number of vertices and edges in G, respectively, where |V

_{G}| is also equal to n. In the case of free-tree, one or more vertices must have degree 3 or more and all of its edges have no cyclic relation. That is, if G is a free-tree, it satisfies the following conditions:

#### 3.2. Applying Symmetry Feature-Based Performance Improving Technique into the Graph Pattern Growth Process

_{1}-e

_{1}-v

_{2}-e

_{2}-…-e

_{n-}

_{1}-v

_{n}}, each symmetry type can be denoted as follows:

_{1}be a given path and P

_{2}′ and P

_{2}″ be paths expanded from P

_{1}. The symmetry models of P

_{1}are shown in the top of the figure. In the case of P

_{2}′, a new edge and vertex, e′ and v′, have been inserted at the end of P

_{1}. In this case, start_sym(P

_{2}′) is equal to whole_sym(P

_{1}). whole_sym(P

_{2}′) can be easily computed by end_sym(P

_{1}) because we have only to compare the edges and vertices next to both end of end_sym(P

_{1}), v

_{1}and e

_{2}, and e′ and v′. In contrast, end_sym(P

_{2}″) is equal to whole_sym(P

_{1}), and whole_sym(P

_{2}″) is obtained from start_sym(P

_{1}). It is important to consider these three symmetry models in the growth step because we can easily determine whether or not a path is symmetric without computational overheads.

#### 3.3. Employing Element Importance and Multiple Minimum Support Thresholds in Frequent Graph Pattern Mining

**Definition**

**4.**(Weighted support of a graph pattern) Let V

_{G}= {v

_{1}, v

_{2}, …, v

_{m}}, E

_{G}= {e

_{1}, e

_{2}, …, e

_{n}}, and W

_{G}= {w

_{1}, w

_{2}, …, w

_{n}} be a set of vertices in a graph pattern, G, a set of edges in G, and a set of edge weights in G. Then, the weighted support of G, Wsup(G) is calculated as follows:

**Definition**

**5.**(Minimum element support threshold)) Let GDB = {Gtr

_{1}, Gtr

_{2}, …, Gtr

_{k}} be a given graph database composed of multiple graph transactions, Gtrs, V

_{GDB}= {v

_{1}, v

_{2}, …, v

_{x}} be a set of separate vertices included in GDB, and E

_{GDB}= {e

_{1}, e

_{2}, …, e

_{y}} be a set of separate edges in GDB. Then, a minimum element support threshold for each element (v or e), δ

_{i}(1 ≤ i ≤ x + y) is set as a value specified by a user.

**Definition**

**6.**(Minimum graph support threshold) Given a graph pattern, G, a set of vertices in G, V

_{G}= {v

_{1}, v

_{2}, …, v

_{n}}, and a set of edges in G, E

_{G}= {e

_{1}, e

_{2}, …, e

_{m}}, a set of δ values in G including V

_{G}and E

_{G}, T

_{G}, can be denoted as T

_{G}= {δ

_{1}, δ

_{2}, …, δ

_{n+m}}. Then, a minimum graph support threshold for G, MGST(G), is computed as follows:

#### 3.4. Maintaining the Correctness of the Proposed Algorithm

_{over}(G), is first applied in the mining process. Through such an overestimation method, we can maintain the anti-monotone property and prevent unintended pattern losses by the weight constraints. In the second method, among all the thresholds in Definition 5, we set the least value that does not violate the property and apply it into the mining process. Let L = {δ

_{1}, δ

_{2}, …, δ

_{x}} be a list of all the elements’ δ values in GDB that are sorted in the descending order of their values. Then, starting from the last element in the list, we check whether or not the overestimated weighted support of the element is higher than its own δ value. If there is the first element satisfying this condition, its δ value becomes an underestimated minimum support threshold, called Least Minimum Support (LMS).

#### 3.5. WRG-Miner Algorithm

_{i}(lines 3–5). In this phase, WRG-Miner continues to expand graphs for each edge, e

_{i}, considering their current states (lines 6–9). For each expanded graph, G’, if the overestimated weighted support of the graph is smaller than LMS, it is permanently pruned (line 10). If the graph pattern’s real weighted support is not lower than its corresponding MGST value, it is regarded as a valid result and the algorithm outputs it (line 11). Once the graph pattern has an overestimated weighted support higher than or equal to LMS, growth operations for the pattern are recursively conducted regardless of whether it is really outputted or not (lines 12–13). After finishing all the recursive processes, we can obtain a complete set of WRGs, S.

## 4. Performance Evaluation

#### 4.1. Experimental Settings

#### 4.2. Experimental Results of the Proposed Algorithm

_{i}, δ

_{i}= MAX(β × Sup(e

_{i}), LS), where LS is the lowest one among all the δ values and is set to the same as the threshold of Gaston for reasonable comparisons. β = 1/α (0 < β ≤ 1, 1 ≤ α) is a variable that represents how closely the real support of each element is related to its own threshold value. That is, as β becomes closer to 1, δ is more likely to be assigned as a value more similar to Sup(e

_{i}) rather than LS.

## 5. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Hwang, Y.; Kwon, J.; Moon, J.; Cho, S. Classifying Malicious Web Pages by Using an Adaptive Support Vector Machine. J. Inf. Process. Syst.
**2013**, 9, 395–404. [Google Scholar] [CrossRef] - Ihm, H. Mining Consumer Attitude and Behavior. J. Converg.
**2013**, 4, 29–35. [Google Scholar] - Malkawi, M.; Murad, O. Artificial neuro fuzzy logic system for detecting human emotions. Hum. Cent. Comput. Inf. Sci.
**2013**, 3, 1–13. [Google Scholar] [CrossRef] - Uddin, J.; Islam, R.; Kim, J. Texture Feature Extraction Techniques for Fault Diagnosis of Induction Motors. J. Converg.
**2014**, 5, 15–20. [Google Scholar] - Brahami, M.; Atmani, B.; Matta, N. Dynamic knowledge mapping guided by data mining: Application on Healthcare. J. Inf. Process. Syst.
**2013**, 9, 1–30. [Google Scholar] [CrossRef] - Cho, Y.; Moon, S. Weighted Mining Frequent Pattern based Customer’s RFM Score for Personalized u-Commerce Recommendation System. J. Converg.
**2013**, 4, 36–40. [Google Scholar] - Holzinger, A.; Ofner, B.; Dehmer, M. Multi-touch graph-based interaction for knowledge discovery on mobile devices: State-of-the-art and future challenges. In Interactive Knowledge Discovery and Data Mining in Biomedical Informatics; Lecture Notes in Computer Science, Lncs 8401; Holzinger, A., Jurisica, I., Eds.; Springer: Berlin, Germany; Heidelberg, Germany, 2014; pp. 241–254. [Google Scholar]
- Preuss, M.; Dehmer, M.; Pickl, S.; Holzinger, A. On terrain coverage optimization by using a network approach for universal graph-based data mining and knowledge discovery. In Brain Informatics and Health; Springer: Berlin, Germany, 2014; pp. 564–573. [Google Scholar]
- Yun, U.; Lee, G.; Ryu, K. Mining Maximal Frequent Patterns by Considering Weight Conditions over Data Streams. Knowl. Based Syst.
**2014**, 55, 49–65. [Google Scholar] [CrossRef] - Agrawal, R.; Srikant, R. Fast Algorithms for Mining Association Rules. In Proceedings of the 20th International Conference on Very Large Data Bases, Santiago de Chile, Chile, 12–15 September 1994.
- Han, J.; Pei, J.; Yin, Y.; Mao, R. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min. Knowl. Discov.
**2004**, 8, 53–87. [Google Scholar] [CrossRef] - Pyun, G.; Yun, U.; Ryu, K. Efficient frequent pattern mining based on Linear Prefix Tree. Knowl. Based Syst.
**2014**, 55, 125–139. [Google Scholar] [CrossRef] - Pyun, G.; Yun, U. Mining top-k frequent patterns with combination reducing techniques. Appl. Intell.
**2014**, 41, 76–98. [Google Scholar] [CrossRef] - Ryang, H.; Yun, U.; Ryu, K. Discovering High Utility Itemsets with Multiple Minimum Supports. Intell. Data Anal.
**2014**, 18, 1027–1047. [Google Scholar] - Binh, H.; Ngo, S. All capacities modular cost survivable network design problem using genetic algorithm with completely connection encoding. Hum. Cent. Comput. Inf. Sci.
**2014**, 4, 1–13. [Google Scholar] [CrossRef] - Khan, R.; Islam, Md.; Amin, M. Traffic Analysis of a Cognitive Radio Network Based on the Concept of Medium Access Probability. J. Inf. Process. Syst.
**2014**, 10, 602–617. [Google Scholar] [CrossRef] - Kumar, K.; Geethakumari, G. Detecting misinformation in online social networks using cognitive psychology. Hum. Cent. Comput. Inf. Sci.
**2014**, 4, 1–22. [Google Scholar] [CrossRef] - Hu, Y.H.; Chen, Y.L. Mining association rules with multiple minimum supports: A new mining algorithm and a support tuning mechanism. Decis. Support Syst.
**2006**, 42, 1–24. [Google Scholar] [CrossRef] - Lee, G.; Yun, U. Frequent Graph Mining Based on Multiple Minimum Support Constraints. In Proceedings of the 4th International Conference on Mobile, Ubiquitous, and Intelligent Computing, Gwangju, Korea, 4–6 September 2013.
- Nijssen, S.; Kok, J.N. A quickstart in frequent structure mining can make a difference. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, 22–25 August 2004.
- Samiullah, M.; Ahmed, C.F.; Fariha, A.; Islam, M.R.; Lachiche, N. Mining frequent correlated graphs with a new measure. Expert Syst. Appl.
**2014**, 41, 1847–1863. [Google Scholar] [CrossRef] - Liu, B.; Hsu, W.; Ma, Y. Mining association rules with multiple minimum supports. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, 15–18 August 1999.
- Kiran, R.U.; Reddy, P.K. Novel techniques to reduce search space in multiple minimum supports-based frequent pattern mining algorithms. In Proceedings of the 14th International Conference on Extending Database Technology, Uppsala, Sweden, 21–25 March 2011.
- Lee, G.; Yun, U.; Ryu, K. Sliding Window based Weighted Maximal Frequent Pattern Mining over Data Streams. Expert Syst. Appl.
**2014**, 41, 694–708. [Google Scholar] [CrossRef] - Vo, B.; Coenen, F.; Le, B. A new method for mining Frequent Weighted Itemsets based on WIT-trees. Expert Syst. Appl.
**2013**, 40, 1256–1264. [Google Scholar] [CrossRef] - Yun, U.; Pyun, G.; Yoon, E. Efficient mining of robust closed weighted sequential patterns without information loss. Int. J. Artif. Intell. Tools
**2015**. [Google Scholar] [CrossRef] - Yun, U.; Kim, J. A Fast Perturbation Algorithm using Tree Structure for Privacy Preserving Utility Mining. Expert Syst. Appl.
**2014**, 42, 1149–1165. [Google Scholar] [CrossRef]

© 2015 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Lee, G.; Yun, U.; Ryang, H.; Kim, D.
Multiple Minimum Support-Based Rare Graph Pattern Mining Considering Symmetry Feature-Based Growth Technique and the Differing Importance of Graph Elements. *Symmetry* **2015**, *7*, 1151-1163.
https://doi.org/10.3390/sym7031151

**AMA Style**

Lee G, Yun U, Ryang H, Kim D.
Multiple Minimum Support-Based Rare Graph Pattern Mining Considering Symmetry Feature-Based Growth Technique and the Differing Importance of Graph Elements. *Symmetry*. 2015; 7(3):1151-1163.
https://doi.org/10.3390/sym7031151

**Chicago/Turabian Style**

Lee, Gangin, Unil Yun, Heungmo Ryang, and Donggyu Kim.
2015. "Multiple Minimum Support-Based Rare Graph Pattern Mining Considering Symmetry Feature-Based Growth Technique and the Differing Importance of Graph Elements" *Symmetry* 7, no. 3: 1151-1163.
https://doi.org/10.3390/sym7031151