Efficiently Estimating Joining Cost of Subqueries in Regular Path Queries
Abstract
:1. Introduction
2. Related Work
3. Preliminaries
3.1. Graph Data and Regular Path Queries
- Concatenation RPQ:
- Alternation RPQ:
- Kleene Star RPQ:
- Highly Complex RPQ:
3.2. Uscm-Based Splitting Rpqs for Parallel Evaluation
4. USCM-Based Parallel Evaluation of RPQs by Estimating Joining Cost
4.1. Estimating Result Size of RPQs with USCM
- For the simplest case, , there is no concatenation sub-query before and after a group of alternate labels. The result size can be estimated, as follows.
- For a general case, , where and , we estimate the result size of by using the equation below.
Algorithm 1.EstimateAlterStar |
Require:pre: string before alternation operator, a: the first label in group of alternation operator, b: the second label in group of alternation operator, suf: string after alternation operator, and USCM Ensure:P: the number of paths satisfying R
|
4.2. Parallel Evaluation of RPQs by Exploiting Joining Cost
4.2.1. Estimating Parallel Evaluation Cost
4.2.2. Parallel Evaluation of RPQs based on Minimum Estimated Evaluation Cost
5. Experimental Evaluation
5.1. Evaluation Settings
5.2. Experimental Results
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Libkin, L.; Vrgoč, D. Regular path queries on graphs with data. In Proceedings of the 15th International Conference on Database Theory, Berlin, Germany, 26–28 March 2012; pp. 74–85. [Google Scholar]
- Barceló Baeza, P. Querying graph databases. In Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, New York, NY, USA, 22–27 June 2013; pp. 175–188. [Google Scholar]
- Yakovets, N.; Godfrey, P.; Gryz, J. Query planning for evaluating SPARQL property paths. In Proceedings of the 2016 International Conference on Management of Data, San Francisco, CA, USA, 26 June–1 July 2016; pp. 1875–1889. [Google Scholar]
- Scott, J.; Ideker, T.; Karp, R.M.; Sharan, R. Efficient algorithms for detecting signaling pathways in protein interaction networks. J. Comput. Biol. 2006, 13, 133–144. [Google Scholar] [CrossRef] [PubMed]
- Konstas, I.; Stathopoulos, V.; Jose, J.M. On social networks and collaborative recommendation. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Boston, FL, USA, 19–23 July 2009; pp. 195–202. [Google Scholar]
- Goldman, R.; Widom, J. DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. In Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB’97, Athens, Greece, 25–29 August 1997; pp. 436–445. [Google Scholar]
- Fernandez, M.; Suciu, D. Optimizing regular path expressions using graph schemas. In Proceedings of the 14th International Conference on Data Engineering, Orlando, FL, USA, 23–27 February 1998; pp. 14–23. [Google Scholar]
- Calvanese, D.; De Giacomo, G.; Lenzerini, M.; Vardi, M.Y. Rewriting of regular expressions and regular path queries. In Proceedings of the Eighteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Philadelphia, PA, USA, 31 May–2 June 1999; pp. 194–204. [Google Scholar]
- Calvanese, D.; De Giacomo, G.; Lenzerini, M.; Vardi, M.Y. Rewriting of regular expressions and regular path queries. J. Comput. Syst. Sci. 2002, 64, 443–465. [Google Scholar] [CrossRef] [Green Version]
- Koschmieder, A.; Leser, U. Regular path queries on large graphs. In Scientific and Statistical Database Management; Springer: Chania, Greece, 2012; pp. 177–194. [Google Scholar]
- Nguyen, V.Q.; Huynh, Q.T.; Kim, K. Estimating searching cost of regular path queries on large graphs by exploiting unit-subqueries. J. Heuristics 2018. [Google Scholar] [CrossRef]
- Nguyen, V.Q.; Nguyen, V.H.; Nguyen, H.-T.; Nguyen Nguyen, M.Q.; Huynh, Q.T.; Kim, K. Accelerating Parallel Evaluation of Regular Path Queries on Large Graphs by Estimating Joining Cost of Subqueries. In Proceedings of the Ninth International Conference on Smart Media and Applications, Jeju Island, Korea, 17–19 September 2020. [Google Scholar]
- Pacaci, A.; Bonifati, A.; Özsu, M.T. Regular path query evaluation on streaming graphs. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, Portland, OR, USA, 14–19 June 2020; pp. 1415–1430. [Google Scholar]
- Wadhwa, S.; Prasad, A.; Ranu, S.; Bagchi, A.; Bedathur, S. Efficiently answering regular simple path queries on large labeled networks. In Proceedings of the 2019 International Conference on Management of Data, Hong Kong, China, 10–13 June 2019; pp. 1463–1480. [Google Scholar]
- Trißl, S. Cost-based optimization of graph queries. In Proceedings of the SIGMOD/PODS PhD Workshop on Innovative Database Research (IDAR), Beijing, China, 10 June 2007. [Google Scholar]
- Grahne, G.; Thomo, A. Query containment and rewriting using views for regular path queries under constraints. In Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, San Diego, CA, USA, 9–12 June 2003; pp. 111–122. [Google Scholar]
- Liu, T.; Liu, A.X.; Shi, J.; Sun, Y.; Guo, L. Towards fast and optimal grouping of regular expressions via DFA size estimation. IEEE J. Sel. Areas Commun. 2014, 32, 1797–1809. [Google Scholar] [CrossRef]
- Almeida, J.; Zeitoun, M. Description and analysis of a bottom-up DFA minimization algorithm. Inf. Process. Lett. 2008, 107, 52–59. [Google Scholar] [CrossRef] [Green Version]
- Liu, D.; Huang, Z.; Zhang, Y.; Guo, X.; Su, S. Efficient Deterministic Finite Automata Minimization Based on Backward Depth Information. PLoS ONE 2016, 11, e0165864. [Google Scholar]
- Kossmann, D. The state of the art in distributed query processing. ACM Comput. Surv. (CSUR) 2000, 32, 422–469. [Google Scholar] [CrossRef]
- Suciu, D. Distributed query evaluation on semistructured data. ACM Trans. Database Syst. (TODS) 2002, 27, 1–62. [Google Scholar] [CrossRef]
- Fan, W.; Wang, X.; Wu, Y. Performance guarantees for distributed reachability queries. Proc. VLDB Endow. 2012, 5, 1304–1316. [Google Scholar] [CrossRef] [Green Version]
- Nguyen, V.Q.; Tung, L.D.; Hu, Z. Minimizing data transfers for regular reachability queries on distributed graphs. In Proceedings of the Fourth Symposium on Information and Communication Technology, Da Nang, Vietnam, 5–6 December 2013; pp. 325–334. [Google Scholar]
- Tung, L.D.; Nguyen, V.Q.; Hu, Z. Efficient query evaluation on distributed graphs with Hadoop environment. In Proceedings of the Fourth Symposium on Information and Communication Technology, Da Nang, Vietnam, 5–6 December 2013; pp. 311–319. [Google Scholar]
- Martens, W.; Trautner, T. Evaluation and Enumeration Problems for Regular Path Queries. In Proceedings of the 21st International Conference on Database Theory (ICDT 2018), Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Vienna, Austria, 26–29 March 2018. [Google Scholar]
- Abul-Basher, Z.; Yakovets, N.; Godfrey, P.; Ghajar-Khosravi, S.; Chignell, M.H. TASWEET: Optimizing disjunctive regular path queries in graph databases. In Proceedings of the EDBT/ICDT 2017 Joint Conference 20th International Conference on Extending Database Technology, Venice, Italy, 21–24 March 2017. [Google Scholar]
- Fletcher, G.H.; Peters, J.; Poulovassilis, A. Efficient regular path query evaluation using path indexes. In Proceedings of the 19th International Conference on Extending Database Technology, Bordeaux, France, 15–16 March 2016; pp. 636–639. [Google Scholar] [CrossRef]
- Trißl, S.; Leser, U. Estimating Result Size and Execution Times for Graph Queries. In Proceedings of the ADBIS (Local Proceedings), Novi Sad, Serbia, 20–24 September 2010; pp. 11–20. [Google Scholar]
- Davoust, A.; Esfandiari, B. Processing Regular Path Queries on Arbitrarily Distributed Data. In OTM Confederated International Conferences On the Move to Meaningful Internet Systems; Springer: Rhodes, Greece, 2016; pp. 844–861. [Google Scholar]
- Afrati, F.N.; Ullman, J.D. Optimizing multiway joins in a map-reduce environment. IEEE Trans. Knowl. Data Eng. 2011, 23, 1282–1298. [Google Scholar] [CrossRef]
- Wu, M.; Berti-Equille, L.; Marian, A.; Procopiuc, C.M.; Srivastava, D. Processing top-k join queries. Proc. VLDB Endow. 2010, 3, 860–870. [Google Scholar] [CrossRef]
- Suchanek, F.M.; Kasneci, G.; Weikum, G. Yago: A core of semantic knowledge. In Proceedings of the 16th International Conference on World Wide Web, Banff, AB, Canada, 8–12 May 2007; pp. 697–706. [Google Scholar]
- Bollacker, K.; Evans, C.; Paritosh, P.; Sturge, T.; Taylor, J. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, Vancouver, BC, Canada, 9–12 June 2008; pp. 1247–1250. [Google Scholar]
- Mahdisoltani, F.; Biega, J.; Suchanek, F.M. Yago3: A knowledge base from multilingual wikipedias. In Proceedings of the CIDR, Asilomar, CA, USA, 6–9 January 2013. [Google Scholar]
- Bast, H.; Bäurle, F.; Buchhold, B.; Haußmann, E. Easy access to the freebase dataset. In Proceedings of the 23rd International Conference on World Wide Web, Seoul, Korea, 7–11 April 2014; pp. 95–98. [Google Scholar]
- Zahiri, J.; Hannon Bozorgmehr, J.; Masoudi-Nejad, A. Computational prediction of protein–protein interaction networks: Algorithms and resources. Curr. Genom. 2013, 14, 397–414. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Bagan, G.; Bonifati, A.; Ciucanu, R.; Fletcher, G.H.; Lemay, A.; Advokaat, N. gMark: Schema-driven generation of graphs and queries. IEEE Trans. Knowl. Data Eng. 2017, 29, 856–869. [Google Scholar] [CrossRef] [Green Version]
- Nguyen, V.Q.; Bui, T.X.L.; Nguyen, V.H. An efficient graph modeling approach for storing and analyzing heterogeneous IoT data. UTEHY J. Sci. Technol. 2020, 27, 21–27. [Google Scholar]
- Bastian, M.; Heymann, S.; Jacomy, M. Gephi: An open source software for exploring and manipulating networks. ICWSM 2009, 8, 361–362. [Google Scholar]
Label:Count | isLeaderOf | Friend | Follows | Knows | Purchased | Likes | ownedBy | Total |
---|---|---|---|---|---|---|---|---|
isLeaderOf:3 | 0 | 2 | 2 | 0 | 1 | 1 | 0 | 6 |
friend:2 | 0 | 0 | 1 | 3 | 1 | 3 | 0 | 8 |
follows:3 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 4 |
knows:6 | 3 | 0 | 0 | 2 | 3 | 1 | 0 | 9 |
purchased:4 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 4 |
likes:6 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 6 |
ownedBy:4 | 0 | 1 | 3 | 2 | 1 | 2 | 0 | 9 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nguyen, V.-Q.; Nguyen, V.-H.; Nguyen, M.-Q.; Huynh, Q.-T.; Kim, K. Efficiently Estimating Joining Cost of Subqueries in Regular Path Queries. Electronics 2021, 10, 990. https://doi.org/10.3390/electronics10090990
Nguyen V-Q, Nguyen V-H, Nguyen M-Q, Huynh Q-T, Kim K. Efficiently Estimating Joining Cost of Subqueries in Regular Path Queries. Electronics. 2021; 10(9):990. https://doi.org/10.3390/electronics10090990
Chicago/Turabian StyleNguyen, Van-Quyet, Van-Hau Nguyen, Minh-Quy Nguyen, Quyet-Thang Huynh, and Kyungbaek Kim. 2021. "Efficiently Estimating Joining Cost of Subqueries in Regular Path Queries" Electronics 10, no. 9: 990. https://doi.org/10.3390/electronics10090990
APA StyleNguyen, V.-Q., Nguyen, V.-H., Nguyen, M.-Q., Huynh, Q.-T., & Kim, K. (2021). Efficiently Estimating Joining Cost of Subqueries in Regular Path Queries. Electronics, 10(9), 990. https://doi.org/10.3390/electronics10090990