# Query Rewriting for Incremental Continuous Query Evaluation in HIFUN

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

#### Motivation and Contributions

- We use the HIFUN language to define the continuous query problem and give a high-level generic algorithm for its solution.
- Then we map the generic algorithm to the physical level, implementing the evaluation mechanism both as SQL queries and MapReduce tasks.
- Further, we show how HIFUN query rewriting can be implemented in the physical layer.
- Finally, we experimentally show that our implementation provides considerable benefits in terms of efficiency.

## 2. Related Work

## 3. Preliminaries—The HIFUN Query Language

- Grouping. Group together all invoices j of D such that $b\left(j\right)=B{r}_{i}$.
- Measuring. Apply q to each invoice j of the $B{r}_{i}$-group to find the corresponding quantity $q\left(j\right)$.
- Aggregation. Sum up the $q{\left(j\right)}^{\prime}$s thus obtained to get a single value ${u}_{i}$.

#### 3.1. Analysis Context

#### 3.2. Query Rewriting

## 4. Methods

#### 4.1. Incremental Computation in HIFUN

**op= sum:**$an{s}^{\prime}\left(i\right)=ans\left(i\right)+incr\left(i\right)$ if i is in $K\cap {K}^{\prime}$;$ans\left(i\right)$ if i is in $K\backslash {K}^{\prime}$; $incr\left(i\right)$ if i is in ${K}^{\prime}\backslash K$;**op= min:**$an{s}^{\prime}\left(i\right)=min(ans\left(i\right),incr\left(i\right))$ if i is in $K\cap {K}^{\prime}$; $ans\left(i\right)$ if i is in $K\backslash {K}^{\prime}$; $incr\left(i\right)$ if i is in ${K}^{\prime}\backslash K$;**op= max:**$an{s}^{\prime}\left(i\right)=max(ans\left(i\right),incr\left(i\right))$ if i is in $K\cap {K}^{\prime}$; $ans\left(i\right)$ if i is in $K\backslash {K}^{\prime}$; $incr\left(i\right)$ if i is in ${K}^{\prime}\backslash K$;**op= count:**$an{s}^{\prime}\left(i\right)=ans\left(i\right)+incr\left(i\right)$ if i is in $K\cap {K}^{\prime}$; $ans\left(i\right)$ if i is in $K\backslash {K}^{\prime}$; $incr\left(i\right)$ if i is in ${K}^{\prime}\backslash K$;

**op = avg:**- -
- $an{s}^{\prime}\left(i\right)=ans\left(i\right)$ if i is in $K\backslash {K}^{\prime}$;
- -
- $an{s}^{\prime}\left(i\right)=incr\left(i\right)$ if i is in ${K}^{\prime}\backslash K$;
- -
- $an{s}^{\prime}\left(i\right)=\frac{an{s}_{op=sum\left(i\right)+inc{r}_{op=sum}\left(i\right)}}{an{s}_{op=count\left(i\right)+inc{r}_{op=count}\left(i\right)}}$ if i is in $K\cap {K}^{\prime}$;

#### 4.2. System Implementation

#### 4.2.1. Continuous HIFUN Queries over Micro-Batches

#### 4.2.2. Continuous HIFUN Queries Using MapReduce

- (a)
- Query Input Preparation. A set of attributes which are included in grouping and measuring part of Q is used to extract the information from the initial unstructured dataset. A map function is used that applies the given attributes to each record of the initial DStream and returns a new DStream which contains the information useful for the next evaluation steps.
- (b)
- Grouping Partition Construction. In this step, a map function constructs the grouping partition as follows. The mapper receives the tuples created from the previous step and extracts the key–value pairs. The result is a new PairDStream in which the key K is the value of the grouping attribute of each data item or the value of the grouping attributes if the domain of $an{s}_{Q}$ is the Cartesian product of two or more grouping attributes. The value V is the value of the measuring attribute of each data item.
- (c)
- Grouping Partition Reduction. In this step a reduce function is used to merge the values of each key using a query operation $op$, and a new DStream is created as the query answer of the current micro-batch.

#### Incremental Evaluation

**Case 1**. The first case includes the Common Grouping and Measuring Rewriting Rule, Common Grouping Rewriting Rule and Basic Rewriting Rule. In these rules when a rewritten query is executed over a micro-batch, a DStream is created in a form of $DStream\left[({K}_{q},{V}_{k})\right]$. The ${K}_{q}$ is the key of the query and ${V}_{k}$ is the synthetic value of its ${K}_{q}$ in the current micro-batch. The mapWithState method is used to update the current state, which is also in the form of $DStream\left[({K}_{s},{V}_{ks})\right]$, where ${K}_{s}$ is the key of the aggregated query and ${V}_{ks}$ is the synthetic value of its ${K}_{s}$. In each micro-batch, this method is executed only for the keys of the state that need to be updated, which is a great performance optimization.

**Case 2**. The second case includes the Common Measuring and Operation Rewriting Rule. In this rule when a rewritten query is executed over a micro-batch, n DStreams are created in a form of $DStream\left[({K}_{q},{V}_{k})\right]$. Here, the ${K}_{q}$ denotes the key of the projection query and ${V}_{k}$ denotes the value of its ${K}_{q}$ in the current micro-batch. The n DStreams depend on the number of different grouping attributes appearing in the rewritten set Q. A chain of mapWithState methods is used to update the n current states, which are also in the form of $DStream\left[({K}_{s},{V}_{ks})\right]$. The ${K}_{s}$ denotes the key of the aggregated projection query and ${V}_{ks}$ denotes the value of its ${K}_{s}$.

#### 4.2.3. Translating Continuous HIFUN Queries to SQL

#### Generic Evaluation Schema to SQL

**Case 1**. The attributes A and B appear in the same table (e.g., T). In this case we can obtain the answer of Q using the following group-by statement of SQL.

**Case 2**. The attributes A and B appear in two different tables (e.g., S and T). In this case we can obtain the answer of Q using the following group-by statement of SQL.

**Select**Branch,**sum**(Quantity)**As**ans_{Q}(Branch)**From**FT**Group by**Branch

**Select**Supplier, Category,**sum**(Quantity)**As**ans_{Q}(Supplier, Category)**From join**(FT, DT_{Product})**Group by**Supplier, Category

#### Rewritings to SQL

## 5. Results

#### 5.1. Setup and Datasets

#### 5.2. Results

#### 5.2.1. Defining the Queries

#### 5.2.2. Evaluating Incrementality

#### 5.2.3. Evaluating Rewritings

## 6. Conclusions

#### Limitations and Future Work

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Mello, B.; Rios, R.; Lira, C.; Prazeres, C. FoT-Stream: A Fog platform for data stream analytics in IoT. Comput. Commun.
**2020**, 164, 77–87. [Google Scholar] [CrossRef] - Kondylakis, H.; Dayan, N.; Zoumpatianos, K.; Palpanas, T. Coconut: Sortable summarizations for scalable indexes over static and streaming data series. VLDB J.
**2019**, 28, 847–869. [Google Scholar] [CrossRef] - Queiroz, W.; Capretz, M.A.; Dantas, M. An approach for SDN traffic monitoring based on big data techniques. J. Netw. Comput. Appl.
**2019**, 131, 28–39. [Google Scholar] [CrossRef] - Carcillo, F.; Dal Pozzolo, A.; Le Borgne, Y.A.; Caelen, O.; Mazzer, Y.; Bontempi, G. SCARFF: A scalable framework for streaming credit card fraud detection with spark. Inf. Fusion
**2018**, 41, 182–194. [Google Scholar] [CrossRef][Green Version] - Banerjee, A.; Chakraborty, C.; Kumar, A.; Biswas, D. Chapter 5—Emerging trends in IoT and big data analytics for biomedical and health care technologies. In Handbook of Data Science Approaches for Biomedical Engineering; Balas, V.E., Solanki, V.K., Kumar, R., Khari, M., Eds.; Academic Press: Cambridge, MA, USA, 2020; pp. 121–152. [Google Scholar] [CrossRef]
- Kondylakis, H.; Bucur, A.I.D.; Crico, C.; Dong, F.; Graf, N.M.; Hoffman, S.; Koumakis, L.; Manenti, A.; Marias, K.; Mazzocco, K.; et al. Patient empowerment for cancer patients through a novel ICT infrastructure. J. Biomed. Inform.
**2020**, 101, 103342. [Google Scholar] [CrossRef] - Agathangelos, G.; Troullinou, G.; Kondylakis, H.; Stefanidis, K.; Plexousakis, D. Incremental Data Partitioning of RDF Data in SPARK; Springer: Cham, Swizterland, 2018. [Google Scholar]
- Kondylakis, H.; Plexousakis, D. Ontology Evolution in Data Integration: Query Rewriting to the Rescue. In Proceedings of the Conceptual Modeling—ER 2011, 30th International Conference, ER 2011, Brussels, Belgium, 31 October–3 November 2011; Volume 6998, Lecture Notes in Computer Science. Jeusfeld, M.A., Delcambre, L.M.L., Ling, T.W., Eds.; Springer: Berlin/Heidelberg, Germany, 2011; pp. 393–401. [Google Scholar] [CrossRef]
- Pappas, A.; Troullinou, G.; Roussakis, G.; Kondylakis, H.; Plexousakis, D. Exploring Importance Measures for Summarizing RDF/S KBs. In Proceedings of the Semantic Web—14th International Conference—ESWC 2017, Portorož, Slovenia, 28 May–1 June 2017; Part I; Volume 10249, Lecture Notes in Computer Science. pp. 387–403. [Google Scholar] [CrossRef]
- Troullinou, G.; Kondylakis, H.; Stefanidis, K.; Plexousakis, D. Exploring RDFS KBs Using Summaries. In Proceedings of the Semantic Web—ISWC 2018—17th International Semantic Web Conference, Monterey, CA, USA, 8–12 October 2018; Part I; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2018; Volume 11136, pp. 268–284. [Google Scholar] [CrossRef][Green Version]
- Bolt, C.R. Hadoop: The Definitive Guide; OReilly Media, Inc.: Sebastopol, CA, USA, 2014. [Google Scholar]
- Dean, J.; Ghemawat, S. MapReduce: Simplified data processing on large clusters. Commun. ACM
**2008**, 51, 107–113. [Google Scholar] [CrossRef] - Jakóbczyk, M.T. Practical Oracle Cloud Infrastructure; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
- Karimov, J.; Rabl, T.; Katsifodimos, A.; Samarev, R.; Heiskanen, H.; Markl, V. Benchmarking Distributed Stream Data Processing Systems. In Proceedings of the 2018 IEEE 34th International Conference on Data Engineering (ICDE), Paris, France, 16–19 April 2018; pp. 1507–1518. [Google Scholar]
- Zaharia, M.; Das, T.; Li, H.; Hunter, T.; Shenker, S.; Stoica, I. Discretized streams: Fault-tolerant streaming computation at scale. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, Farmington, PA, USA, 3–6 November 2013. [Google Scholar]
- Armbrust, M.; Das, T.; Torres, J.; Yavuz, B.; Zhu, S.; Xin, R.; Ghodsi, A.; Stoica, I.; Zaharia, M. Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark. In Proceedings of the 2018 International Conference on Management of Data, Houston, TX, USA, 10–15 June 2018. [Google Scholar]
- Iqbal, M.; Soomro, T.R. Big Data Analysis: Apache Storm Perspective. Int. J. Comput. Trends Technol.
**2015**, 19, 9–14. [Google Scholar] [CrossRef] - Carbone, P.; Katsifodimos, A.; Ewen, S.; Markl, V.; Haridi, S.; Tzoumas, K. Apache Flink™: Stream and Batch Processing in a Single Engine. IEEE Data Eng. Bull.
**2015**, 38, 28–38. [Google Scholar] - Akidau, T.; Bradshaw, R.W.; Chambers, C.; Chernyak, S.; Fernández-Moctezuma, R.; Lax, R.; McVeety, S.; Mills, D.; Perry, F.; Schmidt, E.; et al. The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing. Proc. VLDB Endow.
**2015**, 8, 1792–1803. [Google Scholar] [CrossRef] - Alami, K.; Maabout, S. A framework for multidimensional skyline queries over streaming data. Data Knowl. Eng.
**2020**, 127, 101792. [Google Scholar] [CrossRef] - Ramesh, S.; Baranawal, A.; Simmhan, Y. Granite: A distributed engine for scalable path queries over temporal property graphs. J. Parallel Distrib. Comput.
**2021**, 151. [Google Scholar] [CrossRef] - Kvet, M.; Matiasko, K. Flower Master Index for Relational Database Selection and Joining; Springer: Cham, Switzerland, 2021; pp. 181–202. [Google Scholar] [CrossRef]
- Kvet, M.; Kršák, E.; Matiaško, K. Study on Effective Temporal Data Retrieval Leveraging Complex Indexed Architecture. Appl. Sci.
**2021**, 11, 916. [Google Scholar] [CrossRef] - Dam, T.L.; Chester, S.; Nørvåg, K.; Duong, Q.H. Efficient top-k recently-frequent term querying over spatio-temporal textual streams. Inf. Syst.
**2021**, 97, 101687. [Google Scholar] [CrossRef] - Dhont, M.; Tsiporkova, E.; Boeva, V. Layered Integration Approach for Multi-View Analysis of Temporal Data; Springer: Cham, Switzerland, 2020; pp. 138–154. [Google Scholar] [CrossRef]
- Babu, S.; Widom, J. Continuous queries over data streams. SIGMOD Rec.
**2001**, 30, 109–120. [Google Scholar] [CrossRef] - Franklin, A.; Gantela, S.; Shifarraw, S.; Johnson, T.R.; Robinson, D.J.; King, B.R.; Mehta, A.M.; Maddow, C.L.; Hoot, N.R.; Nguyen, V.; et al. Dashboard visualizations: Supporting real-time throughput decision-making. J. Biomed. Inform.
**2017**, 71, 211–221. [Google Scholar] [CrossRef] - Laurent, D.; Lechtenbörger, J.; Spyratos, N.; Vossen, G. Monotonic complements for independent data warehouses. VLDB J.
**2001**, 10, 295–315. [Google Scholar] [CrossRef][Green Version] - Ahmad, Y.; Kennedy, O.; Koch, C.E.; Nikolic, M. DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views. Proc. VLDB Endow.
**2012**, 5, 968–979. [Google Scholar] [CrossRef] - Spyratos, N.; Sugibuchi, T. HIFUN—A high level functional query language for big data analytics. J. Intell. Inf. Syst.
**2018**, 51, 529–555. [Google Scholar] [CrossRef] - Spyratos, N.; Sugibuchi, T. A High Level Query Language for Big Data Analytics. Available online: http://publications.ics.forth.gr/tech-reports/2017/2017.TR467_HiFu_Query_Language_Big_Data_Analytics.pdf (accessed on 8 May 2021).
- Papadaki, M.E.; Spyratos, N.; Tzitzikas, Y. Towards Interactive Analytics over RDF Graphs. Algorithms
**2021**, 14, 34. [Google Scholar] [CrossRef] - Zervoudakis, P.; Kondylakis, H.; Plexousakis, D.; Spyratos, N. Incremental Evaluation of Continuous Analytic Queries in HIFUN. In International Workshop on Information Search, Integration, and Personalization; Springer: Cham, Switzerland, 2019; pp. 53–67. [Google Scholar]
- Garcia-Molina, H.; Ullman, J.D.; Widom, J. Database Systems—The Complete Book (International Edition); Pearson Education: London, UK, 2002. [Google Scholar]
- Le, D.; Chen, R.; Bhatotia, P.; Fetze, C.; Hilt, V.; Strufe, T. Approximate Stream Analytics in Apache Flink and Apache Spark Streaming. arXiv
**2017**, arXiv:1709.02946. [Google Scholar] - Terry, D.; Goldberg, D.; Nichols, D.; Oki, B.M. Continuous queries over append-only databases. In Proceedings of the SIGMOD ’92, San Diego, CA, USA, 3–5 June 1992. [Google Scholar]
- Chen, J.; DeWitt, D.; Tian, F.; Wang, Y. NiagaraCQ: A scalable continuous query system for Internet databases. In Proceedings of the SIGMOD ’00, Dallas, TX, USA, 16–18 May 2000. [Google Scholar]
- Liu, L.; Pu, C.; Tang, W. Continual Queries for Internet Scale Event-Driven Information Delivery. IEEE Trans. Knowl. Data Eng.
**1999**, 11, 610–628. [Google Scholar]

**Figure 8.**Evaluation of CGMRR and CGRR when the MapReduce execution model is used over an unstructured dataset.

**Figure 9.**Evaluation of CMORR and BRR when the MapReduce execution model is used over an unstructured dataset.

**Figure 10.**Evaluation of CGMRR and CGRR when the SQL execution model is used over an structured dataset.

**Figure 11.**Evaluation of rewriting rules for unstructured datasets, while the number of queries in the rewritten and non-rewritten set Q increases.

**Figure 12.**Evaluation of CGMRR and CGRR for structured datasets, while the number of queries in the rewritten and the non-rewritten set Q increases.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Zervoudakis, P.; Kondylakis, H.; Spyratos, N.; Plexousakis, D. Query Rewriting for Incremental Continuous Query Evaluation in HIFUN. *Algorithms* **2021**, *14*, 149.
https://doi.org/10.3390/a14050149

**AMA Style**

Zervoudakis P, Kondylakis H, Spyratos N, Plexousakis D. Query Rewriting for Incremental Continuous Query Evaluation in HIFUN. *Algorithms*. 2021; 14(5):149.
https://doi.org/10.3390/a14050149

**Chicago/Turabian Style**

Zervoudakis, Petros, Haridimos Kondylakis, Nicolas Spyratos, and Dimitris Plexousakis. 2021. "Query Rewriting for Incremental Continuous Query Evaluation in HIFUN" *Algorithms* 14, no. 5: 149.
https://doi.org/10.3390/a14050149