## 1. Introduction

The utilization of large data repositories is a crucial factor in improving various types of businesses. Such a kind of massive data repositories is defined as “big data”. Extracting the skyline objects is a vital task for understanding the dataset in the early stage of the knowledge discovery process from large data repositories. Skyline objects in a database are objects that are not dominated by any other object in the database and the skyline query [

1] is a function to find a set of skyline objects.

Given an

m-dimensional dataset

$DS$, an object

${O}_{i}$ is said to be in the skyline of

$DS$ if there is no other object

${O}_{j}$ $(i\ne j)$ in

$DS$ such that

${O}_{j}$ is better than

${O}_{i}$. If there exists such an

${O}_{j}$, then we say that

${O}_{i}$ is dominated by

${O}_{j}$, or

${O}_{j}$ dominates

${O}_{i}$.

Figure 1 presents an example skyline. The table in the figure contains a list of hotels, each of which has two numerical attributes: distance and price. If we assume that a smaller value is better, then the skyline query retrieves objects

$\{{O}_{2},{O}_{3},{O}_{7}\}$ as in

Figure 1b. Objects

${O}_{1}$ and

${O}_{5}$ are dominated by object

${O}_{2}$. Objects

${O}_{4}$ and

${O}_{6}$ are dominated by object

${O}_{3}$.

Top-

k dominating query [

2] is a variant of the skyline query. In the query, a scoring function

$\mu \left(O\right)$ is used for evaluating strongness of an object

$O\in DS$:

$\mu \left(O\right)$ = |{

${O}^{\prime}$∈

$DS$ |

O≺

${O}^{\prime}$} |.

The scoring function

$\mu \left(O\right)$ returns how many objects the object

O dominate in the dataset. In the above example,

$\mu \left({O}_{2}\right)=2$ because

${O}_{2}$ dominates two objects:

${O}_{1}$ and

${O}_{5}$. Similarly,

$\mu \left({O}_{3}\right)=2$. The top-

k dominating query selects

k objects based on

$\mu \left(O\right)$. For example, a top-2 dominating query for the example in

Figure 1 retrieves

${O}_{2}$ and

${O}_{3}$.

The skyband query, also known as

K-skyband query [

3], is another well known variant of the skyline query. A

K-skyband query returns a set of objects, each object of which is not dominated by

K other objects. For the dataset in

Figure 1a, the skyband query for

$K=2$ retrieves objects

$\{{O}_{1},{O}_{2},{O}_{3},{O}_{4},{O}_{7}\}$. Object

${O}_{5}$ is not in the skyband because it is dominated by two objects:

${O}_{2}$ and

${O}_{1}$. Similarly, object

${O}_{6}$ is not in the skyband result because it is also dominated by two objects:

${O}_{3}$ and

${O}_{4}$. We illustrate this procedure in

Figure 2.

As mentioned above, recent “big data” is too large to analyze intensively. Instead of analyzing raw big data, we propose using a relatively small subset, i.e., the results of skyband and dominating queries that contain the important features of the raw data. However, conventional algorithms for computing such skyline variants are not designed for parallel distributed environments. In recent years, the MapReduce framework has been applied for parallel processing of huge amounts of data on large-size clusters of commodity computers in a reliable manner. MapReduce and Hadoop, which is a popular open source variant of MapReduce, has attracted significant research attention. Our parallel algorithm utilizes the MapReduce framework. In this paper, we propose a MapReduce algorithm, i.e., a parallel algorithm that simultaneously computes skyband and dominating query.

The contributions of this paper can be summarized as follows:

We examine the skyband and dominating queries for processing “big data”.

We develop a scalable parallel algorithm to compute the queries. The proposed algorithm simultaneously computes skyband and dominating queries. Exploiting the MapReduce framework for the skyband as well as dominating queries is an innovative approach that utilizes the advantage of parallel distributed computing environment.

The main focus of the proposed algorithm is to distribute the computation process evenly among multiple computing nodes so that “big data” can be effectively processed. We empirically prove the efficiency of the proposed method through intensive experiments using a real dataset and synthetic datasets.

The organization of this paper is as follows. We survey the literature and review related works in

Section 2. We present the concepts and properties of skyband and dominating query in

Section 3. We describe analysis of proposed algorithm with detailed examples in

Section 4. Next, we evaluate our algorithm through intensive experiments in

Section 5. Finally, in

Section 6, we conclude this paper.

## 2. Related Works

Skyline query and its variants have been widely used in several multi-criteria decision support applications. Borzsonyi et al., who first introduced the skyline query, proposed three basic algorithms for the skyline computation [

1]. Those algorithms are known as block-nested-loops (BNL), divide-and-conquer (D&C), and B-tree-based schemes. Chomicki et al. proposed a sort–filter–skyline (SFS) algorithm, which improved the efficiency of the skyline computation by presorting the database attributes [

4]. To optimize the average-case running time, Godfrey et al. proposed linear elimination sort for skyline (LESS) algorithm [

5]. To filter the dominated objects efficiently through recursively partitioning the dataset based on the nearest objects, Kossmann proposed the nearest neighbor (NN) algorithm for computing the skyline [

6].

On the other hand, instead of computing skyline from the original objects’ attributes, several algorithms have proposed to use the index of the objects’ attributes for computing skyline. Tan et al. has proposed two progressive algorithms to compute skyline based on attributes’ bitmap and index [

7]. The recent state-of-the-art algorithm is the

$Branch$-

$and$-

$Bound$ $Skyline\left(BBS\right)$, proposed by Papadias et al., which is shown to be I/O optimal for computing skylines on datasets indexed by R-trees [

3]. Meanwhile, various approaches have been proposed for effective skyline querying from the high dimensional dataset. Yuan et al. proposed a

$skycube$ structure to reduce the cost of skyline computation over all possible subspaces [

8]. Later, Xia et al. revised the

$skycube$ structure and proposed CSC structure as a more promising alternative for removing identical skyline objects in the

$skycube$ by storing each skyline object only to its minimum-subspace [

9].

As a variant of the skyline query, Chan et al. introduced the concept of top-

k frequent skyline queries [

10]. They suggested that a metric, called

skyline frequency, can be used to rank and select skyline objects by their interesting-ness. Li et al. proposed a data-cube structure to speed up the query evaluation by analyzing the dominance relationship [

11]. On the other hand, Lin et al. have considered extracting k most representative skyline objects [

12]. They have introduced the concept of a representative object by the population it dominates. According to their definition, a skyline object is more representative than other skyline objects, when it dominates more objects than others. Chan et al. illustrated

k-dominant skyline based on the measure of the

k-dominance relationship. The

k-dominant skyline query can control the number of retrieved objects by changing

k. If we set a larger

k value, an object more likely to be dominated by another object. They developed specialized algorithms to compute the

k-dominant skyline [

13].

K-skyband query, which is another variant of skyline query, selects those objects which are dominated by at most (

K-1) other objects. It has been noticed that, for any increasingly monotone aggregate function, the top-

k objects belong to the k-skyband, where

$k\le K$ [

3,

14].

There exist more spontaneous techniques for skyline query formalization. Lin et al. proposed

n-of-

N skyline query to support online query on data streams, i.e., to find the skyline of the set composed from the most recent

n elements. The proposed method considers a very widely distributed dataset, which is impossible to process in a centralized fashion [

15]. Balke et al. has also investigated skyline computation over a vertically distributed database [

16]. Tao et al. examined skyline query in arbitrary subspaces [

17]. Papadias et al. studied on dynamic skyline query [

18]. Dellis et al. proposed the reverse skyline query, which selects the number of users who like the given object most based on the dominance relationship among the objects [

19].

Nowadays, the parallel computing paradigm becomes very popular for processing and analyzing “big data”. Therefore the computation of skyline and its variants are becoming challenging today. Noted that [

10,

11,

13] cannot be directly applied to evaluate top-

k dominating queries. Moreover, the computation of skyband query requires a separate algorithm. This paper proposed an efficient algorithm for computing both types of queries(top-

k dominating and skyband) over a large volume of data, such as “big data”. For such data intensive applications, the most notable platform, which has attracted a lot of attention, is MapReduce For this kind of data-intensive application, the MapReduce framework has attracted much attention as the most prominent platform [

20,

21,

22]. It facilitates the deployment of scalable parallel applications on the share-noting machines cluster for processing large dataset. Google’s MapReduce or its open source equivalent Hadoop is a powerful tool for building such applications [

23]. The MapReduce framework has also been utilized for some of the recent research works on the computation of skyline and

k-dominant skyline [

24,

25,

26].

Recently, Ezatpoor et al. [

27] exploits the MapReduce framework for computing top-

k dominance on incomplete big data. Besides, Chen et al. [

28] utilized the Spark streaming framework to process top-

k dominating query over the distributed data stream. Both [

27,

28] divided the data by using the method of the hash map to process the data through distributed computing nodes.

This paper complements the existing efforts to address the K-skyband and top-k dominating query problems by the rank of objects obtained using two intuitive scoring functions. Specifically, our algorithm can provide solutions for these two types of queries within the same framework. To the best of our knowledge, there is no such MapReduce algorithm had been proposed for the k-dominating query and the K-skyband query so far.

## 3. Preliminaries

Consider an

m-dimensional dataset

$DS$ $\{{a}_{1},{a}_{2},\cdots ,$ ${a}_{m}\}$. We assume that the dataset is distributed into

n subsets

$\{D{S}_{1},D{S}_{2},\cdots ,D{S}_{n}\}$ in different locations. Without loss of generality, we assume that the dataset contains non-negative numerical values. We also assume that smaller values are preferable in each dimension/attribute.

${O}_{i,j}$.

${a}_{p}$ denotes that the

p-th dimension’s/attribute’s value for object

${O}_{i,j}$, where

$i,j$ is an object

$ID$ which means object

j in dataset

i (

$D{S}_{i}$). Assume that the dataset

$DS$ shown in

Figure 1 is distributed into three subsets,

$D{S}_{1},D{S}_{2}$, and

$D{S}_{3}$, each of which has two attributes,

${a}_{1}$ and

${a}_{2}$, as shown in

Table 1.

**Definition** **1.** (Dominance) For two objects O and ${O}^{\prime}$, object O is said to dominate object ${O}^{\prime}$, denoted $O\prec {O}^{\prime}$, if $O.{a}_{s}\le {O}^{\prime}.{a}_{s}$ for all attributes ($s=1,\cdots ,m$) and $O.{a}_{x}<{O}^{\prime}.{a}_{x}$ for at least one attribute ($1\le x\le m$). We refer to such an O as a dominant object and such a ${O}^{\prime}$ as A dominated object. If O dominates ${O}^{\prime}$, then O is preferable to ${O}^{\prime}$.

In

Table 1, object

${O}_{1,2}$ dominates object

${O}_{1,1}$ (

${O}_{1,2}\prec {O}_{1,1}$). This is because object

${O}_{1,2}$ has a smaller value for both attributes than object

${O}_{1,1}$.

**Definition** **2.** (Skyline) An object $O\in DS$ is in a skyline of $DS$ (i.e., a skyline object in $DS$) if O is not dominated by any other object in $DS$. The skyline of $DS$, denoted $Sky\left(DS\right)$, is the set of skyline objects in $DS$. For the dataset $DS$, objects ${O}_{1,2},{O}_{2,1}$, and ${O}_{3,3}$ can dominate all other objects and are not dominated by any other object. Therefore, a skyline query on dataset $DS$ will retrieve $Sky\left(DS\right)$ = $\{{O}_{1,2},{O}_{2,1}$, ${O}_{3,3}\}$.

**Definition** **3.** (The μ score) The μ score of an object shows how many objects the object dominate in the dataset. We use $\mu \left(O\right)$ to denote the μ score of an object O. In Table 1, object ${O}_{1,2}$ dominates objects ${O}_{1,1}$ and ${O}_{3,1}$. Therefore, the μ score of ${O}_{1,2}$ is 2 (i.e., $\mu \left({O}_{1,2}\right)=2$). **Definition** **4.** (The $SB$ score). The $SB$ score of an object is the number of objects dominating that object. We use $SB\left(O\right)$ to denote the $SB$ score of an object O. In Table 1, object ${O}_{3,2}$ is dominated by objects ${O}_{2,1}$ and ${O}_{2,2}$. Therefore, the $SB$ score of ${O}_{3,2}$ is 2 (i.e., $SB\left({O}_{3,2}\right)=2$). **Definition** **5.** (Dominating query) Given a positive integer k and a dataset $DS$, the top-k dominating query returns the k objects that have the top-k μ scores in $DS$. For the dataset in Table 1, a top-two dominating query retrieves ${O}_{1,2}$ and ${O}_{2,1}$. **Definition** **6.** (Skyband query) Given a positive integer K, the K-skyband is the set of objects that are not dominated by K other objects. For the dataset $DS$ in Table 1, the skyband query for $K=2$ retrieves objects $\{{O}_{1,1},{O}_{1,2},{O}_{2,1},{O}_{2,2},and\phantom{\rule{3.33333pt}{0ex}}{O}_{3,3}\}$. Intuitively, K represents the thickness of the skyline. A 1-skyband query is the same as a conventional skyline query. **Definition** **7.** (Worst rank) For an object O, assume ${r}_{1}\left(O\right)$ and ${r}_{2}\left(O\right)$ are the rank values of attributes ${a}_{1}$ and ${a}_{2}$, respectively. For example, in Figure 1, ${r}_{1}\left({O}_{4}\right)=6$ and ${r}_{2}\left({O}_{4}\right)=2$. We refer to the largest ${r}_{s}\left(O\right)$ ($s=1,\dots ,m$) as “the worst rank of O” and ${a}_{s}$ as “the worst rank attribute of O.” In this example, the worst rank of ${O}_{4}$ was six and the worst rank attribute was ${a}_{1}$. **Definition** **8.** (Domination check set) The domination check set ($DC$) for an object O is the set of objects that have equal or greater rank than the worst rank of O for the worst rank attribute. For example, ${O}_{6}$ has a greater rank than the worst rank of ${O}_{4}$ (7th > 6th) in ${a}_{1}$ (price). Therefore, the $DC$ set of object ${O}_{4}$ is $\left\{{O}_{6}\right\}$. Similarly, the worst rank of ${O}_{1}$ is 6 in ${a}_{1}$ (distance). So, the $DC$ set of object ${O}_{1}$ is $\left\{{O}_{5}\right\}$.

From the above definitions, we have observed an important property [

28] and a lemma.

**Property** **1.** Top-k dominating queries result always comes from skyband queries result. For example, a top-two dominating query for the example in Figure 1 retrieves ${O}_{2}$ and ${O}_{3}$. Those are also belongs to two-skyband result (Figure 2). **Lemma** **1.** Dominance relation among the objects within a dataset also remains in the transformed ranked dataset. For example, in Figure 1 object ${O}_{2}$ dominates object ${O}_{1}$, since object ${O}_{2}$ has a smaller value for both attributes than object ${O}_{1}$. This dominance relation is also true according to the rank dataset. This is because object ${O}_{1}$ has a greater rank for both attributes than object ${O}_{2}$. ## 4. Skyband and Dominating Query Processing

Our MapReduce-based algorithm for skyband and dominating queries consists of the following five phases:

P1: Data map and ranking. Each distributed dataset was partitioned vertically. Then, each partition was dispatched to the map workers (mappers). Each map worker, next, generates ($Val$, $OID$) pairs, where $Val$ is the numeric value of the corresponding object in the attribute domain and $OID$ represents the object $ID$. After receiving ($Val$, $OID$) pairs as input, each “reduce” worker (reducer) produced the ($OID$, $Rank$) pairs for each object, where $Rank$ is the rank value of each object in the attribute domain.

P2: Shuffling. In this phase, each map worker outputs ($OID$, $AttrRank$) pairs, where $AttrRank$ is the attribute name with the corresponding attribute rank for each object in the attribute domain. Next, each “reduce” worker also produced ($OID$, $Ranks$) pairs for each object, where $Ranks$ is a list of attribute names and the respective rank value for each attribute.

P3: Worst rank computation. The coordinator collected all $(OID,Ranks)$ pairs to reduce data transmissions from map workers to reduce workers. After rearranging the $(OID,Ranks)$ pairs, the coordinator found the worst attribute rank for each object.

P4: DC sets computation. The coordinator sends the attributes with the worst ranks to the workers, which are responsible for $DC$ set computation for each object. Each worker takes an attribute rank, and the corresponding attribute’s worst rank as input and outputs $DC$ sets for each object.

P5: Skyband and dominating objects computation. At this stage, the map workers take $DC$ sets as inputs and perform domination checks between the $DC$ sets and corresponding objects. Finally, the reducer produced the $SB$ score and the $\mu $ score required to compute the skyband query and top-k dominating query, respectively.

#### 4.1. Data Map and Ranking

We first vertically split the dataset into m partitions, if the number of attributes in a dataset is m. Therefore, if the number of data subsets was n, then the total number of partitions was equal to $n\times m$ (e.g., $\{{s}_{1,1},\cdots ,{s}_{1,m},\cdots ,{s}_{n,1},\cdots ,{s}_{n,m}\}$). For simplicity, we denote $\{{s}_{1,1},{s}_{2,1},\cdots ,{s}_{n,1}\}$, $\{{s}_{1,2},{s}_{2,2},\cdots ,{s}_{n,2}\}$, and $\{{s}_{1,m},{s}_{2,m},\cdots ,{s}_{n,m}\}$ as $\left\{{S}_{1}\right\}$, $\left\{{S}_{2}\right\}$, ⋯, and $\left\{{S}_{m}\right\}$, respectively. In the example, $D{S}_{1}$ had two attributes ${a}_{1}$ and ${a}_{2}$. We split $D{S}_{1}$ into two partitions called ${s}_{1,1}$ and ${s}_{1,2}$. Since we had two partitions, we needed at least two map workers to complete the computation.

Figure 3 illustrates the “data map and ranking” procedure. Figure shows that objects

${O}_{1,2}$ and

${O}_{2,1}$ have rank “1" for attribute

${a}_{1}$ and

${a}_{2}$, respectively. Therefore,

${O}_{1,2}$ had the smallest

${a}_{1}$ value and

${O}_{2,1}$ has the smallest

${a}_{2}$ value.

Recall that in the Hadoop framework in which we have implemented our system, each map worker operates on a non-overlapping partition of the input file independently and the worker emits key-value pair lists in parallel according to a user-defined “map function”. In proposed algorithm, each map worker produces ($Val$, $OID$) pairs, where $Val$ is the numeric value of each object in the attribute domain and $OID$ is the corresponding object $ID$. Next, the reduce workers begin their processing job. They receive ($Val$, $OID$) pairs as inputs and produce ($OID$, $Rank$) pairs for each object, where $Rank$ is the ascending order sorted rank value of each object in the attribute domain. To calculate the rank value for each key-value pair of ${S}_{l}(l=1,\cdots ,m)$, the corresponding reduce worker sorts its attribute in ascending order. The reduce worker, then, replaces the values with their corresponding ascending rank value.

#### 4.2. Shuffling

The second MapReduce phase is invoked for skyband and top-

k dominating query computation. After generating

$(OID,Rank)$ pairs in the data map and ranking phase, map workers take those pairs as inputs and produce (

$OID$,

$AttrRank$) pairs, where

$AttrRank$ is the attribute name with the corresponding attribute rank for each object in the attribute domain. Then, each map worker dispatches the (

$OID$,

$AttrRank$) pairs to the reducers. After shuffling, reduce workers produce (

$OID$,

$Ranks$) pairs for each object, where

$Ranks$ is a list of attribute names and respective rank values for each attribute.

Figure 4 illustrates the “shuffling” procedure. In the example,

${O}_{1,1}$ has rank values of two and six for attributes

${a}_{1}$ and

${a}_{2}$, respectively. Therefore, map workers produce two key-value pairs,

$({O}_{1,1},<{a}_{1},2>)$ and

$({O}_{1,1},<{a}_{2},6>)$, for

${O}_{1,1}$. After shuffling those two pairs, the reduce worker generates an

$(OID,Ranks)$ pair for object

${O}_{1,1}$ as a key-value pair, which is (

${O}_{1,1},<<{a}_{1},2>,<{a}_{2},6>>$). Each reduce worker dispatches

$(OID,Ranks)$ pairs to the coordinator.

#### 4.3. Worst Rank Computation

The coordinator computes the worst rank and corresponding worst rank attribute for each object.

Figure 5 illustrates the “worst rank computation” by the coordinator. In the example,

${O}_{1,1}$ has rank values of two and six for attributes

${a}_{1}$ and

${a}_{2}$, respectively. In the object,

${a}_{2}$’s rank is the worst among all attribute ranks. Therefore, the worst rank of

${O}_{1,1}$ is six and the corresponding worst rank attribute of

${O}_{1,1}$ is

${a}_{2}$. Therefore, the coordinator generates (

${O}_{1,1},<{a}_{2},6>$) as a key-value pair for

${O}_{1,1}$.

#### 4.4. DC Sets Computation

The coordinator distributes the output pairs to the workers according to the worst rank attribute.

Figure 6 presents the “

$DC$ sets computation” procedure. As shown in the figure, pairs of

${O}_{2,1}$,

${O}_{2,2}$,

${O}_{3,2}$, and

${O}_{3,3}$ are distributed to worker

$DC$ $Com{p}_{1}$ for

${a}_{1}$. Similarly, pairs of

${O}_{1,1}$,

${O}_{1,2}$, and

${O}_{3,1}$ are distributed to worker

$DC$ $Com{p}_{2}$ for

${a}_{2}$.

Each worker outputs

$DC$ sets for each object. In

Figure 6, because object

${O}_{2,1}$ has the worst rank of 5 in

${a}_{1}$,

$DC$ $Com{p}_{1}$ outputs

$\{{O}_{2,2},{O}_{3,2}\}$ as the

$DC$ set for

${O}_{2,1}$. It should be noted that

${O}_{2,2}$ and

${O}_{3,2}$ have a greater rank than that of

${O}_{2,1}$ in

${a}_{1}$. Object

${O}_{1,1}$ has the worst rank of 6 in

${a}_{2}$ and

$DC$ $Com{p}_{2}$ outputs the

$DC$ set member of

${O}_{1,1}$ as

$\left\{{O}_{3,1}\right\}$. We calculate the

$DC$ sets for other objects similarly.

#### 4.5. Skyband and Dominating Objects Computation

Each map worker takes the previous $DC$ sets as inputs and performs domination checks between corresponding objects and $DC$ sets to produce $(\mu /SB,Score)$ pairs, where the $\mu $ score is the number of objects dominated by an object. In contrast, the $SB$ score of an object is the number of dominant objects. However, to compute either a $\mu $ score or $SB$ score, our method does not need to perform a domination check with any objects outside the $DC$ sets. This advantage stems from the following theorem.

**Theorem** **1.** For two objects $\{O,{O}^{\prime}\in DS\}$, if ${O}^{\prime}$ is not in the $DC$ set of object O, then O cannot dominate object ${O}^{\prime}$ (i.e., $O\nprec {O}^{\prime}$).

**Proof.** Let ${a}_{s}$ be the worst rank attribute of O. If O dominates ${O}^{\prime}$, ${O}^{\prime}$ must be in the $DC$ set of O because $O.{a}_{s}\le {O}^{\prime}.{a}_{s}$. If ${O}^{\prime}$ is not in the $DC$ set of O, it means $O.{a}_{s}>{O}^{\prime}.{a}_{s}$. Therefore, O cannot dominate ${O}^{\prime}$. □

Theorem 1 demonstrates that it is sufficient to perform a domination check between an object O and the corresponding $DC$ set to determine whether or not O is in the query results. To analyze this result, recall object ${O}_{2,1}$ and its corresponding $DC$ set ${O}_{2,2},{O}_{3,2}$. Because ${O}_{2,1}$ has the worst rank of 5 for attribute ${a}_{1}$, this means that this object has no possibility to dominate a higher rank object, such as ${O}_{2,1},{O}_{1,1},{O}_{3,1}$, or ${O}_{3,3}$. This means we need to perform a domination check only between ${O}_{2,1}$ and its $DC$ set $\{{O}_{2,2},{O}_{3,2}\}$.

Figure 7 presents the skyband and dominating objects computation process. To compute

K-skyband, if we set

$K=2$, the query returns

$\{{O}_{1,1}$,

${O}_{1,2}$,

${O}_{2,1}$,

${O}_{2,2}$,

${O}_{3,3}\}$ as the two-skyband objects set, since the

$SB$ score values for these objects is less than 2. Then, our method outputs objects

${O}_{1,2}$ and

${O}_{2,1}$ as the top-2 dominating result because both objects have the highest

$\mu $ score value.

After performing a domination check, each map worker produces two types of keys: the $\mu $ score for the corresponding object and the $SB$ score for all objects. If an object dominates another object in the $DC$ set, the $\mu $ score value of the corresponding object and $SB$ score value of the dominated objects are incremented by 1. Next, all of the $\mu $ scores and $SB$ scores are sent to the reduce workers. After applying a “group-by” operation, the reduce workers can provide K-skyband and top-k dominating query results.