Researching Why-Not Questions in Skyline Query Based on Orthogonal Range

This paper aims to answer “why-not” questions in skyline queries based on the orthogonal query range (i.e., ORSQ). These queries retrieve skyline points within a rectangular query range, which improves query efficiency. Answering why-not questions in ORSQ can help users analyze query results and make decisions. We discuss the causes of why-not questions in ORSQ. Then, we outline how to modify the why-not point and the orthogonal query range so that the why-not point is included in the result of the skyline query based on the orthogonal range. When the why-not point is in the orthogonal range, we show how to modify the why-not point and narrow the orthogonal range. We also present how to expand the orthogonal range when the why-not point is not in the orthogonal range. We effectively combine query refinement and data modification techniques to produce meaningful answers. The experimental results demonstrate that the proposed algorithms have high-quality explanations for why-not questions in ORSQ in the real and synthetic datasets.


Introduction
In the past ten years, big data has received widespread attention. However, the data themselves have no value. After collection, storage [1], processing [2] and analysis, they generate value. For example, in intelligent communication systems, users use wireless sensors [3][4][5][6][7] and mobile devices to collect data, and then analyze the data to provide decision-making basis for programs. The wireless sensor is a type of wireless data communication collector which integrates data acquisition, data management, data communication and, other functions. In this paper, we will research a new problem, which is related to data analysis.
With the development of information technology, the performance of the database has been continuously improved. Many issues, such as privacy protection [8] and fault detection [9], have made great progress. However, the current database is still imperfect, and availability is one of the key points to refine the database. In the research of improving database usability, the "Why-Not" questions [10] have received more and more attention. In ordinary queries, users do not know the specific execution process of the query. When users find that the query results do not have the information they want, they often feel confused or even frustrated. The why-not question can explain to users why the expected results are lost, and help users solve the problem.
To introduce skyline queries based on the orthogonal query range (i.e., ORSQ), we firstly introduce the orthogonal range [11]. Queries based on the orthogonal range are retrieved within the range R, where R = R 1 ∩ . . . ∩ R n , R i is continuous and i ∈ {1, 2, . . . , n}. Compared with unrestricted range queries, these queries greatly narrow the query range and improve query efficiency. To date, orthogonal range queries have been extensively and intensively studied in the fields of computational geometry and databases. Next, we introduce the skyline query [12], which aims to find a collection of data points that are not dominated by any other points. It is often used for multi-objective decision making. In short, ORSQ finds data that users may be interested in within a given orthogonal range. Given a dataset of objects O and an orthogonal range R, the ORSQ retrieves objects within R that are not dominated by other objects. Point o 1 ∈ O dominates point o 2 ∈ O, if and only if the coordinate value of o 1 in any axis is less than or equal to the coordinate value of the corresponding axis of o 2 , and cannot all be equal to.
Suppose a newly-wed couple is going to Nassau for their honeymoon. They already have the expected hotel, but they still want to know which hotels (https://www.booking.com) are cheap near the beach. In Figure 1, we execute a skyline query on all Nassau hotels and get skyline points: {sp 1 , sp 2 , sp 3 }. Although sp 1 and sp 2 are cheap, they are youth hostels but not romantic at all for lovers. Moreover sp 3 is too close to the beach. To solve this problem, they can set the values of attributes which can be accepted. Let the price range be 100 $-300 $ and the distance range from the beach be 10 m-500 m. The new skyline points based on the data in the orthogonal range R (i.e., the shaded area) can be found as {sp 1 , sp 2 , sp 3 }. This means that they may be more interested in hotels {sp 1 , sp 2 , sp 3 }. They only need to choose one of them, which greatly saves screening time. Consider the expected hotel is h 2 in Figure 2a. Why is h 2 not in the query results? This is the problem to be solved in this paper, that is, why-not questions in ORSQ. There are two main objectives of this paper. On the one hand, it is necessary to find out why the expected tuple does not appear in the result of ORSQ. On the other hand, we need to answer how to include the tuple in the result of ORSQ. Hereinafter, Figure 2 will be used as examples. For a more concise and clear explanation, we extract some data from Figure 1 as explanatory data (i.e., Figure 2b), and the corresponding ORSQ is shown in Figure 2a.
In summary, this paper aims to answer why-not questions in ORSQ. First of all, we answer why there is a "why-not" question. Secondly, we illustrate strategies of modifying the why-not point and the orthogonal range by analyzing the causes, so that the query results include the why-not point. For these strategies, we propose cost formulas. It is understood that this is the first attempt to research the why-not questions in skyline queries based on the orthogonal range. The main contributions of this paper are summarized as follows: • Provide the meaning and semantics of the why-not questions in ORSQ.

•
Propose strategies for modifying the why-not point and the orthogonal range according to the cause of the problem. • Present how to modify the why-not point and the orthogonal range so that the why-not point is included in results.

•
Prove algorithms with experiments. This is the organizational structure of the paper. In the second section, we review the related work. In the third section, we describe the preliminary knowledge. According to the location of the why-not point, the fourth section describes how to solve the why-not questions in ORSQ by modifying the why-not point or the orthogonal range. In the fifth section, the experimental results are presented in terms of effectiveness and performance. In the sixth section, we summarize the paper.
(a) This is an example of ORSQ. Data from (b).
(b) Data for some hotels in Nassau (Bahamas).

Figure 2.
An example of ORSQ based on Nassau hotel real data set.

Related Work
Range-based preference query. Wang et al. [13] first tried to solve the problem of dynamic skyline calculations considering range queries. To solve it, they proposed an effective algorithm based on the grid index and a novel variant of the well-known Z-order curve. Kalavagattu et al. [14] considered the problem of the dominating point set in two-dimension. Given an orthogonal query rectangle, the dominant point set is found in it. Rahul and Janardan [15] researched algorithms for range-skyline queries. Lin and Xu et al. [16] researched range-based skyline queries in mobile environments, and proposed two algorithms: index-based and not based on any index. Jiang et al. [17] and Fu et al. [18] researched continuous range-based skyline queries problem in road networks. Li et al. [19] researched why-not questions of Top-k queries on the orthogonal region. They adjusted the initial query by automatically updating the query so that the result of the new query contains why-not points with minimum cost.
Why-Not questions. Researchers have mainly proposed five explanations to answer why-not questions. First of all, operation positioning [20][21][22] refers to finding out the operation that causes the expected result to be lost. Secondly, data modification [23][24][25][26] refers to inserting new data or modifying existing data to make the missing tuple into query results. Thirdly, query refinement [27] refers to updating the query so that the new query results contain the missing tuple. Next, the key to the ontology-based approach [28] is to get a most-general explanation for why-not questions with the ontology provided by the user or automatically generated from data and patterns. Finally, the hybrid explanation [29,30] encompasses the variety of previously defined types of explanations to explain a larger set of missing-answers. Although operation positioning and ontology-based approach explain the reason for the expected tuple loss, they cannot help users solve the problem of expected tuple loss.
Why-Not questions in variant skyline queries. Islam and Zhou et al. [31] answered why-not questions in reverse skyline queries. They used data modification and query refinement strategies to propose solutions: (1) modify the data separately, (2) modify the query separately and (3) the integration of the above solutions. Miao et al. [32] made the greatest contribution to this paper. They researched "why-not" questions in range-based skyline queries in road networks. To deal with it, they proposed three strategies: (1) modifying the query range, (2) modifying the attributes of the why-not point and (3) modifying both of them. It is worth noting that their range refers to the range of distance. The orthogonal range in this paper is based on orthogonal regions proposed by Li et al. [19].
As far as we know, there is not much research to solve the why-not question in skyline queries. Next, we will formally define and formalize "why-not" questions in ORSQ, and then propose solutions based on data modification and query refinement technologies.

Preliminaries
Equivalently, the orthogonal range refers to the intersection of ranges in each dimension which are continuous and non-segmented. Take Figure 2a for example. Let d = 2, the attributes are the hotel price and the distance from the hotel to the beach. The range of price is R 1 (100 $ ≤ R 1 ≤ 300 $), and the range of distance from hotel to beach is R 2 (100 m≤ R 2 ≤ 5000 m). The shaded part is the orthogonal range R, where R = R 1 ∩ R 2 and the points in R are {h 1 , h 2 , h 3 , h 5 , h 6 , h 7 }. When executing a query, we only need to search in R. This is very effective in improving query efficiency and can provide users with query results that better meet their needs. In a nutshell, skyline queries focus on the definition of dominance. Take Figure 1 for example, where we execute a skyline query on all hotels and get SP = {sp 1 , sp 2 , sp 3 }. Point sp 2 ∈ SP because sp 2 and sp 3 dominate sp 2 . Lemma 1. Let sp i ∈ SP, i ∈ {1, · · · , n}. If n ≥ 2, SP are distributed in a stepwise manner.
Proof. Prove it in a two-dimensional space Firstly. Assume that in D 1 , they are arranged in the order of sp 1 1 ≤ sp 1 2 ≤ · · · ≤ sp 1 n . Suppose they have coincident points, namely, ∃i, j ∈ {1, · · · , n} and i = j : sp 1 i = sp 1 j . Then, unless sp 2 i = sp 2 j , there will always be points that are dominated. However, this is inconsistent with the given condition that they are all skyline points. Next, let us assume that in D 1 ∀i, j ∈ {1, · · · , n} and i < j : sp 1 i < sp 1 j . Since they are not dominated by any points, there must be sp 2 i > sp 2 j . That is, sp 1 , sp 2 , · · · , sp n are arranged in a ladder. In high dimensional space, the same can be proved. Definition 3 (Skyline query based on orthogonal range). Given a dataset of objects O and the orthogonal range R ⊆ O. In ORSQ, users execute SQ in R to find skyline points (i.e., SP(R)) that they might be interested in.
In Figure 1, the skyline query without a limited query range gets SP = {sp 1 , sp 2 , sp 3 }. These hotels have their shortcomings and do not meet the needs of all users. The traditional skyline query blindly chooses the best result in the whole range, but it is not always helpful to users. Therefore, in Figure 2a, we specify the orthogonal query range R, and get SP(R) = {h 1 , h 5 , h 6 }. This helps users find information they are more interested in by considering their actual consumption level and preferences.
Definition 4 (Skyline Dominance Region). Points in SDR are not dominated by any skyline points.
Take Figure 2a for instance. The slant shaded region is the dominant region of the skyline query based on R. Points in SDR cannot be dominated by any points in the query range. It is worth noting that the boundary of SDR is related to skyline points. Lemma 2. SDR boundary is determined by the query range and straight lines passing through skyline points coordinate and parallel to the coordinate axis.
Proof. Let SP = {sp 1 , . . . , sp n } and i ∈ {1, . . . , n}. Each line is equivalent to treating skyline points as the origin respectively and dividing the coordinate axis again. According to Definition 2, we can easily get that points o ∈ O in the upper right corner of the coordinate chart are always dominated by certain skyline points. Equivalently, ∃o ∈ O and ∃j ∈ {1, 2, . . . , d} : sp j i < o j . Therefore, we have to exclude the upper right corner of each skyline point. In addition, the boundary of SDR also needs to consider the boundary of the query range.  For the second and third problems, we have to propose different solutions for different reasons. When (1) w ∈ R, we have three solutions: 1) modify the why-not point w, 2) narrow the orthogonal range R and 3) the integration of the above solutions. When (2) w ∈ R, it is necessary to expand R to R so that w ∈ R . We then test whether e is included in SP(R ). If e ∈ SP(R ) then the problem is resolved, otherwise the problem is returned to (1). The following is a brief explanation of the application and implementation steps of these solutions.
Modify w. The main idea is to modify w to w so that w ∈ SP(R). It involves Definition 4. This solution preserves the initial skyline points and provides users with information similar to w.
Modify R. The main idea is to modify R to R so that w ∈ SP(R ). It includes two strategies: expanding R and narrowing R. When w ∈ R, we expand R. When w ∈ R, there must be sp i dominates w, where sp i ∈ SP(R) and i ∈ {1, . . . , n}. The fact cannot be changed even if R is expanded. So we narrow R and exclude the skyline points that dominate w. Last but not least, narrowing the range will lose the original skyline points, although it is also possible to get new skyline points different from w. In this solution, extracting orthogonal range data is the key. The calculation steps are as follows: (1) Input file name of original data originalData, file name of extracted data generateData, the orthogonal range R and the flag f g. Parameter f g = True when extracting orthogonal range data during an initial query or expanding the orthogonal range; f g = False when narrowing the orthogonal range. (4) Traversing the data in originalData, and writing the data into generateData if there is data matching the orthogonal range. (5) Close originalData, generateData and return result.
Modify both of them. The above solutions have their advantages and disadvantages. If the difference between w and w is large or the narrowed range is too small, this has no practical significance. To this end, we can combine these solutions to provide a compromise solution.
In this paper, we focus on the second and third aspects. The first problem is easy to calculate. Focusing on the second aspect, we can propose solutions based on the distribution of w. The steps to solve why-not questions in ORSQ are as follows. Firstly, we judge whether w is included in R. If w ∈ R, we select a solution among modifying the why-not point (i.e., MWP), narrowing the orthogonal range (i.e., MRN) and the integration of the above solutions (i.e., MWR). If not, we expand the orthogonal range (i.e., MRE). If the problem still exists, we choose one of the above three solutions. However, an algorithm may produce a variety of answers, such as the MWR algorithm. Considering the third question, we can find the answer with the minimum cost. The pseudo-code of the total algorithm of the why-not questions in ORSQ is shown in Algorithm 1. //Execute a skyline query in the new range to judge whether the why-not question is solved. 8: if f lag2 then 9: //If the problem is not solved, we select a strategy from MWP, MRN and MRW.

Solutions to Why-Not Questions in ORSQ
In this section, we will present how to modify the why-not point w and the orthogonal range R according to whether w is within R, and include the why-not point in the new query results with minimum cost. We discuss solutions in three cases: (1) d = 2 and w ∈ R; (2) d = 2 and w ∈ R; (3) d > 2 and w ∈ R. For more specific explanations, we use hotels as objects in the examples. 4.1. Case 1: d = 2 and w ∈ R When w ∈ R but w ∈ SP(R), we have three algorithms to make the new query results contain the why-not point, namely modifying the why-not point (i.e., MWP algorithm), narrowing the orthogonal range (i.e., MRN algorithm), and the integration of the above algorithms (i.e., MWR algorithm).

Modifying the Why-Not Point
Definition 6 (MWP). Given a dataset of hotels H, the orthogonal range R ⊆ H and the expected hotel e ∈ R. When e ∈ SP(R)(i.e., e = w), modify the why-not point w to w , so that w ∈ SP(R). And the cost in Formula (1) should be as small as possible.
The Euclidean distance between them is calculated as Formula (2).
In Formula (1), point o ∈ R is a point whose coordinates correspond to the minimum values of R in each dimension, and dist(w, o) represents the Euclidean distance between w and o. Similarly, dist(w , o) represents the Euclidean distance between w and o. The cost of the MWP algorithm reflects the difference before and after the modification of w.
As shown in Figure 3, point e = h 3 is the why-not point w because points {h 1 , h 5 } dominates h 3 . To modify w to w ∈ SP(R), we need to find the moving region. In this moving region, the why-not point can dominate all points within R and cannot be dominated by any other points. However, considering the practical significance, when w is modified to w ∈ SDR, users can only get SP(R) = {w }. Therefore, we need to find a critical condition that will not lose the original skyline points and will also provide points that users may be interested in. That is, we need to find points in the boundary of SDR that has the shortest Euclidean distance from w to w . These points are generated in a candidate set C, which includes mapping points C mp from w to skyline points and turning points C tp between skyline points, where C = C mp ∪ C tp . In Figure 3, e = w and C mp = {B, D}, C tp = {A, C}. Obviously, in wBh 1 , dist(w, B) < dist(w, h 1 ) because Bh 1 w < wBh 1 . Similarly, we get these facts that dist(w, C) < dist(w, h 1 ), dist(w, A) < dist(w, h 6 ) and dist(w, D) < dist(w, h 5 ). We can get the fact: ∀i ∈ {1, . . . , n} and ∀j ∈ {1, . . . , k}, dist(w, bp i ) ≥ dist(w, c j ) where c j ∈ C, bp i ∈ BP and BP is a point set on the SDR boundary.  In a few words, the MWP algorithm is to find a candidate set. We calculate turning points C tp between skyline points firstly. Let d = 2, the number of points in SP(R) is n and SP(R) arranged in ascending order according to D 1 . The point of C tp consists of the abscissa of the next skyline point and the ordinate of the current skyline point. As shown in Formula (3).
Next, we calculate C mp . The coordinates of C mp are related to the arrangement order of SP(R) and w on each coordinate. Firstly, we merge SP(R) and w into a list skylineW, and then sort the list by D 1 and D 2 respectively to obtain ox and oy. Later, we find out the order wpxo of the abscissa of w in ox, wpyo of the ordinate of w in oy. The general formula of C mp is given below: Finally, we find the point with the shortest Euclidean distance from w in C. We define the cost of the MWP algorithm in Formula (1). The closer the point is to point o, the smaller the cost is.
But compared to the cost, users prefer the change of w as small as possible. Algorithm 2 gives the pseudo-code for all of the above calculation steps.

Algorithm 2 MWP(w, SP(R))
Input: w: the coordinates of the why-not point; SP(R): Results obtained after executing SQ(R) Output: w : the modified why-not point 1: orderSPX ← sorted(SP(R), sp 1 ), 2: length ← len(orderSPX) 3: //Calculate C tp 4: for i in range(length) do 5: if i+1 < length then 6:  Complexity analysis: The complexity of MWP is mainly determined by calculating the candidate set C and sorting C by Euclidean distance. In addition, the calculation of the Euclidean distance in steps 6, 18, 21 can be considered to be completed in constant time. Moreover, the complexity of calculating C tp in steps 4-9 is O(N), and the complexity of calculating C mp in steps 11-23 is mainly the complexity of Python's built-in list sorting (i.e., O(N * log 2 N)) and binary search (O(log 2 N)). Similarly, the complexity of sorting C in step 24 is also (O(N * log 2 N)). Therefore, the overall complexity of MWP is O(N * log 2 N).

Narrowing the Orthogonal Range
Definition 7 (MRN). Given a dataset of hotels H, the orthogonal range R ⊆ H and the expected hotel e ∈ R. When e ∈ SP(R)(i.e., e = w), narrow R to R , so that the why-not point w ∈ SP(R ). Moreover, the cost in Formula (5) should be as small as possible.
Parameter S represents the space of R. If d = 2, R is a rectangle and S(R) represents the area of R. If d = 3, R is a cuboid and S(R) represents the volume of R. Other dimensions are analogous. The cost of MRN reflects the difference between R and R .
In Figure 4, point e = h 3 and e is the why-not point. To narrow R to R and w ∈ SP(R ), we must exclude some skyline points SP(R) part = {h 1 , h 5 } that dominate w. The steps for calculating SP(R) part ⊆ SP(R) are as follows: (1)  The narrowed range is determined by the coordinates of SP(R) and the boundary of R. The boundary of the narrowed range is calculated as follows: (1) Calculate SP(R) part that dominate w. Point sp i ∈ SP(R) part is expressed as sp i = {sp 1 , sp 2 , . . . , sp d }, i ∈ {1, 2, . . . , k}.
(2) Calculate R as shown in Formula (6), the narrowed range R is determined by the coordinates of SP(R) part and the boundary of R. (3) After narrowing R for the first time, we execute SQ(R ) to judge whether the why-not question exists. If it still exists, repeat the above steps until the problem is solved. (4)Next, the corresponding narrowed range is calculated at the next point in SP(R) part . (5)Finally, we will get some narrowed ranges. The final result is the result with minimum cost. Algorithm 3 gives the pseudo-code for all of the above calculation steps.
Example: Take Figure 4 for example. The shaded area is the original orthogonal range R. Let the expected point e = h 3 , and get SP(R) = {h 1 , h 5 , h 6 }. According to the method of calculating SP(R) part , SP(R) part = {h 1 , h 5 } (k = 2). For h 1 , R is first narrowed to R 1 and SP(R 1 ) part = {h 2 }. However, the why-not point w is dominated by h 2 and the problem still exists. Based on R 1 , we further narrow down the range to R 1 and SP(R 1 ) part = ∅, the problem is solved. Similarly, for h 5 , R is first narrowed to R 2 and SP(R 2 ) part = {h 2 }. Then, we further narrow down the range to R 2 based on R 2 , which R 2 = R 1 . Finally, we can get the narrowed range R 1 . if ox 2 ≤ wpy then 10: nxmi ← ox 1 , nymi ← ox 2

11:
R ← nxmi, xmax, nymi, ymax 12: 13: if not f lag then 14: //If f lag is False, the why-not question still exists. As we have already analyzed, we will not lose the initial skyline points if we apply the MWP algorithm. If the difference between w and w is too large, it is meaningless. Moreover, if we narrow the range R, we will lose the existing skyline points, although we may get new skyline points. If the narrowed range is too small, we also have no choice, and this is not what we want to see. To solve the above two problems, a hybrid method of modifying w and narrowing R is proposed. This approach can neutralize two problems to expect a compromise result. We hope to narrow R to R to get points that we might be interested in that are closer to w. If necessary, we modify w. We formally define the MWR algorithm as follows: . Given a dataset of hotels H, the orthogonal range R ⊆ H and the expected hotel e ∈ R. When e ∈ SP(R)(i.e., e = w), narrow R to R , and if necessary, modify the why-not point w to w , so that w ∈ SP(R ). And the cost in Formula (7) should be as small as possible.
When there is only one type of result: (1) Only w is modified, the final result is the scheme with the smallest Euclidean distance. (2) Only R is modified, the final result is the scheme with the minimum cost. (3) Both w and R are modified, results of the shortest Euclidean distance in the same range are first selected, and the one with the minimum cost of these results is the final result. When the result contains multiple types of results, that is, it includes condition(a): only narrowing R, and condition(b): narrowing R and modifying w. We obtained two final solutions according to (2) and (3), respectively.
To avoid meaningless narrowing, we need to determine the limit of the narrowed range. Here, we stipulate that only one skyline point dominates w is the limit. The main steps of the MWR algorithm are as follows: (1) Calculate count, which is the number of points in SP(R) part . (2) If count = 1, we execute the MWP algorithm. (3) If count = 1, after narrowing the orthogonal range once, we execute SQ(R ) to determine whether the problem exists. (4) If the problem persists, return to (1). (5) If not, return parameters of the current range.
Example: Take Figure 5 for illustration. The shaded area is the original orthogonal range R. In Figure 5a, let the expected point e = h 7 , and get SP(R) = {h 1 , h 5 , h 6 }. Point e = h 7 is the why-not point w, because h 6 ∈ SP(R) dominates w. Through the MWR algorithm, we first calculate the number of points in SP(R) part and count = 1. In this case, directly modify w. In Figure 5b, let e = h 3 . Because points {h 1 , h 5 } dominate e, count = 2. We first narrow down the range to R to get SP(R ) = {h 2 }. Then, recalculate count = 1. At this time, directly modify w in R .
Complexity analysis: The complexity of MWR consists mainly of (1) calculating the number of elements of SP(R) part , (2) MWP (i.e., O(N * log 2 N)) and (3) narrowing the orthogonal range until it reaches the limit. According to the calculation steps mentioned above, the complexity of (1) is O(N * log 2 N). The complexity of (3)

Case 2: d = 2 and w ∈ R
In Section 4.1, we discussed how to solve why-not questions in ORSQ when w is in R. Next, we will discuss another case: w is not in R. Considering that the distribution of w is random, the two-dimensional space is divided as shown in Figure 6, where R is the orthogonal query range (shaded region). When w ∈ R, then it may be distributed in A1-A4, B1-B2, or C1-C2.
Obviously, when e ∈ R, e ∈ SP(R). This means that e is the why-not point w. Moreover, the new query results cannot contain w simply by modifying the why-not point and narrowing the orthogonal range. Because SQ(R) retrieves data in R. Therefore, we need to expand the orthogonal range so that w is included in the new orthogonal range. We formally define the MRE algorithm as follows.
Definition 9 (MRE). Given a dataset of hotels H, the orthogonal range R ⊆ H and the expected hotel e ∈ R. When e ∈ SP(R)(i.e., e = w), expand R to R so that w ∈ R . The problem is then converted to Case 1. Moreover, the cost in Formula (8) should be as small as possible.
The cost of the MRE algorithm is discussed separately. When w ∈ SP(R ), the cost is the same as Formula (5). When w ∈ SP(R ), the cost depends on the solution chosen.
The calculation steps of the MRE algorithm are as follows: (1) Expand R to R . Obviously, the boundary of R are related to the boundary of R (xmin, xmax, ymin, ymax) and the coordinates of w (wpx, wpy). The minimum value on the x-axis of R is the smaller one between wpx and xmin. The maximum value on the x-axis of R is the larger one between wpx and xmax. Other dimensions and so on. (2) Execute SQ(R ) to determine if the why-not question exist. (3) If it does not exist, the problem is solved. If it still exists, the problem is turned to Case 1 (w ∈ R ).
Example: In Figure 7, the orthogonal range R = {h 1 , h 2 , h 3 , h 5 , h 6 , h 7 },SP(R) = {h 1 , h 5 , h 6 }. In Figure 7a, let point e = h 4 and e is the why-not point w. According to the MRE algorithm, we expand R to R so that w ∈ R . Next, we calculate SP(R ) and get the conclusion that w ∈ SP(R ). The why-not question has been solved. In Figure 7b, let point e = h 8 , then e = w. Repeat the above steps and we find that w ∈ SP(R ). It is no longer possible to solve the problem by expanding the orthogonal range. The problem is changed to Case 1: the why-not point is in the query range. Complexity analysis: The complexity of MRE is primarily related to expanding the orthogonal range (i.e., O(1)), skyline queries within the new range (i.e., O(|SQ(R)|)), and the strategy we choose in Case 1.

Case 3: d > 2 and w ∈ R
In the multidimensional case, the solution is similar to that in the two-dimensional case. If w ∈ R, we need to expand R so that w is included in the new orthogonal range. Without a doubt, the change in R should be as small as possible. Take Figure 8 as an example. The attributes are the hotel price, the distance from the hotel to the beach and the hotel rating. The range of price is R 1 (100 $ ≤ R 1 ≤ 300 $), the range of distance is R 2 (100 m ≤ R 2 ≤ 500 m) and the range of rating is R 3 (7 ≤ R 3 ≤ 10). The orthogonal range R = R 1 ∩ R 2 ∩ R 3 and the points in R are {h 1 , h 2 , h 3 , h 5 }. If the expected hotel is h 8 , we only need to expand R 1 to 100 $ ≤ R 1 ≤ 350 $.
If the why-not point w is in the orthogonal range, we have three solutions: Modify the why-not point. Firstly, we execute the skyline query in the new orthogonal range to find the boundary of the SDR region. Then, based on the boundary of the SDR, we find the range in which the why-not point can move, so that the modified why-not point is not dominated by any point, and the cost is as small as possible.
Narrow the orthogonal range. In order that the why-not point is not dominated by other points, we can narrow the orthogonal range so that the new orthogonal range contains the why-not point and does not contain the point that dominates w. When the result of the skyline query in the new orthogonal query range contains the why-not point, we stop narrowing the orthogonal range.
Narrow the orthogonal range and modify the why-not point if necessary. If the modified why-not point or the narrowed orthogonal range is too far from the expected effect, it has no good reference value and little significance to solve the problem. To this end, we can amend both at the same time in the hope of reaching a compromise solution.

Experimental Results and Discussion
We use Nassau's real hotel dataset, namely a data set collected from Booking.com, and two types of synthetic data, namely anti-correlation data and independent data, of three different sizes (10K, 50K and 100K tuples) to evaluate strategies for why-not questions in ORSQ. According to the method proposed by Borzsony et al. [33], we generate synthetic data sets. The real hotel dataset has properties related to price, distance to the beach, rating, etc. In the experiments, we considered in a two-dimensional space, two numerical properties, namely price, and distance to the beach. It is important to answer the "why-not" questions in real datasets. Take hotels as an example, travelers can find hotels more quickly according to their consumption levels and preferences and can get hotels closer to the why-not point, which is more in line with their expectations.
All experiments were performed on a 3.9 GHz central processor and 8.0 GB of main memory. We use the BBS algorithm to calculate skyline points. The page_size of BTree on the disk is 2048 bytes, the pointer_size is 4, and the key_size is 8. All algorithms proposed in this paper are implemented by Python. The experimental results will be shown below from the effectiveness and performance of the algorithms.

Effectiveness
In this section, we use Nassau's hotel dataset to demonstrate the effectiveness of the algorithms. More specifically, the four different strategies proposed in this paper are: (1)  To test the effectiveness of algorithms, we select a data point as the why-not point w in the remaining non-Skyline points according to the region division of Figure 6.
When w ∈ R and w = h 2 , the running process and results of MWP, MRN, and MWR algorithms are shown in Table 1. In the MWP algorithm, we find the coordinates of the candidate set C, and then calculate the distance between w and C. The point with the smallest distance is w . In the MRN algorithm, we first find SP(R) part , which dominate w. Then, we gradually narrow the orthogonal range for each point of SP(R) part until the problem was solved. In the MWR algorithm, we calculate the number of elements count in SP(R) part . If count = 1, we directly modify the why-not point. If count > 1, R is gradually narrowed until count ≤ 1. When w ∈ R, we need to execute the MRE algorithm first and then combine other algorithms. Because the MWR algorithm combines the MWP algorithm and the MRN algorithm, we use the combination algorithm of the MRE algorithm and the MWR algorithm to prove the effectiveness.
The main steps are: (1) Expand R to R , and then calculate whether w ∈ SP(R ). (2) If so, the why-not question is solved. If not, we adopt the MWR algorithm. The running process and results of w = h 4 and w = h 8 are shown in Table 2.

Performance
In this section, we present the performance of algorithms using two types of synthetic datasets of three different sizes (10k, 50k and 100k). We use the running time and cost as the performance evaluation criteria. Let R is the original orthogonal range, SP(R) ∈ R represents the query results of SQ(R), and SP(R) part ∈ SP(R) represents the point set that dominates the why-not point w. Among them, the number of elements in SP(R) is n, and the number of elements in SP(R) part is k (n ≥ k). Next, we show the performance of algorithms in two cases based on the distribution of w.

Case 1: w ∈ R
In this case, we compare the performance of MWP, MRN and MWR algorithms. For different types and sizes of data sets, we randomly select a point as the why-not point according to the size of k respectively.
The cost of anti-correlation data and independent data are respectively shown in Tables 3 and 4. Let point o ∈ R be a point whose coordinates correspond to the minimum values of R in each dimension, and dist(w, o) represents the Euclidean distance between w and o. The cost is reserved to nine decimal places. In particular, in the MWR algorithm, we use w to indicate that w has been modified, and R to indicate that R has been modified.
The runtime is shown in Figure 9. We use charts to show the change of running time of different algorithms under the conditions of the same type and the same size data set and the same why-not point. It is worth mentioning that in the 10k anti-correlation data, when k > 5, the computation is very huge, so only part of the data is taken as the experimental result. The same is true for other data. Therefore, we chose two types of data sets of 10k size for experiments.
By analyzing and comparing the execution time and cost of the algorithms, we can draw the following conclusions: • Except for the algorithm, when other conditions are the same, the ascending order of cost in anti-correlation data is MWP ≥ MWR ≥ MRN. Similarly, in general, the ascending order of cost in independent data is MWR ≥ MRN ≥ MWP.
• Except for the algorithm, when other conditions are the same, the ascending order of runtime in various datasets is MWP ≥ MWR ≥ MRN. However, due to the operating environment, MRN ≥ MWR may occur.

•
Except for k, when other conditions are the same, the larger k, the longer the execution time of the algorithm.
In anti-correlation data, MWP and MWR algorithms are superior to the MRN algorithm in terms of cost and runtime. In independent data, MWR and MRN algorithms outperform the MWP algorithm in terms of cost, but the MWP algorithm runs faster than the MWR algorithm and MRN algorithm. In general, the MWP algorithm is optimal if users want to get results extremely quickly and do not change the query range. Moreover, the MWR algorithm is a good choice if users want to get the results relatively quickly and at the lowest possible cost. Table 3. Cost for anti-corr-data (w ∈ R).

Case 2: w ∈ R
In this case, we compare the performance of MRE+MWP, MRE+MRN and MRE+MWR algorithms. According to Figure 6, we randomly select a point as the why-not point in each region except R. We use line charts to show the changes of running time and cost of different algorithms under the conditions of the same type and the same size data set and the same why-not point. The cost and running time of anti-correlation data and independent data are respectively shown in Figures 10 and 11. It is worth mentioning that in the 10k anti-correlation data when w ∈ A2, the calculation is extremely huge and its running time exceeds 72 h. Therefore, relevant experimental results are not given here. The same is true for 10k equivalent data.
When w is in one of A3, C2, A4, B1, A1, the execution time of the algorithm is shorter than w in one of C1, B2, A2. This is because the former is more likely to become the SDR than the latter. In particular, when w ∈ A3, only need to extend R to R and its execution time is the shortest. When w ∈ A2, the execution time is the longest.

Conclusions
With the development of information technology, the why-not question and skyline query are getting more and more attention. However, there is not much research to solve the why-not question in skyline queries. This paper answers "why-not" questions in skyline queries based on the orthogonal query range. Skyline queries based on the orthogonal range can help users effectively improve query efficiency. Answering why-not questions in ORSQ can help users analyze query results and make decisions. With the emergence of new technologies such as cloud computing, advances in mobile devices and other technologies, this research can be applied to mobile devices to provide decision support for users, such as smartphones.
In this paper, we researched why-not questions in ORSQ. Firstly, we present the semantics of this problem. Then, we analyze the cause of why-not questions in ORSQ. According to the location of the why-not point, we present how to solve the why-not questions in ORSQ by modifying the why-not point or the orthogonal range. Finally, the experimental results demonstrate that the algorithms are effective in answering why-not questions in ORSQ. This paper instantiates the proposed algorithm in two dimensions. In theory, this paper is also scalable in multi-dimensional situations. In future research, we can answer this problem in high dimensional space.