Computing Non-Dominated Flexible Skylines in Vertically Distributed Datasets with No Random Access
Abstract
1. Introduction
- Summary of contributions. The main contributions of this paper are as follows:
- We consider for the first time the problem of computing the flexible skyline on a vertically distributed dataset when random access is not available (NRA scenario).
- We address the problem by proposing an algorithm that is proved correct and provides strong optimality guarantees on the execution cost.
- We run an extensive set of experiments validating our solution, covering several real and synthetic datasets, and propose two baselines obtained through non-trivial adaptations of the algorithms existing in the literature. Our solution largely outperforms the baselines and proves to be effective, especially in those scenarios (like uniformly distributed datasets) in which early termination can be achieved even in the absence of random access.
2. Preliminaries
3. Flexible NRA
- Access sorted items, one at a time, on all ranked lists. This will unveil the value of a tuple on some list but possibly not in all lists, so we may not be able to compute the overall score of a tuple. We can, anyhow, compute bounds expressing the best and worst possible values for such a score.
- Keep a buffer for all the seen tuples, and for all tuples, compute the worst and the best bound on their overall score2.
- Also, compute a threshold value for the overall score that may be attained by unseen tuples.
- Repeat until there are k tuples in the buffer whose worst bound is no worse than the best bound of all the other seen tuples and the threshold.
- Return such k objects.
Algorithm 1: Algorithmic pattern for computing . |
Input: Ranked lists , scoring functions Output: 1. // buffer of tuples 2. lists not exhausted 3. c := 0 // a counter of tuples -dominating the threshold point 4. make a sorted access on and insert/update extracted tuples in B 5. // threshold point: last scores on every list 6. seen tuples 7. // worst bound of t -dominates τ 8. if ++c = k // threshold -dominated by k tuples 9. break to Line 10 10. // keep digging if at least one tuple can be -dominated by k tuples 11. remove from B tuples -dominated by other k tuples 12. for s in B // candidate non--dominated tuples 13. c := 0 // a counter of non--dominance relationships 14. for t in // candidate non--dominating tuples 15. if // best bound of t does not -dominate worst bound of s 16. c++ 17. if // if k tuples may -dominate it, we keep deepening 18. make a sorted access on and insert/update extracted tuples in B 19. continue to Line 10 20. break 21. return B |
4. Experiments
- Varying the dataset size N. The effect of the dataset size on the computation of through Algorithm 1 is illustrated in Figure 5. We varied the dataset size on UNI using the values indicated in Table 1 and default values (indicated in bold in the table) for all other parameters. Figure 5a reports stacked bars for each of the tested dataset sizes, in which the lower part refers to the depth reached at the end of the growing phase, while the top part indicates the additional depth incurred during the shrinking phase. We observe that, for datasets with uniformly distributed values, such as , the depth grows less than linearly with the dataset size, varying from around with K to around with M. Figure 5b shows the number of -dominance tests that were executed in order to find the result. In this case, we can see the effect of the quadratic nature of skyline-based operators such as , with the number of tests varying from 387,810 with K to 308,877,008 with M. Execution times are essentially related to the number of -dominance tests, which are the most expensive operation in the process. Figure 5c shows that such times, in seconds, vary from s with K to with M, i.e., there is a growth of nearly 3 orders of magnitude, as also observed for the number of -dominance tests.
- Varying the number of dimensions d. Figure 6 shows the effect of the number of dimensions d on our measurements. Algorithms based on a “no random access” (NRA) policy heavily suffer from the so-called curse of dimensionality, since an increased number of dimensions entails less likely dominance (and -dominance) relationships, with result sets growing larger and larger. In such cases, an NRA policy essentially mandates a full scan of the dataset, since stopping criteria are met no earlier than that, thereby defeating the very purpose of “early exit” top-k algorithms exploiting the ranking inherent in the vertically distributed sources. For these reasons, we limited our analysis to low values of d (2, 3, and 4). While the charts in Figure 6 are analogous to those in Figure 5, here we see that the effect of augmenting d is heavier on the depth, which reaches with 4 dimensions, while it was just with . The larger number of involved tuples, with higher values of d, consequently entails larger numbers of -dominance tests and longer execution times, as can be seen in Figure 6b,c.
- Varying k. Figure 7 shows the effect of k on . While k is not an exact output size in the case of , it can be considered as the initial output size, which applies when contains just one function. We observe here that the depth grows from when to when , i.e., less than linearly as k grows. The number of -dominance tests and the execution times are, again, tightly connected and mainly depend on the number of tuples that are retained in the growing phase and that, consequently, might need to be removed in the shrinking phase. To this end, Figure 8 shows how the number of retained tuples varies from right after the growing phase, i.e., when the buffer has its largest size, shown in Figure 8b, to the end of the execution, when the buffer contains the final result, whose size is shown in Figure 8a. With our default spread value , the output size does not grow too much larger than k, topping for . Instead, the number of tuples retained at the end of the growing phase goes from just 599 for to 5898 for , thus causing the steep increase in the number of -dominance tests shown in Figure 7b.
- Varying the spread . The effect of , and, more precisely, of the constraints used on the weights to determine is shown in Figure 9. In particular, we vary the spread of the constraints shown in (4) so that the operator ranges from a pure top-k query (with just one linear scoring function) to a pure k-skyband query (with all possible linear scoring functions, which, as is well known [10], result-wise have the same power as all the monotone scoring functions). Figure 9a shows that, on the UNI dataset with default parameter values, small values of make deviate very little from the behavior of a top-k query—and this is also confirmed in terms of -dominance tests (Figure 9b) and execution time (Figure 9c). Some growth is visible starting at , and it is definitely evident for , where the depth nearly doubles with respect to the case (none). However, we also observe that the computational toll is entirely ascribable to the shrinking phase, which requires more deepening to satisfy looser constraints. We intentionally left out of the charts the case where can vary freely (full), because there we experience an explosion in the depth (reaching vs. just with ) as well as in the -dominance tests (nearly 53 M vs. 14 M) and execution times ( s vs. s), which would make the charts difficult to read.
- Varying the batch size . In order to reduce the number of times the stopping criterion is checked (which requires a high number of -dominance tests), one could try to increase the number of rows read by sorted access at once. Normally, d sorted accesses (one per list) are made, and then the threshold-based stopping condition is checked (this happens during both the growing phase and the shrinking phase). Reducing the frequency of the checks to once per batch of accesses may entail a significant speed-up. Additionally, this behavior mimics the case of online services returning results in pages of a given size . Figure 10 shows the effect of varying from the no-batch scenario to . While guarantees that the minimum depth will be attained during the execution, larger values will have looser guarantees on the depth, but might drastically reduce the number of incurred -dominance tests and, consequently, the execution time. Figure 10a shows that increasing causes the final depth to be a multiple of the batch size itself, but this negative effect may not be overall prevalent. Indeed, while with we reach the minimum depth (1691), this only increases to 1700 when and to 1800 when , while the largest increase is experienced for , with a depth of 2000. We observe that, while the depth is more or less stable for this dataset during the growing phase when varies (with values ranging from 949 to 1000), larger changes are found during the shrinking phase (values from 742 to 1000). However, the increase in depth is worthwhile if we look at the number of -dominance tests (Figure 10b) and execution times (Figure 10c): the number of -dominance tests plummets from 372 M when to just M when and times go from 147 s when to just 2 s when . Due to the almost negligible difference between the cases and , we chose the former as the default value to use in the experiments, as it causes the lesser harm to depth.
- Other datasets. As we mentioned, we also executed our experiments against the ANT family of datasets. However, due to the very nature of these datasets, an NRA-based approach like the one described in Algorithm 1 is inherently ineffective. Indeed, even with the most favorable working conditions (, K, , , ), the depth explored by the algorithm is almost as large as the dataset size. In particular, with this specific configuration, the depth was of the dataset size, and required 58 M -dominance tests with an execution time of s. Clearly, larger dataset sizes and less favorable conditions would determine a full scan of the dataset, with consequently higher execution times and numbers of -dominance tests.
- Final observations. Our experiments show that the algorithmic scheme we proposed for computing is effective in computing the results, especially in scenarios regarding uniformly distributed data, which are more likely to allow early termination even in the absence of random access. In all other scenarios, particularly unfavorable configurations might be difficult to manage. In particular, when the growing phase first ends, the passage to the shrinking phase charges a heavy computational toll, since many tuples are in the buffer and many need to be removed, with non-negligible costs that are quadratic in the buffer size.
5. Related Work
6. Conclusions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
1 | More precisely, if we call the score of a tuple with attribute values x and y, we have , corresponding to the straight line , whose angular coefficient is , i.e., . |
2 | The original algorithm uses lower and upper bounds. For generality, with respect to the adopted convention, we prefer to talk about the worst and best bounds here. |
References
- Fagin, R. Combining Fuzzy Information from Multiple Systems. In Proceedings of the Fifteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Montreal, QC, Canada, 3–5 June 1996; pp. 216–226. [Google Scholar] [CrossRef]
- Ertz, M.; Leblanc-Proulx, D.; Sarigöllü, E.; Morin, C. Web Scraping Techniques and Applications: A Literature Review. J. Bus. Res. 2023, 142, 1–13. [Google Scholar] [CrossRef]
- Carro, M. NoSQL Databases. arXiv 2014, arXiv:1401.2101. [Google Scholar]
- Zhang, W.; Liu, J.; Chen, L. Automatic Web Data API Creation via Cross-Lingual Neural Pagination. In Proceedings of the 2022 International Conference on Web Engineering, Bari, Italy, 5–8 July 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 115–130. [Google Scholar] [CrossRef]
- McSherry, F.; Lattuada, A.; Schwarzkopf, M.; Roscoe, T. Shared Arrangements: Practical Inter-Query Sharing for Streaming Dataflows. Proc. VLDB Endow. 2020, 13, 1793–1806. [Google Scholar] [CrossRef]
- Alabdulkarim, A.; Bhowmick, S.S. Efficient and Secure Multiparty Querying over Federated Graph Databases. In Proceedings of the 2024 International Conference on Data Engineering (ICDE), Utrecht, The Netherlands, 13–16 May 2024; pp. 1234–1245. [Google Scholar] [CrossRef]
- Fagin, R.; Lotem, A.; Naor, M. Optimal Aggregation Algorithms for Middleware. In Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Santa Barbara, CA, USA, 21–23 May 2001. [Google Scholar] [CrossRef]
- Börzsönyi, S.; Kossmann, D.; Stocker, K. The Skyline Operator. In Proceedings of the 17th International Conference on Data Engineering, Heidelberg, Germany, 2–6 April 2001; pp. 421–430. [Google Scholar] [CrossRef]
- Ciaccia, P.; Martinenghi, D. Reconciling Skyline and Ranking Queries. Proc. VLDB Endow. 2017, 10, 1454–1465. [Google Scholar] [CrossRef]
- Ciaccia, P.; Martinenghi, D. Flexible Skylines: Dominance for Arbitrary Sets of Monotone Functions. ACM Trans. Database Syst. 2020, 45, 18:1–18:45. [Google Scholar] [CrossRef]
- Papadias, D.; Tao, Y.; Fu, G.; Seeger, B. Progressive skyline computation in database systems. ACM Trans. Database Syst. 2005, 30, 41–82. [Google Scholar] [CrossRef]
- Ciaccia, P.; Martinenghi, D. FA + TA < FSA: Flexible Score Aggregation. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM 2018, Torino, Italy, 22–26 October 2018; pp. 57–66. [Google Scholar] [CrossRef]
- Mamoulis, N.; Yiu, M.L.; Cheng, K.H.; Cheung, D.W. Efficient top-k aggregation of ranked inputs. ACM Trans. Database Syst. 2007, 32, 19. [Google Scholar] [CrossRef]
- Kung, H.T.; Luccio, F.; Preparata, F.P. On Finding the Maxima of a Set of Vectors. J. ACM 1975, 22, 469–476. [Google Scholar] [CrossRef]
- Chomicki, J.; Godfrey, P.; Gryz, J.; Liang, D. Skyline with Presorting. In Proceedings of the 19th International Conference on Data Engineering, Bangalore, India, 5–8 March 2003; pp. 717–719. [Google Scholar] [CrossRef]
- Godfrey, P.; Shipley, R.; Gryz, J. Maximal Vector Computation in Large Data Sets. In Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, 30 August–2 September 2005; Böhm, K., Jensen, C.S., Haas, L.M., Kersten, M.L., Larson, P., Ooi, B.C., Eds.; ACM: New York, NY, USA, 2005; pp. 229–240. [Google Scholar]
- Bartolini, I.; Ciaccia, P.; Patella, M. SaLSa: Computing the skyline without scanning the whole sky. In Proceedings of the 2006 ACM CIKM International Conference on Information and Knowledge Management, Arlington, VA, USA, 6–11 November 2006; Yu, P.S., Tsotras, V.J., Fox, E.A., Liu, B., Eds.; ACM: New York, NY, USA, 2006; pp. 405–414. [Google Scholar] [CrossRef]
- Godfrey, P.; Shipley, R.; Gryz, J. Algorithms and analyses for maximal vector computation. VLDB J. 2007, 16, 5–28. [Google Scholar] [CrossRef]
- Chomicki, J. Semantic optimization techniques for preference queries. Inf. Syst. 2007, 32, 670–684. [Google Scholar] [CrossRef]
- Lee, K.C.K.; Zheng, B.; Li, H.; Lee, W. Approaching the Skyline in Z Order. In Proceedings of the 33rd International Conference on Very Large Data Bases, Vienna, Austria, 23–27 September 2007; Koch, C., Gehrke, J., Garofalakis, M.N., Srivastava, D., Aberer, K., Deshpande, A., Florescu, D., Chan, C.Y., Ganti, V., Kanne, C., et al., Eds.; ACM: New York, NY, USA, 2007; pp. 279–290. [Google Scholar]
- Sheng, C.; Tao, Y. On finding skylines in external memory. In Proceedings of the 30th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2011, Athens, Greece, 12–16 June 2011; Lenzerini, M., Schwentick, T., Eds.; ACM: New York, NY, USA, 2011; pp. 107–116. [Google Scholar] [CrossRef]
- Fagin, R. Fuzzy Queries in Multimedia Database Systems. In Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Seattle, WA, USA, 1–3 June 1998; pp. 1–10. [Google Scholar] [CrossRef]
- Akbarinia, R.; Pacitti, E.; Valduriez, P. Best Position Algorithms for Top-k Queries. In Proceedings of the 33rd International Conference on Very Large Data Bases, Vienna, Austria, 23–27 September 2007; Koch, C., Gehrke, J., Garofalakis, M.N., Srivastava, D., Aberer, K., Deshpande, A., Florescu, D., Chan, C.Y., Ganti, V., Kanne, C., et al., Eds.; ACM: New York, NY, USA, 2007; pp. 495–506. [Google Scholar]
- Bast, H.; Majumdar, D.; Schenkel, R.; Theobald, M.; Weikum, G. IO-Top-k: Index-access Optimized Top-k Query Processing. In Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Republic of Korea, 12–15 September 2006; Dayal, U., Whang, K., Lomet, D.B., Alonso, G., Lohman, G.M., Kersten, M.L., Cha, S.K., Kim, Y., Eds.; ACM: New York, NY, USA, 2006; pp. 475–486. [Google Scholar]
- Schnaitter, K.; Polyzotis, N. Evaluating rank joins with optimal cost. In Proceedings of the Twenty-Seventh ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2008, Vancouver, BC, Canada, 9–11 June 2008; Lenzerini, M., Lembo, D., Eds.; ACM: New York, NY, USA, 2008; pp. 43–52. [Google Scholar] [CrossRef]
- Lange, D.; Naumann, F. Bulk sorted access for efficient top-k retrieval. In Proceedings of the Conference on Scientific and Statistical Database Management, SSDBM ’13, Baltimore, MD, USA, 29–31 July 2013; Szalay, A., Budavari, T., Balazinska, M., Meliou, A., Sacan, A., Eds.; ACM: New York, NY, USA, 2013; pp. 39:1–39:4. [Google Scholar] [CrossRef]
- Luo, Y.; Wang, W.; Lin, X.; Zhou, X.; Wang, J.; Li, K. SPARK2: Top-k Keyword Query in Relational Databases. IEEE Trans. Knowl. Data Eng. 2011, 23, 1763–1780. [Google Scholar] [CrossRef]
- Luo, Y.; Lin, X.; Wang, W.; Zhou, X. Spark: Top-k keyword query in relational databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Beijing, China, 12–14 June 2007; Chan, C.Y., Ooi, B.C., Zhou, A., Eds.; ACM: New York, NY, USA, 2007; pp. 115–126. [Google Scholar] [CrossRef]
- Xin, D.; Han, J.; Chang, K.C. Progressive and selective merge: Computing top-k with ad-hoc ranking functions. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Beijing, China, 12–14 June 2007; Chan, C.Y., Ooi, B.C., Zhou, A., Eds.; ACM: New York, NY, USA, 2007; pp. 103–114. [Google Scholar] [CrossRef]
- Güntzer, U.; Balke, W.; Kießling, W. Towards Efficient Multi-Feature Queries in Heterogeneous Environments. In Proceedings of the 2001 International Symposium on Information Technology (ITCC 2001), Las Vegas, NV, USA, 2–4 April 2001; pp. 622–628. [Google Scholar] [CrossRef]
- Fagin, R.; Lotem, A.; Naor, M. Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 2003, 66, 614–656. [Google Scholar] [CrossRef]
- Fagin, R. Combining Fuzzy Information: An Overview. SIGMOD Rec. 2002, 31, 109–118. [Google Scholar] [CrossRef]
- Theobald, M.; Weikum, G.; Schenkel, R. Top-k Query Evaluation with Probabilistic Guarantees. In Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB 2004, Toronto, ON, Canada, 31 August–3 September 2004; Nascimento, M.A., Özsu, M.T., Kossmann, D., Miller, R.J., Blakeley, J.A., Schiefer, K.B., Eds.; Morgan Kaufmann: San Francisco, CA, USA, 2004; pp. 648–659. [Google Scholar] [CrossRef]
- Gurský, P.; Vojtás, P. Speeding Up the NRA Algorithm. In Proceedings of the Scalable Uncertainty Management, Second International Conference, SUM 2008, Naples, Italy, 1–3 October 2008; Lecture Notes in Computer Science. Greco, S., Lukasiewicz, T., Eds.; Springer: Berlin/Heidelberg, Germany, 2008; Volume 5291, pp. 243–255. [Google Scholar] [CrossRef]
- Yuan, J.; Sun, G.; Tian, Y.; Chen, G.; Liu, Z. Selective-NRA Algorithms for Top-k Queries. In Proceedings of the Advances in Data and Web Management, Joint International Conferences, APWeb/WAIM 2009, Suzhou, China, 2–4 April 2009; Lecture Notes in Computer Science. Li, Q., Feng, L., Pei, J., Wang, X.S., Zhou, X., Zhu, Q., Eds.; Springer: Berlin/Heidelberg, Germany, 2009; Volume 5446, pp. 15–26. [Google Scholar] [CrossRef]
- Yuan, J.; Sun, G.; Luo, T.; Lian, D.; Chen, G. Efficient processing of top-k queries: Selective NRA algorithms. J. Intell. Inf. Syst. 2012, 39, 687–710. [Google Scholar] [CrossRef]
- Chen, L.; Hwang, K.; Wu, J. MapReduce Skyline Query Processing with a New Angular Partitioning Approach. In Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, IPDPS 2012, Shanghai, China, 21–25 May 2012; pp. 2262–2270. [Google Scholar] [CrossRef]
- Mullesgaard, K.; Pederseny, J.L.; Lu, H.; Zhou, Y. Efficient Skyline Computation in MapReduce. In Proceedings of the 17th International Conference on Extending Database Technology, EDBT 2014, Athens, Greece, 24–28 March 2014. [Google Scholar] [CrossRef]
- Zhang, J.; Jiang, X.; Ku, W.; Qin, X. Efficient Parallel Skyline Evaluation Using MapReduce. IEEE Trans. Parallel Distrib. Syst. 2016, 27, 1996–2009. [Google Scholar] [CrossRef]
- Koh, J.; Chen, C.; Chan, C.; Chen, A.L.P. MapReduce skyline query processing with partitioning and distributed dominance tests. Inf. Sci. 2017, 375, 114–137. [Google Scholar] [CrossRef]
- Kim, J.; Kim, M.H. An efficient parallel processing method for skyline queries in MapReduce. J. Supercomput. 2018, 74, 886–935. [Google Scholar] [CrossRef]
- Wang, W.; Zhang, J.; Sun, M.; Ku, W. Efficient Parallel Spatial Skyline Evaluation Using MapReduce. In Proceedings of the 20th International Conference on Extending Database Technology, EDBT 2017, Venice, Italy, 21–24 March 2017; Markl, V., Orlando, S., Mitschang, B., Andritsos, P., Sattler, K., Breß, S., Eds.; OpenProceedings.org: Konstanz, Germany, 2017; pp. 426–437. [Google Scholar] [CrossRef]
- Li, C.; Gu, Y.; Qi, J.; Yu, G. SkyCell: A Space-Pruning Based Parallel Skyline Algorithm. arXiv 2021, arXiv:2107.09993. [Google Scholar]
- Cui, B.; Chen, L.; Xu, L.; Lu, H.; Song, G.; Xu, Q. Efficient Skyline Computation in Structured Peer-to-Peer Systems. IEEE Trans. Knowl. Data Eng. 2009, 21, 1059–1072. [Google Scholar] [CrossRef]
- Lee, J.; Hwang, S. Scalable skyline computation using a balanced pivot selection technique. Inf. Syst. 2014, 39, 1–21. [Google Scholar] [CrossRef]
- Han, X.; Wang, B.; Li, J.; Gao, H. Ranking the big sky: Efficient top-k skyline computation on massive data. Knowl. Inf. Syst. 2019, 60, 415–446. [Google Scholar] [CrossRef]
- Song, B.; Liu, A.; Ding, L. Efficient Top-k Skyline Computation in MapReduce. In Proceedings of the 12th Web Information System and Application Conference, WISA 2015, Jinan, China, 11–13 September 2015; pp. 67–70. [Google Scholar] [CrossRef]
- Liu, A. Top-k Skyline Result Optimization Algorithm in MapReduce. In Proceedings of the 14th International Conference on Computer Science & Education, ICCSE 2019, Toronto, ON, Canada, 19–21 August 2019; pp. 466–471. [Google Scholar] [CrossRef]
- Li, C.; Gu, Y.; Qi, J.; Yu, G. Parallel Skyline Processing Using Space Pruning on GPU. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–21 October 2022; Hasan, M.A., Xiong, L., Eds.; ACM: New York, NY, USA, 2022; pp. 1074–1083. [Google Scholar] [CrossRef]
- Tang, M.; Yu, Y.; Aref, W.G.; Malluhi, Q.M.; Ouzzani, M. Efficient Parallel Skyline Query Processing for High-Dimensional Data. In Proceedings of the 35th IEEE International Conference on Data Engineering, ICDE 2019, Macao, China, 8–11 April 2019; pp. 2113–2114. [Google Scholar] [CrossRef]
- Wijayanto, H.; Wang, W.; Ku, W.; Chen, A.L.P. LShape Partitioning: Parallel Skyline Query Processing Using MapReduce. IEEE Trans. Knowl. Data Eng. 2022, 34, 3363–3376. [Google Scholar] [CrossRef]
- Patil, M.; Shah, R.; Thankachan, S.V. Top-k join queries: Overcoming the curse of anti-correlation. In Proceedings of the 17th International Database Engineering & Applications Symposium, IDEAS ’13, Barcelona, Spain, 9–11 October 2013; Desai, B.C., Larriba-Pey, J.L., Bernardino, J., Eds.; ACM: New York, NY, USA, 2013; pp. 76–85. [Google Scholar] [CrossRef]
- Martinenghi, D.; Tagliasacchi, M. Proximity Rank Join. Proc. VLDB Endow. 2010, 3, 352–363. [Google Scholar] [CrossRef]
- Martinenghi, D.; Tagliasacchi, M. Cost-Aware Rank Join with Random and Sorted Access. IEEE Trans. Knowl. Data Eng. 2012, 24, 2143–2155. [Google Scholar] [CrossRef]
- Miao, X.; Gao, Y.; Zheng, B.; Chen, G.; Cui, H. Top-k Dominating Queries on Incomplete Data. IEEE Trans. Knowl. Data Eng. 2016, 28, 252–266. [Google Scholar] [CrossRef]
- Soliman, M.A.; Ilyas, I.F.; Chang, K.C. Probabilistic top-k and ranking-aggregate queries. ACM Trans. Database Syst. 2008, 33, 13:1–13:54. [Google Scholar] [CrossRef]
- Liu, D.; Wan, C.; Xiong, N.; Yang, L.T.; Chen, L. Two Novel Semantics of Top-k Queries Processing in Uncertain Database. In Proceedings of the 10th IEEE International Conference on Computer and Information Technology, CIT 2010, Bradford, UK, 29 June–1 July 2010; pp. 651–659. [Google Scholar] [CrossRef]
- Lian, X.; Chen, L. Top-k dominating queries in uncertain databases. In Proceedings of the EDBT 2009, 12th International Conference on Extending Database Technology, Saint Petersburg, Russia, 24–26 March 2009; ACM International Conference Proceeding Series. Kersten, M.L., Novikov, B., Teubner, J., Polutin, V., Manegold, S., Eds.; ACM: New York, NY, USA, 2009; Volume 360, pp. 660–671. [Google Scholar] [CrossRef]
- Li, F.; Yi, K.; Le, W. Top-k queries on temporal data. VLDB J. 2010, 19, 715–733. [Google Scholar] [CrossRef]
- Rocha-Junior, J.B.; Nørvåg, K. Top-k spatial keyword queries on road networks. In Proceedings of the 15th International Conference on Extending Database Technology, EDBT ’12, Berlin, Germany, 27–30 March 2012; Proceedings. Rundensteiner, E.A., Markl, V., Manolescu, I., Amer-Yahia, S., Naumann, F., Ari, I., Eds.; ACM: New York, NY, USA, 2012; pp. 168–179. [Google Scholar] [CrossRef]
- Wang, D.; Zou, L.; Zhao, D. Top-k queries on RDF graphs. Inf. Sci. 2015, 316, 201–217. [Google Scholar] [CrossRef]
- Mouratidis, K. Geometric Approaches for Top-k Queries. Proc. VLDB Endow. 2017, 10, 1985–1987. [Google Scholar] [CrossRef]
- Lee, J.; Lee, D.; Hwang, S. CrowdK: Answering top-k queries with crowdsourcing. Inf. Sci. 2017, 399, 98–120. [Google Scholar] [CrossRef]
- Vlachou, A.; Doulkeridis, C.; Kotidis, Y.; Nørvåg, K. Reverse top-k queries. In Proceedings of the 26th International Conference on Data Engineering, ICDE 2010, Long Beach, CA, USA, 1–6 March 2010; Li, F., Moro, M.M., Ghandeharizadeh, S., Haritsa, J.R., Weikum, G., Carey, M.J., Casati, F., Chang, E.Y., Manolescu, I., Mehrotra, S., et al., Eds.; IEEE Computer Society: Los Alamitos, CA, USA, 2010; pp. 365–376. [Google Scholar] [CrossRef]
- Farazi, S.; Rafiei, D. Top-K Frequent Term Queries on Streaming Data. In Proceedings of the 35th IEEE International Conference on Data Engineering, ICDE 2019, Macao, China, 8–11 April 2019; pp. 1582–1585. [Google Scholar] [CrossRef]
- Cheng, J.; Qi, S.; An, B.; Qi, Y.; Wang, J.; Qiao, Y. Lightweight verifiable blockchain top-k queries. Future Gener. Comput. Syst. 2024, 156, 105–115. [Google Scholar] [CrossRef]
- Li, X.; Bai, L.; Miao, Y.; Ma, S.; Ma, J.; Liu, X.; Choo, K.R. Privacy-Preserving Top-$k$k Spatial Keyword Queries in Fog-Based Cloud Computing. IEEE Trans. Serv. Comput. 2023, 16, 504–514. [Google Scholar] [CrossRef]
- Ilyas, I.F.; Beskales, G.; Soliman, M.A. A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 2008, 40, 1–58. [Google Scholar] [CrossRef]
- Mouratidis, K.; Tang, B. Exact Processing of Uncertain Top-k Queries in Multi-criteria Settings. Proc. VLDB Endow. 2018, 11, 866–879. [Google Scholar] [CrossRef]
- Mouratidis, K.; Li, K.; Tang, B. Marrying Top-k with Skyline Queries: Relaxing the Preference Input while Producing Output of Controllable Size. In Proceedings of the SIGMOD ’21: International Conference on Management of Data, Virtual Event, China, 20–25 June 2021; pp. 1317–1330. [Google Scholar] [CrossRef]
- Nanongkai, D.; Sarma, A.D.; Lall, A.; Lipton, R.J.; Xu, J.J. Regret-Minimizing Representative Databases. Proc. VLDB Endow. 2010, 3, 1114–1124. [Google Scholar] [CrossRef]
- Soliman, M.A.; Ilyas, I.F.; Martinenghi, D.; Tagliasacchi, M. Ranking with uncertain scoring functions: Semantics and sensitivity measures. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, 12–16 June 2011; pp. 805–816. [Google Scholar] [CrossRef]
- Papadias, D.; Tao, Y.; Fu, G.; Seeger, B. An Optimal and Progressive Algorithm for Skyline Queries. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, CA, USA, 9–12 June 2003; Halevy, A.Y., Ives, Z.G., Doan, A., Eds.; ACM: New York, NY, USA, 2003; pp. 467–478. [Google Scholar] [CrossRef]
- Lin, X.; Yuan, Y.; Zhang, Q.; Zhang, Y. Selecting Stars: The k Most Representative Skyline Operator. In Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007, Istanbul, Turkey, 15–20 April 2007; Chirkova, R., Dogac, A., Özsu, M.T., Sellis, T.K., Eds.; IEEE Computer Society: Los Alamitos, CA, USA, 2007; pp. 86–95. [Google Scholar] [CrossRef]
- Tao, Y.; Ding, L.; Lin, X.; Pei, J. Distance-Based Representative Skyline. In Proceedings of the 25th International Conference on Data Engineering, ICDE 2009, Shanghai, China, 29 March–2 April 2009; Ioannidis, Y.E., Lee, D.L., Ng, R.T., Eds.; IEEE Computer Society: Los Alamitos, CA, USA, 2009; pp. 892–903. [Google Scholar] [CrossRef]
- Mouratidis, K.; Zhang, J.; Pang, H. Maximum Rank Query. Proc. VLDB Endow. 2015, 8, 1554–1565. [Google Scholar] [CrossRef]
- Ciaccia, P.; Martinenghi, D. Directional Queries: Making Top-k Queries More Effective in Discovering Relevant Results. Proc. ACM Manag. Data 2024, 2, 1–26. [Google Scholar] [CrossRef]
- Masciari, E. Trajectory Clustering via Effective Partitioning. In Proceedings of the Flexible Query Answering Systems, 8th International Conference, FQAS 2009, Roskilde, Denmark, 26–28 October 2009; pp. 358–370. [Google Scholar] [CrossRef]
- Masciari, E.; Mazzeo, G.M.; Zaniolo, C. Analysing microarray expression data through effective clustering. Inf. Sci. 2014, 262, 32–45. [Google Scholar] [CrossRef]
- Fazzinga, B.; Flesca, S.; Masciari, E.; Furfaro, F. Efficient and effective RFID data warehousing. In Proceedings of the International Database Engineering and Applications Symposium (IDEAS 2009), Cetraro, Calabria, Italy, 16–18 September 2009; ACM International Conference Proceeding Series. Desai, B.C., Saccà, D., Greco, S., Eds.; ACM: New York, NY, USA, 2009; pp. 251–258. [Google Scholar] [CrossRef]
- Fazzinga, B.; Flesca, S.; Furfaro, F.; Masciari, E. RFID-data compression for supporting aggregate queries. ACM Trans. Database Syst. 2013, 38, 11. [Google Scholar] [CrossRef]
- Galli, L.; Fraternali, P.; Martinenghi, D.; Tagliasacchi, M.; Novak, J. A Draw-and-Guess Game to Segment Images. In Proceedings of the 2012 International Conference on Privacy, Security, Risk and Trust, PASSAT 2012, and 2012 International Confernece on Social Computing, SocialCom 2012, Amsterdam, The Netherlands, 3–5 September 2012; pp. 914–917. [Google Scholar] [CrossRef]
- Loni, B.; Menéndez, M.; Georgescu, M.; Galli, L.; Massari, C.; Altingövde, I.S.; Martinenghi, D.; Melenhorst, M.S.; Vliegendhart, R.; Larson, M.A. Fashion-focused creative commons social dataset. In Proceedings of the Multimedia Systems Conference 2013, MMSys ’13, Oslo, Norway, 27 February–1 March 2013; Griwodz, C., Ed.; ACM: New York, NY, USA, 2013; pp. 72–77. [Google Scholar] [CrossRef]
- Bozzon, A.; Catallo, I.; Ciceri, E.; Fraternali, P.; Martinenghi, D.; Tagliasacchi, M. A Framework for Crowdsourced Multimedia Processing and Querying. In Proceedings of the First International Workshop on Crowdsourcing Web Search, Lyon, France, 17 April 2012; CEUR Workshop Proceedings. CEUR-WS.org: Aachen, Germany, 2012; Volume 842, pp. 42–47. [Google Scholar]
- Martinenghi, D.; Torlone, R. Querying Context-Aware Databases. In Proceedings of the Flexible Query Answering Systems, 8th International Conference, FQAS 2009, Roskilde, Denmark, 26–28 October 2009; Lecture Notes in Computer Science. Andreasen, T., Yager, R.R., Bulskov, H., Christiansen, H., Larsen, H.L., Eds.; Springer: Berlin/Heidelberg, Germany, 2009; Volume 5822, pp. 76–87. [Google Scholar] [CrossRef]
- Deutch, D.; Milo, T.; Polyzotis, N. Top-k queries over web applications. VLDB J. 2013, 22, 519–542. [Google Scholar] [CrossRef]
- Dembinski, P.; Maluszynski, J. AND-Parallelism with Intelligent Backtracking for Annotated Logic Programs. In Proceedings of the 1985 Symposium on Logic Programming, Boston, MA, USA, 15–18 July 1985; pp. 29–38. [Google Scholar]
- Halevy, A.Y. Answering Queries Using Views: A Survey. Very Large Database J. 2001, 10, 270–294. [Google Scholar] [CrossRef]
- Millstein, T.D.; Halevy, A.Y.; Friedman, M. Query containment for data integration systems. J. Comput. Syst. Sci. 2003, 66, 20–39. [Google Scholar] [CrossRef]
- Florescu, D.; Levy, A.Y.; Manolescu, I.; Suciu, D. Query Optimization in the Presence of Limited Access Patterns. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Philadelphia, PA, USA, 1–3 June 1999; pp. 311–322. [Google Scholar]
- Li, C.; Chang, E. Query Planning with Limited Source Capabilities. In Proceedings of the Sixteenth IEEE International Conference on Data Engineering (ICDE 2000), San Diego, CA, USA, 29 February–3 March 2000; pp. 401–412. [Google Scholar]
- Li, C.; Chang, E. On Answering Queries in the Presence of Limited Access Patterns. In Proceedings of the Eighth International Conference on Database Theory (ICDT 2001), London, UK, 4–6 January 2001; pp. 219–233. [Google Scholar]
- Li, C.; Chang, E. Answering Queries with Useful Bindings. ACM Trans. Database Syst. 2001, 26, 313–343. [Google Scholar] [CrossRef]
- Li, C. Computing Complete Answers to Queries in the Presence of Limited Access Patterns. Very Large Database J. 2003, 12, 211–227. [Google Scholar] [CrossRef]
- Calì, A.; Martinenghi, D. Conjunctive Query Containment under Access Limitations. In Proceedings of the Conceptual Modeling—ER 2008, 27th International Conference on Conceptual Modeling, Barcelona, Spain, 20–24 October 2008; Lecture Notes in Computer Science. Li, Q., Spaccapietra, S., Yu, E.S.K., Olivé, A., Eds.; Springer: Berlin/Heidelberg, Germany, 2008; Volume 5231, pp. 326–340. [Google Scholar] [CrossRef]
- Calì, A.; Martinenghi, D. Querying Data under Access Limitations. In Proceedings of the 24th International Conference on Data Engineering, ICDE 2008, Cancún, Mexico, 7–12 April 2008; Alonso, G., Blakeley, J.A., Chen, A.L.P., Eds.; IEEE Computer Society: Los Alamitos, CA, USA, 2008; pp. 50–59. [Google Scholar] [CrossRef]
- Calì, A.; Calvanese, D.; Martinenghi, D. Dynamic Query Optimization under Access Limitations and Dependencies. J. Univ. Comput. Sci. 2009, 15, 33–62. [Google Scholar] [CrossRef]
- Duschka, O.M.; Levy, A.Y. Recursive Plans for Information Gathering. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence (IJCAI’97), Nagoya, Japan, 23–29 August 1997; pp. 778–784. [Google Scholar]
- Rajaraman, A.; Sagiv, Y.; Ullman, J.D. Answering Queries Using Templates with Binding Patterns. In Proceedings of the Fourteenth ACM SIGACT SIGMOD SIGART Symposium on Principles of Database Systems (PODS’95), San Jose, CA, USA, 22–25 May 1995. [Google Scholar]
- Deutsch, A.; Ludäscher, B.; Nash, A. Rewriting queries using views with access patterns under integrity constraints. Theor. Comput. Sci. 2007, 371, 200–226. [Google Scholar] [CrossRef]
- Nash, A.; Ludäscher, B. Processing first-order queries under limited access patterns. In Proceedings of the Twentythird ACM SIGACT SIGMOD SIGART Symposium on Principles of Database Systems (PODS 2004), Paris, France, 14–16 June 2004; pp. 307–318. [Google Scholar]
- Ludäscher, B.; Nash, A. Processing union of conjunctive queries with negation under limited access patterns. In Proceedings of the Ninth International Conference on Extending Database Technology (EDBT 2004), Heraklion, Crete, Greece, 14–18 March 2004; pp. 422–440. [Google Scholar]
- Yang, G.; Kifer, M.; Chaudhri, V.K. Efficiently ordering subgoals with access constraints. In Proceedings of the Twentyfifth ACM SIGACT SIGMOD SIGART Symposium on Principles of Database Systems (PODS 2006), Chicago, IL, USA, 26–28 June 2006; p. 22. [Google Scholar]
- Calì, A.; Martinenghi, D. Querying the deep web. In Proceedings of the EDBT 2010, 13th International Conference on Extending Database Technology, Lausanne, Switzerland, 22–26 March 2010; ACM International Conference Proceeding Series. Manolescu, I., Spaccapietra, S., Teubner, J., Kitsuregawa, M., Léger, A., Naumann, F., Ailamaki, A., Özcan, F., Eds.; ACM: New York, NY, USA, 2010; Volume 426, pp. 724–727. [Google Scholar] [CrossRef]
Name | Tested Value |
---|---|
Distribution | synthetic: UNI, ANT; real: NBA |
Synthetic dataset size (N) | 10 K, 50 K, 100 K, 500 K, 1 M |
# of dimensions (d) | 2, 3, 4 |
k | 1, 2, 5, 10, 20, 50, 100 |
Spread () | none, 1%, 2%, 5%, 10%, 20%, 50%, full |
Batch size () | 1, 10, 100, 1000 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Martinenghi, D. Computing Non-Dominated Flexible Skylines in Vertically Distributed Datasets with No Random Access. Data 2025, 10, 76. https://doi.org/10.3390/data10050076
Martinenghi D. Computing Non-Dominated Flexible Skylines in Vertically Distributed Datasets with No Random Access. Data. 2025; 10(5):76. https://doi.org/10.3390/data10050076
Chicago/Turabian StyleMartinenghi, Davide. 2025. "Computing Non-Dominated Flexible Skylines in Vertically Distributed Datasets with No Random Access" Data 10, no. 5: 76. https://doi.org/10.3390/data10050076
APA StyleMartinenghi, D. (2025). Computing Non-Dominated Flexible Skylines in Vertically Distributed Datasets with No Random Access. Data, 10(5), 76. https://doi.org/10.3390/data10050076