In today’s data-driven world, algorithms operating with vertically distributed datasets are crucial due to the increasing prevalence of large-scale, decentralized data storage. These algorithms process data locally, thereby reducing data transfer and exposure to breaches, while at the same time improving scalability thanks
[...] Read more.
In today’s data-driven world, algorithms operating with vertically distributed datasets are crucial due to the increasing prevalence of large-scale, decentralized data storage. These algorithms process data locally, thereby reducing data transfer and exposure to breaches, while at the same time improving scalability thanks to data distribution across multiple sources. Top-
k queries are a key tool in vertically distributed scenarios and are widely applied in critical applications involving sensitive data. Classical top-
k algorithms typically resort to
sorted access to sequentially scan the dataset and to
random access to retrieve a tuple by its id. However, the latter kind of access is sometimes too costly to be feasible, and algorithms need to be designed for the so-called “no random access” (NRA) scenario. The latest efforts in this direction do not cover the recent advances in ranking queries, which propose hybridizations of top-
k queries (which are preference-aware and control the output size) and skyline queries (which are preference-agnostic and have uncontrolled output size). The
non-dominated flexible skyline (
) is one such proposal, which tries to obtain the best of top-
k and skyline queries. We introduce an algorithm for computing
in the NRA scenario, prove its correctness and optimality within its class, and provide an experimental evaluation covering a wide range of cases, with both synthetic and real datasets.
Full article