Adaptive Processing for Distributed Skyline Queries over Uncertain Data
Abstract: Query processing over uncertain data has gained growing attention, because it is necessary to deal with uncertain data in many real-life applications. In this paper, we investigate skyline queries over uncertain data in distributed environments (DSUD query) whose research is only in an early stage. The state-of-the-art algorithm, called e-DSUD algorithm, is designed for processing this query. It has the desirable characteristics of progressiveness and minimum bandwidth consumption. However, it still needs to be perfected in three aspects. (1) Progressiveness. Each time it only returns one query result at most. (2) Efficiency. There are a significant amount of redundant I/O cost and numerous iterations which causes a long total query time. (3) Universality. It is restricted to the case where local skyline tuples are incomparability. To address these concerns, we first present a detailed analysis of the e-DSUD algorithm and then develop an improved framework for the DSUD query, namely IDSUD. Based on the new framework, we propose an adaptive algorithm, called ADSUD, for the DSUD query. In the algorithm, we redefine the approximate global skyline probability and choose local representative tuples due to minimum probabilistic bounding rectangle adaptively. Furthermore, we design a progressive pruning method and apply the reuse mechanism to improve its efficiency. The results of extensive experiments verify the better overall performance of our algorithm than the eDSUD algorithm.