Human powered data cleaning for probabilistic reachability queries on uncertain graphs

Page 1

Human-Powered Powered Data Cleaning for Probabilistic Reachability Queries on Uncertain Graphs

Abstract: Uncertain graph models are widely used in real real-world world applications such as knowledge graphs and social networks. To capture the uncertainty, each edge in an uncertain graph is associated with an existential probability that signifies the likelihood of the existence xistence of the edge. One notable issue of querying uncertain graphs is that the results are sometimes uninformative because of the edge uncertainty. In this paper, we consider probabilistic reachability queries, which are one of the fundamental classes of graph queries. To make the results more informative, we adopt a crowdsourcing crowdsourcing-based based approach to clean the uncertain edges. However, considering the time and monetary cost of crowdsourcing, it is a problem to efficiently select a limited set of edges for ccleaning leaning that maximizes the quality improvement. We prove that the edge selection problem is #P-hard. #P In light of the hardness of the problem, we propose a series of edge selection algorithms, followed by a number of optimization techniques and pruning heuristics istics for reducing the computation time. Our experimental results demonstrate that our proposed techniques outperform a random selection by up to 27 times in terms of the result quality improvement and the brute-force brute solution by up to 60 times in terms o of the elapsed time.


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.