Fast Top- K Path-Based Relevance Query on Massive Graphs
Abstract: Obtaining the items highly-relevant to a given set of query items is a key task for various applications, such as recommendation and relationship prediction. A family of path-based relevance metrics, which quantify item relevance based on the paths in an item graph, have been shown to be effective in capturing the relevance in many applications. Despite their effectiveness, path-based relevance normally requires time-consuming iterative computation. We propose an approach to obtain the top-k most relevant items for a given query item set quickly. Our approach uses novel score bounds to detect the emergence of the top-k items during the computation. The approach is designed for a distributed environment, which makes it scale for massive graphs having billions of nodes. Our experimental results show that the proposed approach can provide the results up to two order of magnitudes faster than previously proposed approaches and can scale well with both the size of input and the number of machines used in the computation.