Keyword Search over Distributed Graphs with Compressed Signature
Abstract: Graph keyword search has drawn many research interests, since graph models can generally represent both structured and unstructured databases and keyword searches can extract valuable information for users without the knowledge of the underlying schema and query language. In practice, data graphs can be extremely large, e.g., a Web-scale scale graph containing billions of vertices. The state-of-the-art state approaches employ centralized aalgorithms lgorithms to process graph keyword searches, and thus they are infeasible for such large graphs, due to the limited computational power and storage space of a centralized server. To address this problem, we investigate keyword search for Web Web-scale scale graphs deployed d in a distributed environment. We first give a naive search algorithm to answer the query efficiently. However, the naive search algorithm uses a flooding search strategy that incurs large time and network overhead. To remedy this shortcoming, we then en propose a signature signature-based based search algorithm. Specifically, we design a vertex signature that encodes the shortest shortest-path path distance from a vertex to any given keyword in the graph. As a result, we can find query answers by exploring fewer paths, so that the time and communication costs are low. Moreover, we reorganize the graph data in the cluster after its initial random partitioning so that the signature signature-based based techniques are more effective. Finally, our experimental results demonstrate the feasibility of ou ourr proposed approach in performing keyword searches over Web Web-scale graph data.