C lustering and Summarizing Protein-Protein Interaction Networks: A Survey
Abstract: The increasing availability and significance of large-scale protein-protein interaction (PPI) data has resulted in a flurry of research activity to comprehend the organization, processes, and functioning of cells by analyzing these data at network level. Network clustering, that analyzes the topological and functional properties of a PPI network to identify clusters of interacting proteins, has gained significant popularity in the bioinformatics as well as data mining research communities. Many studies since the last decade have shown that clustering PPI networks is an effective approach for identifying functional modules, revealing functions of unknown proteins, etc. In this paper, we examine this issue by classifying, discussing, and comparing a wide ranging approaches proposed by the bioinformatics community to cluster PPI networks. A pervasive desire of this review is to emphasize the uniqueness of the network clustering problem in the context of PPI networks and highlight why generic network clustering algorithms proposed by the data mining community cannot be directly adopted to address this problem effectively. We also review a closely related problem to PPI network clustering, network summarization, which can enable us to make sense out of the information contained in large PPI networks by generating multi-level functional summaries.