Off-campus UNL users: To download campus access dissertations, please use the following link to log into our proxy server with your NU ID and password. When you are done browsing please remember to return to this page and log out.
Non-UNL users: Please talk to your librarian about requesting this dissertation through interlibrary loan.
Scalable Parallel Community Detection for Large-Scale Graphs
Abstract
Community detection, also named as graph clustering, is essential to various graph analysis applications and can help researchers in a wide range of fields gain deep insights of graphs. However, in the era of big data, a large-scale graph containing billions or trillions of vertices and edges is difficult to be processed efficiently on a single machine. Although parallel clustering that uses parallel machines with distributed memory provides a natural solution, it remains an open problem to design scalable and accurate parallel community detection algorithms due to the imbalanced partitioning and convergence problems. In this dissertation, we holistically address these two challenges and develop a set of scalable solutions for representative graph clustering algorithms, as well as a unified end-to-end optimized graph clustering framework. The first contribution of this dissertation is that we design a parallel Louvain algorithm by studying the relationship between graph structure property and clustering quality, carefully dealing with ghost vertices between graph partitions, and partitioning the large-scale graphs heuristically in a balanced way. This algorithm can achieve higher accuracy and better scalability compared to the existing method. Second, we employ a vertex delegates partitioning method and design a novel distributed Louvain algorithm to achieve balanced workload and communication among massive processors with large graphs. We use a collective operation to synchronize the information on delegated vertices, and design a new heuristic strategy to ensure the convergence of the algorithm. Our algorithm clearly shows superior scalability and parallel efficiency over the existing work. Third, by taking advantage of the vertex delegates partitioning, we design and implement a new distributed Infomap algorithm that has scaled up to 4,096 cores and showed the significant performance improvement over the previous state-of-the-art distributed Infomap algorithm. The last but not least contribution is that, by studying the common computation and communication patterns in the distributed Louvain and Infomap algorithms, we design a unified end-to-end optimized graph clustering framework based on asynchronous graph visitor queue. Our framework can not only ensure balanced workload and communication, but also improve accuracy through immediately updated information than existing synchronized algorithms. The effectiveness of this unified framework will facilitate our future study on graph community detection.
Subject Area
Computer science
Recommended Citation
Zeng, Jianping, "Scalable Parallel Community Detection for Large-Scale Graphs" (2017). ETD collection for University of Nebraska-Lincoln. AAI10683599.
https://digitalcommons.unl.edu/dissertations/AAI10683599