期刊文献+

Parallel Spectral Clustering Based on MapReduce 被引量:3

Parallel Spectral Clustering Based on MapReduce
在线阅读 下载PDF
导出
摘要 Clustering is one of the most widely used techniques for exploratory data analysis. Spectral clustering algorithm, a popular modern cluslering algorithm, has been shown to be more effective in detecting clusters than many traditional algorithms. It has applications ranging from computer vision and information retrieval to social sienee and biology. With the size of databases soaring, cluostering algorithms bare saling computational time and memory use. In this paper, we propose a parallel spectral elustering implementation based on MapRednee. Both the computation and data storage are dislributed, which solves the sealability problems for most existing algorithms. We empirically analyze the proposed implementation on both benchmark net- works and a real social network dataset of about two million vertices and two billion edges crawled from Sina Weibo. It is shown that the proposed implementation scales well, speeds up the clustering without sacrificing quality, and processes massive datasets efficiently on commodity machine clusters. Clustering is one of the most widely used techniques for exploratory data analysis. Spectral clustering algorithm, a popular modern cluslering algorithm, has been shown to be more effective in detecting clusters than many traditional algorithms. It has applications ranging from computer vision and information retrieval to social sienee and biology. With the size of databases soaring, cluostering algorithms bare saling computational time and memory use. In this paper, we propose a parallel spectral elustering implementation based on MapRednee. Both the computation and data storage are dislributed, which solves the sealability problems for most existing algorithms. We empirically analyze the proposed implementation on both benchmark net- works and a real social network dataset of about two million vertices and two billion edges crawled from Sina Weibo. It is shown that the proposed implementation scales well, speeds up the clustering without sacrificing quality, and processes massive datasets efficiently on commodity machine clusters.
出处 《ZTE Communications》 2013年第2期45-50,共6页 中兴通讯技术(英文版)
关键词 spectral clustering parallel implementation massive dataset Hadoop MapRedue data mining spectral clustering parallel implementation massive dataset Hadoop MapRedue data mining
  • 相关文献

参考文献10

  • 1U. yon Luxburg, A Tutorial on Spectral Clustering," Statistivs and Computing, vol. 17, pp. 395-416, Aug. 2007.
  • 2Wen-Yen Chen, Yangqiu Song, Hongjie Bai, Chih-Jen Lin, and Edward Y. Chang, "Parallel Spectral Clustering in Distributed Systems, IEEE Transac- tions on Pattern Analysis and Machine Intelligence, vol. 33, pp. 568-586, Mar. 2011.
  • 3U. yon Luxburg, O. Bousquet, and M. Belkin, "Limits of Spe:tral Clustering," Neural Information Processing Systems bbundation, 2004.
  • 4Yangqiu Song, Wen-Yen Chen, Hongjie Bai, Chih-Jen Lin, and Edward Y. Chang, Parallel Spectral Clustering," Machine Learning and Knowledge Dis- covery in Datatases, vol. 5212, pp. 374-389, 2008.
  • 5J. Dean and S. Ghemawat, =MapReduce: Simplified Data Prucessing on Large Clusters,= Communications of the ACM-50th anniversary issue: 1958-2008, roe 51, pp. 107-113, Jan. 2008.
  • 6Fan R. K. Chung, Spectral Graph Theory (CBM,q Regional Conference SeHe, in Mathematics, No. 92), Providence, RI: Ameriean Mathematical Soeiety, 2007.
  • 7Weizhong Zhao, Huifang Ma, and Qing He, "Parallel K-Means Clustering Based on MapReduee," Cloud Computing: First International Conferenee, Beijing, Chi- na, Dee. 2009, pp. 674 - 679.
  • 8M. E. J. Newman and M. Girvan, "Finding and evaluating community structure in networks," Physk'al Review E: 69, 026113, 2004.
  • 9A. Clauset, M. E. J. Newman, and C. Moore, "Finding community structure in very large networks," Physical Review E. 70, 066111, 2004.
  • 10A. Lancichinetti, S. Fortunato, and F. Radieehi, "Benchmark graphs for testing community detection algorithms". Physical Review E. 78, 046110, 2008.

同被引文献40

  • 1江小平,李成华,向文,张新访,颜海涛.k-means聚类算法的MapReduce并行化实现[J].华中科技大学学报(自然科学版),2011,39(S1):120-124. 被引量:79
  • 2Santo Fortunato.Community detection in graphs[J].Physics Reports.2009(3)
  • 3Yuan J,Zheng Y,Xie X,Sun G.Driving with Knowledge from the Physical World[].Proceedings of the th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2011
  • 4Brockmann D,Hufnagel L,Geisel T.The scaling laws of human travel[].Nature.2006
  • 5Achrekar H,Gandhe A,Lazarus R,et al.Predicting flu trends u-sing Twitter data[].Proceedings of IEEE Conference on Com-puter Communications Workshops.2011
  • 6LU P,LUO S,HU L,et.al.A novel parallel hierarchical community detection method for large networks. http://biglearn.org/2012/files/papers/biglearning2012_submission_4.pdf . 2013
  • 7TU H,DING J.An efficient clustering algorithm for microblogging hot topic detection[].Proceedings of the International Conference on Computer Science&Service System (CSSS’’).2012
  • 8FERRARI L,ROSI A,MAMEI M,et al.Extracting urban patterns from location-based social networks[].Proceedings of therd ACM SIGSPATIAL International Workshop on Location-based Social Networks (LBSN’’).2011
  • 9PAULOS E,HONICKY R J,HOOKER B.Handbook of research on urban informatics:The practice and promise of the real-time city[]..2008
  • 10OUTRAM C,RATTI C,BIDERMAN A.The copenhagen wheel:An innovative electric bicycle system that harnesses the power of real-time information and crowd sourcing[].Proceedings of the EVER Monaco International Exhibition&Conference on Ecologic Vehicles&Renewable Energies (EVER’’).2010

引证文献3

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部