Parallel Spectral Clustering Based on MapReduce 被引量：3

Parallel Spectral Clustering Based on MapReduce

在线阅读下载PDF

导出

摘要 Clustering is one of the most widely used techniques for exploratory data analysis. Spectral clustering algorithm, a popular modern cluslering algorithm, has been shown to be more effective in detecting clusters than many traditional algorithms. It has applications ranging from computer vision and information retrieval to social sienee and biology. With the size of databases soaring, cluostering algorithms bare saling computational time and memory use. In this paper, we propose a parallel spectral elustering implementation based on MapRednee. Both the computation and data storage are dislributed, which solves the sealability problems for most existing algorithms. We empirically analyze the proposed implementation on both benchmark net- works and a real social network dataset of about two million vertices and two billion edges crawled from Sina Weibo. It is shown that the proposed implementation scales well, speeds up the clustering without sacrificing quality, and processes massive datasets efficiently on commodity machine clusters. Clustering is one of the most widely used techniques for exploratory data analysis. Spectral clustering algorithm, a popular modern cluslering algorithm, has been shown to be more effective in detecting clusters than many traditional algorithms. It has applications ranging from computer vision and information retrieval to social sienee and biology. With the size of databases soaring, cluostering algorithms bare saling computational time and memory use. In this paper, we propose a parallel spectral elustering implementation based on MapRednee. Both the computation and data storage are dislributed, which solves the sealability problems for most existing algorithms. We empirically analyze the proposed implementation on both benchmark net- works and a real social network dataset of about two million vertices and two billion edges crawled from Sina Weibo. It is shown that the proposed implementation scales well, speeds up the clustering without sacrificing quality, and processes massive datasets efficiently on commodity machine clusters.

作者 Qiwei Zhong Yunlong Lin Junyang Zou Kuangyan Zhu Qiao Wang Lei Hu

机构地区 School of Information Science and Engineering ZTE Corporation

出处《ZTE Communications》 2013年第2期45-50,共6页 中兴通讯技术（英文版）

关键词 spectral clustering parallel implementation massive dataset Hadoop MapRedue data mining spectral clustering parallel implementation massive dataset Hadoop MapRedue data mining

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献10

1U. yon Luxburg, A Tutorial on Spectral Clustering," Statistivs and Computing, vol. 17, pp. 395-416, Aug. 2007.
2Wen-Yen Chen, Yangqiu Song, Hongjie Bai, Chih-Jen Lin, and Edward Y. Chang, "Parallel Spectral Clustering in Distributed Systems, IEEE Transac- tions on Pattern Analysis and Machine Intelligence, vol. 33, pp. 568-586, Mar. 2011.
3U. yon Luxburg, O. Bousquet, and M. Belkin, "Limits of Spe:tral Clustering," Neural Information Processing Systems bbundation, 2004.
4Yangqiu Song, Wen-Yen Chen, Hongjie Bai, Chih-Jen Lin, and Edward Y. Chang, Parallel Spectral Clustering," Machine Learning and Knowledge Dis- covery in Datatases, vol. 5212, pp. 374-389, 2008.
5J. Dean and S. Ghemawat, =MapReduce: Simplified Data Prucessing on Large Clusters,= Communications of the ACM-50th anniversary issue: 1958-2008, roe 51, pp. 107-113, Jan. 2008.
6Fan R. K. Chung, Spectral Graph Theory (CBM,q Regional Conference SeHe, in Mathematics, No. 92), Providence, RI: Ameriean Mathematical Soeiety, 2007.
7Weizhong Zhao, Huifang Ma, and Qing He, "Parallel K-Means Clustering Based on MapReduee," Cloud Computing: First International Conferenee, Beijing, Chi- na, Dee. 2009, pp. 674 - 679.
8M. E. J. Newman and M. Girvan, "Finding and evaluating community structure in networks," Physk'al Review E: 69, 026113, 2004.
9A. Clauset, M. E. J. Newman, and C. Moore, "Finding community structure in very large networks," Physical Review E. 70, 066111, 2004.
10A. Lancichinetti, S. Fortunato, and F. Radieehi, "Benchmark graphs for testing community detection algorithms". Physical Review E. 78, 046110, 2008.

同被引文献40

1江小平,李成华,向文,张新访,颜海涛.k-means聚类算法的MapReduce并行化实现[J].华中科技大学学报（自然科学版）,2011,39(S1):120-124. 被引量：79
2Santo Fortunato.Community detection in graphs[J].Physics Reports.2009(3)
3Yuan J,Zheng Y,Xie X,Sun G.Driving with Knowledge from the Physical World[].Proceedings of the th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2011
4Brockmann D,Hufnagel L,Geisel T.The scaling laws of human travel[].Nature.2006
5Achrekar H,Gandhe A,Lazarus R,et al.Predicting flu trends u-sing Twitter data[].Proceedings of IEEE Conference on Com-puter Communications Workshops.2011
6LU P,LUO S,HU L,et.al.A novel parallel hierarchical community detection method for large networks. http://biglearn.org/2012/files/papers/biglearning2012_submission_4.pdf . 2013
7TU H,DING J.An efficient clustering algorithm for microblogging hot topic detection[].Proceedings of the International Conference on Computer Science&Service System (CSSS’’).2012
8FERRARI L,ROSI A,MAMEI M,et al.Extracting urban patterns from location-based social networks[].Proceedings of therd ACM SIGSPATIAL International Workshop on Location-based Social Networks (LBSN’’).2011
9PAULOS E,HONICKY R J,HOOKER B.Handbook of research on urban informatics:The practice and promise of the real-time city[]..2008
10OUTRAM C,RATTI C,BIDERMAN A.The copenhagen wheel:An innovative electric bicycle system that harnesses the power of real-time information and crowd sourcing[].Proceedings of the EVER Monaco International Exhibition&Conference on Ecologic Vehicles&Renewable Energies (EVER’’).2010

引证文献3

1原旭,陈志奎,赵亮,杨德礼.一种基于Hadoop的改进减法聚类算法[J].微电子学与计算机,2015,32(3):151-155. 被引量：1
22013年全球LTE智能手机销量将增至2012年的3倍[J].中兴通讯技术,2013,19(4):53-53.
3李瑞琳,赵永华,黄小磊.一种基于MPI的稀疏化局部尺度并行谱聚类算法的研究与实现[J].计算机工程与科学,2016,38(5):839-847. 被引量：3

二级引证文献4

1马慧,赵捧未,王婷婷.语义减法聚类研究[J].计算机工程与科学,2016,38(9):1924-1929.
2苏琳,赵永华,李瑞琳.自适应谱聚类算法并行实现及优化[J].科研信息化技术与应用,2016,7(6):44-53.
3李鹏清,李扬定,邓雪莲,李永钢,方月.一种基于SimRank得分的谱聚类算法[J].计算机科学,2018,45(B11):458-461. 被引量：4
4于天禹,赵永华,赵莲.基于神威太湖之光架构的LOBPCG并行算法研究[J].数值计算与计算机应用,2019,40(4):291-309. 被引量：1

1王洪波,罗贺.基于谱聚类的流形学习算法研究[J].中国科学技术大学学报,2013,43(1):79-86. 被引量：1
2牛科,贾郭军.基于Hadoop云平台的并行谱聚类算法的设计与实现[J].山西师范大学学报（自然科学版）,2014,28(1):43-46. 被引量：1
3田铮,李小斌,句彦伟.谱聚类的扰动分析[J].中国科学（E辑）,2007,37(4):527-543. 被引量：33
4郑欣,林学訚.图像数据库的保局聚类[J].计算机研究与发展,2006,43(3):463-469. 被引量：3
5任志勇,张彦平,王龙,丁彦斌.SPSS软件在同土资源数据统计分析中的应用[J].黑龙江国土资源,2010(4):73-73.
6刘伟斌.SPSS软件在电力系统安全管理中的应用[J].江苏安全生产,2009(12):36-37.
7Yifang Yang,Yuping Wang.Simulated annealing spectral clustering algorithm for image segmentation[J].Journal of Systems Engineering and Electronics,2014,25(3):514-522. 被引量：3
8Xianchao ZHANG,Quanzeng YOU.An improved spectral clustering algorithm based on random walk[J].Frontiers of Materials Science,2011,5(3):268-278. 被引量：2
9XU Haixia,TIAN Zheng.An Optimal Spectral Clustering Approach Based on Cauchy-Schwarz Divergence[J].Chinese Journal of Electronics,2009,18(1):105-108. 被引量：4
10ZHENG NAN.Soaring E-Commerce Boosts Business Transformation[J].China International Business,2013(12):42-45.

ZTE Communications

2013年第2期

浏览历史

内容加载中请稍等...

Parallel Spectral Clustering Based on MapReduce 被引量：3

参考文献10

同被引文献40

引证文献3

二级引证文献4

相关作者

相关机构

相关主题

浏览历史