期刊文献+

密度峰值聚类算法综述 被引量:53

Survey on Density Peak Clustering Algorithm
在线阅读 下载PDF
导出
摘要 密度峰值聚类(density peak,DPeak)算法是一种简单有效的聚类算法,它可将任意维度数据映射成2维,在降维后的空间中建构出数据之间的层次关系,可以非常容易地从中挑选出密度高、且与其他密度更高区域相隔较远的数据点.这些点被称为密度峰值点,可以用来作为聚类中心.根据建构好的层次关系,该算法提供了2种不同的方式完成最后聚类:一种是与用户交互的决策图,另一种是自动化方式.跟踪了DPeak近年来的发展与应用动态,对该算法的各种改进或变种从以下3方面进行了总结和梳理:首先,介绍了DPeak算法原理,对其在聚类算法分类体系中的位置进行了讨论.将其与5个主要的聚类算法做了比较之后,发现DPeak与均值漂移聚类算法(mean shift)有诸多相似之处,因而认为其可能为mean shift的一个特殊变种.其次,讨论了DPeak的几个不足之处,如复杂度较高、自适应性不足、精度低和高维数据适用性差等,将针对这些缺点进行改进的相关算法做了分类讨论.此外,梳理了DPeak算法在不同领域中的应用,如自然语言处理、生物医学应用、光学应用等.最后,探讨了密度峰值聚类算法所存在的问题及挑战,同时对进一步的工作进行展望. DPeak(density peak)is a simple but effective clustering method.It is able to map data with arbitrary dimension onto a2-dimensional space,and construct hierarchical relationship for all data points on the new reduction space.This makes it is easy to pick up some distinguished points(density peaks),each of which has high density and large distance from other regions of higher density.In addition,based on regarding theses density peaks as cluster centers and the hierarchical relationship,the algorithm provides two different ways to perform the final task of clustering,i.e.,one is decision diagram that can interact with users,and the other is an automatic method.In this paper,we trace the development and application trends of DPeak in recent years,summarize and comb various improvements or variations of DPeak algorithm from the following aspects.Firstly,the principle of DPeak algorithm is introduced,and its position in the classification system of clustering algorithm is discussed as well.After comparing DPeak with several other main clustering algorithms,it is found that DPeak is highly similar to mean shift,and hence,we think that DPeak may be a special variant of mean shift.Secondly,some shortcomings of DPeak are discussed,such as high time complexity,lack of adaptability,low precision and inefficiency in high dimensional space etc.,and then various improved algorithms are demonstrated in different categories.In addition,some applications of DPeak in different fields,such as natural language processing,biomedical analysis and optical applications etc.,are presented and combed.Last but not least,we look forward to its future work based on the problems and challenges of the DPeak.
作者 陈叶旺 申莲莲 钟才明 王田 陈谊 杜吉祥 Chen Yewang;Shen Lianlian;Zhong Caiming;Wang Tian;Chen Yi;Du Jixiang(College of Computer Science and Technology,Huaqiao University,Xiamen,Fujian 361021;Beijing Key Laboratory of Big Data Technology for Food Safety(Beijing Technology and Business University),Beijing 100048;Provincial Key Laboratory for Computer Information Processing Technology(Soochow University),Suzhou,Jiangsu 215006;Fujian Key Laboratory of Big Data Intelligence and Security(Huaqiao University),Xiamen,Fujian 361021;College of Information,Ningbo University,Ningbo,Zhejiang 315211)
出处 《计算机研究与发展》 EI CSCD 北大核心 2020年第2期378-394,共17页 Journal of Computer Research and Development
基金 国家自然科学基金项目(61673186,71771094,61876068,61972010) 泉州市高层次人才创新创业项目(2018C114R,2018C110R) 福建省科技计划项目(2017H01010065,2019H01010129)~~
关键词 聚类算法 密度峰值 大数据 数据挖掘 密度聚类 clustering algorithm density peak big data data mining density clustering
  • 相关文献

参考文献3

二级参考文献20

  • 1Xu Rui, Wunsch D II. Survey of clustering algorithms [J]. IEEE Trans on Neural Networks, 2005, 16(3): 645-678.
  • 2Kaufman L, Peter R. Clustering by Means of Medoids [G] // Statistical Data Analysis Based on the IA Norm and Related Methods. North-Holland: North-Holland Press, 1987: 405- 416.
  • 3MacQueen J. Some methods for classification and analysis of multivariate observations[C] //Proc of the 5th Berkeley Symp on Mathematical Statistics and Probability. Berkeley, CA: University of California Press, 1967 281-297.
  • 4Zhang W, Wang X, Zhao D, et al. Graph Degree Linkage: Agglomerative Clustering on a Directed Graph [M] . Berlin: Springer, 2012:428-441.
  • 5Ester M, Kriegel H P, Sander J, et al. A density based algorithm for discovering clusters in large spatial databases with noise [C] //Proc of ACM KDD'96. New York: ACM, 1996:226-231.
  • 6Wang W, Jiong Y, Muntz R. STING: A statistical information grid approach to spatial data mining [C]//Proc of VLDB'97. San Francisco, CA: Morgan Kau{mann, 1997: 186-195.
  • 7Alex R, Alessandro L. Clustering by fast search and find of density peaks [J]. Science, 2014, 344(1492) :1492-1496.
  • 8Jeffrey D, Sanay G. MapReduce.. Simplified data processing on large clusters [J]. Communications of the ACM, 2004, 51(1) : 107-113.
  • 9Akdogan A, Demiryurek U, Banael Kashani F, et al. Voronoi-based geospatial query processing with MapReduee [C]//Proc of CloudCom '10. Piscataway, NJ: IEEE, 2010: 9-16.
  • 10Lu Wei, Shen Yanyan, Chen Su, etc. Efficient processing of k nearest neighbor joins using MapReduce [J]. VLDB Endowment, 2012, 5(10)= 1016-1027.

共引文献22

同被引文献430

引证文献53

二级引证文献212

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部