摘要
密度峰值聚类(DPC)算法是一种基于密度的聚类算法。该算法原理简单、运行高效,可以找到任意非球形类簇。但是该算法存在一些缺陷:首先,该算法局部密度定义的度量准则不统一且两者的聚类结果存在较大差异;其次,该算法的分配策略易产生分配连带错误,即一旦某一个样本分配错误,会导致后续一连串的样本分配错误。为解决这些问题,提出了一种加权K近邻和多簇合并的密度峰值聚类算法(WKMM-DPC)。该算法结合加权K近邻的思想,引入样本的权重系数,重新定义样本的局部密度,使局部密度更加依赖于K近邻内样本的位置,且统一了密度定义的度量准则;定义了类簇间的相似度,并据此度量准则进行多簇合并,以避免分配剩余样本时的分配连带错误。在人工和UCI数据集上的实验表明,该算法的聚类效果优于FKNNDPC、DPCSA、FNDPC、DPC和DBSCAN算法。
Density peaks clustering(DPC) algorithm is a clustering algorithm based on density. The algorithm is simple in principle and efficient in operation, and can find any non-spherical class clusters. However, there are some defects in the algorithm. Firstly, the measurement criteria defined by the local density are not uniform and there are great differences in the clustering results. Secondly, the allocation strategy is prone to allocation errors, that is once a sample is incorrectly allocated, a series of subsequent samples will be incorrectly allocated too. In order to solve these problems, this paper proposes a weighted K-nearest neighbors and multi-cluster merge density peaks clustering(WKMM-DPC) algorithm. Combined with the idea of weighted K-nearest neighbors, the local density of the sample is redefined by introducing the weight coefficient of the sample, which makes the local density more dependent on the position of the sample in the K-nearest neighbors, and unifies the measurement criteria of density definition. The similarity between clusters is defined, and the clusters are merged according to the metric to avoid the joint error in the allocation of remaining samples. Experiments on artificial and UCI datasets show that the clustering performance of the proposed algorithm is better than that of FKNN-DPC, DPCSA, FNDPC, DPC and DBSCAN algorithms.
作者
陈磊
吴润秀
李沛武
赵嘉
CHEN Lei;WU Runxiu;LI Peiwu;ZHAO Jia(School of Information Engineering,Nanchang Institute of Technology,Nanchang 330099,China)
出处
《计算机科学与探索》
CSCD
北大核心
2022年第9期2163-2176,共14页
Journal of Frontiers of Computer Science and Technology
基金
江西省教育厅科技项目(GJJ180940)
国家自然科学基金(52069014,51669014)
江西省杰出青年基金(2018ACB21029)。
关键词
聚类
局部密度
密度峰值
K近邻(KNN)
多簇合并
clustering
local density
density peaks
K-nearest neighbors(KNN)
multi-cluster merge