摘要
为解决密度聚类算法在处理高维和多密度数据集时聚类结果不精确的问题,提出一种基于共享近邻亲和度(SNNA)的聚类算法。该算法引入k近邻和共享近邻,定义共享近邻亲和度作为对象的局部密度度量。算法首先根据亲和度来提取核心点,然后利用广度优先搜索算法对核心点进行聚类,最后对非核心点进行指派即完成整个数据集的聚类。实验结果表明,该算法能够发现任意形状、大小、密度的聚类;与同类算法相比,SNNA算法在处理高维数据时具有较高的聚类准确率。
In order to solve the problem of inaccurate clustering results when dealing with high-dimensional and multidensity datasets,a Shared Nearest Neighbor Affinity(SNNA)based clustering algorithm is put forward.The algorithm incorporates k nearest neighbor and shared nearest neighbor,and defines shared neighbor affinity as the local density measure of the object.The algorithm firstly extracts the core points according to the affinity,then uses the breadth first search algorithm to cluster the core points,and finally assigns the non-core points to the right cluster to complete the clustering of the whole data set.Experimental results show that the algorithm can find clusters of arbitrary shape,size and density.Compared with other similar algorithms,SNNA has higher clustering accuracy when dealing with high-dimensional data.
作者
邱保志
辛杭
QIU Baozhi;XIN Hang(School of Information Engineering,Zhengzhou University,Zhengzhou 450001,China)
出处
《计算机工程与应用》
CSCD
北大核心
2018年第18期184-187,222,共5页
Computer Engineering and Applications
基金
河南省基础与前沿基金(No.152300410191)
关键词
聚类
密度
共享近邻
亲和度
数据挖掘
clustering
density
shared nearest neighbor
affinity
data mining