摘要
针对"基于密度的带有噪声的空间聚类"(DBSCAN)算法存在的不足,提出"分而治之"和高效的并行方法对DBSCAN算法进行改进.通过对数据进行划分,利用"分而治之"思想减少全局变量Eps值的影响;利用并行处理方法和降维技术提高聚类效率,降低DBSCAN算法对内存的较高要求;采用增量式处理方式解决数据对象的增加和删除对聚类的影响.结果表明:新方法有效地解决了DBSCAN算法存在的问题,其聚类效率和聚类效果明显优于传统DBSCAN聚类算法.
An improved density based spatial clustering of applications with noise(DBSCAN) algorithm, which can considerably improve cluster quality, is proposed. The algorithm is based on two ideas: dividing and ruling, and; high performance parallel methods. The idea of dividing and ruling was used to reduce the effect of the global variable Eps by data partition. Parallel processing methods and the technique of reducing dimensionality were used to improve the efficiency of clustering and to reduce the large memory space requirements of the DBSCAN al- gorithm. Finally, an incremental processing method was applied to determine the influence on clustering of inserting or deleting data objects. The results show that an implementation of the new method solves existing problems treated by the DBSCAN algorithm: Both the efficiency and the cluster quality are better than for the original DBSCAN algorithm.
出处
《中国矿业大学学报》
EI
CAS
CSCD
北大核心
2008年第1期105-111,共7页
Journal of China University of Mining & Technology
基金
福建省自然科学基金项目(A0310008)
福建省高新技术研究开放计划重点项目(2003H043)
关键词
聚类
DBSCAN
划分
并行
cluste- ring
DBSCAN
partition
parallel Key words: clustering
DBSCAN
partition
parallel