期刊文献+

基于Spark的改进K-means算法的并行实现 被引量:13

Parallel implementation of improved K-means algorithm based on Spark
在线阅读 下载PDF
导出
摘要 针对K-means聚类算法存在的不足,提出了改进K-means来提高算法的性能,利用简化后的轮廓系数作为评估标准衡量K-means算法中k值,采用K-means++完成K-means算法初始中心点的选择。设置好k值以及初始中心点后使用形态学相似距离作为相似度测量标准将数据点归属到距离最近的中心点形成的簇中,最后计算平均轮廓系数确定合适的k值,并在Spark上实现算法并行化。通过对四个标准数据集在准确性、运行时间和加速比三个方面的实验表明,改进后的K-means算法相对于传统的K-means和SKDK-means算法不仅提高了聚类划分质量,缩短了计算时间,而且在多节点的集群环境下表现出了良好的并行性能。实验结果分析出提出的改进算法能有效提高算法执行效率和并行计算能力。 Aiming at the deficiency of K-means clustering algorithm,this paper proposed an improved algorithm with the use of simplified silhouette coefficient as the evaluation criterion to measure the k value in K-means to boost the algorithm performance.It used the K-means++algorithm to choose the initial center points in the K-means algorithm.After setting the k value and the initial center point,it used morphology similarity distance as the similarity measurement standard to assign the data points to the cluster formed by the closest center point.And finally it calculated the average silhouette coefficient to determine the appropriate k value.It implemented the improved algorithm on Spark.Experiments on accuracy,run-time and speedup of four standard datasets show that the improved K-means algorithm can not only improve the quality of clustering division that compared with the traditional K-means algorithm and SKDK-means algorithm,but also shortens the calculation time,showing good parallel performance in a multi-node cluster environment.The experimental results suggest that the improved algorithm can effectively improve the algorithm execution efficiency and parallel computing ability.
作者 杜佳颖 段隆振 段文影 卜秋瑾 Du Jiaying;Duan Longzhen;Duan Wenying;Bu Qiujin(College of Information Engineering,Nanchang University,Nanchang 330031,China)
出处 《计算机应用研究》 CSCD 北大核心 2020年第2期434-436,497,共4页 Application Research of Computers
基金 国家自然科学基金资助项目(61070139,81460769).
关键词 聚类算法 简化轮廓系数 形态学相似距离 相似性度量 clustering algorithm simplified silhouette coefficient morphology similaly distance(MSD) similarity measurement
  • 相关文献

参考文献4

二级参考文献24

共引文献26

同被引文献145

引证文献13

二级引证文献43

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部