期刊文献+

一种面向不完备数据的集对粒层次聚类算法 被引量:6

Set Pair Granule Hierarchical Clustering Algorithm for Incomplete Data
在线阅读 下载PDF
导出
摘要 针对现有层次聚类算法难以处理不完备数据集,同时考虑样本与类簇之间的不确定关系,提出一种面向不完备数据的集对粒层次聚类算法-SPGCURE.首先,采用集对信息粒的知识对缺失值进行处理,不同于以往算法中将缺失属性删除或者填充,用集对联系度中的差异度来表示缺失属性值,提出一种改进的集对信息距离度量方法,用于考量不完备数据样本间的紧密程度;其次,基于改进后的集对距离度量,给出各个类簇的类内平均距离的定义,形成以正同域Cs(样本一定属于类簇)、边界域Cu(样本可能属于类簇)和负反域Co(样本不属于类簇)表示的集对粒层次聚类;SPGCURE算法在完备和不完备数据都适用,最后,选用5个经典的UCI数据集,与常用的经典及改进聚类算法进行实验评价,结果表明,SPGCURE算法在准确度、F-measure、调整兰德系数和标准互信息等指标上均具有不错的聚类性能. Based on the existing hierarchical clustering algorithm,it is difficult to deal with incomplete data sets,and considering the uncertain relationship between samples and clusters, a set pair granule hierarchical clustering algorithm for incomplete data is proposed(SPGCURE).Firstly,the missing values are processed by using the knowledge of the set pair information granules,which is different from deleting or filling the missing attributes in the previous algorithm,representing the missing attribute values by the difference degree in the set pair contact degree,and an improved set pair information distance measurement method is proposed to consider the tightness between incomplete data samples.Secondly,based on the improved set pair distance measure,the definition of the intra-cluster average distance of each cluster is given,to form set pair granule hierarchical clusters represented by positive regionCs(sample must belong to the cluster) boundary regionCu sample may belong to the cluster and negative regionCo(sample does not belong to the cluster).SPGCURE algorithm is applicable to both complete and incomplete data.Finally,five classical UCI data sets are selected for experimental evaluation with classical and improved clustering algorithms.The results show that the SPGCURE algorithm has good clustering performance in Accuracy,F-measure, Adjusted Rand Index and Normalized Mutual Information.
作者 张春英 高瑞艳 范雨祥 王龙飞 裴天帅 冯晓泽 任静 ZHANG Chun-ying;GAO Rui-yan;FAN Yu-xiang;WANG Long-fei;PEI Tian-shuai;FENG Xiao-ze;REN Jing(College of Science,North China University of Science and Technology,Tangshan 063210,China;Key Laboratory of Data Science and Application of Hebei Province,Tangshan 063210,China)
出处 《小型微型计算机系统》 CSCD 北大核心 2021年第3期522-530,共9页 Journal of Chinese Computer Systems
基金 河北省自然科学基金项目(F2018209374)资助 河北省自然科学基金项目(F2016209344)资助。
关键词 不完备数据 集对信息粒 层次聚类 集对信息距离 incomplete data set pair information granule hierarchical clustering set pair information distance
  • 相关文献

参考文献5

二级参考文献86

共引文献132

同被引文献47

引证文献6

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部