期刊文献+

基于信息熵的粗糙K-prototypes聚类算法 被引量:4

Rough K-prototypes clustering algorithm based on entropy
在线阅读 下载PDF
导出
摘要 针对传统K-prototypes在计算分类属性的差异度时未考虑各个分类属性对聚类结果的影响程度,且算法容易受到噪声的干扰,无法处理数据中不够精确、不完整等不确定性问题,提出基于信息熵的粗糙K-prototypes聚类算法。在计算数据样本之间分类属性的差异度时,使用信息熵的理论,确定每个分类属性对于聚类分析结果的影响权重;引入粗糙理论,计算得到各样本与粗糙模之间的粗糙相异度,通过多次迭代计算,获得最终聚类结果。该算法结合信息熵和粗糙理论,可区别对待各分类属性,解决数据不精确引起的不确定性问题,4个UCI数据集上的实验分析结果验证了该算法的有效性。 The traditional K-prototypes fail to concern the degree that every category attribution effecting clustering results,and they are easily disturbed by noise and can not deal with these uncertain problems including inaccuracy and incompleteness.To solve these faults,an entropy based rough K-prototypes clustering algorithm was provided.Firstly,using entropy theory,the weight of every category attribution for clustering result was got,and then using rough theory,the rough dissimilarity degree be-tween every sample and every rough model was calculated.Through multi iterative calculations,the final clustering results were got.The new method combines information entropy theory and rough set theory,which can differentiate every category attribu-tion and cope with uncertain problems caused by inaccuracy data.Results of experiments on four UCI data sets show the algo-rithm is effective.
出处 《计算机工程与设计》 北大核心 2015年第5期1239-1243,共5页 Computer Engineering and Design
基金 国家自然科学基金项目(61364006) 广西自然科学基金项目(2013GXNSFAA019336 2013GXNSFBA019280) 广西高校科学技术研究基金项目(LX2014190) 广西科技大学科学基金项目(校科自1261128)
关键词 混合型数据 聚类 信息熵 粗糙集 数据挖掘 mixed data clustering information entropy rough set data mining
  • 相关文献

参考文献18

  • 1Han Jiawei, Micheline Kamber, Pei Jian. Data mining: Con cept and techniques [M]. 3rd Edition. Beijing: China Ma- chine Press, 2012.
  • 2Saeed Aghabozorgi, Ying Wah Teh. Stock market co-move- ment assessment using a three-phase clustering method [J]. Expert Systems with Applications, 2014, 41 (4): 1301-1314.
  • 3Donatella Vicari, Marco Alfe. Model based clustering of cus tomer choice data [J]. Computational Statistics Data Analy- sis, 2014, 71: 3-13.
  • 4Dhiah A1-Shammary, Ibrahim Khalil, Zahir Tari, et al. Frac- tal self-similarity measurements based clustering technique for SOAP Web messages [J]. Journal of Parallel and Distributed Computing, 2013, 73 (5): 664-676.
  • 5Michael Hackenberg, Antonio Rueda, Pedro Carpena, et al. Clustering of DNA words and biological function: A proof of principle [J]. Journal of Theoretical Biology, 2012, 297 (21) : 127-136.
  • 6Huang Z. Clustering large data sets with mixed numeric and categorical values [C] //Proceedings of the First Pacific Asia Knowledge Discovery and Data Mining Confenence, Singapore: World Scintific, 1997: 21-34.
  • 7孙浩军,闪光辉,高玉龙,等.一种高维混合属性数据聚类算法[0L].[2013-11-14].http://d.g.wanfangdata.com.cn/Periodical-pre_849c8593-e9c8-4664-aal6-c3e122d74bc8.aspx.
  • 8陈韡,王雷,蒋子云.基于K-prototypes的混合属性数据聚类算法[J].计算机应用,2010,30(8):2003-2005. 被引量:16
  • 9Ji Jinchao, Bai Tian, Zhou Chunguang, et al. An improved K-prototypes clustering algorithm for mixed numeric and cate- gorical data[J].Neurocomputing, 2013, 120: 590-596.
  • 10Huang ZX, Ng MK, Rong HQ, et al. Automated variable weighting in k-means type clustering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27 (5): 657-668.

二级参考文献22

  • 1王宇,杨莉.基于凝聚函数的混合属性数据聚类算法[J].大连理工大学学报,2006,46(3):446-448. 被引量:2
  • 2GAN G,YANG Z,WU J.A genetic fuzzy K-modes algorithm for clustering categorical data[J].Expert Systems with Applications:An International Journal,2009,32(2):1615-1620.
  • 3HUANG Z.Extensions to the K-means algorithm for clustering large data sets with categorical values[J].Data Mining and Knowledge Discovery II,1998(2):283-304.
  • 4HUANG Z,MA N G.Fuzzy K-modes algorithm for clustering categorical data[J].IEEE Transacitons on Fuzzy Systems,1999,7(4):446 -452.
  • 5Pawlak Z. Rough sets[J]. International Journal of Information Computer Sciences, 1982,11 : 145-172.
  • 6Lingras P, West C. Interval set clustering of web users with rough k-means[J]. Journal of Intelligent Information Systems, 2004,23(1) :5-16.
  • 7Mitra S, Banka H, Pedrycz W. Rough-Fuzzy collaborative clustering[J]. IEEE Transactions on Systems, Man, and Cybernetics- Part B.. Cybernetics, 2006,36 (4): 795-805.
  • 8Bezdek J C. Pattern recognition with fuzzy objective function algorithms[M]. New York: Plenum, 1981.
  • 9Pedryez W. Shadowed sets: representing and processing fuzzy sets[J]. IEEE Transactions on Systems, Man, and Cybernetics- Part B.-Cybernetics, 1998,28(1): 103-109.
  • 10Pakhira M K, Bandyopadhyay S, Maulik U. Validity index for crisp and fuzzy clusters[J]. Pattern Recognition, 2004,37 : 487- 501.

共引文献64

同被引文献26

引证文献4

二级引证文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部