期刊文献+

一种适用于混合属性数据的K近邻方法 被引量:2

A Novel K-Nearest Neighbor Method with an Application to Mixed-Attribute Data
原文传递
导出
摘要 对于传统K近邻算法只适用于数值属性数据类型的问题,提出了一种基于对混合属性数据中的不同属性列赋予不同权值的K近邻算法(K Nearest Neighbor for Mixed-attribute Data,KNNM),使新的K近邻算法能够适用于混合属性数据.由于混合数据间数值属性部分与分类属性部分对整体相似性度量的贡献率不同,又各分量对其所属的属性部分的相似性度量的贡献率不同的特点.提出了考虑数值属性部分与分类属性部分作为整体对混合属性数据间的相似性度量的贡献率,并考虑不同属性数据的各分量对其所属的数据间的相似性度量的贡献率的向量参数计算方法,以此提出了一种适用于混合属性数据的K近邻方法.在5个UCI数据集上的实验结果表明KNNM算法在准确率,宏平均召回率,宏平均精度、宏平均值和ROC均优于传统K近邻算法,以此说明KNNM方法在混合属性数据上的适用性与有效性. According to the problem of traditional k-Nearest Neighbor(KNN) algorithm that it’s only applicable to numerical data,this paper proposes a novel KNN algorithm based on assign different weights to different attribute columns between mixed attribute data(K Nearest Neighbor for Mixed-attribute Data,KNNM),which is suitable for mixed attribute data.As part of numerical data and part of category data in mixed attribute data make different contributions to the whole similarity measure,and the contribution of each component to the similarity measure of the attribute part to which it belongs is different.This paper proposes a computing vectors-based parameters method,which considers two contributions of part of numerical data and part of category data in mixed attribute data as a whole respectively to the whole similarity measure,and consider the contribution of each component to the data to which it belongs.Based this view,this paper presents the vector-based KNNM,which is suitable for mixed attribute data.The experimental results on five UCI datasets show that KNNM is superior to KNN in views of accuracy,macro average recall,macro average precision,macro average F1 measure and ROC,that is,KNNM algorithm is suitable and effective for mixed attribute data.
作者 刘佳宇 周凌云 吴秋峰 孟翔燕 邓华玲 LIU Jia-yu;ZHOU Ling-yun;WU Qiu-feng;MENG Xiang-yan;DENG Hua-ling(College of Economics and Management,Northeast Agricultural University,Harbin 150030,China;College of Economics,Heilongjiang University of Finance and Economic,Harbin 150030,China;College of Engineering,Northeast Agricultural University,Harbin 150030,China;College of Science,Northeast Agricultural University,Harbin 150030,China)
出处 《数学的实践与认识》 北大核心 2020年第16期132-143,共12页 Mathematics in Practice and Theory
基金 公益性行业(农业)科研专项项目二级任务(201503116-04-06) 黑龙江省博士后基金(LBHZ15020) 国家科技支撑计划专题任务(2014BAD12B01-1-3) 哈尔滨市科技创新人才研究专项资金(青年后备人才)(2017RAQXJ096) 半湿润区粳稻水分高效利用技术集成与示范(2018YFD0300105-2)。
关键词 混合属性数据 相似性度量 K近邻 参数计算方法 主成分分析法 mixed-attribute data similarity measure K nearest neighbor Computing parame ters method principal component analysis
  • 相关文献

参考文献10

二级参考文献101

  • 1朱颢东,钟勇,赵向辉.一种优化初始中心点的K-Means文本聚类算法[J].郑州大学学报(理学版),2009,41(2):29-32. 被引量:13
  • 2王宇,杨莉.基于凝聚函数的混合属性数据聚类算法[J].大连理工大学学报,2006,46(3):446-448. 被引量:2
  • 3苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:392
  • 4周晓飞,杨静宇,姜文瀚.核最近邻凸包分类算法[J].中国图象图形学报,2007,12(7):1209-1213. 被引量:6
  • 5GAN G,YANG Z,WU J.A genetic fuzzy K-modes algorithm for clustering categorical data[J].Expert Systems with Applications:An International Journal,2009,32(2):1615-1620.
  • 6HUANG Z.Extensions to the K-means algorithm for clustering large data sets with categorical values[J].Data Mining and Knowledge Discovery II,1998(2):283-304.
  • 7HUANG Z,MA N G.Fuzzy K-modes algorithm for clustering categorical data[J].IEEE Transacitons on Fuzzy Systems,1999,7(4):446 -452.
  • 8韩立群.人工神经网络[M].北京:北京邮电出版社,2006.
  • 9Yu K, Ji L, Zhang X G. Kernel nearest-neighbor algorithm. Neural Processing Letters, 2002, 15(2): 147-156.
  • 10Amari Shun-ichi, Nagaoka H. Methods of Information Geometry (Translations of Mathematical Monographs). New Orleans: American Mathematical Society, 2000.

共引文献480

同被引文献28

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部