摘要
对于传统K近邻算法只适用于数值属性数据类型的问题,提出了一种基于对混合属性数据中的不同属性列赋予不同权值的K近邻算法(K Nearest Neighbor for Mixed-attribute Data,KNNM),使新的K近邻算法能够适用于混合属性数据.由于混合数据间数值属性部分与分类属性部分对整体相似性度量的贡献率不同,又各分量对其所属的属性部分的相似性度量的贡献率不同的特点.提出了考虑数值属性部分与分类属性部分作为整体对混合属性数据间的相似性度量的贡献率,并考虑不同属性数据的各分量对其所属的数据间的相似性度量的贡献率的向量参数计算方法,以此提出了一种适用于混合属性数据的K近邻方法.在5个UCI数据集上的实验结果表明KNNM算法在准确率,宏平均召回率,宏平均精度、宏平均值和ROC均优于传统K近邻算法,以此说明KNNM方法在混合属性数据上的适用性与有效性.
According to the problem of traditional k-Nearest Neighbor(KNN) algorithm that it’s only applicable to numerical data,this paper proposes a novel KNN algorithm based on assign different weights to different attribute columns between mixed attribute data(K Nearest Neighbor for Mixed-attribute Data,KNNM),which is suitable for mixed attribute data.As part of numerical data and part of category data in mixed attribute data make different contributions to the whole similarity measure,and the contribution of each component to the similarity measure of the attribute part to which it belongs is different.This paper proposes a computing vectors-based parameters method,which considers two contributions of part of numerical data and part of category data in mixed attribute data as a whole respectively to the whole similarity measure,and consider the contribution of each component to the data to which it belongs.Based this view,this paper presents the vector-based KNNM,which is suitable for mixed attribute data.The experimental results on five UCI datasets show that KNNM is superior to KNN in views of accuracy,macro average recall,macro average precision,macro average F1 measure and ROC,that is,KNNM algorithm is suitable and effective for mixed attribute data.
作者
刘佳宇
周凌云
吴秋峰
孟翔燕
邓华玲
LIU Jia-yu;ZHOU Ling-yun;WU Qiu-feng;MENG Xiang-yan;DENG Hua-ling(College of Economics and Management,Northeast Agricultural University,Harbin 150030,China;College of Economics,Heilongjiang University of Finance and Economic,Harbin 150030,China;College of Engineering,Northeast Agricultural University,Harbin 150030,China;College of Science,Northeast Agricultural University,Harbin 150030,China)
出处
《数学的实践与认识》
北大核心
2020年第16期132-143,共12页
Mathematics in Practice and Theory
基金
公益性行业(农业)科研专项项目二级任务(201503116-04-06)
黑龙江省博士后基金(LBHZ15020)
国家科技支撑计划专题任务(2014BAD12B01-1-3)
哈尔滨市科技创新人才研究专项资金(青年后备人才)(2017RAQXJ096)
半湿润区粳稻水分高效利用技术集成与示范(2018YFD0300105-2)。
关键词
混合属性数据
相似性度量
K近邻
参数计算方法
主成分分析法
mixed-attribute data
similarity measure
K nearest neighbor
Computing parame ters method
principal component analysis