摘要
大多数非均衡数据集的研究集中于重构数据集或者代价敏感学习,针对数据集类分布非均衡和不相等误分类代价往往同时发生这一事实,在简要回顾代价敏感学习理论和现有学习算法的基础上,将所提出的自适应混合重取样算法,与基于最小误分类代价的MetaCost算法分别进行实验比较,实验表明所提出算法在代价敏感学习中具有一定的优势,实验结果显示非均衡类对代价敏感学习算法性能产生较大影响,当样本类别差异较大时,用样本类空间重构的方法可以得到较好的分类效果.
Most studies on the imbalanced data set classification focused on discussion of re-sampling or cost-sensitive learning systems themselves,however,the fact that imbalanced class distribution and misclassification errors cost unequally always occurring simultaneously was neglected.On the basis of analyzing the theory and algorithm of cost-sensitive learning,a novel hybrid re-sampling technique based on Automated Adaptive Selection of the Number of Nearest Neighbors in order to solve the misclassification problem of imbalanced data set is proposed.We compared hybrid re-sampling algorithm and MetaCost algorithm,Experiment results show that the proposed method can improve the classification accuracy and decrease the misclassification cost effectively.The experimental results confirm that this algorithm is superior to the traditional algorithms as for dealing with the imbalanced problem.
出处
《微电子学与计算机》
CSCD
北大核心
2011年第8期146-149,153,共5页
Microelectronics & Computer
基金
国家自然科学基金项目(61075063)
国家高技术研究发展计划("八六三"计划)项目(2009AA12Z117)
湖北省自然科学基金项目(2010CDB05201)
湖北省教育厅中青年项目(Q20112604)
关键词
分类
非均衡数据集
混合重取样
代价敏感学习
classification
imbalanced dataset
Hybrid Re-sampling
Cost Sensitive Learning