摘要
为了提高传统支持向量域描述(C-SVDD)算法处理不均衡数据集的分类能力,提出一种基于密度敏感最大软间隔支持向量域描述(DSMSM-SVDD)算法.该算法通过对多数类样本引入相对密度来体现训练样本原始空间分布对求解最优分类界面的影响,通过在目标函数中增加最大软间隔正则项,使C-SVDD的分类边界向少数类偏移,进而提高算法分类性能.算法首先对每个多数类样本计算相对密度来反映样本的重要性,然后将训练样本输入到DSMSM-SVDD中实现数据分类.实验部分,讨论了算法参数间的关系及其对算法分类性能的影响,给出算法参数取值建议.最后通过与C-SVDD的对比实验,表明本文建议的算法在不均衡数据情况下的分类性能优于C-SVDD算法.
In order to improve the conventional support vector domain description(C-SVDD)algorithm’s classification performance under unbalanced datasets,a novel maximum soft margin support vector domain descriptionalgorithm based on density sensitivity(DSMSM-SVDD)is presented.The relative density informationof the majority samples is introduced to reflect the impact of original training sample’s space distribution on the optimal interface,by adding the maximum soft margin regularization term in the objective function,the classification boundary of the C-SVDD algorithm is shifted tominority classes,and consequently the classification performance of the proposed algorithm is significantly improved.Firstly,the relative density of each majority sample is calculated to reflect the importance of the training samples,and then the obtained training samples with relative density are input into the proposed DSMSM-SVDD algorithm to implement the classificationtask.In the experiments,the relationship of the parameters and the influence of the parameters on classification performance are investigated.Finally,the comparison results with C-SVDD algorithm demonstrate that the proposed algorithm is superior to the C-SVDD algorithm in the case of unbalanced data.
作者
陶新民
李晨曦
沈微
常瑞
王若彤
刘艳超
TAO Xin-min;LI Chen-xi;SHEN Wei;CHANG Rui;WHANG Ruo-tong;LIU Yan-chao(College of Engineering and Technology,University of Northeast Forestry,Harbin,Heilongjiang150040,China)
出处
《电子学报》
EI
CAS
CSCD
北大核心
2018年第11期2725-2732,共8页
Acta Electronica Sinica
基金
中央高校基本科研业务费专项资金(No.2572017EB02
No.2572017CB07)
东北林业大学双一流科研启动基金(No.411112438)
哈尔滨市科技局创新人才基金(No.2017RAXXJ018)
国家自然科学基金(No.31570547)
关键词
支持向量域数据描述
不均衡数据
相对密度
support vector domain description
unbalanced datasets
relative density