期刊文献+

基于深度集成学习的类极度不均衡数据信用欺诈检测算法 被引量:20

Credit Fraud Detection for Extremely Imbalanced Data Based on Ensembled Deep Learning
在线阅读 下载PDF
导出
摘要 信用欺诈数据分布极度不均衡时,信息失真、周期性统计误差和报告偏倚所产生的噪声错误对训练模型干扰凸显,且易产生过拟合现象.鉴于此,提出一种深度信念神经网络集成算法来解决类极度不均衡的信用欺诈问题.首先,提出双向联合采样算法克服信息缺失和过拟合问题;然后,构造2阶段基分类器簇,针对支持向量机(support vector machine,SVM)对不均衡数据分布所表现的分类超平面向少数类偏移问题,利用增强(boosting)算法生成SVM与随机森林(random forest,RF)结合的基分类器簇;利用深度信念网络(deep belief network,DBN)整合基分类器簇的多元预测,输出分类结果.考虑传统精度评价指标过度关注多数类样本,忽视信用欺诈存在违约损失高于利息收益事实,引入成本-效益指数兼顾正类和负类样本的识别能力,提高模型对少数类样本预测精度.通过对欧洲信用卡欺诈数据检测发现,相比于其他相关算法成本效益指数均值提高3个百分点,同时,实验比较样本不均衡比例对算法精度影响,结果表明在处理极端不均衡数据时所提算法效果更优. The existence of class imbalance in credit fraud data significantly undermines model performance.In particular,when the sample distribution is extremely unbalanced,noise caused by information distortion,statistical discrepancy and reporting bias will severely damage the process of training models,leading to potential issues such as overfitting.For this reason,this paper proposes an algorithm based on ensembled deep belief network,which is meant to tackle credit fraud data featured by extreme imbalance.First,this paper proposes joint sampling strategy combining under-sampling and over-sampling to retrieve training subset data.Then,we introduce an algorithm of constructing classifier clusters through two stages.Support vector classifiers and random forest classifiers are combined by using Boosting algorithm to overcome classification interface deviation of support vector machine.Finally,deep belief network is exploited to assemble classifiers predictions and output final classification result.Besides,traditional evaluation methods put too much emphasis on majority samples,ignoring the reality where the minority matters even more.The revenue cost index that considers identification of both positive and negative samples has thereby been introduced.This paper conducts empirical study on European credit card data and concludes a 3%higher performance on revenue cost index of the proposed algorithm than others average.The experiment also evaluates the influence of imbalance ratio over algorithm s performance and finds that proposed algorithm outperforms others in this aspect.
作者 刘颖 杨轲 Liu Ying;Yang Ke(School of Management Science and Information Engineering,Jilin University of Finance and Economics,Changchun 130117;School of Taxation,Jilin University of Finance and Economics,Changchun 130117)
出处 《计算机研究与发展》 EI CSCD 北大核心 2021年第3期539-547,共9页 Journal of Computer Research and Development
基金 国家社会科学基金项目(20BTJ062)。
关键词 信用欺诈 类极不均衡 深度信念神经网络 支持向量机 成本-效益指数 credit fraud extremely imbalanced data deep belief network(DBN) support vector machine(SVM) revenue cost index
  • 相关文献

参考文献2

二级参考文献5

共引文献18

同被引文献191

引证文献20

二级引证文献45

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部