摘要
集成方法是不平衡学习方法的重要分支,然而,现有不平衡集成方法均作用于原样本而没考虑样本的结构信息,因此其效能仍然有限.样本的结构信息包括局部和全局结构信息.为了解决上述问题,本文提出了一种基于深度样本包络网络(Deep Instance Envelope Network,DIEN)和分级结构一致性机制(Hierarchical Structure Consistency Mechanism,HSCM)的不平衡集成学习算法.该算法在考虑局部流形和全局结构信息的情况下,通过多层样本聚类,生成高质量的多层包络样本,从而实现类平衡化.首先,算法基于样本近邻拼接和模糊C均值聚类算法,设计DIEN来挖掘样本的结构信息,得到深度包络样本.然后,设计局部流形结构度量和全局结构分布度量来构建HSCM用于增强层间样本的分布一致性.接着,将DIEN和HSCM结合起来,构建出优化后的深度样本包络网络——DH(DIEN with HSCM).之后,将基分类器应用于包络样本.最后,设计bagging集成学习机制来融合基分类器的预测结果.文末组织了多组实验,采用了十多个公共数据集和有代表性的相关算法进行验证比较.实验结果表明,本文算法在AUC(Area Under Curve),F-measure等四个性能指标上显著最优.
Ensemble methods have become an important branch of imbalanced learning.However,the existing imbal⁃anced ensemble methods all rely on the original instances without considering the structure information of the instances,so their effectiveness is still limited.The research shows that the structure information of instances includes local and global structure information.In order to solve the above problem,this paper proposes an imbalanced ensemble algorithm based on deep instance envelope network(DIEN)and hierarchical structure consistency mechanism(HSCM).Considering the local manifold and global structure information,the algorithm generates high-quality deep envelope instances to achieve class bal⁃ance.Firstly,based on the instance neighborhood concatenation and fuzzy c-means clustering algorithm,the DIEN is de⁃signed to mine the structure information of instances,obtaining the deep envelope instances.Then,the local manifold struc⁃ture measure and global structure distribution measure are designed to construct the HSCM to enhance the distribution con⁃sistency of interlayer instances.Next,DIEN and HSCM are combined to construct the optimized deep instance envelope net⁃work—DH(DIEN with HSCM).Then,the base classifier is applied to the deep envelope instances.Finally,the bagging en⁃semble learning mechanism is designed to fuse the prediction results of the base classifier to obtain the final results.At the end of this paper,several groups of experiments are organized.More than 10 public datasets and representative related algo⁃rithms are used for verification.Experimental results show that the proposed algorithm is significantly better in four performance metrics,such as AUC(Area Under Curve)and F-measure.
作者
李帆
张小恒
李勇明
王品
LI Fan;ZHANG Xiao-heng;LI Yong-ming;WANG Pin(College of Communication Engineering,Chongqing University,Chongqing 400030,China;Chongqing Radio&TV University,Chongqing 400052,China)
出处
《电子学报》
EI
CAS
CSCD
北大核心
2024年第3期751-761,共11页
Acta Electronica Sinica
基金
国家自然科学基金(No.61771080,No.U21A20448)
中央高校基本科研业务费(No.2022CDJJJ-003)。
关键词
不平衡学习
包络学习
分级结构一致性机制
局部流形结构度量
全局结构分布度量
imbalanced learning
envelope learning
hierarchical structure consistency mechanism
local manifold structure measure
global structure distribution measure