期刊文献+

基于包络学习和分级结构一致性机制的不平衡集成算法 被引量:1

Imbalanced Ensemble Algorithm Based on Envelope Learning and Hierarchical Structure Consistency Mechanism
在线阅读 下载PDF
导出
摘要 集成方法是不平衡学习方法的重要分支,然而,现有不平衡集成方法均作用于原样本而没考虑样本的结构信息,因此其效能仍然有限.样本的结构信息包括局部和全局结构信息.为了解决上述问题,本文提出了一种基于深度样本包络网络(Deep Instance Envelope Network,DIEN)和分级结构一致性机制(Hierarchical Structure Consistency Mechanism,HSCM)的不平衡集成学习算法.该算法在考虑局部流形和全局结构信息的情况下,通过多层样本聚类,生成高质量的多层包络样本,从而实现类平衡化.首先,算法基于样本近邻拼接和模糊C均值聚类算法,设计DIEN来挖掘样本的结构信息,得到深度包络样本.然后,设计局部流形结构度量和全局结构分布度量来构建HSCM用于增强层间样本的分布一致性.接着,将DIEN和HSCM结合起来,构建出优化后的深度样本包络网络——DH(DIEN with HSCM).之后,将基分类器应用于包络样本.最后,设计bagging集成学习机制来融合基分类器的预测结果.文末组织了多组实验,采用了十多个公共数据集和有代表性的相关算法进行验证比较.实验结果表明,本文算法在AUC(Area Under Curve),F-measure等四个性能指标上显著最优. Ensemble methods have become an important branch of imbalanced learning.However,the existing imbal⁃anced ensemble methods all rely on the original instances without considering the structure information of the instances,so their effectiveness is still limited.The research shows that the structure information of instances includes local and global structure information.In order to solve the above problem,this paper proposes an imbalanced ensemble algorithm based on deep instance envelope network(DIEN)and hierarchical structure consistency mechanism(HSCM).Considering the local manifold and global structure information,the algorithm generates high-quality deep envelope instances to achieve class bal⁃ance.Firstly,based on the instance neighborhood concatenation and fuzzy c-means clustering algorithm,the DIEN is de⁃signed to mine the structure information of instances,obtaining the deep envelope instances.Then,the local manifold struc⁃ture measure and global structure distribution measure are designed to construct the HSCM to enhance the distribution con⁃sistency of interlayer instances.Next,DIEN and HSCM are combined to construct the optimized deep instance envelope net⁃work—DH(DIEN with HSCM).Then,the base classifier is applied to the deep envelope instances.Finally,the bagging en⁃semble learning mechanism is designed to fuse the prediction results of the base classifier to obtain the final results.At the end of this paper,several groups of experiments are organized.More than 10 public datasets and representative related algo⁃rithms are used for verification.Experimental results show that the proposed algorithm is significantly better in four performance metrics,such as AUC(Area Under Curve)and F-measure.
作者 李帆 张小恒 李勇明 王品 LI Fan;ZHANG Xiao-heng;LI Yong-ming;WANG Pin(College of Communication Engineering,Chongqing University,Chongqing 400030,China;Chongqing Radio&TV University,Chongqing 400052,China)
出处 《电子学报》 EI CAS CSCD 北大核心 2024年第3期751-761,共11页 Acta Electronica Sinica
基金 国家自然科学基金(No.61771080,No.U21A20448) 中央高校基本科研业务费(No.2022CDJJJ-003)。
关键词 不平衡学习 包络学习 分级结构一致性机制 局部流形结构度量 全局结构分布度量 imbalanced learning envelope learning hierarchical structure consistency mechanism local manifold structure measure global structure distribution measure
  • 相关文献

参考文献4

二级参考文献75

  • 1H Wang, et al. Mining concept-drifting data streams using ensemble classifiers[ A ]. Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining[C] .New York: ACM Press,2003.226- 235.
  • 2M Scholz, R Klinkenberg. An ensemble classifier for drifting concepts[ A]. Proceedings of the Second International Work- shop on Knowledge Discovery in Data Streams [ C]. Porto, Portugal: Springer,2005.53 - 64.
  • 3Wei Fan. Systematic data selection to mine concept - drifting data streams[A]. Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining[C] .New York: ACM Press,2004. 128- 137.
  • 4J Z Kolter, M A Maloof. Using additive expert ensembles to cope with concept drift [ A]. Proceedings of the 22nd International Conference on Machine Learning[C]. New York: ACM Press, 2005.449 - 456.
  • 5G M Weiss, F Provost. Learning when training data are costly: the effect of class distribution on tree induction[ J]. JOUlllal of Artificial Intelligence Research, 2003, (19) : 315 - 354.
  • 6N V Chawla, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, (16) :321 - 357.
  • 7G M Weiss. Mining with rarity: a unifying framework[ J]. ACM SIGKDD Explorations, 2004,6( 1 ) :8 - 19.
  • 8C Elkan. The foundations of cost - sensitive learning[A]. Proceedings of the 17th International Joint Conference on Artificial Intelligence[C]. Seattle, Washington, USA: Morgan Kaufinann Publishers Inc, 2001. 973 - 978.
  • 9M Ciraco, M Rogalewski, G Weiss. Improving classifier utility by altering the misclassification cost ratio[A]. Proceedings of the 1st International Workshop on Utility-based Data Mining [C] .New York: ACM Press,2005.46- 52.
  • 10C X Ling, V S Sheng. Cost-sensitive learning and the class imbalance problem [ A ]. Encyclopedia of Machine Learning M]. New York: Springer. 2008.

共引文献238

同被引文献22

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部