期刊文献+

GA-iForest: An Efficient Isolated Forest Framework Based on Genetic Algorithm for Numerical Data Outlier Detection 被引量:4

GA-iForest: An Efficient Isolated Forest Framework Based on Genetic Algorithm for Numerical Data Outlier Detection
在线阅读 下载PDF
导出
摘要 With the development of data age,data quality has become one of the problems that people pay much attention to.As a field of data mining,outlier detection is related to the quality of data.The isolated forest algorithm is one of the more prominent numerical data outlier detection algorithms in recent years.In the process of constructing the isolation tree by the isolated forest algorithm,as the isolation tree is continuously generated,the difference of isolation trees will gradually decrease or even no difference,which will result in the waste of memory and reduced efficiency of outlier detection.And in the constructed isolation trees,some isolation trees cannot detect outlier.In this paper,an improved iForest-based method GA-iForest is proposed.This method optimizes the isolated forest by selecting some better isolation trees according to the detection accuracy and the difference of isolation trees,thereby reducing some duplicate,similar and poor detection isolation trees and improving the accuracy and stability of outlier detection.In the experiment,Ubuntu system and Spark platform are used to build the experiment environment.The outlier datasets provided by ODDS are used as test.According to indicators such as the accuracy,recall rate,ROC curves,AUC and execution time,the performance of the proposed method is evaluated.Experimental results show that the proposed method can not only improve the accuracy and stability of outlier detection,but also reduce the number of isolation trees by 20%-40%compared with the original iForest method. With the development of data age,data quality has become one of the problems that people pay much attention to. As a field of data mining,outlier detection is related to the quality of data. The isolated forest algorithm is one of the more prominent numerical data outlier detection algorithms in recent years. In the process of constructing the isolation tree by the isolated forest algorithm,as the isolation tree is continuously generated,the difference of isolation trees will gradually decrease or even no difference,which will result in the waste of memory and reduced efficiency of outlier detection. And in the constructed isolation trees,some isolation trees cannot detect outlier. In this paper,an improved iForest-based method GA-iForest is proposed. This method optimizes the isolated forest by selecting some better isolation trees according to the detection accuracy and the difference of isolation trees,thereby reducing some duplicate,similar and poor detection isolation trees and improving the accuracy and stability of outlier detection. In the experiment,Ubuntu system and Spark platform are used to build the experiment environment. The outlier datasets provided by ODDS are used as test. According to indicators such as the accuracy,recall rate,ROC curves,AUC and execution time,the performance of the proposed method is evaluated. Experimental results show that the proposed method can not only improve the accuracy and stability of outlier detection,but also reduce the number of isolation trees by 20%—40% compared with the original iForest method.
出处 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI CSCD 2019年第6期1026-1038,共13页 南京航空航天大学学报(英文版)
基金 supported by the State Grid Liaoning Electric Power Supply CO, LTDthe financial support for the “Key Technology and Application Research of the Self-Service Grid Big Data Governance (No.SGLNXT00YJJS1800110)”
关键词 outlier detection isolation tree isolated forest genetic algorithm feature selection outlier detection isolation tree isolated forest genetic algorithm feature selection
  • 相关文献

参考文献1

二级参考文献9

共引文献11

同被引文献24

引证文献4

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部