期刊文献+

数据挖掘中特征选择算法研究 被引量:14

Research On Feature Selection Algorithm In Data Mining
在线阅读 下载PDF
导出
摘要 针对在数据挖掘过程中存在的数据冗余特征和维灾难问题,依据Relief F算法和主成分分析算法的理论基础方法,建立了基于Relief F优化的核主成成分析的二次特征选择法,并给出了该方法的实验结果 .该方法能够有效处理维度过高、具有冗余和无关特征的数据,结合机器学习算法,使数据挖掘系统得到准确高效的执行结果,为决策人员提供有力的决策依据。通过实验得出该算法具有更高的分类准确度的结论 . Aiming at data redundancy and curse of dimensionality in data mining process, in accordance with the theoretical bases and methods of ReliefF algorithm and principal component analysis algorithm, this paper establishes the quadratic feature selection method on the basis of ReliefF optimization and principal component analysis, and gives out the experimental results of this method. This method can effectively process the data with high dimension, redundant and irrelevant features. Combined with machine learning algorithm, it makes the data mining system get an accurate and efficient implementation result, thus providing a solid decision-making foundation for decision makers. The conclusion is that this algorithm has a higher classification accuracy obtained through experiment.
出处 《哈尔滨理工大学学报》 CAS 北大核心 2016年第1期106-109,共4页 Journal of Harbin University of Science and Technology
基金 黑龙江省博士后资助项目(LBH-Q11081) 黑龙江省教育厅科学技术研究项目(11551093)
关键词 数据挖掘 特征选择 主成分分析 data mining feature selection principal component analysis
  • 相关文献

参考文献10

二级参考文献193

  • 1赛门铁克2008年9月垃圾邮件报告[R/OL].http://article.pchome.net/content-719881-1.html.
  • 2FURMERA G, PILLAI I, ROLI F. Spam filtering based on the analysis of text information embedded into images [ M ]. Berlin:Springer, 2006 : 2699 - 2720.
  • 3KIM J S,KIM S H,YANG H J, et al. Text extraction for spam-mail image filtering using a text color estimation technique [ J ]. New Trends in Applied Artificial Intelligence, 2007: 105-114.
  • 4BYUN B, LEE C H, WEBB S, et al. A discriminative classieer learning approach to image modeling and spam image identification [ C]//Proc of the 4th Conference on E-mail and Anti-Spam. 2007.
  • 5KRASSER S, YUCHUN T, GOULD J, et al. Identifying image spam based on header and file properties using CA. 5 decision trees and sup- port vector machine learning[ C ]//Proc of IEEE Conference on Information Assurance and Security Workshop. 2007.
  • 6NHUNG N P, PHUONG T M. An efficient method for filtering imagebased spam [ C ]//Proc of IEEE International Conference on Research, Innovation and Vision for the Future. 2007 : 96-102.
  • 7WANG Zhe, JOSEPHSON W, LV Qin, et al. Filtering image spam with near-duplicate detection[ C ]//Proc of the 4th Conference on Email and Anti-Spam. 2007.
  • 8ARADHYE H B, MYERS G K, HERSON J A. Image analysis for efficient categorization of image-based spam e-mail [ C ]//Proc of the 8th International Conference on Document Analysis and Recognition. 2005 : 914-918.
  • 9CHENG L H, CHIEH J W, A GA-based feature selection and parameters optimization for support vector machines[ J]. Expert Systems with Applications, 2006,3:231-240.
  • 10ROBNIK-SIKONJA M, KONONENKO I. Theoretical and empirical analysis of ReliefF and R-ReliefF[ J]. Machine Learning, 2003,53 (1-2) :23-69.

共引文献510

同被引文献141

引证文献14

二级引证文献67

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部