期刊文献+

决策树算法的优化研究 被引量:7

Analysis and improved implementation of decision tree algorithms
在线阅读 下载PDF
导出
摘要 针对决策树C4.5/5.0分类算法及改进的算法在创建决策树时训练误差率和校验误差率相对较高的缺点,提出一些改进策略,即利用属性相关性进行属性约简与度量以达到解决属性集合中的冗余属性,采用一定置信度值进行决策树的修剪,采用优化的Chi2算法更合理更准确地对连续属性进行离散化,基于改进策略设计并实现一个分类器,将改进的算法应用于Breast-cancer实例,实验结果证明改进的算法生成的决策树具有较高的分类正确率。 In order to effectively deal with the problems that the training error and test error are comparatively high when decision tree is built based on C4.5 and C5.0 decision tree algorithms,three improved strategies are presented.The improved strategies are as follows:Attribute correlation that can not only remove irrelevant features,also can find redundant feature with high feature correlation,is to quantify the correlation between attribute and concept;pruning strategy adopts appropriate confidence to good purpose,then reduces the attribute number and the different value of each attribute assuring the feasibility and effectiveness of the decision tree;a variation of the Chi2 algorithm is proposed to perform attribute discretization and selection great exactly.The improved strategies are applied to the Breast-cancer data and the simulation validates their efficiency.Through experiment testing,the improved algorithm can construct the better accuracy of classification compared with the classical decision tree algorithms.
出处 《计算机工程与应用》 CSCD 北大核心 2010年第13期139-141,150,共4页 Computer Engineering and Applications
基金 江苏省高校自然科学基础研究No.07KJD520216 徐州师范大学项目基金No.KY200710~~
关键词 属性相关性 属性约束 剪枝策略 离散化 CHI2算法 attribute correlation attribute reduction pruning strategy discretization Chi2 algorithm
  • 相关文献

参考文献9

  • 1KANTARDZICM.数据挖掘-概念、模型、方法和算法[M].北京:清华大学出版社,2003..
  • 2Quinlan J R.Decision tree and decisionmaking[J].IEEE Transactions on Systems,Man,and Cybernetics,1990,20(2):339-346.
  • 3Quinlan J R.Bagging,booting and C4.5[C]//Proc of 13th National Conference on Artificial Intelligence Portland,1996:725-730.
  • 4Quinlan J R.C45:Programs for machine learning[M].San Mateo,California:Morgan Kaufmann,1993.
  • 5Oates T,Jensen D.The effects of training set sizes on decision tree[C]//Proc of the 14th Int'l Conf on Machine Learning.Nnshville:Morgan Kaufman,1997:254-262.
  • 6Liu H,Setiono R.Feature selection via discretization[J].IEEE Transactions on Knowledge and Data Engineering,1997,9(4):642-645.
  • 7Su C T,Hsu J H.An extended Chi2 algorithm for discretization of real value attributes[J].IEEE Transactions on Knowledge and Data Engineering,2005,17 (3):437-441.
  • 8范洁,常晓航,杨岳湘.基于属性相关性的决策树规则生成算法[J].计算机仿真,2006,23(12):90-92. 被引量:9
  • 9Bennett K P,Mangasarian O L.Robnst linear programing discrimination of two linearly inseparable sets[J].Optimization Methods and Software.[S.l.]:Gordon & Breach Science Publishers,1992,1:23-34.

二级参考文献3

  • 1J R Quinlan.C4.5:Programs for Machine Learning[M].San Mateo,Calif.:Morgan Kaufmann,1993.
  • 2Jiawei Han,Micheline Kamber.Data Mining:Concepts and Techniques[M].Morgan Kaufmann,2001.
  • 3M Kamber,L Winstone,W Gong,S Cheng.Generalization and Decision Tree Induction:Efficient Classification in Data Mining[C].Proceeding of the 1997 International Workshop on Research Issues on Data Engineering (RIDE′97),1997.

共引文献19

同被引文献50

  • 1姚家奕,姜海,王秦.决策树算法的系统实现与修剪优化[J].计算机工程与设计,2002,23(8):75-77. 被引量:11
  • 2严干新,姚青海,王东琦,崔长琮.J波与J波综合征[J].中华心律失常学杂志,2004,8(6):360-365. 被引量:67
  • 3任萍,宋伟,刘武.基于自组织数据挖掘的房地产影响因素分析[J].统计与决策,2005,21(10X):142-143. 被引量:9
  • 4IanH.数据挖掘实用机器学习技术[M].北京:机械工业出版社,2005.
  • 5杨舒晴.基于粗糙集的决策树分类算法研究[D].赣州:江西理工大学,2009.
  • 6GiudieiP.袁方,译.实用数据挖掘[M].北京:电子工业出版杜,2004.
  • 7David J C MacKay. Information Theory, Inference. and Learning Algorithms. Cambridge: Cambridge University Press. 2005: 149 - 183.
  • 8王广涛 宋擒豹 车蕊.一种新的基于信息熵的属件选择算法.计算机研究与发展,2009,46:509-514.
  • 9Polat K, Gunes S. A novel hybrid intelligent method based on CA. 5decision tree classifier and one - against - all approach for multi -class classification problems [ J ]. Expert System with Applications ,2008.
  • 10Weka数据挖掘平台.[2008-04-16].http:/www.cs.waikato.ac.nz/ml/weka/.

引证文献7

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部