摘要
处理连续属性离散化是决策树分类方法中C5.0算法在创建决策树时对数据表示空间的简化的一个重要问题,采用合理有效的连续属性离散化方法可以提高创建决策树的分类预测精度。在分析C5.0算法的离散化方法的不足之处后,提出一种改进Chi2算法的方法,能更合理更准确地对连续属性进行离散化,在此基础上创建的决策树具有更好的准确率。实验结果表明,基于改进方法的C5.0算法创建的决策树分类模型具有较高的分类准确率。
How to discretize continuous attributes is an important problem that simplifies the representation of data set when building a decision tree based on C5.0 algorithm. Adopting a more effective and sound method of discretization can heighten the predictive accuracy of decision tree. To do this, improved method of Chi2 algorithm is presented after studying the C5.0 algorithm and Chi2 algorithm and analyzing their drawbacks of discretization, which discretizes the real value attributes exactly and reasonably while growing an accurate decision-tree. The experiment results show the validity of the proposed method.
出处
《计算机工程与设计》
CSCD
北大核心
2009年第22期5197-5199,5203,共4页
Computer Engineering and Design
基金
江苏省高校自然科学基础研究基金项目(07KJD520216)
徐州师范大学基金项目(08XLB14)
关键词
决策树
离散化
CHI2算法
分类器
预测精度
decision tree
discretization
Chi2 algorithm
classifier
predictive accuracy