摘要
数据预处理是提高挖掘过程精度和性能的关键。文章在分析决策树算法和滑坡数据属性值特点基础上,利用聚类将连续属性值划分区间,提出了一种针对滑坡数据连续属性值离散化的方法,通过实验,新方法构造的决策树比原算法的分类正确率高,规则冗余少。
Data preprocessing is essential to improving accuracy of data mining, Through analyzing the algorithm of decision tree and property of landslide data, we develop a new method to make continuous property discrete using of cluster in this paper. We compare the performance of the method with the performance of the original algorithm on two properties of data sets. The results provide evidence that: (a) new method is competitive with original algorithm with respect to predictive accuracy; and (h) The rule sets discovered by new method are simpler (smaller) than the rule sets discovered by original algorithm.
出处
《微计算机信息》
北大核心
2006年第08X期10-11,32,共3页
Control & Automation
关键词
连续属性值
聚类
滑坡
continuous property, cluster, Landslide