摘要
由于经典的粗糙集理论不能处理原始数据资料中,含有连续属性的数据,需要对这些数据进行连续属性离散化才能用于知识获取,因此数据预处理是粗糙集理论应用中非常重要的一环,其结果将直接影响到粗糙集理论应用的效率,准确度.所以有关基于粗集理论的数据挖掘中数据预处理的方法研究具有非常重要的意义。本文对目前主要的离散化算法进行分析和评价,以中值序列分割点集为基础,提出了一种连续、离散混合离散化改进算法,保证划分后决策表一致性,获得合理的划分点.
Due to the classic rough set theory can not deal with the data containing the continuous feature, It can be used to gain knowledge to discretize the continuous feature, So data preprocessing in the rough set theory practice is a very useful process, its result will influence rough set theory practice's efficiency and accuracy. So the research for data preprocessing method in the data mining that is based upon rough set theory has very important meaning. This thesis discusses the question of continuous feature discretization which is based upon data preprocessing of rough set. it analyzes and comments the present main discretization calculation, it puts forward a kind of continuous and mixed discretization modified method based on the median sequence division point set to ensure decision table compatibility and to gain reasonable division point.
出处
《吉林师范大学学报(自然科学版)》
2006年第4期25-26,33,共3页
Journal of Jilin Normal University:Natural Science Edition
关键词
离散化
数据挖掘
粗糙集理论
数据预处理
discretization
data mining
rough sets
gain reasonable division point. data preprocessing