摘要
在知识发现和机器学习领域里,许多数据挖掘方法如基于粗集的数据挖掘工具等需要使用离散的属性值,但实际观测到的大多是连续性属性数据,这对许多新型数据挖掘工具的研究带来了不便.本文针对以上问题,在综合分析目前连续属性离散化方法的基础上,提出了一种基于数据分布特征的连续属性离散化新方法,并用经典算例验证了此算法,实验结果表明该方法具有合理性和可行性.
In the field of Knowledge discovery in database and machine learning, many methods in data mining such as data mining tools based on rough set need discretization attributes, but the actual observed data are successive attributes data. Aiming for the above problem and based on analyzing the old method of discretization of successive attributes in knowledge discovery, this paper gives a new discretization method based on data distribution characterization, and the algorithm is proved by classics example. The result shows the approach is rational and feasible.
出处
《数学的实践与认识》
CSCD
北大核心
2007年第10期90-96,共7页
Mathematics in Practice and Theory
基金
陕西省教育厅专项科研计划基金(05JK092)
关键词
知识发现
数据挖掘
连续属性
离散化
数据分布
knowledge discovery
data mining
successive attributes
discretization
data distribution