摘要
在数据挖掘中应用抽样技术,可以显著提高数据挖掘任务的效率。通过采用不同的抽样方法,使得数据挖掘算法可以针对比原始数据集小得多的样本数据集进行分析,从而大幅度提高性能。随之而来的问题就是,由于采用了抽样方法,在大幅提高性能的同时,对分析的精确性就会产生影响。如何选取合适的反映总体数据水平的样本成为数据挖掘中的关键问题。传统意义上的抽样大多采用单一的抽样方法,进行单一抽样,抽取的样本在一定程度上具有局限性。本文对传统抽样方法和样本容量的选取进行总结,对传统的分层抽样思想进行改进,提出了一种新的基于数据挖掘的启发式抽样思想,大大提高了抽取样本的精确性。
In data mining the use of sampling algorithm, can significantly improve the efficiency of data mining tasks. Through using different sampling methods, data mining algorithm can analysis sample data sets which are much less than the original data sets, thereby significantly improving capability. The attendant problem is that use of sampling methods, while substantially increase the capability, also will have an impact on the accuracy of the analysis. How to select the appropriate data which can reflect the overall level of a sample are key issues of data mining. The traditional sense of the sampling method is usually a single sample. Using a single sample, the samples taken to a certain extent, has limitations. In this paper, we sum up the traditional sampling methods and how to select the sample size, improve the traditional stratified sampling, and bring up a new heuristic sampling algorithm based on data mining, greatly improving the accuracy of the sampling.
出处
《微计算机信息》
2009年第12期216-217,199,共3页
Control & Automation