期刊文献+

基于数据挖掘的启发式抽样方法研究 被引量:4

Research of heuristic sampling algorithm based on Data Mining
在线阅读 下载PDF
导出
摘要 在数据挖掘中应用抽样技术,可以显著提高数据挖掘任务的效率。通过采用不同的抽样方法,使得数据挖掘算法可以针对比原始数据集小得多的样本数据集进行分析,从而大幅度提高性能。随之而来的问题就是,由于采用了抽样方法,在大幅提高性能的同时,对分析的精确性就会产生影响。如何选取合适的反映总体数据水平的样本成为数据挖掘中的关键问题。传统意义上的抽样大多采用单一的抽样方法,进行单一抽样,抽取的样本在一定程度上具有局限性。本文对传统抽样方法和样本容量的选取进行总结,对传统的分层抽样思想进行改进,提出了一种新的基于数据挖掘的启发式抽样思想,大大提高了抽取样本的精确性。 In data mining the use of sampling algorithm, can significantly improve the efficiency of data mining tasks. Through using different sampling methods, data mining algorithm can analysis sample data sets which are much less than the original data sets, thereby significantly improving capability. The attendant problem is that use of sampling methods, while substantially increase the capability, also will have an impact on the accuracy of the analysis. How to select the appropriate data which can reflect the overall level of a sample are key issues of data mining. The traditional sense of the sampling method is usually a single sample. Using a single sample, the samples taken to a certain extent, has limitations. In this paper, we sum up the traditional sampling methods and how to select the sample size, improve the traditional stratified sampling, and bring up a new heuristic sampling algorithm based on data mining, greatly improving the accuracy of the sampling.
作者 黎娅 郭江娜
出处 《微计算机信息》 2009年第12期216-217,199,共3页 Control & Automation
关键词 数据挖掘 启发式 抽样 样本容量 Data mining Heuristic Sampling Sample size
  • 相关文献

参考文献9

二级参考文献34

  • 1张利萍,李宏光.灰色神经网络预测算法在DMF回收过程中的应用[J].微计算机信息,2005,21(1):183-184. 被引量:27
  • 2王永庆.人工智能原理及方法[M].西安:西安交通大学出版社,1998.162-171.
  • 3KRISHNASWAMY S. Federated data mining services and a supporting XML-based language[A]. Pro 34th Int Conf on System Sciences[C]. Hawaii: IEEE, 2001. 1-10.
  • 4RANA O, WALKER D, LI Mao-zhen. PaDDMAS: Parallel and distributed data mining application suite[A]. Pro 14th Int Conf on Parallel and Distributed Processing Symposium[C]. Cancun Mexico: IEEE, 2000. 387-392.
  • 5GROSSMAN R, GU Yun-hong, HANLEY D, et al. Photonic Data Services: Integrating Data, Network and Path Services to Support Next Generation Data Mining Applications [EB/OL]. http:∥www.rgrossman.com/dl/proc-068.pdf, 2004-05-11/2004-10-08.
  • 6朱建秋.数据挖掘系统发展综述[EB/OL].http:∥www.dmgroup.org.cn/zhujianqiu/dmsystem.pdf,2003-04-20/2004-10-10.
  • 7MEO R. A tightly-coupled architecture for data mining[A]. Pro 14th Int Conf on Data Engineering[C]. Orlando: IEEE, 1998. 316-323.
  • 8GOIL S, CHOUDHARY A. A parallel scalable infrastructure for OLAP and data mining[A]. Pro IDEAS '99 on Database Engineering and Applications[C]. Montreal: IEEE, 1999. 178-186.
  • 9Data Mining Group. PMML2.1 Specification [EB/OL]. http:∥www.dmg.org/pmml-v2 -1.html, 2003-03-25/2004-09-28.
  • 10WETTSCHERECK D, MULLER S. Exchanging Data Mining Models With the Predictive Modelling Markup Language[EB/OL]. http:∥ai.ijs.si/branax/iddm-2001-proceedings/workshop/ Paper26.pdf, 2001-09-06/2004-09-28.

共引文献18

同被引文献34

  • 1冯少荣,肖文俊.基于密度的DBSCAN聚类算法的研究及应用[J].计算机工程与应用,2007,43(20):216-221. 被引量:34
  • 2薛安荣,鞠时光,何伟华,陈伟鹤.局部离群点挖掘算法研究[J].计算机学报,2007,30(8):1455-1463. 被引量:96
  • 3Hagan M T , Demuth H B. Neural network design [M] .china machine press ,2002.
  • 4Christer , A.H, Lee ,C. Refining the delay-time-based PM in section Model with Non-negligible System Downtime Estimates of expected number of failures. [J], Int. Production Economics 2000, (67).
  • 5Staszewski W. Monitoring on-line integrated technologies for operational reliability-monitor [j], Air&Space Europe, 2000, 21 (4): 64-72.
  • 6Dekker R. Applications of maintenance optimization models: a review and analysis [J]. Reliability Engineering & Systems Safety, 1996, 51: 229- 240.
  • 7Philip A. Higgs. A SURVEY ON CONDITION MONITORING SYSTEMS IN INDUSTRY. Proceedings of: ESDA 2004: 7th Biennial ASME Conference Engineering Systems Design and Analysis July 19-22. 2004 Manchester. UK.
  • 8Knorr E M, Ng R T. Algorithms for Mining Distance-based Outliers in Large Datasets[C]. New York: Proc. of Int. Conf. Very large Databases (VLDB' 98), 1998.392-403.
  • 9Hawkins D. Identification of Outlier. London: Chapman and Hall, 1980.
  • 10Berchtold C, Bohm, H P Kriegel. Improving the Query Performance of High-dimensional Index Structures by Bulk Load operations [C]. Proc. of EDBT, 1998.

引证文献4

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部