期刊文献+

Multi-Scaling Sampling: An Adaptive Sampling Method for Discovering Approximate Association Rules 被引量:2

Multi-Scaling Sampling: An Adaptive Sampling Method for DiscoveringApproximate Association Rules
原文传递
导出
摘要 One of the obstacles of the efficient association rule mining is theexplosive expansion of data sets since it is costly or impossible to scan large databases, esp., formultiple times. A popular solution to improve the speed and scalability of the association rulemining is to do the algorithm on a random sample instead of the entire database. But how toeffectively define and efficiently estimate the degree of error with respect to the outcome of thealgorithm, and how to determine the sample size needed are entangling researches until now. In thispaper, an effective and efficient algorithm is given based on the PAC (Probably Approximate Correct)learning theory to measure and estimate sample error. Then, a new adaptive, on-line, fast samplingstrategy - multi-scaling sampling - is presented inspired by MRA (Multi-Resolution Analysis) andShannon sampling theorem, for quickly obtaining acceptably approximate association rules atappropriate sample size. Both theoretical analysis and empirical study have showed that the Samplingstrategy can achieve a very good speed-accuracy trade-off. One of the obstacles of the efficient association rule mining is theexplosive expansion of data sets since it is costly or impossible to scan large databases, esp., formultiple times. A popular solution to improve the speed and scalability of the association rulemining is to do the algorithm on a random sample instead of the entire database. But how toeffectively define and efficiently estimate the degree of error with respect to the outcome of thealgorithm, and how to determine the sample size needed are entangling researches until now. In thispaper, an effective and efficient algorithm is given based on the PAC (Probably Approximate Correct)learning theory to measure and estimate sample error. Then, a new adaptive, on-line, fast samplingstrategy - multi-scaling sampling - is presented inspired by MRA (Multi-Resolution Analysis) andShannon sampling theorem, for quickly obtaining acceptably approximate association rules atappropriate sample size. Both theoretical analysis and empirical study have showed that the Samplingstrategy can achieve a very good speed-accuracy trade-off.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2005年第3期309-318,共10页 计算机科学技术学报(英文版)
基金 CAS Project of Brain and Mind Science,国家高技术研究发展计划(863计划),国家重点基础研究发展计划(973计划),国家自然科学基金,湖南省自然科学基金
关键词 data mining association rule frequent itemset sample error multi-scalingsampling data mining association rule frequent itemset sample error multi-scalingsampling
  • 相关文献

参考文献1

二级参考文献3

  • 1Cheung D,IEEE Trans Knowledge Data Eng,1996年,8卷,6期,911页
  • 2Cheung D,Proc 1996 Int Conf Data Engineering New Orleans Lousiana USA,1996年
  • 3David W L Cheung,Proceedings of the Fifth International Conference on Database Systems for Advanced Applications Melb,1997年,185页

共引文献1

同被引文献6

  • 1Evfimievski A,Srikant R,Agrawal R,et al.Privacy Preserving Mining of Association Rules[A].Proc of the 8th ACM SIGKDD Int'l Conf on Knowledge Discovery and Data Mining[C].2002.217-228.
  • 2Chen B,Haas P,Scheuermann P.A New Two-Phase Sampling Based Algorithms for Discovery Association Rules[A].Proc of the 8th ACM SIGKDD Int'l Conf on Knowledge Discovery and Data Mining[C].2002.462-468.
  • 3Zaki M J,Parthasarathy S,Lin W,et al.Evaluation of Sampling for Data Mining of Association Rules[A].Proc of the 7th Workshop on Research Issues in Data Engineer[C].1997.42-50.
  • 4Agrawal R,Srikant R.Fast Algorithm for Mining Association Rules[A].Proc of the 20th Int'l Conf on Very Large Data Bases[C].1994.487-499.
  • 5Watanabe O.Simple Sampling Techniques for Discovery Science[J].IEICE Trans on Information and Systems,2000,83(1):19-26.
  • 6张春阳,周继恩,钱权,蔡庆生.抽样在数据挖掘中的应用研究[J].计算机科学,2004,31(2):126-128. 被引量:11

引证文献2

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部