摘要
经典的Apriori算法在大项目集的挖掘过程中因为重复搜索导致效率低下。提出一种改进的Hash表结构应用于DHP算法中的项目集存放,定义新的Hash函数确定项目集的存放地址,并基于新的Hash表结构,以并行挖掘的方式优化关联规则算法的剪枝过程。实验结果表明,与Apriori算法相比,文中的方法可以更好地节省存储空间,提高挖掘效率。
With classical Apriori algorithm, mining large itemsets is inefficient because of repeated scanning. In this paper, develop an algorithm DHP with improved Hash table for efficient large itemset generation. The stored address of itemsets is determined by a new Hash function. Based on the new Hash table,can use parallel mining to improve pruning process in association rules algorithm. From the experiment results, the method in this paper can save more storing space and enhance mining efficiency compared with Apriori algorithm.
出处
《计算机技术与发展》
2007年第6期12-14,共3页
Computer Technology and Development
基金
国家科技攻关项目(2004BA811B06)