期刊文献+

挖掘最大频繁项集的并行算法 被引量:5

Parallel Algorithm for Mining Maximal Frequent Itemsets
在线阅读 下载PDF
导出
摘要 频繁项集的挖掘是数据挖掘的核心内容。本文提出挖掘最大频繁项集的并行算法P-MinMax,它采用数据库的垂直表示和基于前缀关系的等价类划分,利用因子项集的完全包含关系在处理机之间贪心分配等价类,根据等价类的需要相应地划分和有选择地复制数据库记录,使各处理机得以异步计算,达到了较好的负载平衡。分析和实验表明,P-MinMax有较好的可扩展性,其性能优于已有同类算法。 Mining frequent itemsets is a crucial issue in data mining applications. The complexity of the problem has been shown as NP-hard. Parallel techniques are widely used to improve the efficiency of mining algorithms. A novel and powerful parallel algorithm for mining maximal frequent itemsets, called P-MinMax, is proposed in this paper, which is based on its serial version MinMax. The new algorithm decomposes the search space by prefix-based equivalence classes, distributes work among the processors by complete inclusive relation between equivalence class gene itemsets and selectively duplicates databases in such a way that each processor can compute the frequent itemsets independently. These techniques eliminate the need for synchronization, drastically cutting down the I/O overhead. The analysis and experimental results demonstrate the superb efficiency of the approach in comparison with the previous work.
出处 《计算机科学》 CSCD 北大核心 2004年第12期132-134,188,共4页 Computer Science
基金 国家自然科学基金(60273075)
关键词 频繁项集 并行算法 等价类 数据库 处理机 数据挖掘 负载平衡 因子 表示 包含关系 Frequent itemsets, Parallel algorithm, Data mining
  • 相关文献

参考文献6

  • 1Burdick D, Calimlim M, Gehrke J. MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases. In: Proc.of 17th Intl. Conf. on Data Engineering, Heidelberg, Germany,April 2001. 443-452
  • 2Gouda K, Zaki M J. Efficiently Mining Maximal Frequent Itemsets. In: Proc. of 2001 IEEE Intl. Conf. on Data Mining (ICDM'01), San Jose, California, November 2001. 163-170
  • 3Wang Hui, Li Qinghua, Ma Chuanxiang, Li Kenli. A Maximal Frequent Itemset Algorithm. Lecture Notes in Computer Science,Springer, 2003,2639: 484-490
  • 4Agrawal R, Shafer J C. Parallel Mining of Association Rules.IEEE Transaction On Knowledge And Data Engineering, Dec.1996,8(6): 962-969
  • 5Zaki M J, Parthasarathy S, Ogihara M, Li Wei. New Parallel Algorithms for Fast Discovery of Association Rules. Data Mining and Knowledge Discovery: An International Journal. special issue on Scalable High-Performance Computing for KDD, Dec. 1997, 1(4) :34
  • 6Zaki M J. Parallel and Distributed Association Mining: A Survey.IEEE Concurrency, 1999,7(4): 14-25

同被引文献42

  • 1颜跃进,李舟军,陈火旺.基于FP-Tree有效挖掘最大频繁项集[J].软件学报,2005,16(2):215-222. 被引量:68
  • 2王黎明,赵辉.基于FP树的全局最大频繁项集挖掘算法[J].计算机研究与发展,2007,44(3):445-451. 被引量:16
  • 3Ceglar A,Roddick J F.Association mining[J].ACM Computing Surveys, 2006,38(2) : 1-42.
  • 4Rigoutsos L,Floratos A.Combinatoriat pattern discovery in bio-logical sequences:the teiresias algorithm[J].Bioinformaties, 1998,14( 1 ) : 55-67.
  • 5Bayardo R J.Efficiently mining long patterns from databases[C]// Haas L M,Tiwary A.Proceedings ACM SIGMOD International Conference on Management of Data, 1998:85-93.
  • 6Lin D I,Kedem Z M.Pincer-search:a new algorithm for discovering the maximum frequent set[C]//Schek H J.Proceedings of 6th International Conference on Extending Database Technology,1998: 105-119.
  • 7Agarwal R C,Aggarwal C C,Prasad V V V.Depth first generation of long patterns[C]//Ramakrishnan R,Stolfo S.Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2000:108-118.
  • 8Burdick D,Calimlim M,Gehrke J.MAFIA:a maximal frequent itemset 'algorithm for transactional databases[C]//Georgakopoulos D. Proceedings of the 17th International Conference on Data Engineering, 2001 : 443-452.
  • 9Gouda K,Zaki M J.Efficiently mining maximal frequent itemsets[C]// Cercone N,Lin T Y,Wu X D.Proceedings of the 2001 IEEE International Conference on Data Mining,2001:163-170.
  • 10Ceglar A,Roddick J F.Association mining[J].A CM Computing Surveys,2006,38(2):1-42.

引证文献5

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部