挖掘最大频繁项集的并行算法被引量：5

Parallel Algorithm for Mining Maximal Frequent Itemsets

在线阅读下载PDF

导出

摘要频繁项集的挖掘是数据挖掘的核心内容。本文提出挖掘最大频繁项集的并行算法P-MinMax,它采用数据库的垂直表示和基于前缀关系的等价类划分,利用因子项集的完全包含关系在处理机之间贪心分配等价类,根据等价类的需要相应地划分和有选择地复制数据库记录,使各处理机得以异步计算,达到了较好的负载平衡。分析和实验表明,P-MinMax有较好的可扩展性,其性能优于已有同类算法。 Mining frequent itemsets is a crucial issue in data mining applications. The complexity of the problem has been shown as NP-hard. Parallel techniques are widely used to improve the efficiency of mining algorithms. A novel and powerful parallel algorithm for mining maximal frequent itemsets, called P-MinMax, is proposed in this paper, which is based on its serial version MinMax. The new algorithm decomposes the search space by prefix-based equivalence classes, distributes work among the processors by complete inclusive relation between equivalence class gene itemsets and selectively duplicates databases in such a way that each processor can compute the frequent itemsets independently. These techniques eliminate the need for synchronization, drastically cutting down the I/O overhead. The analysis and experimental results demonstrate the superb efficiency of the approach in comparison with the previous work.

作者李庆华王卉蒋盛益

机构地区华中科技大学计算机学院

出处《计算机科学》 CSCD 北大核心 2004年第12期132-134,188,共4页 Computer Science

基金国家自然科学基金(60273075)

关键词频繁项集并行算法等价类数据库处理机数据挖掘负载平衡因子表示包含关系 Frequent itemsets, Parallel algorithm, Data mining

分类号 TP311 [自动化与计算机技术—计算机软件与理论] TP393 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献6

1Burdick D, Calimlim M, Gehrke J. MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases. In: Proc.of 17th Intl. Conf. on Data Engineering, Heidelberg, Germany,April 2001. 443-452
2Gouda K, Zaki M J. Efficiently Mining Maximal Frequent Itemsets. In: Proc. of 2001 IEEE Intl. Conf. on Data Mining (ICDM'01), San Jose, California, November 2001. 163-170
3Wang Hui, Li Qinghua, Ma Chuanxiang, Li Kenli. A Maximal Frequent Itemset Algorithm. Lecture Notes in Computer Science,Springer, 2003,2639: 484-490
4Agrawal R, Shafer J C. Parallel Mining of Association Rules.IEEE Transaction On Knowledge And Data Engineering, Dec.1996,8(6): 962-969
5Zaki M J, Parthasarathy S, Ogihara M, Li Wei. New Parallel Algorithms for Fast Discovery of Association Rules. Data Mining and Knowledge Discovery: An International Journal. special issue on Scalable High-Performance Computing for KDD, Dec. 1997, 1(4) :34
6Zaki M J. Parallel and Distributed Association Mining: A Survey.IEEE Concurrency, 1999,7(4): 14-25

同被引文献42

1颜跃进,李舟军,陈火旺.基于FP-Tree有效挖掘最大频繁项集[J].软件学报,2005,16(2):215-222. 被引量：68
2王黎明,赵辉.基于FP树的全局最大频繁项集挖掘算法[J].计算机研究与发展,2007,44(3):445-451. 被引量：16
3Ceglar A,Roddick J F.Association mining[J].ACM Computing Surveys, 2006,38(2) : 1-42.
4Rigoutsos L,Floratos A.Combinatoriat pattern discovery in bio-logical sequences:the teiresias algorithm[J].Bioinformaties, 1998,14( 1 ) : 55-67.
5Bayardo R J.Efficiently mining long patterns from databases[C]// Haas L M,Tiwary A.Proceedings ACM SIGMOD International Conference on Management of Data, 1998:85-93.
6Lin D I,Kedem Z M.Pincer-search:a new algorithm for discovering the maximum frequent set[C]//Schek H J.Proceedings of 6th International Conference on Extending Database Technology,1998: 105-119.
7Agarwal R C,Aggarwal C C,Prasad V V V.Depth first generation of long patterns[C]//Ramakrishnan R,Stolfo S.Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2000:108-118.
8Burdick D,Calimlim M,Gehrke J.MAFIA:a maximal frequent itemset 'algorithm for transactional databases[C]//Georgakopoulos D. Proceedings of the 17th International Conference on Data Engineering, 2001 : 443-452.
9Gouda K,Zaki M J.Efficiently mining maximal frequent itemsets[C]// Cercone N,Lin T Y,Wu X D.Proceedings of the 2001 IEEE International Conference on Data Mining,2001:163-170.
10Ceglar A,Roddick J F.Association mining[J].A CM Computing Surveys,2006,38(2):1-42.

引证文献5

1宋威,杨炳儒,徐章艳,侯伟.基于索引数组与集合枚举树的最大频繁项集挖掘算法[J].计算机科学,2007,34(7):146-149. 被引量：4
2陈波,王乐,董鹏.挖掘最大频繁项集的事务集迭代算法[J].计算机工程与应用,2009,45(6):141-144. 被引量：3
3王乐,王水,陈波,董鹏.交集剪枝法挖掘最大频繁项集[J].计算机工程与应用,2009,45(13):156-159. 被引量：1
4石少敏.基于数据挖掘的混合式入侵检测模型及分析[J].通信技术,2009,42(8):70-72.
5刘黎明,王水,王乐.基于迭代事务集与交集剪枝的最大频繁项集挖掘算法[J].南开大学学报（自然科学版）,2009,42(4):97-102. 被引量：3

二级引证文献11

1张忠平,李岩,林志杰,王爱杰.基于索引数组的频繁项集挖掘算法[J].计算机应用研究,2009,26(1):44-46. 被引量：2
2张大为,黄丹,嵇敏,谢福鼎.利用模式指导树的并行频繁项集挖据方法[J].计算机工程与应用,2010,46(22):147-150. 被引量：3
3汪成亮,罗昌银.一种基于动态排序的最大频繁项集挖掘算法[J].世界科技研究与发展,2010,32(4):440-444.
4张月琴,陈东.数据流最大频繁项挖掘方法[J].计算机工程,2010,36(22):86-87. 被引量：2
5张月琴.数据挖掘在多Agent入侵检测系统中的应用[J].计算机应用与软件,2010,27(11):284-286. 被引量：1
6周丽,王小玲.基于网络审计日志关联规则挖掘的改进[J].计算机技术与发展,2011,21(6):150-153. 被引量：4
7曾志勇,杨辉,余建坤.基于HMT和哈希树的Apriori并行算法研究[J].计算机工程与设计,2012,33(1):214-218. 被引量：3
8王春华,宁慧,邹韵,郭江鸿.基于图的四叉链表存储结构的最大频繁项集挖掘算法[J].应用科技,2013,40(1):76-79.
9马高庭,蒋万春,申艳光.基于关联规则的肉鸡产品质量安全预警模型[J].江苏农业科学,2015,43(3):271-274. 被引量：2
10陈喜华,黄海宁,黄沛杰.基于Apriori算法的学生成绩分析在课程关联性的应用研究[J].北京城市学院学报,2018(4):60-65. 被引量：7

1董青,吴跃.Mobile Agent系统结构及其关键技术[J].成都信息工程学院学报,2004,19(3):359-362. 被引量：2
2郭欣.使用Map/Reduce进行并行计算[J].程序员,2009(10):66-67. 被引量：2
3郭欣.异步计算的两把尖刀[J].程序员,2009(9):71-73. 被引量：2
4苗付友,熊焰,王行甫,华蓓.移动环境中基于Mobile Agent的异步计算技术[J].小型微型计算机系统,2003,24(7):1173-1177.
5曾宪锐.Java消息服务[J].程序员（CSDN开发高手）,2004(7):62-64.
6闵玉堂,容太平.移动Agent系统结构及其关键技术[J].武汉理工大学学报（信息与管理工程版）,2003,25(1):28-31. 被引量：9
7荣秋生,颜君彪.网格下最大频繁项集挖掘算法的实现[J].计算机技术与发展,2007,17(1):98-100. 被引量：6
8马俊涛,刘积仁.Mobile Agent体系结构及关键技术探讨[J].小型微型计算机系统,1998,19(2):7-14. 被引量：49
9李永魁,谢文阁.基于非关系数据库的分布式负载均衡技术[J].计算机系统应用,2013,22(5):155-157. 被引量：2
10夏勇,金卫健,田华.基于多agent的电子商务应用[J].网络与信息,2009,23(9):35-35.

计算机科学

2004年第12期

浏览历史

内容加载中请稍等...

挖掘最大频繁项集的并行算法被引量：5

参考文献6

同被引文献42

引证文献5

二级引证文献11

相关作者

相关机构

相关主题

浏览历史

挖掘最大频繁项集的并行算法 被引量：5

参考文献6

同被引文献42

引证文献5

二级引证文献11

相关作者

相关机构

相关主题

浏览历史

挖掘最大频繁项集的并行算法被引量：5