面向Web活跃用户的树型访问模式挖掘算法

Mining access patterns of Web active user based on tree structure

在线阅读下载PDF

导出

摘要传统Web挖掘技术面向所有Web用户,而访问网站时活跃用户与非活跃用户表现特征不同.基于此,提出一种面向活跃用户的访问模式挖掘方法,包括活跃用户会话提取算法(AUSM)和树型访问模式挖掘算法(WAP-BUM).AUSM扫描一遍日志数据即可挖掘Web活跃用户并提取会话信息,在提取的用户会话信息基础上,利用网站拓扑结构给出了一种基于树结构的频繁访问模式挖掘算法(WAPBUM).WAPBUM针对Web日志挖掘特点,通过对子树构造等价类,自下而上产生频繁子树.人工数据集和真实数据集上的实验都证明AUSM算法的运行时间与Web日志数据量成线性关系,且运行过程中内存保持稳定;WAPBUM在处理带根子树挖掘时明显快于FREQT算法,所挖掘结果可有效应用于网站结构分析. Conventional Web mining approaches generally employ the Web logs of all users when mining patterns. However, the behaviors of active users and inactive users are usually not the same when visiting the Web site. Therefore, an approach to access pattern mining was introduced, oriented to active users. The session-retrieval algorithm, named active user session miner （AUSM）, was proposed to retrieve sessions of active users using one pass scan of the Web logs. Moreover, a tree-mining algorithm, named Web access pattern bottom up miner （WAPBUM）, was presented to discover frequent access patterns from the retrieved sessions based on the topology of Web site. Based on the characteristics of the Web logs, WAP- BUM buihds the subtree equivalence classes and generated frequent subtrees from bottom to top. Performance of these two algorithms were evaluated both on the synthetic and real datasets. Experimental results show that the proposed algorithms are efficient and effective. AUSM can keep memory stable and its running time is linear to the log scale. WAPBUM is not only more efficient than the previous algorithm FREQT, but also provides useful mining results for analyzing the web structure.

作者贝毅君陈刚董金祥

机构地区浙江大学计算机科学与技术学院

出处《浙江大学学报（工学版）》 EI CAS CSCD 北大核心 2009年第6期1005-1013,1140,共10页 Journal of Zhejiang University：Engineering Science

基金国家自然科学基金资助项目(60603044) 浙江省重大软件专项基金资助项目(2006c11108) 长江学者和创新团队发展计划资助项目(IRT0652)

关键词 WEB使用挖掘 WEB访问模式 WEB日志活跃用户频繁子树 Web usage mining Web access pattern Web log active user frequent subtree

分类号 TP309.2 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献19

1COOLEY R,SRIVASTAVA J.Data preparation for mining World Wide Web browsing patterns[J].Journal of Knowledge and Information Systems,1999,1(1):5-32.
2邢东山,沈钧毅,宋擒豹.从Web日志中挖掘用户浏览偏爱路径[J].计算机学报,2003,26(11):1518-1523. 被引量：87
3余轶军,林怀忠,陈纯.基于竞争凝聚的个性化网页推荐[J].浙江大学学报（工学版）,2007,41(2):239-244. 被引量：1
4韩家炜,孟小峰,王静,李盛恩.Web挖掘研究[J].计算机研究与发展,2001,38(4):405-414. 被引量：356
5LIU B.Web data mining:Exploring hyperlinks,contents,and usage data[M].Berlin:Springer,2007.
6NANOPOULOS A,MANOLOPULOS Y.Mining patterns from graph traversals[J].Data & Knowledge Engineering,2001,37:243-266.
7李颖基,彭宏,郑启伦,曾炜.Web日志中有趣关联规则的发现[J].计算机研究与发展,2003,40(3):435-439. 被引量：20
8欧阳一鸣,陈敏,刘红樱,胡学钢.Web挖掘中发现用户访问模式算法的改进与分析[J].模式识别与人工智能,2005,18(6):728-734. 被引量：2
9PEI J,HAN J W,MORTAZAVI-ASL B,et al.Mining access patterns efficiently from web logs[C]∥ Proceedings of the 4th PAKDD.Kyoto,Japan:Springer,2000:396-407.
10FIOT C,LAURENT A,TEISSEIRE M.Web access log mining with soft sequential patterns[C]∥ Proceedings of the 7th International FLINS Conference on Applied Artificial Intelligence.Genova,Italy:World Scientific,2006.

二级参考文献56

1R Agarwal, et al. A tree projection algorithm for generation of frequent item sets. Journal of Parallel and Distributed Computing,2001, 61(3): 350～371
2R Agrawal, et al. Fast algorithms for mining association rules in large databases. The 20th Int'l Conf on Very Large Data Bases,Santiago de Chile, hile, 1994
3J Han, J Pei, et al. Mining frequent patterns without candidate generation. The ACM-SIGMOD Int'l Conf on Management of Data, Dallas, Texas, USA, 2000
4R Agrawal, et al. Mining sequential pattem. The 1 1th Int' l Conf on Data Engineering, Taipei, Taiwan, 1995
5J Ayres, et al. Sequential pattern mining using a bitmap representation. The 8th ACM SIGKDD Int 'l Conf on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, 2002
6J Pei, et al. PreffixSpan: Mining sequential patterns by preffixprojected growth. The 17th Int'l Conf on Data Engineering,Heidelberg, Germany, 2001
7M Zaki. SPADE: An effcient algorithm for mining frequent sequences. Machine Learning, 2001, 42(1/2): 31～60
8T Asai, K Abe, et al. Efficient substructure discovery from large semi-structured data. The 2nd SIAM Int'l Conf on Data Mining,Arlington, VA, USA, 2002
9M Kuramochi, et al. Frequent subgraph discovery. The IEEE Int'l Conf on Data Mining, San Jose, California, USA, 2001
10M J Zaki. Efficiently mining frequent trees in a forest. The 8th ACM SIGKDD Int'l Conf on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, 2002

共引文献466

1杨洋.Web数据挖掘的分析与探讨[J].装备制造技术,2006(5):63-64. 被引量：1
2王志明,沙莎.Web文本挖掘技术在新闻主题检测中的应用研究[J].长沙大学学报,2007,21(5):58-60. 被引量：2
3阿静.政企互动打假树维权典范——爱普生打印机胜诉“骗保门”事件[J].办公自动化,2006(14):10-11.
4董德民.面向电子商务的Web使用挖掘及其应用研究[J].中国管理信息化（综合版）,2006,9(10):83-85. 被引量：1
5杜志文,曾文华.网格计算在文本分类中的应用[J].微电子学与计算机,2006,23(z1):221-222.
6陈子军,李伟,李霞,王鑫昱.基于投影编码的频繁子树挖掘算法[J].计算机研究与发展,2006,43(z3):389-394. 被引量：2
7吕佳.Web日志挖掘技术应用研究[J].重庆师范大学学报（自然科学版）,2006,23(4):39-44. 被引量：15
8张克君,李伯群,李欣,杨炳儒.基于DWLMS模型的分布式Web用户访问模式挖掘[J].清华大学学报（自然科学版）,2005,45(S1):1762-1766. 被引量：2
9万君,耿东辉.浅说电子商务中的数据挖掘技术[J].东北大学学报（自然科学版）,2004,25(z1):194-196. 被引量：1
10许亮,李明,梁素田,侯耕.数据挖掘技术在电子商务中的应用[J].甘肃科学学报,2002,14(S1):17-20. 被引量：1

1郭建奎,黄震华,阮备军,朱扬勇.一种Web流频繁模式挖掘算法[J].模式识别与人工智能,2007,20(6):757-762.
2周文.一种新的基于信息安全的Web访问数据库模型[J].计算机安全,2008(4):63-64.
3肖国强,肖轶.一种从Web日志中挖掘访问模式的新算法[J].华中科技大学学报（自然科学版）,2004,32(5):70-72. 被引量：7
4陈正明,马光志.Web访问模式聚类中引入Web内容挖掘的方法[J].计算机工程,2006,32(18):70-71. 被引量：4
5张永,杨志勇.一种基于粗糙集的Web用户访问规则获取方法[J].计算机工程,2006,32(20):84-85. 被引量：3
6姚建文.信息挖掘与个性化信息服务[J].企业技术开发（下半月）,2003,27(2):35-37. 被引量：6
7刘加伶,范军.基于用户访问树的Web日志挖掘数据预处理[J].计算机科学,2009,36(9):154-156. 被引量：8
8牛晓晨.Web日志挖掘中网站拓扑结构获取技术的研究[J].电脑知识与技术,2011,7(1):4-6.
9何玉宝,刘正捷,田晓杰.网站拓扑结构提取技术的研究与应用[J].计算机工程,2006,32(1):157-159. 被引量：2
10刘敏娴,马强.基于Web访问的数据挖掘[J].电脑知识与技术,2007(10):26-27.

浙江大学学报（工学版）

2009年第6期

浏览历史

内容加载中请稍等...

面向Web活跃用户的树型访问模式挖掘算法

参考文献19

二级参考文献56

共引文献466

相关作者

相关机构

相关主题

浏览历史