期刊文献+

面向Web活跃用户的树型访问模式挖掘算法

Mining access patterns of Web active user based on tree structure
在线阅读 下载PDF
导出
摘要 传统Web挖掘技术面向所有Web用户,而访问网站时活跃用户与非活跃用户表现特征不同.基于此,提出一种面向活跃用户的访问模式挖掘方法,包括活跃用户会话提取算法(AUSM)和树型访问模式挖掘算法(WAP-BUM).AUSM扫描一遍日志数据即可挖掘Web活跃用户并提取会话信息,在提取的用户会话信息基础上,利用网站拓扑结构给出了一种基于树结构的频繁访问模式挖掘算法(WAPBUM).WAPBUM针对Web日志挖掘特点,通过对子树构造等价类,自下而上产生频繁子树.人工数据集和真实数据集上的实验都证明AUSM算法的运行时间与Web日志数据量成线性关系,且运行过程中内存保持稳定;WAPBUM在处理带根子树挖掘时明显快于FREQT算法,所挖掘结果可有效应用于网站结构分析. Conventional Web mining approaches generally employ the Web logs of all users when mining patterns. However, the behaviors of active users and inactive users are usually not the same when visiting the Web site. Therefore, an approach to access pattern mining was introduced, oriented to active users. The session-retrieval algorithm, named active user session miner (AUSM), was proposed to retrieve sessions of active users using one pass scan of the Web logs. Moreover, a tree-mining algorithm, named Web access pattern bottom up miner (WAPBUM), was presented to discover frequent access patterns from the retrieved sessions based on the topology of Web site. Based on the characteristics of the Web logs, WAP- BUM buihds the subtree equivalence classes and generated frequent subtrees from bottom to top. Performance of these two algorithms were evaluated both on the synthetic and real datasets. Experimental results show that the proposed algorithms are efficient and effective. AUSM can keep memory stable and its running time is linear to the log scale. WAPBUM is not only more efficient than the previous algorithm FREQT, but also provides useful mining results for analyzing the web structure.
出处 《浙江大学学报(工学版)》 EI CAS CSCD 北大核心 2009年第6期1005-1013,1140,共10页 Journal of Zhejiang University:Engineering Science
基金 国家自然科学基金资助项目(60603044) 浙江省重大软件专项基金资助项目(2006c11108) 长江学者和创新团队发展计划资助项目(IRT0652)
关键词 WEB使用挖掘 WEB访问模式 WEB日志 活跃用户 频繁子树 Web usage mining Web access pattern Web log active user frequent subtree
  • 相关文献

参考文献19

  • 1COOLEY R,SRIVASTAVA J.Data preparation for mining World Wide Web browsing patterns[J].Journal of Knowledge and Information Systems,1999,1(1):5-32.
  • 2邢东山,沈钧毅,宋擒豹.从Web日志中挖掘用户浏览偏爱路径[J].计算机学报,2003,26(11):1518-1523. 被引量:87
  • 3余轶军,林怀忠,陈纯.基于竞争凝聚的个性化网页推荐[J].浙江大学学报(工学版),2007,41(2):239-244. 被引量:1
  • 4韩家炜,孟小峰,王静,李盛恩.Web挖掘研究[J].计算机研究与发展,2001,38(4):405-414. 被引量:356
  • 5LIU B.Web data mining:Exploring hyperlinks,contents,and usage data[M].Berlin:Springer,2007.
  • 6NANOPOULOS A,MANOLOPULOS Y.Mining patterns from graph traversals[J].Data & Knowledge Engineering,2001,37:243-266.
  • 7李颖基,彭宏,郑启伦,曾炜.Web日志中有趣关联规则的发现[J].计算机研究与发展,2003,40(3):435-439. 被引量:20
  • 8欧阳一鸣,陈敏,刘红樱,胡学钢.Web挖掘中发现用户访问模式算法的改进与分析[J].模式识别与人工智能,2005,18(6):728-734. 被引量:2
  • 9PEI J,HAN J W,MORTAZAVI-ASL B,et al.Mining access patterns efficiently from web logs[C]∥ Proceedings of the 4th PAKDD.Kyoto,Japan:Springer,2000:396-407.
  • 10FIOT C,LAURENT A,TEISSEIRE M.Web access log mining with soft sequential patterns[C]∥ Proceedings of the 7th International FLINS Conference on Applied Artificial Intelligence.Genova,Italy:World Scientific,2006.

二级参考文献56

  • 1R Agarwal, et al. A tree projection algorithm for generation of frequent item sets. Journal of Parallel and Distributed Computing,2001, 61(3): 350~371
  • 2R Agrawal, et al. Fast algorithms for mining association rules in large databases. The 20th Int'l Conf on Very Large Data Bases,Santiago de Chile, hile, 1994
  • 3J Han, J Pei, et al. Mining frequent patterns without candidate generation. The ACM-SIGMOD Int'l Conf on Management of Data, Dallas, Texas, USA, 2000
  • 4R Agrawal, et al. Mining sequential pattem. The 1 1th Int' l Conf on Data Engineering, Taipei, Taiwan, 1995
  • 5J Ayres, et al. Sequential pattern mining using a bitmap representation. The 8th ACM SIGKDD Int 'l Conf on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, 2002
  • 6J Pei, et al. PreffixSpan: Mining sequential patterns by preffixprojected growth. The 17th Int'l Conf on Data Engineering,Heidelberg, Germany, 2001
  • 7M Zaki. SPADE: An effcient algorithm for mining frequent sequences. Machine Learning, 2001, 42(1/2): 31~60
  • 8T Asai, K Abe, et al. Efficient substructure discovery from large semi-structured data. The 2nd SIAM Int'l Conf on Data Mining,Arlington, VA, USA, 2002
  • 9M Kuramochi, et al. Frequent subgraph discovery. The IEEE Int'l Conf on Data Mining, San Jose, California, USA, 2001
  • 10M J Zaki. Efficiently mining frequent trees in a forest. The 8th ACM SIGKDD Int'l Conf on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, 2002

共引文献466

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部