期刊文献+

An Efficient Agglomerative Clustering Algorithm for Web Navigation Pattern Identification

An Efficient Agglomerative Clustering Algorithm for Web Navigation Pattern Identification
在线阅读 下载PDF
导出
摘要 Web log mining is analysis of web log files with web page sequences. Discovering user access patterns from web access are necessary for building adaptive web servers, to improve e-commerce, to carry out cross-marketing, for web personalization, to predict web access sequence etc. In this paper, a new agglomerative clustering technique is proposed to identify users with similar interest, and to determine the motivation for visiting a website. Using this approach, web usage mining is done through different stages namely data cleaning, preprocessing, pattern discovery and pattern analysis. Results are given to explain how this approach produces tight usage clusters than the existing web usage mining techniques. Rather than traditional distance based clustering, the similarity measure is considered during clustering process in order to reduce computational complexity. This paper also deals with the problem of assessing the quality of user session clusters and cluster validity is measured by using statistical test, which measures the distances of clusters distributions to infer their dissimilarity and distinguish level. Using such statistical measures, it is proved that cluster accuracy is improved to the extent of 0.83, over existing k-means clustering with validity measure 0.26, FCM (Fuzzy C Means) clustering with validity measure 0.56. Rough set based clustering with validity measure 0.54 Generation of dense clusters is essential for finding interesting patterns needed for further mining and analysis. Web log mining is analysis of web log files with web page sequences. Discovering user access patterns from web access are necessary for building adaptive web servers, to improve e-commerce, to carry out cross-marketing, for web personalization, to predict web access sequence etc. In this paper, a new agglomerative clustering technique is proposed to identify users with similar interest, and to determine the motivation for visiting a website. Using this approach, web usage mining is done through different stages namely data cleaning, preprocessing, pattern discovery and pattern analysis. Results are given to explain how this approach produces tight usage clusters than the existing web usage mining techniques. Rather than traditional distance based clustering, the similarity measure is considered during clustering process in order to reduce computational complexity. This paper also deals with the problem of assessing the quality of user session clusters and cluster validity is measured by using statistical test, which measures the distances of clusters distributions to infer their dissimilarity and distinguish level. Using such statistical measures, it is proved that cluster accuracy is improved to the extent of 0.83, over existing k-means clustering with validity measure 0.26, FCM (Fuzzy C Means) clustering with validity measure 0.56. Rough set based clustering with validity measure 0.54 Generation of dense clusters is essential for finding interesting patterns needed for further mining and analysis.
作者 A. Anitha A. Anitha(Department of IT, FX Engineering College, Tirunelveli, Nellai)
机构地区 Department of IT
出处 《Circuits and Systems》 2016年第9期2349-2356,共9页 电路与系统(英文)
关键词 Agglomerative Clustering Similarity Measure Cluster Validity Clickstream Sequence TRANSACTION Agglomerative Clustering Similarity Measure Cluster Validity Clickstream Sequence Transaction
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部