An Efficient Agglomerative Clustering Algorithm for Web Navigation Pattern Identification

An Efficient Agglomerative Clustering Algorithm for Web Navigation Pattern Identification

在线阅读下载PDF

导出

摘要 Web log mining is analysis of web log files with web page sequences. Discovering user access patterns from web access are necessary for building adaptive web servers, to improve e-commerce, to carry out cross-marketing, for web personalization, to predict web access sequence etc. In this paper, a new agglomerative clustering technique is proposed to identify users with similar interest, and to determine the motivation for visiting a website. Using this approach, web usage mining is done through different stages namely data cleaning, preprocessing, pattern discovery and pattern analysis. Results are given to explain how this approach produces tight usage clusters than the existing web usage mining techniques. Rather than traditional distance based clustering, the similarity measure is considered during clustering process in order to reduce computational complexity. This paper also deals with the problem of assessing the quality of user session clusters and cluster validity is measured by using statistical test, which measures the distances of clusters distributions to infer their dissimilarity and distinguish level. Using such statistical measures, it is proved that cluster accuracy is improved to the extent of 0.83, over existing k-means clustering with validity measure 0.26, FCM (Fuzzy C Means) clustering with validity measure 0.56. Rough set based clustering with validity measure 0.54 Generation of dense clusters is essential for finding interesting patterns needed for further mining and analysis. Web log mining is analysis of web log files with web page sequences. Discovering user access patterns from web access are necessary for building adaptive web servers, to improve e-commerce, to carry out cross-marketing, for web personalization, to predict web access sequence etc. In this paper, a new agglomerative clustering technique is proposed to identify users with similar interest, and to determine the motivation for visiting a website. Using this approach, web usage mining is done through different stages namely data cleaning, preprocessing, pattern discovery and pattern analysis. Results are given to explain how this approach produces tight usage clusters than the existing web usage mining techniques. Rather than traditional distance based clustering, the similarity measure is considered during clustering process in order to reduce computational complexity. This paper also deals with the problem of assessing the quality of user session clusters and cluster validity is measured by using statistical test, which measures the distances of clusters distributions to infer their dissimilarity and distinguish level. Using such statistical measures, it is proved that cluster accuracy is improved to the extent of 0.83, over existing k-means clustering with validity measure 0.26, FCM (Fuzzy C Means) clustering with validity measure 0.56. Rough set based clustering with validity measure 0.54 Generation of dense clusters is essential for finding interesting patterns needed for further mining and analysis.

作者 A. Anitha A. Anitha(Department of IT, FX Engineering College, Tirunelveli, Nellai)

机构地区 Department of IT

出处《Circuits and Systems》 2016年第9期2349-2356,共9页 电路与系统（英文）

关键词 Agglomerative Clustering Similarity Measure Cluster Validity Clickstream Sequence TRANSACTION Agglomerative Clustering Similarity Measure Cluster Validity Clickstream Sequence Transaction

分类号 TP3 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

1朱峥瑜,宋燕.一种基于多重信息的不完全数据的模糊C均值聚类算法[J].小型微型计算机系统,2021,42(12):2545-2552. 被引量：10
2Chenxi Wang,Stefania Zourlidou,Jens Golze,Monika Sester.Trajectory analysis at intersections for traffic rule identification[J].Geo-Spatial Information Science,2021,24(1):75-84.
3彭如洁,曾庆新,邱锋,彭孝纬.RNA干扰沉默CLIC4基因对胃癌细胞增殖和侵袭的影响[J].中国老年学杂志,2022,42(3):707-711.
4Jianfeng YU,Jiazhen PANG,Jie ZHANG,Yuan LI.Pose-free assembly retrieval based on spatial-contact skeleton[J].Chinese Journal of Aeronautics,2022,35(4):497-507.
5尹梦婕,刘国平,李怡慧,姜波,白霜,李凌绪.ACCase Trp-1999-Leu突变对菵草种子萌发的影响[J].山东农业科学,2021,53(12):44-49. 被引量：1
6Mailing Zhao,Jun Ye.P-Indeterminate Vector Similarity Measures of Orthopair Neutrosophic Number Sets and Their Decision-MakingMethod with Indeterminate Degrees[J].Computer Modeling in Engineering & Sciences,2021(9):1219-1230.
7Xiu-Xia Xing,Ting Xu,Chao Jiang,Yin-Shan Wang,Xi-Nian Zuo.Connectome Computation System:2015–2021 updates[J].Science Bulletin,2022,67(5):448-451. 被引量：2
8Yulong HUANG,Mingming BAI,Yonggang ZHANG.A novel multiple-outlier-robust Kalman filter[J].Frontiers of Information Technology & Electronic Engineering,2022,23(3):422-437. 被引量：1
9Norio Watanabe.Dissimilarity Measures for Time Series and Trend Analysis: Application to COVID-19 Cases Series[J].Journal of Mathematics and System Science,2021,11(1):1-12.
10Elettra Merola,Andrea Michielan,Umberto Rozzanigo,Marco Erini,Sandro Sferrazza,Stefano Marcucci,Chiara Sartori,Chiara Trentin,Giovanni de Pretis,Franca Chierichetti.Therapeutic strategies for gastroenteropancreatic neuroendocrine neoplasms: State-of-the-art and future perspectives[J].World Journal of Gastrointestinal Surgery,2022,14(2):78-106. 被引量：1

Circuits and Systems

2016年第9期

浏览历史

内容加载中请稍等...

An Efficient Agglomerative Clustering Algorithm for Web Navigation Pattern Identification

相关作者

相关机构

相关主题

浏览历史