期刊文献+

基于Hadoop/Hive的web日志分析系统的设计 被引量:24

Design of web log analysis system based on Hadoop/Hive
在线阅读 下载PDF
导出
摘要 互联网技术的迅速发展,使得web承载的信息量呈现出爆炸式增长的趋势,因此web日志的数据量也越来愈大。如何存储、处理大规模数据就成了新的挑战。云计算技术的出现,为这类问题的解决提供了一种思路。云计算将数据通过网络分布到集群的各个计算节点上,从而完成大规模数据的存储和运算。Hadoop是一个用于构建云计算平台的流行的开源框架,广泛应用于海量数据的处理。但利用Hadoop处理数据,用户必须自己开发Map/Reduce程序。这种程序处于比较低的层次,用户不容易掌握,而且难于维护。Hive是一个基于Hadoop的开源数据仓库工具,它能够将文件映射成数据表,并提供类SQL语句,简化了用户的开发。利用Hadoop、Hive设计了一个用于处理web日志分析的系统,既充分利用了Hadoop的海量数据处理的能力,又降低了开发的难度。通过与单机实验的对比,证明系统是有效的和有价值的。 With the rapid development of Internet technology, the amount of information carried by the web shows explosive growth trend. With this correspondence, web log data is becoming bigger and bigger. Cloud computing technology provides a way to solve this kind of problem. Cloud computing technology completes storing and computing of massive data by distributing data to each computing node of cluster through the network. Hadoop is an open source framework which used widely in massive data processing. However, users have to develop their own Map/Reduce procedure if they want to process data using Hadoop. The Map/Reduce procedure is not easy to grasp and maintain, because it is at a relatively low level. Hive is an open source data warehouse tool which is based on the Hadoop. Hive can map the file into a data table, and provide SQL-Like statements, which simpli- fies the user's development. The web log analysis system based on Hadoop and Hive make full use of the data processing ability and reduces the difficulty of development. The system proved to be effective and valuable according to an experiment contrasted with the stand-alone machine.
出处 《广西大学学报(自然科学版)》 CAS CSCD 北大核心 2011年第A01期314-317,共4页 Journal of Guangxi University(Natural Science Edition)
关键词 WEB日志 云计算 HADOOP Hive web log cloud computing Hadoop Hive
  • 相关文献

参考文献5

  • 1HUSSAIN T, ASGHAR S, MASOOD N. Web Usage Mining:A Survey on Preprocessing of Web Log File[ C]//Information and Emerging Technologies, 2010 : 1-6.
  • 2程苗,陈华平.基于Hadoop的Web日志挖掘[J].计算机工程,2011,37(11):37-39. 被引量:64
  • 3ASHISH T, JOYDEEP S, NAMIT J, et al. Hive-A Petabyte Scale Data Warehouse Using Hadoop[ C ],//Data Engineering (ICDE), 2010 IEEE 26th International :996-1005.
  • 4Tom White.Hadoop权威指南[M].曾大聃,周傲英,译.北京:清华大学出版社,2010.
  • 5HE YONGQIANG, LEE RUBAO, HUAI YIN, et al. RCFile:A Fast and Space-efficient Data Placement Structure in MapReduce-bsed Warehouse Systems[ C ]//Data Engineering (ICDE), 2010 IEEE 26th International :996-1005.

二级参考文献5

共引文献63

同被引文献120

引证文献24

二级引证文献94

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部