摘要
随着网络信息资源的爆发式增长,现有的搜索引擎已经无法满足迅速获取准确信息的需要,为搜索引擎引入搜索内容更为精确、搜索信息量更大的爬虫显得十分迫切.本文实现了一种基于多个分类器的分布式主题爬虫方法.实验结果表明,该爬虫的速度和精度均较为良好,特别适合于对大数据量的特定主题信息的抓取.
As the network information resources grow in an explosive magnitude,current search engines have failed to meet the need for quick and accurate information retrieval.It is crucial to introduce better crawler that can retrieve information more precisely and in larger quantities.This paper introduces a solution of distributional theme crawler based on multiple classifiers.Experimental results prove that the speed and accuracy of this crawler is satisfactory,and particularly suited to capturing large quantities of data of thematic information.
出处
《洛阳师范学院学报》
2011年第11期51-53,57,共4页
Journal of Luoyang Normal University
基金
河南省科技攻关计划基金资助项目(08210221007102300410198)
关键词
主题提取
分类器
主题爬虫
theme extraction
classifier
theme crawler