期刊文献+

主题网络爬虫研究综述 被引量:132

Survey on topic-focused Web crawler
在线阅读 下载PDF
导出
摘要 首先给出了主题网络爬虫的定义和研究目标;然后系统分析了近年来国内外主题爬虫的研究方法和技术,包括基于文字内容的方法、基于超链分析的方法、基于分类器预测的方法以及其他主题爬行方法,并比较了各种方法优缺点;最后对未来的研究方向进行了展望。 This paper gave the goal of focused crawling, then comprehensively analyzed the recent advances of the relevant researches and applications about focused-crawler, included focused crawling methods based on text contents, link analyses' methods, classifier-guided methods and other focused methods. Finally pointed out the future direction of focused crawling.
出处 《计算机应用研究》 CSCD 北大核心 2007年第10期26-29,47,共5页 Application Research of Computers
关键词 主题网络爬虫 信息检索 WEB挖掘 topic-focused crawler information retrieval Web mining
  • 相关文献

参考文献23

  • 1MURRAY B,MOORE A.Sizing the Internet[M].[S.l.]:Cyveillance Inc,2000.
  • 2LAWRENCE S,GILES L.Accessibility and distribution of information on the Web[J].Nature,1999,400(8):107-109.
  • 3CHO J,CARCIA M H.The evolution of the Web and implication for an incremental crawler[C]//Proc of the 26th International Conference on Very Large Databases (NVLDB-00).2000.
  • 4BREWINGTON B E,CYBENKO C.How dynamic is the Web[C]//Proc of the 9th International World Wide Web Conference.2000.
  • 5MENCZER F,PANT C,RUIZ M E.Evaluating topic-driven Web crawlers[C]//Proc of SIGIR'01.New Orleans,Louisiana:[s.n.],2001:241-249.
  • 6MENCZER F,PANT C,SRINIVASAN P.Topic-driven crawlers:machine learning issues[EB/OL].(2002-05-15).http://dollar.biz.uiowa.edu/-fil/papers.html.
  • 7CHO J,GARCIA M H,PAGE L.Efficient crawling through URL ordering[J].Computer Networks and ISDN Systems,1998,30(1-7):161-172.
  • 8DeBRA P,HOUBEN G,KORNATZKY Y,et al.Information retrieval in distributed hypertexts[C]//Proc of the 4th RIAO Conference.New York:[s.n.],1994:481-491.
  • 9HERSOVICI M,JACOVI M,MAAREK Y S,et al.The shark-search algorithm:an application:tailored Web site mapping[C]//Proc of the 7th International World Wide Web Conference.Brisbane:[s.n.],1998:65-74.
  • 10BRIN S,PAGE L.The anatomy of a large-scale hypertexual Web search engine[C]//Proc of the 7th World Wide Web Conference.Brisbane:[s.n.],1998.

二级参考文献19

  • 1Aggarwal C, AI-Garawi F, Yu P. Intelligent Crawling on the World Wide Web with Arbitrary Predicates. In Proceedings of the 10th International WWW Conference,2001.
  • 2Brin S, Page L, Tile Anatomy of a Large-scale Hypertextual Web Search Engine. In Proceedings of the Seventh International World Wide Web Conference, 1998.
  • 3Diligenti M, Coetzee F M, Lawrence S, et al. Gori Focused Crawling Using Context Graphs. VLDB Conference, 2000.
  • 4Menczer F, Srinivasan G P P, Ruiz M. Evaluating Topic-driven Web Crawlers. In Proceedings of the 24th Annual International ACM/SIGIR Conference,2001.
  • 5McCallum A, Nigam K, Rennie J, et al. Building domain-specific search engine with machine learning techniques [A]. AAAI Spring Symposium on Intelligent Agents in Cyberspace, Stanford University,USA,1999.
  • 6Chakrabarti S M, van den Berg H, Dom B. Focused crawling: a new approach to topic-specific Web resource discovery [J]. Computer Networks,1999,31(11-16):1 623-1 640.
  • 7Diligenti M, Coetzee F M, Lawrence S, et al. Focused crawling using context graphs [A]. 26th International Conference on Very Large Database, Cairo,Egypt, 2000.
  • 8Chakrabarti S, Kunal P, Mellela S. Accelerated focused crawling through online relevance feedback [A]. The Eleventh International Conference on World Wide Web, Hawaii,USA,2002.
  • 9Nigam K. Using unlabeled data to improve text classification [D]. Pittsburgh, USA: School of Computer Science, Carnegie Mellon University, 2001.
  • 10Jing Peng, Williams R. Incremental multi-step Q-learning [J]. Machine Learning,1996,22(1-3):283-290.

共引文献44

同被引文献794

引证文献132

二级引证文献540

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部