期刊文献+

基于Lucene的搜索引擎设计与实现 被引量:26

Design and Implementation of Search Engine Based on Lucene
在线阅读 下载PDF
导出
摘要 针对目前教育网庞大的FTP资源检索困难的问题,提出一种基于EdtFTPJ和Lucene的FTP搜索引擎的设计和实现方案。该方案整体上采用基于Struts1.2框架的模型-视图-控制器设计模式,数据采集模块利用基于正则表达式的有限状态自动机抓取数据,索引模块应用倒排索引方法,系统的分词算法使用基于字典的正向最大匹配中文分词法。实验结果表明,该方案具有较高的资源检索率,同时能够保证检索结果的准确性。 The number of File Transfer Protocol(FTP) resources on the China Education and Research Network(CERNET) is quite large.It is difficult to find the resources.Because of this problem,a high-performance FTP search engine is designed based on EdtFTPJ and Lucene.In this engine,Struts1.2 is employed to implement Model View Controller(MVC).Data acquisition module uses finite state machine based on regular expression to grab information.Index module uses inverted index method.Word segmentation algorithm uses maximally match Chinese words segmentation based on dictionary.Query Experimental results indicate that the proposed scheme improves the query efficiency,at the same time to ensure the accuracy of the retrieval results.
出处 《计算机工程》 CAS CSCD 北大核心 2011年第16期39-41,共3页 Computer Engineering
基金 国家自然科学基金资助项目(60841004 60971110) 郑州大学创新性实验基金资助项目(2009cxsy100)
关键词 FTP搜索引擎 Lucene框架 模型-视图-控制器 有限状态自动机 倒排索引 File Transfer Protocol(FTP) search engine Lucene framework Model View Controller(MVC) finite state automata inverted index
  • 相关文献

参考文献6

  • 1张宇,王映辉,张翔南.基于Spring的MVC框架设计与实现[J].计算机工程,2010,36(4):59-62. 被引量:143
  • 2郭立力,赵春江.高效FTP搜索引擎的设计与实现[J].华南理工大学学报(自然科学版),2009,37(1):135-139. 被引量:7
  • 3Almpanidis G,Kotropoulos C,Pitas I.Combining Text and Link Analysis for Focused Crawling——An Application for Vertical Search Engines. Information Systems . 2006
  • 4Cavaness C.Programming Jarkata Struts. . 2004
  • 5SimonJ Shepherd.Concepts and architectures for next-generation information search engines. International Journal of Information Management . 2007
  • 6G Leroy,et al.An end user evaluation of query formulation and results review tools in three medical meta-search engines. International Journal of Medical Informatics . 2007

二级参考文献13

  • 1谢欣,刘菲菲,李晓明.天网千帆——一种新型文件搜索引擎[J].华南理工大学学报(自然科学版),2004,32(z1):58-62. 被引量:4
  • 2王映辉,王英杰,王彦君,樊宏斌.基于MVC的软件界面体系结构研究与实现[J].计算机应用研究,2004,21(9):188-190. 被引量:24
  • 3彭波.大规模搜索引擎检索系统框架与实现要点[J].计算机工程与科学,2006,28(3):1-4. 被引量:20
  • 4Wolfram Spink D, Jansen B J, Saracevic A T. Searching the Web : the public and their queries [ J ]. Journal of the American Society for Information Science, 2001,53 : 226- 234.
  • 5Almpanidis G, Kotropoulos C, Pitas I. Combining text and link analysis for focused crawling-an application for vertical search engines [ J ]. Information Systems, 2006,9 (4) :1-23.
  • 6Shepherd S J. Concepts and architectures for next-generation information search engines [ J ]. International Journal of Information Management, 2007,27 ( 1 ) : 3- 8.
  • 7Chuang Shui-lung, Chien Lee-feng. Enriching Web taxonomies through subject categorization of query terms from search engine logs [ J ]. Decision Support Systems,2003, 35( 1 ) :113-127.
  • 8Yan Hong-fei, Wang Jian-yong, Li Xiao-ming. A dynamically reconfigurable model for a distributed Web crawling system [ C ]//Proc of Int'l Conf on Computer Networks and Mobile Computing. Beijing: [ s.n.] ,2001 : 157-162.
  • 9Johnson R, Hoeiler J, Arendsen A. Spring, Java/J2EE Application Framework[EB/OL]. [2008-05-14]. Http://static.springframework. org/spring/docs/2.0.x/reference/index.html.
  • 10刘宁,陆荣国,缪万胜.MVC体系架构从模式到框架的持续抽象进化[J].计算机工程,2008,34(4):107-110. 被引量:24

共引文献148

同被引文献142

引证文献26

二级引证文献65

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部