期刊文献+

基于超链接和标记文本的信息检索算法 被引量:7

Information-Retrtieval Algorithm Based on Hyperlinks and Anchors
在线阅读 下载PDF
导出
摘要 在 HITS超链接主题查找算法的基础上提出了一种检索改进算法 .该算法首先通过网页之间的链接关系计算出每一网页的出度值和入度值 ,并将查询条件与超链接上的标记文本或网页全文内容进行相似度匹配 ,得出每一网页的权值 ,综合权值与出度或入度值 ,将检索结果进行排序输出 .实验结果表明 ,与 HITS、TF* IDF算法相比 。 In this paper, a new algorithm for information retrieval is proposed based on HITS .In this new algorithm, Hub and Authority values are firstly calculated from the links between the web pages, and the relevant weight of each page is gained by matching link anchor or document content with query, and then rank the retrieved results according to weight and hub or authority. The experiment result shows that compared with HITS and TF*IDF, the new algorithm for IR possesses higher precision under the condition of same recall.
出处 《小型微型计算机系统》 CSCD 北大核心 2004年第7期1344-1347,共4页 Journal of Chinese Computer Systems
基金 国家自然科学基金 (60 2 72 0 5 1)资助
关键词 HITS 网页入度 网页出度 标记文本 HITS algorithm authority hub anchor
  • 相关文献

参考文献10

  • 1王继成,潘金贵,张福炎.Web文本挖掘技术研究[J].计算机研究与发展,2000,37(5):513-520. 被引量:275
  • 2[2]Salton G, Wong A and Yang C S. On the specification of term values in automatic indexing[J]. Journalof Documentation.1973,29(4):351-372.
  • 3[3]Michal Cutler,Shi Yun-ming, Meng Wei-yi. Using the structure of HTML documents to improve retrieval[C]. USENIX Symposium on Internet Technologies and Systems (NSITS97).241-251,Monterey,California,December 1997
  • 4刘芳,卢正鼎.有效地检索HTML文档[J].小型微型计算机系统,2000,21(9):986-988. 被引量:23
  • 5[5]Filippo Menczer, Gautam Pant, Padmini Srinivasan. Evaluating topic driven web crawlers[C]. Proc. of 24th international ACM SIGIR conference on Research and Development in Information retrieval, 2001,241-249.
  • 6[6]Kleinberg J. Authoritative sources in a hyperlinked enviroment[J]. Journal of ACM (JASM),1999,46(5):604-632.
  • 7[7]Krishna Bharat, Monika R. Henzinger improved algorithms for topic distillation in a hyperlinked environment[C]. In: 21st International ACM SIGIR Conference on Research and Development in Informa-tion Retrieval,1998,8, 104-111.
  • 8[8]Soumen Chakrabarti, Mukul Josln,Vivek Tawde. Enhanced topic distillation using text,markuptags,and hyperlinks[C]. In: Proc.of 24th ACM-SIGIR conference on Research and Development in Infor-mation Retrieval,2001,9, 208-216.
  • 9陈治平,林亚平,童调生.基于N层向量空间模型的信息检索算法[J].计算机研究与发展,2002,39(10):1233-1237. 被引量:17
  • 10[10]Chakrabarti S, Dom B, Gibson D, etc. Automatic resource compilation by analyzing hyperlink structure and associated text[C]. Proc.of 7th World Wide Web Conference,1998,65-74.

二级参考文献7

共引文献307

同被引文献46

引证文献7

二级引证文献25

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部