摘要
在 HITS超链接主题查找算法的基础上提出了一种检索改进算法 .该算法首先通过网页之间的链接关系计算出每一网页的出度值和入度值 ,并将查询条件与超链接上的标记文本或网页全文内容进行相似度匹配 ,得出每一网页的权值 ,综合权值与出度或入度值 ,将检索结果进行排序输出 .实验结果表明 ,与 HITS、TF* IDF算法相比 。
In this paper, a new algorithm for information retrieval is proposed based on HITS .In this new algorithm, Hub and Authority values are firstly calculated from the links between the web pages, and the relevant weight of each page is gained by matching link anchor or document content with query, and then rank the retrieved results according to weight and hub or authority. The experiment result shows that compared with HITS and TF*IDF, the new algorithm for IR possesses higher precision under the condition of same recall.
出处
《小型微型计算机系统》
CSCD
北大核心
2004年第7期1344-1347,共4页
Journal of Chinese Computer Systems
基金
国家自然科学基金 (60 2 72 0 5 1)资助