期刊文献+

TF-IDF与规则相结合的中文关键词自动抽取研究 被引量:35

TF-IDF and Rules Based Automatic Extraction of Chinese Keywords
在线阅读 下载PDF
导出
摘要 关键词的抽取广泛应用于自然语言处理过程中.对于中文关键词抽取,分词结果及候选词的选取严重影响后期的抽取结果.针对候选词的选取,提出一种连续单字未登录词识别和多词短语识别的方法来进行候选词选择,可以较好的识别出频率大于1的未登录词,且不依赖于语料库规模和领域.并且,在传统的TF-IDF基础上,结合位置特征和长度特征的情况下,考虑兼类词的不同词性问题,提出改进的TF-IDF计算公式,进行关键词抽取.通过比较实验,证明了候选词对关键词抽取的影响,与TF-IDF进行比较实验,改进的TF-IDF的准确率提高了5%左右. Keywords extraction is widely used in natural language processing.For Chinese keyword extraction,the selection of candidate words affects the final result of keywords extraction.This paper proposes a method to recognize unknown words that consist of continuous individual chinese characters and muti-words phrases.The method can better identify the unknown word whose frequency is greater than one without depending on the scale and scope of the corpus.Considering of the words with different part of speeches and word's position and length,keywords and key phrases extraction is completed based on a newmethod which adds those features to traditional TF-IDF.With comparision exteriments,it shows that the affection of candidate words.Compared to the traditional TF-IDF,the value of P,R and F of the improved TD-IDF method improves about 5%.
作者 牛萍 黄德根
出处 《小型微型计算机系统》 CSCD 北大核心 2016年第4期711-715,共5页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(61173100 61173101 61272375)资助
关键词 抽取 未登录词识别 候选词抽取 TF-IDF extraction unknown word recognition candidate word selection TF-IDF
  • 相关文献

参考文献8

二级参考文献57

共引文献116

同被引文献317

引证文献35

二级引证文献175

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部