摘要
现有的关键词提取方法从文档集或者单文档方面考虑词语的特征,很少考虑词语在单文档和文档集中的综合特征对关键词提取效果产生的影响,因此,本文提出多元特征加权的关键词提取方法。该方法通过Word2vec模型提取出词语在文档集中的语义关系特征与词语在单文档中的重要性特征,通过线性加权的方式计算出词语的综合影响力,用于改进TextRank模型中的概率转移矩阵,最后迭代计算选取排名靠前的词语作为文档的关键词。实验结果表明,从单文档和文档集两方面综合考虑词语的影响力,可以有效地改善关键词的提取效果。
Existing keyword extraction methods take into account the characteristics of words from the document set or single document,and rarely comprehensively considered the impact of the comprehensive features of words in single document and document set on the keyword extraction effect.This paper proposed a multi-feature weighted keyword extraction method.This method used the Word2 vec model to extract the semantic relationship characteristics of words in the document set,and the importance characteristics of words in a single document to calculate the comprehensive influence of the words in a linear weighting manner,which was used to improve the probability transition matrix in the TextRank model.Finally,iterative calculation selected the top-ranked words as the keywords of the document.Experimental results show that comprehensive consideration of the influence of words from both a single document and a document set can effectively improve the effect of keyword extraction.
作者
余本功
张宏梅
曹雨蒙
YU BenGong;ZHANG HongMei;CAO YuMeng(School of Management,Hefei University of Technology,Hefei 230009,China)
出处
《数字图书馆论坛》
CSSCI
2020年第3期41-50,共10页
Digital Library Forum
基金
国家自然科学基金资助项目“基于制造大数据的产品研发知识集成与服务机制研究”(编号:71671057)资助。