摘要
对短文本中词项按词性进行切分构建词性向量,将词性向量中词项进行归并构建词性空间,首次提出对词性空间的权值进行动态定义。词项在词性空间中映射权值通过词项词频信息和Word Net语义词典得到,短文本之间相似度运算转换为各词性空间相似度协同运算。将改进的文本相似度算法运用于微软研究院释义语料库上,实验结果表明,改进的文本相似度算法使得文本相似度计算的准确率和稳定性有了较大的提高。
This paper divided short text into several part of speech vectors according to part of speech of term,and merged those terms in the part of speech vector in order to constitute part of speech space. This paper firstly proposed the strategy of defining the weight of part of speech space. It obtained the weight of term in the part of speech space through term frequency in short text and Word Net semantic library. And it turned into the similarity calculation between short texts the similarity between those part of speech spaces. The experimental results on an open benchmark dataset from Microsoft research paraphrase corpus( MSRP) show that the proposed algorithm acquires a high accuracy and stability compared with traditional algorithm.
出处
《计算机应用研究》
CSCD
北大核心
2015年第4期996-999,共4页
Application Research of Computers
基金
国家自然科学基金资助项目(61173184)
重庆市教委科技计划资助项目(KJ100821)
重庆理工大学研究生创新基金资助项目(YCX2012317)