期刊文献+

基于变系数词性空间权值定义的英文句子相似度算法研究 被引量:1

Research on English sentence similarity algorithm based on variable modulus part of speech space definition
在线阅读 下载PDF
导出
摘要 对短文本中词项按词性进行切分构建词性向量,将词性向量中词项进行归并构建词性空间,首次提出对词性空间的权值进行动态定义。词项在词性空间中映射权值通过词项词频信息和Word Net语义词典得到,短文本之间相似度运算转换为各词性空间相似度协同运算。将改进的文本相似度算法运用于微软研究院释义语料库上,实验结果表明,改进的文本相似度算法使得文本相似度计算的准确率和稳定性有了较大的提高。 This paper divided short text into several part of speech vectors according to part of speech of term,and merged those terms in the part of speech vector in order to constitute part of speech space. This paper firstly proposed the strategy of defining the weight of part of speech space. It obtained the weight of term in the part of speech space through term frequency in short text and Word Net semantic library. And it turned into the similarity calculation between short texts the similarity between those part of speech spaces. The experimental results on an open benchmark dataset from Microsoft research paraphrase corpus( MSRP) show that the proposed algorithm acquires a high accuracy and stability compared with traditional algorithm.
出处 《计算机应用研究》 CSCD 北大核心 2015年第4期996-999,共4页 Application Research of Computers
基金 国家自然科学基金资助项目(61173184) 重庆市教委科技计划资助项目(KJ100821) 重庆理工大学研究生创新基金资助项目(YCX2012317)
关键词 Word Net语义词典 词项语义空间映射 可变词性空间权值 词项词频 句子相似度算法 Word Net semantic library term semantic space mapping changing weight of part of speech space term frequen-cy sentence similarity algorithm
  • 相关文献

参考文献16

  • 1BANEA C, HASSAN S, MOHLER M, et al. A supervised synergistic approach to semantic text similarity [ C ]//Proc of the 1 st Joint Confer-ence on Lexical and Computational Semantics. 2012: 635-642.
  • 2RAMAGE D, RAFFERTY A N, MANNING C D. Random walks for text semantic similarity [ C ]//Proc of Workshop on Graph-based Methods for Natural Language Processing. 2009: 23-31.
  • 3ISLAM A, INKPEN D. Semantic text similarity using corpus-based word similarity and string similarity[J]. ACM Yrans on Knowledge Discovery from Data, 2008, 2(2) : 1-25.
  • 4TASI C S, HUANG Yong-ming, LIU Chen-hang, et al. Applying VSM and LCS to develop an integrated text retrieval mechanism[ J]. Expert Systems with Applications, 2012, 39(4) : 3974-3982.
  • 5LIU Wen-yin, QUAN Xiao-jun, FENG Min, et al. A short text mod- eling method combining semantic and statistical information[ J]. In- formation Sciences, 2010, 180(20) : 4031-4041.
  • 6刘赫,刘大有,裴志利,高滢.一种基于特征重要度的文本分类特征加权方法[J].计算机研究与发展,2009,46(10):1693-1703. 被引量:25
  • 7MULLER C, GUREVYCH I, MUHLHAUSER M. Integrating seman- tic knowledge into text similarity and information retrieval [ C ]//Proc of International Conference on Semantic Computing. 2007 : 257-264.
  • 8Stanford NLP Group. Stanford log-linear part-of-speech tagger [ EB/OL]. http://nlp, stanford, edu/software/tagger, shtml/.
  • 9PORTER M F. An algorithm for suffix stripping [ J ]. Program, 2006, 40(3) : 211-218.
  • 10HONG J L. Data extraction for deep Web using WordNet[J]. IEEE Trans on Systems, Man, and Cybernetics, Part C: Applica- tions and Reviews, 2011,41 (6) : 854-868.

二级参考文献31

  • 1Marnitsuka H. Selecting features in microarray classification using ROC curves[J]. Pattern Recognition, 2006, 39 (12) : 2393-2404.
  • 2Tahir M A, Bouridane A, Kurugollu F. Simultaneous feature selection and feature weighting using hybrid tabu search/K-nearest neighbor classifier [J]. Pattern Recognition Letters, 2007, 28(4): 438-446.
  • 3Soucy P, Mineau G W. Beyond TFIDF weighting for text categorization in the vector space model [C]//Proc of the Int Joint Conf on Artificial Intelligence. San Francisco: Morgan Kaufmann, 2005:1130-1135.
  • 4Blansche A, Gancarski P, Korezak J J. A modular approach for clustering with local attribute weighting [J]. Pattern Recognition Letters, 2006, 27(11): 1299-1306.
  • 5Samer H, Rada M, Carmen B. Random-walk term weighting for improved text classification [C] // Proe of the 1st IEEE Int Conf on Semantic Computing. Los Alamitos, CA: IEEE Computer Society, 2007:242-249.
  • 6Salton G, McGill M J. Introduction to Modern Information Retrieval [M]. New York: McGraw-Hill Book Co, 1983.
  • 7Yamada T, Yamashita K, Ishii N, et al. Text classification by combining different distance functions with weights [C]//Proc of the 7th ACIS Int Conf on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing. Los Alamitos, CA: IEEE Computer Society, 2006:85-90.
  • 8Zeng Xueqiang, Wang Mingwen, Nie Jianyun. Text classification based on partial least square analysis [C] //Proc of the 2007 ACM Syrup on Applied Computing. New York: ACM, 2007:834-838.
  • 9David D L, Ringuette M. A comparison of two learning algorithms for text categorization [C]// Proc of the 3rd Annual Symp on Document Analysis and Information Retrieval. Las Vegas, Nevada, USA: Information Science Research Institute, 1994: 81-93.
  • 10Salton G, Buckley B. Term weighting approaches in automatic text retrieval [J]. Information Processing and Management, 1998, 24(5) : 513-523.

共引文献249

同被引文献17

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部