摘要
词义消歧是自然语言领域中重要的研究课题之一。目前,有监督词义消歧方法已经是解决该问题的有效手段。但是,由于缺乏大规模的训练语料,有监督方法还不能取得满意的效果。该文提出一种基于语言模型的词义消歧优化模型,该模型采用语言模型优化传统的有监督消歧模型,充分利用有监督和语言模型两种模型的消歧优势,共同推导歧义词的词义。该模型可以在训练语料不足的情况下,有效的提高词义消歧效果。在真实数据上表明,该方法的消歧性能超过了参加SemEval-2007:task#5评测任务的最好的有监督词义消歧系统。
Word Sense Disambiguation (WSD) is one of the key issues in natural language processing. Currently, su- pervised WSD method is an effective way to solve the problem. However, because of the lack of large-scale training data, supervised methods cannot achieve satisfactory results. This paper presents a word sense disamhiguation opti- mization model based on statistical language model, which exploits language model to optimize traditional supervised WSD model. The new model derives the meaning of ambiguous words by taking advantage of the knowledge con- tained in training data and language model. The model can significantly improve WSD performance when the training data is insufficient. Experimental results show that the optimized model outperformed the best participating system in the SemEval-2007 : task # 5 evaluation.
出处
《中文信息学报》
CSCD
北大核心
2014年第1期19-25,共7页
Journal of Chinese Information Processing
基金
国家自然科学基金(61132009)
北京理工大学科技创新计划重大项目培育专项计划基金
国防基础基金
关键词
数据稀疏
模型优化
有监督模型
语言模型
参数估计
data sparseness
model optimization
supervised models language models parameter estimation