摘要
词义消歧是自然语言处理领域的基本任务。在词语词向量表示的基础上,计算获得多义词语上下文窗口的向量表示。利用统计的多义词及词义个数,基于K-means算法聚类文本语料集中多义词的上下文窗口表示,在原始文本语料集中对多义词语根据聚类类别进行标记。在标记的文本语料集上,训练获得多义词语每个词义的向量表示。对句子中的多义词语,给出了一种基于多义词向量表示的词义消歧方法,实验结果显示该方法有效可行。
Word sense disambiguation is a basic task in natural language process. To the original text corpus,the vector representation of polysemous word context window is calculated based on vector representation. Using statistical polysemy and the numbers of word sense,the vector representation of polysemy context window is clustered based on K-means,and the polysemous words are marked in the original text corpus. On the marked text corpus,the vector representation of polysemy' word sense is trained by using neural network language model. A word sense disambiguation method based on polysemy vector representation is presented.The experimental results show that the method is effective and feasible.
作者
李国佳
赵莹地
郭鸿奇
LI Guojia;ZHAO Yingdi;GUO Hongqi(School of Information Engineering,North China University of Water Resources and Electric Power,Zhengzhou 450045,China;School of Electric Power,North China University of Water Resources and Electric Power,Zhengzhou 450045,China)
出处
《智能计算机与应用》
2018年第4期52-56,共5页
Intelligent Computer and Applications
基金
华北水利水电大学2017年创新创业计划项目(2017XB136)