期刊文献+

基于混合模型的维吾尔文词性标注方法 被引量:6

Uyghur Part of Speech Tagging Method Based on Hybrid Model
在线阅读 下载PDF
导出
摘要 维吾尔语词性标注是词法分析中的重要任务之一,其标注结果的准确性直接影响到自然语言处理的后续工作。维吾尔语词性标注的难点是如何正确判断兼类词和未登录词的词性。提出了基于BiLSTM-CNN-CRF的混合模型进行维吾尔语词性标注。上述模型采用三层结构,先用CNN网络框架训练出维吾尔文单词的字符级形态特征向量,其次用skip-gram方法对大规模语料进行训练生成具有语义信息的低维度稠密实数词向量,然后将字符级特征向量和词向量拼接的组合向量作为BiLSTM-CRF深层神经网络的输入向量进行训练,构建适合维吾尔语词性标注的BiLSTM-CNN-CRF混合神经网络模型。实验结果显示,新的神经网络混合模型的词性标注准确率在实验室提供的数据集上达到了最好的标注结果,F1值达到了97.01%,对维吾尔语兼类词及未登录词标注有明显的提高。 Uyghur part of speech tagging is one of the most important tasks in lexical analysis. The accuracy of the tagging results directly affects the follow-up work of natural language processing. The difficulty of Uyghur part of speech tagging is how to correctly judge the part of speech of concurrent and unregistered words. This paper proposed a hybrid model based on Bi LSTM-CNN-CRF for Uyghur part of speech tagging. The model adopted a three-layer structure. Firstly,the character-level morphological feature vectors of Uyghur words were trained by CNN network framework. Secondly,the large-scale corpus was trained by skip-gram method to generate low-dimensional dense real word vectors with semantic information. Secondly,we constructed Bi LSTM-CNN-CRF hybrid neural network model suitable for Uyghur part of speech tagging,which used the combination vectors of character-level feature vectors and word vectors as input vectors for training. The experimental results show that the new hybrid neural network model achieves the best part of speech tagging accuracy on the data set provided by the laboratory,and the f1 value reaches97. 01 %,which significantly improves the tagging of Uyghur concurrent words and unregistered words.
作者 帕丽旦.木合塔尔 吾守尔.斯拉木 买买提阿依甫 MUHETAER Palidan;SILAMU Wushouer;Maimaitayifu(College of Information Science and Engineering,Xinjiang University,Urumqi Xinjiang 830046,China)
出处 《计算机仿真》 北大核心 2019年第1期268-273,共6页 Computer Simulation
基金 国家"973"重点基础研究计划基金资助项目((2014CB340506) 国家自然科学基金资助项目(61363063) 新疆大学多语种重点实验室开放课题(XJDX0905-2013-01)
关键词 递归神经网络 卷积神经网络 条件随机场 维吾尔语 词性标注 Recurrent neural network (RNN) Convolutional neural networkCNN) Conditional random field (CRF) Uyghur Part of speech tagging
  • 相关文献

参考文献5

二级参考文献27

  • 1买合木提·买买提.基于统计的维吾尔语词性标注研究与实现[D].乌鲁木齐:新疆大学,2009.
  • 2吐尔根·依不拉音,阿里甫·库尔班.基于词典的现代维吾尔语词性自动标注系统的研究[C]//中文输入技术发展历程及输入方案汇编(论文集),2006,11.
  • 3Lafferty J,McCallum A,Pereira F.Conditional Random Fields:Probabilistic Models for Segmenting and Labeling Sequence Data.In:Proceedings of the 18th International Conf on machine Learning,2001.282~289
  • 4Sha F,Pereira F.Shallow Parsing with Conditional Random Fields.In:Proceedings of Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics (HLT-NAACL),2003
  • 5现代汉语语料库加工规范-词语切分与词性标注.北京大学计算语言学研究所,1999
  • 6Bai Shuanhu.An Integrated Model of Chinese Word Segmentation and Part-of Speech Tagging.In:Advanced and Applications on Computational Linguistics,Third National Computational Linguistics Meeting,Shanghai.Nov.1995.56~61
  • 7Bai S H,Xia,Y,Huang C N.Automatic Part-of-Speech Tagging System of Chinese:[Technical Report].Beijing:Tsinghua University,1992
  • 8Ratnaparkhi A.A maximum entropy model for part-of-speech tagging[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 1996 : 133-141.
  • 9Zhao Yan, Wang Xiaolong.Applying class triggers in Chi- nese POS tagging based on maximum entropy model[C]// The 3rd International Conference on Machine Learning and Cybernetics,Shanghai,2004: 1641-1645.
  • 10Darroch J N, Ratcliff D.Generalized iterative scaling for log2 linear models[J].Analysis of Mathematical Statistics, 1999,43(5) : 1470-1480.

共引文献155

同被引文献88

引证文献6

二级引证文献31

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部