摘要
利用电话录音的汉维平行语料库和开源的Moses系统构建一个基于短语的统计机器翻译系统。针对汉维平行语料库规模较小和维吾尔语形态变化比较丰富的特点,通过对词级的语料库进行切分得到词素级的语料库,并分别进行词一级的实验和词素级的实验。实验表明,词素级的实验能降低无法识别的词的概率,提高翻译的质量。
This paper gives a description of implementing a phrase-based machine translation system for Chinese-Uyghur,by the Moses toolkit,using a parallel corpus which is based on telephone recording.For the small scale parallel corpus and highly-inflected characteristics for Uyghur,it splits the Uyghur words into morphemes,and it gets another parallel corpus on morpheme-level.Experiments are carried out on word-level and morpheme-level separately,and show it can reduce the probability of Out-Of-Vocabulary(OOV) and improve the translation quality.
出处
《计算机工程》
CAS
CSCD
北大核心
2011年第9期16-18,21,共4页
Computer Engineering
基金
中国科学院西部行动计划高新技术基金资助项目(KGCX2-YW-507)
关键词
汉维
维汉
词素
预处理
后处理
Chinese-Uyghur
Uyghur-Chinese
morpheme-level
preprocessing
postprocessing