本文描述了一个基于分层语块分析的统计翻译模型。该模型在形式上不仅符合同步上下文无关文法,而且融合了基于条件随机场的英文语块分析知识,因此基于分层语块分析的统计翻译模型做到了将句法翻译模型和短语翻译模型有效地结合。该系统...本文描述了一个基于分层语块分析的统计翻译模型。该模型在形式上不仅符合同步上下文无关文法,而且融合了基于条件随机场的英文语块分析知识,因此基于分层语块分析的统计翻译模型做到了将句法翻译模型和短语翻译模型有效地结合。该系统的解码算法改进了线图分析的CKY算法,融入了线性的N-gram语言模型。目前,本文主要针对中文-英文的口语翻译进行了一系列实验,并以国际口语评测IWSLT(International Workshopon Spoken Language Translation)为标准,在2005年的评测测试集上,BLEU和NIST得分均比统计短语翻译系统有所提高。展开更多
A novel model based on structure alignments is proposed for statistical machine translation in this paper. Meta-structure and sequence of meta-structure for a parse tree are defined. During the translation process, a ...A novel model based on structure alignments is proposed for statistical machine translation in this paper. Meta-structure and sequence of meta-structure for a parse tree are defined. During the translation process, a parse tree is decomposed to deal with the structure divergence and the alignments can be constructed at different levels of recombination of meta-structure (RM). This method can perform the structure mapping across the sub-tree structure between languages. As a result, we get not only the translation for the target language, but sequence of meta-stmctu .re of its parse tree at the same time. Experiments show that the model in the framework of log-linear model has better generative ability and significantly outperforms Pharaoh, a phrase-based system.展开更多
Lexicalized reordering models are very important components of phrasebased translation systems.By examining the reordering relationships between adjacent phrases,conventional methods learn these models from the word a...Lexicalized reordering models are very important components of phrasebased translation systems.By examining the reordering relationships between adjacent phrases,conventional methods learn these models from the word aligned bilingual corpus,while ignoring the effect of the number of adjacent bilingual phrases.In this paper,we propose a method to take the number of adjacent phrases into account for better estimation of reordering models.Instead of just checking whether there is one phrase adjacent to a given phrase,our method firstly uses a compact structure named reordering graph to represent all phrase segmentations of a parallel sentence,then the effect of the adjacent phrase number can be quantified in a forward-backward fashion,and finally incorporated into the estimation of reordering models.Experimental results on the NIST Chinese-English and WMT French-Spanish data sets show that our approach significantly outperforms the baseline method.展开更多
文摘本文描述了一个基于分层语块分析的统计翻译模型。该模型在形式上不仅符合同步上下文无关文法,而且融合了基于条件随机场的英文语块分析知识,因此基于分层语块分析的统计翻译模型做到了将句法翻译模型和短语翻译模型有效地结合。该系统的解码算法改进了线图分析的CKY算法,融入了线性的N-gram语言模型。目前,本文主要针对中文-英文的口语翻译进行了一系列实验,并以国际口语评测IWSLT(International Workshopon Spoken Language Translation)为标准,在2005年的评测测试集上,BLEU和NIST得分均比统计短语翻译系统有所提高。
基金the National High Technology Research and Development Progran of China(No.200606010108.2006AA01Z150)
文摘A novel model based on structure alignments is proposed for statistical machine translation in this paper. Meta-structure and sequence of meta-structure for a parse tree are defined. During the translation process, a parse tree is decomposed to deal with the structure divergence and the alignments can be constructed at different levels of recombination of meta-structure (RM). This method can perform the structure mapping across the sub-tree structure between languages. As a result, we get not only the translation for the target language, but sequence of meta-stmctu .re of its parse tree at the same time. Experiments show that the model in the framework of log-linear model has better generative ability and significantly outperforms Pharaoh, a phrase-based system.
基金supported by the National Natural Science Foundation of China(No.61303082) the Research Fund for the Doctoral Program of Higher Education of China(No.20120121120046)
文摘Lexicalized reordering models are very important components of phrasebased translation systems.By examining the reordering relationships between adjacent phrases,conventional methods learn these models from the word aligned bilingual corpus,while ignoring the effect of the number of adjacent bilingual phrases.In this paper,we propose a method to take the number of adjacent phrases into account for better estimation of reordering models.Instead of just checking whether there is one phrase adjacent to a given phrase,our method firstly uses a compact structure named reordering graph to represent all phrase segmentations of a parallel sentence,then the effect of the adjacent phrase number can be quantified in a forward-backward fashion,and finally incorporated into the estimation of reordering models.Experimental results on the NIST Chinese-English and WMT French-Spanish data sets show that our approach significantly outperforms the baseline method.