摘要
语法纠错任务是自然语言处理领域的一项重要任务,近年来受到了学术界广泛关注。该任务旨在自动识别并纠正文本中所包含的语法、拼写以及语序错误等。该文将语法纠错任务看作是翻译任务,即将带有错误表达的文本翻译成正确的文本,采用基于多头注意力机制的Transformer模型作为纠错模型,并提出了一种动态残差结构,动态结合不同神经模块的输出来增强模型捕获语义信息的能力。受限于目前训练语料不足的情况,该文提出了一种数据增强方法,通过对单语语料的腐化从而生成更多的纠错数据,进一步提高模型的性能。实验结果表明,该文所提出的基于动态残差的模型增强以及腐化语料的数据增强方法对纠错性能有着较大的提升,在NLPCC 2018中文语法纠错共享评测数据上达到了最优性能。
Grammatical error correction is an important task in the field of natural language processing,which has attracted wide attention in recent years.This paper regards grammatical error correction task as a translation task to translate the wrong texts into the right ones.We use the transformer model with multi-head attention mechanism as framework,and propose a dynamic residual structure to combine the outputs of different neural blocks dynamically to better capture semantic information.Due to the lack of training corpus,we propose a data augmentation method to generate the parallel data by corrupting a monolingual corpus.The experimental results show that the proposed method based on dynamic residuals and data augmentation has significantly improved the performance of error correction,achieving the best performance on NLPCC 2018 Chinese grammatical error correction task.
作者
王辰成
杨麟儿
王莹莹
杜永萍
杨尔弘
WANG Chencheng;YANG Liner;WANG Yingying;DU Yongping;YANG Erhong(Faculty of Information Technology,Beijing University of Technology,Beijing 100124,China;Beijing Advanced Innovation Center for Language Resources,Beijing Language and Culture University,Beijing 100083,China;School of Information Science,Beijing Language and Culture University,Beijing 100083,China)
出处
《中文信息学报》
CSCD
北大核心
2020年第6期106-114,共9页
Journal of Chinese Information Processing
基金
北京语言大学语言资源高精尖创新中心项目(TYZ19005)
国家语委信息化项目(ZDI135-105,YB135-89)
关键词
语法纠错
多头注意力
动态残差结构
数据增强
grammatical error correction
multi-head attention
dynamic residual structure
data augmentation