摘要
自动摘要技术用于将较长篇幅的文章压缩为一段较短的能概括原文中心内容的文本。多文档冗余度高,电子设备所展示的空间有限,成为摘要发展面临的挑战。本文提出融合图卷积特征的句子粗粒度排序方法。首先将句子之间的相似度矩阵视为拓扑关系图,对其进行图卷积计算得到图卷积特征。然后通过排序模型融合图卷积特征以及主流的抽取式多文档摘要技术对句子进行重要度排序,选取排名前四的句子作为摘要。最后提出基于Seq2seq框架的短摘要生成模型:①在Encoder部分采用基于卷积神经网络(CNN)的方法;②引入基于注意力的指针机制,并将主题向量融入其中。实验结果表明,在本文场景下,相较于循环神经网络(RNN),在Encoder部分基于CNN能够更好地进行并行化,在效果基本一致的前提下,显著提升效率。此外,相较于传统的基于抽取和压缩的模型,本文提出的模型在ROUGE指标以及可读性(信息度和流利度)方面均取得了显著的效果提升。
The automatic summarization technique is used to compress a long piece of articles into a shorter text that can generalize the content of the original text.Multiple documents are highly redundant,while the space of electronic devices is limited.Thus,the development of abstracts has faced with challenges.In this paper,a rough granularity sorting method with convolution features is proposed.First,the similarity matrix between sentences is regarded as a topological graph,and the convolution features of the graph convolution are obtained.Then,the convolution features of the graph convolution are fused and the mainstream extraction type multi document summarization technology is used to repeat the sentence,the top four sentences are selected as summaries.Lastly,a short summary generation model based on Seq2seq framework is proposed:1)the method based on the convolution neural network(CNN)is adopted in the encoder part;2)the pointer mechanism based on attention is introduced,the subject vector is incorporated into it.A series of experimental results show that in this scenario,compared with recurrent neural network(RNN),the encoder part based on CNN can be better parallelized,thus increasing the efficiency significantly on the premise of basically consistent effect.In addition,compared with the traditional extraction and compression model,the model proposed in this paper has been significantly improved in ROUGE indicators and readability(information and fluency).
作者
张随远
薛源海
俞晓明
刘悦
程学旗
ZHANG Suiyuan;XUE Yuanhai;YU Xiaoming;LIU Yue;CHENG Xueqi(Key Laboratory of Network Data Science and Technology,Chinese Academy of Sciences,Beijing 100190,China;Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 100190,China)
出处
《广西师范大学学报(自然科学版)》
CAS
北大核心
2019年第2期60-74,共15页
Journal of Guangxi Normal University:Natural Science Edition
基金
国家重点研发计划(2017YFB0803302)