摘要
最近的数据到文本生成方法广泛采用了编码器—解码器架构或其变体,但是这些方法无法识别数据中不同部分的信息的重要性,导致在选择适当的内容和排序方面表现不佳。针对这些问题,提出了一个基于层次化结构表示的数据到文本生成方法,它包括规划阶段和生成阶段,规划阶段通过实体级、记录级的多层次注意力来增强语义空间的表达能力,输出的计划代表重要内容的高层次表示,同时将计划输入给生成阶段的生成器得到最终的文本。通过在两个数据到文本生成的数据集上进行的广泛实验表明,该方法相比于已有的数据到文本生成方法,生成的文本对数据的描述更加准确,质量更高,该方法的提出为数据到文本生成的研究提供了一定的指导性作用。
Recent data-to-text generation methods have widely adopted encoder-decoder architectures or their variants,but these methods fail to identify the different importance of information in different parts of the data,resulting in poor performance in selecting appropriate content and ranking.To address these problems,this paper proposed a data-to-text generation method based on hierarchical structural representation,which consisted of a planning phase and a generation phase.The planning phase enhanced the representation of the semantic space through multi-level attention of entity-level and record-level,and the output plan represented a high-level representation of the important content,while the plan was input to the generator in the generation phase to obtain the final text.Extensive experiments on two datasets generated by data-to-text show that the method generates texts have more accurate descriptions of data and higher quality compared to existing data-to-text generation methods.The proposed method provides some guidance for the research of data-to-text generation.
作者
龚永罡
郭怡星
廉小亲
马虢春
王希
刘宏宇
Gong Yonggang;Guo Yixing;Lian Xiaoqin;Ma Guochun;Wang Xi;Liu Hongyu(School of Artificial Intelligence,Beijing Technology&Business University,Beijing 100048,China)
出处
《计算机应用研究》
CSCD
北大核心
2023年第8期2399-2403,共5页
Application Research of Computers
基金
“十三五”时期北京市属高校高水平教师队伍建设支持计划资助项目(CIT&TCD201904037)。
关键词
数据到文本生成
多层次注意力
层次化结构表示
编码器—解码器架构
data-to-text generation
multi-level attention
hierarchical structure representation
encoder-decoder architecture