摘要
针对段落式图像描述生成研究中提升描述语句之间的连贯性问题,提出了一种基于全卷积结构的图像段落描述算法.采用基于卷积网络的区域检测器获取图像表示,结合段落在语言学角度的层次性,构建一种层次性的深度卷积解码器对图像表示解码,自动生成段落式文本描述.同时将门控机制嵌入卷积解码器网络中,以提升模型的记忆能力.实验结果表明,相比于基于循环神经网络等传统段落图像的描述方法,新算法能够为图像生成更为连贯的段落式文本描述,在评测指标上取得较好的结果.
How to improve the coherence among descriptive sentences for the paragraph image captioning is paid attention currently.A fully convolutional neural architecture for paragraph image captioning was proposed.An image representation is first obtained using a region detector based on a convolutional network.Then a hierarchical deep convolutional decoder is constructed to translate the image representation,automatically generating a paragraph text description.In addition,the gating mechanism is embedded in the convolutional decoder network to improve memory capacity of the model.Experiments demonstrate that compared with those traditional methods based on recurrent neural networks,the proposed algorithm can generate more coherent paragraph text descriptions for images,achieving better results on evaluation metrics.
作者
李睿凡
梁昊雨
冯方向
张光卫
王小捷
LI Rui-fan;LIANG Hao-yu;FENG Fang-xiang;ZHANG Guang-wei;WANG Xiao-jie(School of Computer Science,Beijing University of Posts and Telecommunications,Beijing 100876,China;Engineering Research Center of Information Networks,Ministry of Education,Beijing 100876,China;Institute of Network Technology,Beijing University of Posts and Telecommunications,Beijing 100876,China)
出处
《北京邮电大学学报》
EI
CAS
CSCD
北大核心
2019年第6期155-161,共7页
Journal of Beijing University of Posts and Telecommunications
基金
国家重点研发计划项目(2019YFF0303302)
国家自然科学基金项目(61906018)
国家电网公司总部科技项目(5200-201918255A-0-0-00).
关键词
卷积网络
深度学习
图像描述
连贯性
convolutional networks
deep learning
image captioning
coherence