摘要
随着近年来智能终端设备和多媒体社交网络平台的飞速发展,平台用户海量增加的同时,多媒体数据同样呈现海量增长的趋势.使当今主流的社交网络平台充斥着海量的文本、图像等多模态媒体数据,有效的信息检索和分析可以大大提高平台多模态数据的利用率及用户的使用体验,而不同模态间存在显著的语义鸿沟,大大制约了海量多模态数据的分析及有效信息挖局,因此,如何在海量的多模态数据中实现跨模态信息的精准检索就成为当今学术界和工业界面临的重要挑战.本文提出了一种基于循环生成对抗网络的跨媒体信息检索算法.方法基于对抗网络模型框架,通过约束条件的设计,实现了跨模态数据表征的一致性和信息的完整性.首先,该方法构建生成模型实现了文本和图像模态间的互相转换,并基于对抗学习理论,实现跨模态数据在独立空间下语义的一致性约束,保证跨模态数据信息表征的完整性;其次,为了进一步缩小跨模态数据的语义鸿沟,提出了循环交叉熵损失函数,增强跨模态数据在独立空间下表征的一致性,进一步确保信息表征的完整性;最后,通过多模异构数据共嵌特征空间构造引导跨模态数据在共嵌空间下的一致性表征,消除跨模态数据的语义鸿沟,实现跨媒体数据的精准检索.本文针对算法的优势在公开数据库Flickr30k和MSCOCO上的对比实验和相关实验分析,相关的实验结果也证明了本文所提算法的优越性和合理性.
In recent years,with the development of smart devices and social networks,the number of users in these social networks is becoming more and more.The multi-media data is also showing a trend of massive growth.Thus,these current social platforms store large scales of multi-modal data,such as text,video,and images.Effective information retrieval and analysis can greatly improve the utilization of multi-modal data and user experience.However,there exists a significant semantic gap between the two different modalities.This condition greatly restricts the analysis of massive multi-modal data and the mining of effective information,which seriously reduces the user experience.Thus,How to handle accurate cross-modal information retrieval based on the massive multi-modal data has become an important challenge in both academia and industry.Many approaches have been proposed to handle the cross-modal information retrieval problem.However,there is an innate semantic gap between cross-modal data,which leads that many current approaches pay more attention to the consistency of the cross-modal data representation and ignore the completeness of information representation.The lack of completeness leads to a decrease in the accuracy of cross-modal information retrieval.In this paper,we propose a cross-media retrieval method based on the cycle generative adversarial networks,which design the effective loss functions to guarantee the consistency and completeness of cross-modal data representation based on the generative adversarial network framework.First,we design the generative model to handle the transformation between the text and image data.Moreover,it implements the semantic consistency representation of cross-modal data in the independent feature space based on the adversarial networks.This design is inspired by the GAN,which can guarantee consistency in semantic space.We hope this consistency can guide feature learning in the co-embedding feature space.Second,we propose the cycle cross-entropy loss function to further narrow the cross-modal semantic gap,which can further benefit enhancing the feature-level consistency constraint and representation completeness in the independent feature space.This design can be used to assist the adversarial network to further pull in the consistency of the representation of cross-modal data in independent space,and guarantee the completeness of final cross-modal data features in the co-embedding feature space.Finally,we construct the co-embedding feature space of multi-modal heterogeneous data to guide the consistent representation learning of cross-modal data.This method can effectively bridge the semantic gap between multiple modalities to achieve more accurate cross-media retrieval.The consistency of cross-modal features in independent space will guide the feature learning in the co-embedding feature space to guarantee the completeness of the final cross-modal data representation.The popular Flickr30k and MSCOCO datasets are utilized for the evaluation and analysis.Some classic cross-modal information retrieval methods have been selected as comparison methods.We make the corresponding experiments and analyses according to the completeness and consistency of the cross-modal data representation respectively.We also make the qualitative experimental analysis to discuss the advantages and disadvantages of our approach.In general,the final comparison experiments demonstrate the superiority and effectiveness of the proposed method.
作者
聂为之
王岩
杨嵩
刘安安
张勇东
NIE Wei-Zhi;WANG Yan;YANG Song;LIU An-An;ZHANG Yong-Dong(School of Electrical and Information Engineering,Tianjin University,Tianjin 300072;School of Information Science and Technology,University of Science and Technology of China,Hefei 230026)
出处
《计算机学报》
EI
CAS
CSCD
北大核心
2022年第7期1529-1538,共10页
Chinese Journal of Computers
基金
国家自然科学基金项目(U21B2024,61525206,61872267)
天津市新一代人工智能重大专项(19ZXZNGX00110,18ZXZNGX00150)
天津市青年基金(19JCQNJC00500)资助.
关键词
跨模态检索
图文检索
生成对抗网络
cross-modal retrieval
image-text retrieval
generative adversarial network