期刊文献+

基于对比学习的跨模态实体链接模型

A Cross-Modal Entity Linking Model Based on Contrastive Learning
在线阅读 下载PDF
导出
摘要 图文跨模态实体链接是对传统实体链接任务的扩展,其输入为包含实体的图像,目标是将其链接到文本模态的知识库实体上.现有模型通常采用双编码器架构,将图像、文本模态的实体分别编码为向量,利用点乘计算两者的相似度,从而链接到与图像实体相似度最高的文本实体.其训练过程通常采用基于Info NCE损失的对比学习任务,即提高一个实体某模态与自身另一模态的向量相似度,降低与其他实体另一模态的向量相似度.然而此模型忽视了图文2个模态内部表示难度的差异:图像模态中的相似实体,通常比文本模态中的相似实体更难以区分,导致外观相似的图像实体很容易链接错误.因此,提出2个新的对比学习任务来提升向量的判别能力.一个是自对比学习,用于提升图像向量之间的区分度;另一个是难负例对比学习,让文本向量更容易区分几个相似的图像向量.在开源数据集Wiki Person上进行实验,在12万规模的实体库上,相比于采用Info NCE损失的最佳基线模型,模型正确率提升了4.5个百分点. Image-text cross-modal entity linking is an extension of traditional named entity linking.The inputs are images containing entities,which are linked to textual entities in the knowledge base.Existing models usually adopt a dual-encoder architecture which encodes entities of visual and textual modality into separate vectors,then calculates their similarities using dot product,and links the image entities to the most similar text entities.The training process usually adopts the cross-modal contrastive learning task.For a given modality of entity vectors,this task pulls closer the vector of another modality that corresponds to itself,and pushes away the vector of another modality corresponding to other entities.However,this approach overlooks the differences in representation difficulty within the two modalities:visually similar entities are often more difficult to distinguish than textual similar entities,resulting in the incorrect linking of the former ones.To solve this problem,we propose two new contrastive learning tasks,which can enhance the discriminative power of the vectors.The first is self-contrastive learning,which aims to improve the distinction between visual vectors.The second is hard-negative contrastive learning,which helps a textual vectors to distinguish similar visual vectors.We conduct experiments on the open-source dataset WikiPerson.With a knowledge base of 120000 entities,our model achieves an accuracy improvement of 4.5%compared with the previous state-of-the-art model.
作者 王苑铮 孙文祥 范意兴 廖华明 郭嘉丰 Wang Yuanzheng;Sun Wenxiang;Fan Yixing;Liao Huaming;Guo Jiafeng(CAS Key Laboratory of Network Data Science&Technology(Institute of Computing Technology,Chinese Academy of Sciences),Beijing 100190;University of Chinese Academy of Science,Beijing 100049)
出处 《计算机研究与发展》 北大核心 2025年第3期662-671,共10页 Journal of Computer Research and Development
基金 国家自然科学基金项目(62372431) 国家重点研发计划项目(2021QY1701,2023YFA1011602) 中国科学院青年创新促进会会员项目(2021100) 中国科学院计算技术研究所创新项目(E261090) 国防科技创新项目。
关键词 实体链接模型 多模态 跨模态 对比学习 视觉信息 entity linking model multi-modal cross-modal contrastive learning visual information
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部