基于对比学习的跨模态实体链接模型

A Cross-Modal Entity Linking Model Based on Contrastive Learning

在线阅读下载PDF

导出

摘要图文跨模态实体链接是对传统实体链接任务的扩展,其输入为包含实体的图像,目标是将其链接到文本模态的知识库实体上.现有模型通常采用双编码器架构,将图像、文本模态的实体分别编码为向量,利用点乘计算两者的相似度,从而链接到与图像实体相似度最高的文本实体.其训练过程通常采用基于Info NCE损失的对比学习任务,即提高一个实体某模态与自身另一模态的向量相似度,降低与其他实体另一模态的向量相似度.然而此模型忽视了图文2个模态内部表示难度的差异:图像模态中的相似实体,通常比文本模态中的相似实体更难以区分,导致外观相似的图像实体很容易链接错误.因此,提出2个新的对比学习任务来提升向量的判别能力.一个是自对比学习,用于提升图像向量之间的区分度;另一个是难负例对比学习,让文本向量更容易区分几个相似的图像向量.在开源数据集Wiki Person上进行实验,在12万规模的实体库上,相比于采用Info NCE损失的最佳基线模型,模型正确率提升了4.5个百分点. Image-text cross-modal entity linking is an extension of traditional named entity linking.The inputs are images containing entities,which are linked to textual entities in the knowledge base.Existing models usually adopt a dual-encoder architecture which encodes entities of visual and textual modality into separate vectors,then calculates their similarities using dot product,and links the image entities to the most similar text entities.The training process usually adopts the cross-modal contrastive learning task.For a given modality of entity vectors,this task pulls closer the vector of another modality that corresponds to itself,and pushes away the vector of another modality corresponding to other entities.However,this approach overlooks the differences in representation difficulty within the two modalities:visually similar entities are often more difficult to distinguish than textual similar entities,resulting in the incorrect linking of the former ones.To solve this problem,we propose two new contrastive learning tasks,which can enhance the discriminative power of the vectors.The first is self-contrastive learning,which aims to improve the distinction between visual vectors.The second is hard-negative contrastive learning,which helps a textual vectors to distinguish similar visual vectors.We conduct experiments on the open-source dataset WikiPerson.With a knowledge base of 120000 entities,our model achieves an accuracy improvement of 4.5%compared with the previous state-of-the-art model.

作者王苑铮孙文祥范意兴廖华明郭嘉丰 Wang Yuanzheng;Sun Wenxiang;Fan Yixing;Liao Huaming;Guo Jiafeng(CAS Key Laboratory of Network Data Science&Technology(Institute of Computing Technology,Chinese Academy of Sciences),Beijing 100190;University of Chinese Academy of Science,Beijing 100049)

机构地区中国科学院网络数据科学与技术重点实验室(中国科学院计算技术研究所) 中国科学院大学

出处《计算机研究与发展》北大核心 2025年第3期662-671,共10页 Journal of Computer Research and Development

基金国家自然科学基金项目(62372431) 国家重点研发计划项目(2021QY1701,2023YFA1011602) 中国科学院青年创新促进会会员项目(2021100) 中国科学院计算技术研究所创新项目(E261090) 国防科技创新项目。

关键词实体链接模型多模态跨模态对比学习视觉信息 entity linking model multi-modal cross-modal contrastive learning visual information

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1宋艳丽,刘德兵,胡海华,常玉清.一种基于动态选择的异常数据检测方法[J].计算机仿真,2024,41(12):555-559.
2林熠笛,李弼程,杨海君.基于相似性增强传播结构的谣言检测[J].计算机科学,2024,51(S02):591-598.
3吴青华,缪云海,高恩伟,张坤元.一种基于知识库和SimBert在短文本场景下的实体链接方法[J].电信工程技术与标准化,2024,37(2):59-63.
4李洪坤,车勋建,宇世鹏,蔡伟华.氢液化透平膨胀机的优化设计与效率评价[J].动力工程学报,2025,45(3):481-488.
5侯萱,梁志贞,张磊,刘佰龙,张雪飞.基于上下文全局空间图的轨迹用户链接[J].计算机工程与科学,2025,47(2):336-348.
6王翔,魏玉锌,毛国君.一种融合图数据多元结构和特征的图池化方法[J].计算机工程,2025,51(1):128-137.
7任霄,刘越.数实融合赋能纺织产业链韧性提升:机理与路径研究[J].中国纤检,2025(2):108-112.
8周笑桐,程傲,张颖,尹嘉男,徐世民.基于随机规划的滑行道与跑道资源综合优化调度研究[J].昆明理工大学学报(自然科学版),2025,50(1):170-177.
9郭俊辰,马御棠,相艳,赵学东,郭军军.基于Prompt打分的实体链接方法[J].计算机工程,2025,51(3):334-341.
10徐雯婧,邢亮.新零售转型背景下食品零售行业的财务风险研究——以G公司为例[J].电子商务评论,2025,14(1):661-666.

计算机研究与发展

2025年第3期

浏览历史

内容加载中请稍等...

基于对比学习的跨模态实体链接模型

相关作者

相关机构

相关主题

浏览历史