摘要
针对现有多模态情感分析模型中各模态信息融合不充分以及对时序依赖性挖掘不足的问题,提出了一种结合跨模态注意力、全局自注意力和对比学习的多模态情感分析模型,提高了情感理解的深度。具体来说,首先,分别提取音频、文本、图像3个模态的特征,并将它们映射到同一向量空间中。随后,采用跨注意力机制和全局注意力机制对模态间数据进行有效建模和融合。同时,引入基于数据、标签和时序的对比学习任务,深化模型对多模态特征差异性的理解。在CMU-MOSI和CMU-MOSEI两个公开数据集上的实验结果表明,相较于模态不变和模态特定表示(modality-invariant and-specific representations,MISA)模型,本文模型的二分类准确率分别提升了1.2和1.6百分点,且F1值分别提升了1.0和1.6百分点。
To address the challenges associated with inadequate integration of information across modalities and limited analysis of temporal dependencies in existing multimodal sentiment analysis models,a model incorporating cross-modal attention,global self-attention,and contrastive learning was proposed,to deepen sentiment analysis.Specifically,features from speech,text,and image modalities were independently extracted,and maped into a unified vector space.Then,inter-modal data was effectively modeled and integrated using both cross-attention and global attention mechanisms.Meanwhile,contrastive learning tasks based on data,labeling,and timing were introduced to enhance the model′s understanding of multimodal feature variability.Experimental evaluations on two publicly available datasets,CMU-MOSI and CMU-MOSEI,reveal that the proposed model achieves superior binary classification accuracy improvements of 1.2 and 1.6 percentage points,and F1 score enhancements of 1.0 and 1.6 percentage points,respectively,compared with the modality-invariant and-specific representations(MISA)model.
作者
方旭东
王兴芬
FANG Xudong;WANG Xingfen(Computer School,Beijing Information Science&Technology University,Beijing 102206,China;School of Information Management,Beijing Information Science&Technology University,Beijing 102206,China)
出处
《北京信息科技大学学报(自然科学版)》
2024年第4期63-70,共8页
Journal of Beijing Information Science and Technology University
关键词
多模态
情感分析
注意力机制
对比学习
multimodal
sentiment analysis
attention mechanism
contrastive learning