期刊文献+

融合多层次视觉信息的人物交互动作识别 被引量:1

Human-Object Interaction Recognition Integrating Multi-level Visual Features
在线阅读 下载PDF
导出
摘要 基于计算机视觉的人体动作识别技术在视频监控、智能驾驶、人机交互、多媒体内容审核等领域均有着广阔的应用前景,其中人体动作中的人物交互是动作识别的核心内容之一。现有的人物交互动作识别模型对人物关系的提取仅仅停留在表层视觉特征之上,并未充分挖掘人体关键区域以及人物之间的深层语义关系。针对此问题,文中提出了层次化的图神经网络模型(HGNN)对人物交互动作建模。HGNN模型从局部到整体显式地对人体关键区域以及人和物构成的场景图进行建模,并利用注意力图池化机制(AttPool)剔除层次图中冗余的信息和噪声,再通过图卷积网络提取图结点之间的深层语义关系,对卷积网络提取的特征进行聚合与优化,从而得到反映人物交互动作本质的特征表示。另外,HGNN模型在中层图进行的临时监督分类也能够约束网络更好地学习到交互动作的人体模式,避免网络对交互对象产生“偏见”。最后,针对HGNN模型,设计了多任务损失函数,用于有效进行模型训练。为了验证HGNN模型的有效性,在公开的大型数据集V-COCO上进行了广泛的实验,结果均显示所提出的HGNN模型对常见的人物交互动作具有广泛的适应性和鲁棒性,精度(mAP)超过了现有的基于图神经网络的模型,同时领先于大部分最新的多流卷积模型。 Computer vision based human action recognition technique has a broad application in the fields of video surveillance,intelligent driving,human-computer interaction,multimedia content audit,etc.More importantly,human-object interaction is one of the core components in human action recognition.Most of the existing human-object interaction action recognition models,which are based on multi-stream convolutional neural networks,only capturing the visual features superficially.They fail to fully explore the key areas of human body and the deep semantic relationship between human and objects.To solve this problem,this paper proposes a hierarchical graph neural network(HGNN)model.HGNN explicitly models the critical areas of the human body and the interaction of human-object in the scene from local to global,and uses an attention pooling mechanism(AttPool)to eliminate redundant information and noise in the graph.Then,the deep semantic relationship between graph nodes are captured by the graph convolution network,and the initial features extracted by convolutional neural network are aggregated and optimized.In this way,the feature representation which reflects the essential character of human-object interaction can be obtained.In addition,the interim supervised classification in the middle graph can also constrain the model to better learn the human patterns of interactive actions,and avoid the model to produce“bias”on the interactive objects.Finally,a multi-task loss function is designed for the HGNN to effectively train the model.To test and verify the effectiveness of the proposed HGNN model,extensive experimental evaluations on the famous public benchmark V-COCO have been conducted.The results show that the proposed HGNN model is adaptive and robust for human-object interaction detection,which outperforms the previous graph neural network based methods by a large margin,and also performs better than most of the latest convolutional neural network based models.
作者 李宝珍 张晋 王宝录 余平 LI Bao-zhen;ZHANG Jin;WANG Bao-lu;YU Ping(Shendong Jinjie Colliery,Chn Energy,Shenmu,Shaanxi 719319,China;Chn Energy Network Infomation Technology(Beijing)CO.,LTD.,Beijing 100011,China)
出处 《计算机科学》 CSCD 北大核心 2022年第S02期643-650,共8页 Computer Science
关键词 计算机视觉 人体动作识别 人物交互 深度学习 图神经网络 Computer vision Human action recognition Human-Object interaction Deeplearning Graph neural network
  • 相关文献

参考文献3

二级参考文献12

共引文献19

同被引文献1

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部