摘要
针对目前早期融合RGB图像和深度图像的特征时存在特征维度高,计算复杂等问题,提出一种结合RGB-D视频序列和卷积神经网络的行为识别算法。首先为了提取RGB视频序列的静态表观和短时域运动信息,将RGB视频序列可重叠地分割为一定数量的子序列片段,输入到卷积神经网络中训练。然后在深度图序列中,将计算得到的改进深度运动图(Depth Motion Map,DMM)作为长时域运动信息表示并输入到二维卷积神经网络训练。最后利用改进的加权乘积法融合上述多路卷积神经网络的预测分数,得到最终的分类结果。实验结果表明,在公开的动作识别库UTD-MHAD和MSR Daily Activity 3D上,该算法能够有效提取行为动作的静态表观和时域运动信息,并取得了较好的识别效果。
A new human action recognition algorithm combining RGB-D and convolutional neural network(CNN)is pro⁃posed to dispose of the problem of high-level feature dimension and computational complexity in the early fusion of the feature of RGB and depth images.First,The convolutional neural network(CNN)is utilized to train RGB video segments which are over⁃lapped segmented from a RGB video sequence,and also extract the static appearance and short-term temporal motion information of a RGB video sequence.Next,in a depth map sequence,the 2D CNN streams is utilized to train the improved depth motion maps(DMMs)which is used as the representation of a long-term temporal motion information.Furthermore,the final classification result can be obtained from the improved weighted product rule which is used to fuse the prediction scores of the above multi-CNN streams.Finally,the experimental results show that the algorithm can effectively extract the static appearance and temporal motion information of actions,achieve better recognition results on the public action recognition of library UTD-MHAD and MSR Daily Ac⁃tivity 3D.
作者
李元祥
谢林柏
LI Yuanxiang;XIE Linbo(Engineering Research Center of Internet of Things Technology Applications of the Ministry of Education,School of Internet of Things Engineering,Jiangnan University,Wuxi 214122)
出处
《计算机与数字工程》
2020年第12期3052-3058,共7页
Computer & Digital Engineering
基金
江苏省产学研联合创新基金资助—前瞻性联合研究项目(编号:BY2016022-28)资助。
关键词
人体行为识别
深度运动图
RGB
卷积神经网络
决策融合
human action recognition
depth motion map
RGB
convolutional neural network
decision-level fusion