基于筛选学习网络的六自由度目标位姿估计算法

Six degrees of freedom object pose estimation algorithm based on filter learning network

在线阅读下载PDF

导出

摘要针对在复杂场景下对弱纹理目标位姿估计的准确性和实时性问题,提出基于筛选学习网络的六自由度(6D)目标位姿估计算法。首先,将标准卷积替换为蓝图可分离卷积(BSConv)以减少模型参数,并使用GeLU(Gaussian error Linear Unit)激活函数,能够更好地逼近正态分布,以提高网络模型的性能;其次,提出上采样筛选编码信息模块(UFAEM),弥补了上采样关键信息丢失的缺陷;最后,提出一种全局注意力机制(GAM),增加上下文信息,更有效地提取输入特征图的信息。在公开数据集LineMOD、YCB-Video和Occlusion LineMOD上测试,实验结果表明,所提算法在网络参数大幅度减少的同时提升了精度。所提算法网络参数量减少近3/4,采用ADD(-S) metric指标,在lineMOD数据集下较Dual-Stream算法精度提升约1.2个百分点,在YCB-Video数据集下较DenseFusion算法精度提升约5.2个百分点,在Occlusion LineMOD数据集下较像素投票网络(PVNet)算法精度提升约6.6个百分点。通过实验结果可知,所提算法对弱纹理目标位姿估计具有较好的效果,对遮挡物体位姿估计具有一定的鲁棒性。 Six Degrees of freedom(6D)object pose estimation algorithm based on filter learning network was proposed to address the accuracy and real-time performance of object pose estimation for weakly textured objects in complex scenes.Firstly,standard convolutions were replaced with Blueprint Separable Convolutions(BSConv)to reduce model parameters,and GeLU(Gaussian error Linear Unit)activation functions were used to better approximate normal distributions,thereby improving the performance of the network model.Secondly,an Upsampling Filtering And Encoding information Module(UFAEM)was proposed to compensate for the loss of key upsampling information.Finally,a Global Attention Mechanism(GAM)was proposed to increase contextual information and more effectively extracted information from input feature maps.The experimental results on publicly available datasets LineMOD,YCB-Video,and Occlusion LineMOD show that the proposed algorithm significantly reduces network parameters while improving accuracy.The network parameter count of the proposed algorithm is reduced by nearly three-quarters.Using the ADD(-S)metric,the accuracy of the proposed algorithm is improved by about 1.2 percentage points compared to the Dual⁃Stream algorithm on lineMOD dataset,by about 5.2 percentage points compared to the DenseFusion algorithm on YCB-Video dataset,and by about 6.6 percentage points compared to the Pixel-wise Voting Network(PVNet)algorithm on Occlusion LineMOD dataset.Through experimental results,it is known that the proposed algorithm has excellent performance in estimating the pose of weakly textured objects,and has a certain degree of robustness for estimating the pose of occluded objects.

作者邴雅星王阳萍雍玖白浩谋 BING Yaxing;WANG Yangping;YONG Jiu;BAI Haomou(School of Electronic and Information Engineering,Lanzhou Jiaotong University,Lanzhou Gansu 730070,China;Gansu Artificial Intelligence and Graphics and Image Processing Engineering Research Center(Lanzhou Jiaotong University),Lanzhou Gansu 730070,China;School of Computer and Communication,Lanzhou University of Technology,Lanzhou Gansu 730050,China)

机构地区兰州交通大学电子与信息工程学院甘肃省人工智能与图形图像处理工程研究中心(兰州交通大学) 兰州理工大学计算机与通信学院

出处《计算机应用》 CSCD 北大核心 2024年第6期1920-1926,共7页 journal of Computer Applications

基金国家自然科学基金资助项目(62067006) 教育部人文社会科学研究项目(21YJC880085) 甘肃省自然科学基金资助项目(23JRRA845)。

关键词目标姿态估计蓝图可分离卷积注意力机制关键点深度学习 object pose estimation blueprint separable convolution attention mechanism keypoint deep learning

分类号 TP391.4 [自动化与计算机技术—计算机应用技术]