摘要
基于区域候选网络(RPN)与孪生网络的框架可以快速的回归位置、形状信息,展现出了良好的跟踪速度和准确性.然而,采用的单阶段SiamRPN跟踪器不能有效地处理相似性干扰和大尺度变化等复杂情况.针对上述问题,本文提出了一个基于特征金字塔(FPN)的级联RPN网络(简记为CF-RPN)的多阶段的跟踪框架.该框架的基础网络由一对孪生的FPN构成,其深高层到浅低层特征分别输送到级联的RPN模块中.相对于传统RPN网络,级联RPN网络具有多个锚点框,其锚点受前一级RPN的影响.与现有的算法相比,其一,多尺度特征的提取使得目标的高层语义信息以及底层空间信息都能充分利用;其二,级联RPN网络能够对难负样本(hard negative samples)进行采样,保证训练样本更加均衡;其三,级联的RPN可实现锚点框逐级更新,从而细化每一个RPN中目标的位置和形状,提高定位的准确性,使得跟踪更加精确.通过测试,本文提出的CF-RPN算法在OTB50,OTB100上能达到62.36%和66.18%的准确率,相对于SiamMask算法,其精度分别提高了2.14%和1.9%;在VOT2016,VOT2018,VOT2019数据集上分别能达到65.5%,61.3%,60.0%,相对于SiamMask算法,其精度分别提高了3.4%,2.1%和1.8%.
The framework based on regional proposal networks(RPN)and Siamese networks emerge the good tracking speed and accuracy while quickly realizing the position and shape regression.It should be pointed out that the single-stage SiamRPN tracker cannot effectively deal with complex situations including similarity interferences and large-scale variations.To solve such a problem, this paper proposes a multi-stage tracking framework of cascaded RPN networks combined with feature pyramid(FPN)(simply denoted as CF-RPN).The basic network of such a framework is composed of a pair of twin FPN whose features originating from deep layers and shallow layers are transmitted to the cascaded RPN modules.Compared with traditional RPN networks, cascaded RPN networks have multiple anchor frames, whose anchors are affected by its previous RPN.Compared with existing algorithms, the multi-scale feature extraction makes that the high-level semantic information and low-level spatial information of targets can be fully utilized.Cascaded RPN networks can effectively handle the challenge of hard negative samples to ensure more balanced training samples.They also provide the capability to update the anchor frames step by step to refine the position and shape of the target in each RPN meanwhile improve both the position accuracy and the tracking accuracy.Through testing, the accuracy of the CF-RPN algorithm proposed in this paper can achieve, respectively, 62.36% and 66.18% on OTB50 and OTB100,which are 2.14% and 1.9% higher than that of the SiamMask algorithm.Furthermore, the accuracy reaches, respectively, 65.5%,61.3%,and 60.0% in the datasets of VOT2016,VOT2018,and VOT2019,which are 3.4%,2.1%,and 1.8% higher than that of the SiamMask algorithm.
作者
王敬坤
丁德锐
梁伟
王永雄
WANG Jing-kun;DING De-rui;LIANG Wei;WANG Yong-xiong(Department of Control Science and Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2022年第1期117-123,共7页
Journal of Chinese Computer Systems
基金
国家基金面上项目(61673276)资助。
关键词
目标跟踪
多尺度
特征提取
孪生网络
object tracking
multi-scale
feature extraction
siamese network