Masked Autoencoders as Single Object Tracking Learners 被引量：1

在线阅读下载PDF

导出

摘要 Significant advancements have beenwitnessed in visual tracking applications leveragingViT in recent years,mainly due to the formidablemodeling capabilities of Vision Transformer(ViT).However,the strong performance of such trackers heavily relies on ViT models pretrained for long periods,limitingmore flexible model designs for tracking tasks.To address this issue,we propose an efficient unsupervised ViT pretraining method for the tracking task based on masked autoencoders,called TrackMAE.During pretraining,we employ two shared-parameter ViTs,serving as the appearance encoder and motion encoder,respectively.The appearance encoder encodes randomly masked image data,while the motion encoder encodes randomly masked pairs of video frames.Subsequently,an appearance decoder and a motion decoder separately reconstruct the original image data and video frame data at the pixel level.In this way,ViT learns to understand both the appearance of images and the motion between video frames simultaneously.Experimental results demonstrate that ViT-Base and ViT-Large models,pretrained with TrackMAE and combined with a simple tracking head,achieve state-of-the-art(SOTA)performance without additional design.Moreover,compared to the currently popular MAE pretraining methods,TrackMAE consumes only 1/5 of the training time,which will facilitate the customization of diverse models for tracking.For instance,we additionally customize a lightweight ViT-XS,which achieves SOTA efficient tracking performance.

作者 Chunjuan Bo XinChen Junxing Zhang

机构地区 School of Information and Communication Engineering School of Information and Communication Engineering

出处《Computers, Materials & Continua》 SCIE EI 2024年第7期1105-1122,共18页 计算机、材料和连续体（英文）

基金 supported in part by National Natural Science Foundation of China(No.62176041) in part by Excellent Science and Technique Talent Foundation of Dalian(No.2022RY21).

关键词 Visual object tracking vision transformer masked autoencoder visual representation learning

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

同被引文献4

1Yunpeng Wang,Yinghui Wang,Hongmao Qin,Haojie Ji,Yanan Zhang,Jian Wang.A Systematic Risk Assessment Framework of Automotive Cybersecurity[J].Automotive Innovation,2021,4(3):253-261. 被引量：3
2Liam Biddle,Saber Fallah.A Novel Fault Detection,Identification and Prediction Approach for Autonomous Vehicle Controllers Using SVM[J].Automotive Innovation,2021,4(3):301-314. 被引量：3
3Tianzi Zhao,Liang Jin,Xiaofeng Zhou,Shuai Li,Shurui Liu,Jiang Zhu.Unsupervised Anomaly Detection Approach Based on Adversarial Memory Autoencoders for Multivariate Time Series[J].Computers, Materials & Continua,2023(7):329-346. 被引量：1
4Chengjin Qin,Yanrui Jin,Zhinan Zhang,Honggan Yu,Jianfeng Tao,Hao Sun,Chengliang Liu.Anti‐noise diesel engine misfire diagnosis using a multi‐scale CNN‐LSTM neural network with denoising module[J].CAAI Transactions on Intelligence Technology,2023,8(3):963-986. 被引量：4

引证文献1

1Md Naeem Hossain,Md Mustafizur Rahman,Devarajan Ramasamy.Artificial Intelligence-Driven Vehicle Fault Diagnosis to Revolutionize Automotive Maintenance:A Review[J].Computer Modeling in Engineering & Sciences,2024,141(11):951-996.

1Zhongyang Wang,Hu Zhu,Feng Liu.SMSTracker:A Self-Calibration Multi-Head Self-Attention Transformer for Visual Object Tracking[J].Computers, Materials & Continua,2024,80(7):605-623.
2Pengyu Zhang,Dong Wang,Huchuan Lu.Multi-modal visual tracking:Review and experimental comparison[J].Computational Visual Media,2024,10(2):193-214. 被引量：2
3陈坤,赵旭,董春玉,邸子超,陈宗枝.Anti-Occlusion Object Tracking Algorithm Based on Filter Prediction[J].Journal of Shanghai Jiaotong university(Science),2024,29(3):400-413.
4Feng Sun,Ming-Kun Xie,Sheng-Jun Huang.A Deep Model for Partial Multi-label Image Classification with Curriculum-based Disambiguation[J].Machine Intelligence Research,2024,21(4):801-814.
5Madiha Hameed,Aneela Zameer,Muhammad Asif Zahoor Raja.A Comprehensive Systematic Review: Advancements in Skin Cancer Classification and Segmentation Using the ISIC Dataset[J].Computer Modeling in Engineering & Sciences,2024,140(9):2131-2164.
6Xue Zhao,Shaojun Tao,Hongying Tang,Jiang Wang,Baoqing Li.A Layered Energy-Efficient Multi-Node Scheduling Mechanism for Large-Scale WSN[J].Computers, Materials & Continua,2024,79(4):1335-1351.
7Qian Li,Haoze Li,Huan Shen,Yangguang Yu,Haoran He,Xincheng Feng,Yi Sun,Zhiyuan Mao,Guangming Chen,Zongjun Tian,Lida Shen,Xiangming Zheng,Aihong Ji.An Aerial–Wall Robotic Insect That Can Land, Climb, and Take Off from Vertical Surfaces[J].Research,2024(1):149-163.
8Zhiguang SHI,Wei HUO,Zongyu ZUO.Motion-pressure coupled control and simulation of long-endurance capability for multicapsule stratospheric airships[J].Chinese Journal of Aeronautics,2024,37(6):137-150.
9李健,胡瑞娟,张克亮,刘海砚.基于任务转化的事件抽取通用框架[J].计算机工程与应用,2024,60(15):133-142.

Computers, Materials & Continua

2024年第7期

浏览历史

内容加载中请稍等...

Masked Autoencoders as Single Object Tracking Learners 被引量：1

同被引文献4

引证文献1

相关作者

相关机构

相关主题

浏览历史