End-to-end object detection Transformer(DETR)successfully established the paradigm of the Transformer architecture in the field of object detection.Its end-to-end detection process and the idea of set prediction have ...End-to-end object detection Transformer(DETR)successfully established the paradigm of the Transformer architecture in the field of object detection.Its end-to-end detection process and the idea of set prediction have become one of the hottest network architectures in recent years.There has been an abundance of work improving upon DETR.However,DETR and its variants require a substantial amount of memory resources and computational costs,and the vast number of parameters in these networks is unfavorable for model deployment.To address this issue,a greedy pruning(GP)algorithm is proposed,applied to a variant denoising-DETR(DN-DETR),which can eliminate redundant parameters in the Transformer architecture of DN-DETR.Considering the different roles of the multi-head attention(MHA)module and the feed-forward network(FFN)module in the Transformer architecture,a modular greedy pruning(MGP)algorithm is proposed.This algorithm separates the two modules and applies their respective optimal strategies and parameters.The effectiveness of the proposed algorithm is validated on the COCO 2017 dataset.The model obtained through the MGP algorithm reduces the parameters by 49%and the number of floating point operations(FLOPs)by 44%compared to the Transformer architecture of DN-DETR.At the same time,the mean average precision(mAP)of the model increases from 44.1%to 45.3%.展开更多
基金Shanghai Municipal Commission of Economy and Information Technology,China(No.202301054)。
文摘End-to-end object detection Transformer(DETR)successfully established the paradigm of the Transformer architecture in the field of object detection.Its end-to-end detection process and the idea of set prediction have become one of the hottest network architectures in recent years.There has been an abundance of work improving upon DETR.However,DETR and its variants require a substantial amount of memory resources and computational costs,and the vast number of parameters in these networks is unfavorable for model deployment.To address this issue,a greedy pruning(GP)algorithm is proposed,applied to a variant denoising-DETR(DN-DETR),which can eliminate redundant parameters in the Transformer architecture of DN-DETR.Considering the different roles of the multi-head attention(MHA)module and the feed-forward network(FFN)module in the Transformer architecture,a modular greedy pruning(MGP)algorithm is proposed.This algorithm separates the two modules and applies their respective optimal strategies and parameters.The effectiveness of the proposed algorithm is validated on the COCO 2017 dataset.The model obtained through the MGP algorithm reduces the parameters by 49%and the number of floating point operations(FLOPs)by 44%compared to the Transformer architecture of DN-DETR.At the same time,the mean average precision(mAP)of the model increases from 44.1%to 45.3%.