With the rapid development of intelligent video surveillance technology,pedestrian re-identification has become increasingly important inmulti-camera surveillance systems.This technology plays a critical role in enhan...With the rapid development of intelligent video surveillance technology,pedestrian re-identification has become increasingly important inmulti-camera surveillance systems.This technology plays a critical role in enhancing public safety.However,traditional methods typically process images and text separately,applying upstream models directly to downstream tasks.This approach significantly increases the complexity ofmodel training and computational costs.Furthermore,the common class imbalance in existing training datasets limitsmodel performance improvement.To address these challenges,we propose an innovative framework named Person Re-ID Network Based on Visual Prompt Technology andMulti-Instance Negative Pooling(VPM-Net).First,we incorporate the Contrastive Language-Image Pre-training(CLIP)pre-trained model to accurately map visual and textual features into a unified embedding space,effectively mitigating inconsistencies in data distribution and the training process.To enhancemodel adaptability and generalization,we introduce an efficient and task-specific Visual Prompt Tuning(VPT)technique,which improves the model’s relevance to specific tasks.Additionally,we design two key modules:the Knowledge-Aware Network(KAN)and theMulti-Instance Negative Pooling(MINP)module.The KAN module significantly enhances the model’s understanding of complex scenarios through deep contextual semantic modeling.MINP module handles samples,effectively improving the model’s ability to distinguish fine-grained features.The experimental outcomes across diverse datasets underscore the remarkable performance of VPM-Net.These results vividly demonstrate the unique advantages and robust reliability of VPM-Net in fine-grained retrieval tasks.展开更多
Visual Place Recognition(VPR)technology aims to use visual information to judge the location of agents,which plays an irreplaceable role in tasks such as loop closure detection and relocation.It is well known that pre...Visual Place Recognition(VPR)technology aims to use visual information to judge the location of agents,which plays an irreplaceable role in tasks such as loop closure detection and relocation.It is well known that previous VPR algorithms emphasize the extraction and integration of general image features,while ignoring the mining of salient features that play a key role in the discrimination of VPR tasks.To this end,this paper proposes a Domain-invariant Information Extraction and Optimization Network(DIEONet)for VPR.The core of the algorithm is a newly designed Domain-invariant Information Mining Module(DIMM)and a Multi-sample Joint Triplet Loss(MJT Loss).Specifically,DIMM incorporates the interdependence between different spatial regions of the feature map in the cascaded convolutional unit group,which enhances the model’s attention to the domain-invariant static object class.MJT Loss introduces the“joint processing of multiple samples”mechanism into the original triplet loss,and adds a new distance constraint term for“positive and negative”samples,so that the model can avoid falling into local optimum during training.We demonstrate the effectiveness of our algorithm by conducting extensive experiments on several authoritative benchmarks.In particular,the proposed method achieves the best performance on the TokyoTM dataset with a Recall@1 metric of 92.89%.展开更多
基金funded by the Key Research and Development Program of Hubei Province,China(Grant No.2023BEB024)the Young and Middle-aged Scientific and Technological Innova-tion Team Plan in Higher Education Institutions inHubei Province,China(GrantNo.T2023007)the key projects ofHubei Provincial Department of Education(No.D20161403).
文摘With the rapid development of intelligent video surveillance technology,pedestrian re-identification has become increasingly important inmulti-camera surveillance systems.This technology plays a critical role in enhancing public safety.However,traditional methods typically process images and text separately,applying upstream models directly to downstream tasks.This approach significantly increases the complexity ofmodel training and computational costs.Furthermore,the common class imbalance in existing training datasets limitsmodel performance improvement.To address these challenges,we propose an innovative framework named Person Re-ID Network Based on Visual Prompt Technology andMulti-Instance Negative Pooling(VPM-Net).First,we incorporate the Contrastive Language-Image Pre-training(CLIP)pre-trained model to accurately map visual and textual features into a unified embedding space,effectively mitigating inconsistencies in data distribution and the training process.To enhancemodel adaptability and generalization,we introduce an efficient and task-specific Visual Prompt Tuning(VPT)technique,which improves the model’s relevance to specific tasks.Additionally,we design two key modules:the Knowledge-Aware Network(KAN)and theMulti-Instance Negative Pooling(MINP)module.The KAN module significantly enhances the model’s understanding of complex scenarios through deep contextual semantic modeling.MINP module handles samples,effectively improving the model’s ability to distinguish fine-grained features.The experimental outcomes across diverse datasets underscore the remarkable performance of VPM-Net.These results vividly demonstrate the unique advantages and robust reliability of VPM-Net in fine-grained retrieval tasks.
基金supported by the Natural Science Foundation of Xinjiang Uygur Autonomous Region under grant number 2022D01B186.
文摘Visual Place Recognition(VPR)technology aims to use visual information to judge the location of agents,which plays an irreplaceable role in tasks such as loop closure detection and relocation.It is well known that previous VPR algorithms emphasize the extraction and integration of general image features,while ignoring the mining of salient features that play a key role in the discrimination of VPR tasks.To this end,this paper proposes a Domain-invariant Information Extraction and Optimization Network(DIEONet)for VPR.The core of the algorithm is a newly designed Domain-invariant Information Mining Module(DIMM)and a Multi-sample Joint Triplet Loss(MJT Loss).Specifically,DIMM incorporates the interdependence between different spatial regions of the feature map in the cascaded convolutional unit group,which enhances the model’s attention to the domain-invariant static object class.MJT Loss introduces the“joint processing of multiple samples”mechanism into the original triplet loss,and adds a new distance constraint term for“positive and negative”samples,so that the model can avoid falling into local optimum during training.We demonstrate the effectiveness of our algorithm by conducting extensive experiments on several authoritative benchmarks.In particular,the proposed method achieves the best performance on the TokyoTM dataset with a Recall@1 metric of 92.89%.