摘要
中文对抗样本生成作为自然语言处理领域的重要研究内容,一直受到众多学者的广泛关注。先前的中文对抗样本生成方法主要有替换字词、改变词序等,生成的对抗样本攻击效果差且容易被检测模型识别。该文提出基于攻击引导扩散的中文对抗样本生成方法DiffuAdv。将扩散模型引入中文对抗样本生成中,通过模拟文本对抗样本攻击时的数据分布来增强其扩散机制,利用对抗样本与原始样本之间的变化梯度作为引导条件,在预训练阶段指导模型的逆扩散过程,进而生成更自然且攻击成功率更高的对抗样本。在多个数据集上对自然语言处理领域的不同任务与多种方法进行了对比实验验证。结果表明,本文方法所生成的对抗样本具有高攻击成功率。此外,消融实验也验证了攻击梯度引导在提高对抗样本生成质量的有效性。经过困惑度(PPL)度量实验,本文方法所生成的对抗样本平均PPL仅为0.518,验证了其具有强鲁棒性。DiffuAdv的提出丰富了文本对抗样本生成的研究视角,也拓宽了文本情感分类、因果关系抽取及情感原因对抽取等任务的研究思路。
[Objective] The generation of adversarial samples in text represents a significant area of research in natural language processing.The process is employed to test the robustness of machine learning models and has gained widespread attention from scholars.Owing to the complex nature of Chinese semantics,generating Chinese adversarial samples remains a major challenge.Traditional methods for generating Chinese adversarial samples mainly involve word replacement,deletion/insertion,and word order adjustment.These methods often produce samples that are easily detectable and have low attack success rates,and thus,the methods struggle to balance attack effectiveness and semantic coherence.To address these limitations,this study introduces DiffuAdv,a novel method for generating Chinese adversarial samples.This approach enhances the generation process by simulating the data distribution during the adversarial attack phase.The gradient changes between adversarial and original samples are used as guiding conditions during the model's reverse diffusion phase in pre-training,resulting in the generation of more natural and effective adversarial samples.[Methods] DiffuAdv entails the introduction of diffusion models into the generation of adversarial samples to improve attack success rates while ensuring the naturalness of the generated text.This method utilizes a gradient-guided diffusion process,leveraging gradient information between original and adversarial samples as guiding conditions.It consists of two stages:forward diffusion and reverse diffusion.In the forward diffusion stage,noise is progressively added to the original data until a noise-dominated state is achieved.The reverse diffusion stage involves the reconstruction of samples,in which the gradient changes between adversarial and original samples are leveraged to maximize the adversarial objective.During the pre-training phase,data capture and feature learning occur under gradient guidance,with the aim of learning the data distribution of original samples and analyzing the deviations from adversarial samples.In the reverse diffusion generation phase,adversarial perturbations are constructed using gradients and integrated into the reverse diffusion process,ensuring that at each step of reverse diffusion,samples evolve toward greater adversarial effectiveness.To validate the effectiveness of the proposed method,extensive experiments are conducted across multiple datasets and various natural language processing tasks,and the performance of the method is compared with those of seven existing state-of-the-art methods.[Results] Compared with existing methods for generating Chinese adversarial samples,DiffuAdv demonstrates higher attack success rates across three tasks:text sentiment classification,causal relation extraction,and sentiment cause extraction.Ablation experiments confirm the effectiveness of using gradient changes between original and adversarial samples to guide the generation of adversarial samples and improve their quality.Perplexity(PPL) measurements indicate that the adversarial samples generated by DiffuAdv have an average PPL value of only 0.518,demonstrating that these samples are superior in rationality and readability compared with the samples generated by other methods.[Conclusions] DiffuAdv effectively generates high-quality adversarial samples that closely resemble real text in terms of fluency and naturalness.The adversarial samples produced by this method not only achieve high attack success rates but also exhibit strong robustness.The introduction of DiffuAdv enhances the research perspective on generating adversarial text samples and broadens the approaches for tasks such as text sentiment classification,causal relationship extraction,and emotion-cause pair extraction.
作者
吴厚月
李现伟
张顺香
朱洪浩
王婷
WU Houyue;LI Xianwei;ZHANG Shunxiang;ZHU Honghao;WANG Ting(School of Computer and Information Engineering,Bengbu University,Bengbu 233030,China;Anhui Engineering Research Center for Intelligent Applications and Security of Industrial Internet,Anhui University of Technology,Ma'anshan 243032,China;School of Computer Science and Engineering,Anhui University of Science&Technology,Huainan 232000,China;Institute of Artificial Intelligence,Hefei Comprehensive National Science Center,Hefei 240088,China;School of Information Engineering,Huainan Union University,Huainan 232000,China)
出处
《清华大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2024年第12期1997-2006,共10页
Journal of Tsinghua University(Science and Technology)
基金
国家自然科学基金面上项目(62076006)
认知智能全国重点实验室开放课题(COGOS-2023HE02)
安徽省高校协同创新项目(GXXT-2021-008)
安徽省高校自然科学研究重点项目(2022AH051921,2022AH051909)
安徽省高校优秀青年人才支持计划重点项目(gxyqZD2021135)
蚌埠学院高层次人才科研启动基金(BBXY2020KYQD02)
安徽工业大学工程研究中心开放项目(IASII22-08)
蚌埠学院2024年校级科研一般项目(2024ZR02,2024ZR03)
蚌埠学院2024年校级科研应用型科研项目(2024YYX48pj)。
关键词
对抗样本生成
引导扩散
条件扩散
扩散模型
文本生成
adversarial sample generation
guided diffusion
conditional diffusion
diffusion model
text generation