当前位置：首页 > news >正文

joomla可以做预订类网站吗做运动户外的网站都有哪些

news 2025/12/27 14:01:12

joomla可以做预订类网站吗,做运动户外的网站都有哪些,深圳建设工程交易服务网网址,软件定制开发费用云鲸互创信任一、 Introduction NLP对抗攻击是人工智能对抗攻击的一个重要的组成部分#xff0c;但是最近几年才逐渐开始兴起#xff0c;究其原因在于NLP对抗攻击与传统computer vision或者audio对抗攻击有很大的不同#xff0c;主要在于值空间的连续性#xff08;CV、audio#xff0…一、 Introduction NLP对抗攻击是人工智能对抗攻击的一个重要的组成部分但是最近几年才逐渐开始兴起究其原因在于NLP对抗攻击与传统computer vision或者audio对抗攻击有很大的不同主要在于值空间的连续性CV、audio和离散性NLP。如图为传统的一种对CV和audio模型的攻击方式如图对CV与audio的攻击是在一张图片或一段录音中加入微小连续的扰动如高斯噪声在人眼或人耳不可识别的条件下使模型进行错误的分类。以对CV模型攻击为例 CV的 256 × 256 256 \times 256 256×256大小的图片像素值空间为 [ 0 , 255 ] 256 × 256 [0,255]^{256 \times 256} [0,255]256×256内的连续实数空间对其添加扰动比较容易。但是如图 NLP领域中数值是由一个一个的离散的token组成。因此对NLP模型进行处理时需要先将离散的token转换为连续的vector这样才能让NLP模型对其进行处理。因此对NLP模型做攻击时也只能处理离散的token。连续的vector一般来说是在NLP模型内部生成的因为无法做到对其加噪声。二、Evasion Attacks and Defenses 1. Introduction 在CV中Evasion Attacks就是在图片中添加人眼不可见的噪声使图片分类模型对其进行错误的分类。如图对于原始图片模型有57.7%的概率认为其是熊猫但是在对其添加了人眼无法察觉到的噪声连续值空间后模型有99.3%的概率认为其是长臂猿分类错误。同样的在NLP中Evasion Attack指的是对原始的句子进行修改在对人类来说不改变语义的情况下使模型对修改过的句子进行错误的预测。以情感分析为例如图上图是一段影评对于原始的句子NLP模型认为其是负面的但是在对film添加上一个s后模型认为其是正面这对人来说是很难察觉的。对NLP的Evasion Attack还有其他方面比如修改句子使翻译模型对其进行错误的翻译。这里不在进行赘述。 2. Four Ingredients in Evasion Attacks 以影评的情感分析为例Evasion Attacks攻击的完整步骤执行框架为 1 Goal对既定的攻击模型和对抗样本指定攻击目标。 2 Transformation对对抗样本进行相应的转换添加扰动在此过程中会产生很多可能的候选样本。 3 Constraints根据设置的限制条件对候选样本进行过滤。比如语法错误、人称错误或同义词变成反义词等等 4 Search: 采取一些研究方法在候选的样本中选择可以成功的使模型进行错误预测的样本作为最终的对抗样本。 Morris, J., Lifland, E., Yoo, J. Y., Grigsby, J., Jin, D., Qi, Y. (2020). TextAttack: A framework for adversarial attacks, data augmentation, and adversarial training in NLP. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations.2020. 2.1. Goal: What the attack aims to achieve 以新闻类别分类为例 2.1.1. Untargeted classification: 使模型对当前文本做错误的分类而不关心错误分类的类别。如图对原有新闻文本进行修改使NLP模型对其进行错误的分类但是不关心误分类的类别。只要错误分类就行其他的不关心 2.1.2. Targeted classification: 使模型对当前文本做误分类且误分类的类别也应该被指定。如图在对原有新闻文本进行修改后使模型误分类到指定的Sci/Tech板块。 2.1.3. Universal suffix dropper: 在对翻译文本加入一些前缀后模型回忽略前缀后的文本。如图如图在对翻译文本添加红色前缀后其后面的紫色文本将不在会被翻译。 Wallace, E., Stern, M., Song, D. (2020). Imitation attacks and defenses for black-box machine translation systems. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP).2020 2.1.4. Wrong parse tree in dependency parsing使模型对当前文本做错误的解析 Zheng, X., Zeng, J., Zhou, Y., Hsieh, C.-J., Cheng, M., Huang, X. (2020). Evaluating and enhancing the robustness of neural network-based dependency parsing models with adversarial examples. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020 2.2. Transformations: How to construct perturbations for possible adversaries 如图如图采取某些方法对样本进行转换产生大量候选样本。之后再运用constrain对候选样本进行过滤。 2.2.1. word substitution by WordNet synonyms 同义词替换在进行文本转换时必须要保持文本的语义不变因此最简单的方法是进行同义词替换。WorkNet synonyms是一个同义词数据集。如图如图对原始文本根据WorkNet synonyms进行同义词替换。但是在替换时可能会出现替换后的句子语音改变或者“别扭”这时就需要constraint进行过滤。 2.2.2. Word substitution by knn or ε \varepsilon ε-ball in counter-fitted Glove embedding space 将文本的单词转换为对应的word embedding在embedding vector中寻找相近的单词。如图对原始文本进行转换不是进行同义词替换而是在Counter-fitted embedding space中设置一个半径为 ε \varepsilon ε的“球”可以认为“球”内的embedding对应的单词与原始单词最接近 ε \varepsilon ε是单词接近的程度。这样就可以防止一些不合语义的候选样例产生。 Counter-fitted embedding space: Use linguistic constrains to pull synonyms closer and antonyms far away from each others 如图 Counter-fitted使用语言学的一些限制让同义词变得更近反义词变的更远。对于原始的Glove embedding space词性相近出现频率相同的单词是靠的比较近的。比如东、西、南、北但是如果将”东“变成”西“那么句子的整个意思就会发生改变因此需要在Counter-fitted Glove embedding space中画一个半径为 ε \varepsilon ε的球这样句子的意思才不容易改变。 Mrkšić, N., Ó Séaghdha, D., Thomson, B., Gašić, M., Rojas-Barahona, L. M., Su, P.-H., Vandyke, D., Wen, T.-H., Young, S. (2016). Counter-fitting word vectors to linguistic constraints. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.2016 2.2.3. Word substitution by BERT masked language modeling(MLM) prediction 如图首先对文本的相关单词进行遮蔽之后放入BERT中输出预测的单词将其插回源文本作为候选文本。但是可以看到BERT预测的masked token与源文本的token差别比较大且预测概率最高的单词double与源文本recommend甚至相反因此单独使用BERT对masked token做预测是不可取的行为。 2.2.4. Word substitution by BERT reconstruction(no masking) 如图如图不对源文本进行遮蔽直接将其放入BERT中这样输出的字符信息就与源文本中对应的字符十分接近。但是可以看到不进行遮蔽时预测的字符与源文本中的字符十分接近因此大大限制了BERT的能力。 2.2.5. Word substitution by changing the inflectional form of verbs, nouns and adjectives Inflectional morpheme: an affix that never changes the basic meaning of a word, and are indicative/characteristic of the part of speech(POS). 屈折语素永远不会改变单词基本含义的词缀并且指示/表征词性POS 如图如图在不改变单词含义的情况下改变了单词的时态。但是可以看到改变时第一和第三个句子语法是错误的因此还需要在constraint中进行过滤。 2.2.6. Word substitution by gradient of the word embedding 该方法涉及梯度计算因此这是一个white-box攻击。如图首先将源文本放入模型中会得到对应的Loss之后对文本中指定的单词 e 0 e_0 e0如recommend求偏导这便是 e 0 e_0 e0在当前文本中的贡献。之后计算计算 e 0 e_0 e0与嵌入空间其他的embedding的差与Loss和 e 0 e_0 e0偏导的乘积这便是当 e 0 e_0 e0变为其他字符时Loss改变的一阶近似。在运行中选择使当前Loss改变最大的单词作为转换的单词。Loss越大代表模型预测越“不准” 如图该图为二维状态下的数学解释比较简单就不赘述了。 2.2.7. Word insertion based on BERT MLM 如图如图在想插入单词的位置先插入一个masked token之后将插入后的文本放入BERT中获得BERT预测的插入后的文本作为对抗的候选文本。 2.2.8. Word deletion 如图如图直接删减单词不建议单独使用。 2.2.9. Character level transform SwapSubstitutionDeletionInsertion 如图字符级别的转换在日常中十分常见比如一个人在打字时多打一个字母或少打一个字母。特别的在Substitution方法中会专门寻找与当前字母在键盘上相近的字母进行替换这样可以提高真实性。因为在模型训练时模型可能没有接触过类似的字符出错的“错别字”因此该方法生成的对抗样本的性能比较高。 Gao, J., Lanchantin, J., Soffa, M. L., Qi, Y. (2018). Black-box generation of adversarial text sequences to evade deep learning classifiers. 2018 IEEE Security and Privacy Workshops (SPW).2018 2.3. Constrains: What a valid adversarial examples should satisfy 2.3.1. What a valid adversarial sample should satify 关于对抗样本的限制要具体问题具体分析。目前来说一般的限制条件包括overlapping、grammaticality和semantic similarity 2.3.2. Overlapping between the original and perturbed sample 2.3.2.1. Levenshtien edit distance (character level) 该方法一般用在character level的对抗样本中。该方法计算transform后的单词与transform之前的单词按顺序改动的字符的数量。越小越好如图 Levenshtien edit distance问题是实质上一个递归问题其本质上是比较两个单词之间的不同字符的数量。如图假设kitten经过transform之后变为sitting。 step1: k - s,lev 1 Step2: i、t、t没变此时lev不变 step3: e - ilev 1 step4: n没变此时lev不变 step5: kitten此时已经全部比较完毕根据公式如果kitten比较完毕level sitting剩下的长度即 lev 1 此时lev计算完毕 2.3.2.2. Maximum percentage of modified words 如图该方法计算transform之后的文本中被修改的单词的比例。越小越好 2.3.3. Grammaticality of the perturbed sample 2.3.3.1. Part of speech (POS) consistency 如图 POS即词性通过限制transform之后单词的词性来保证transform之后文本在语法和语义上的正确性。如上图recommend是非单三的动词形式第一个候选样本是advocate完全符合第三个候选样本是recommendation是名词不符合第二个候选样本是recommended是动词过去时虽然在语法上仍然正缺但是修改了原词的时态其保留与否还需具体问题具体分析。 2.3.3.2. Number of grammarical errors (evaluated by some toolkit) 借助语法检查工具来检查当前候选文本中语法错误的数量。越少越好 2.3.3.3. Fluency scored by the perplexity of a pre-trained language model 如图将当前生成的候选文本送人预训练语言模型根据其perplexity困惑度来过滤候选样本。perplexity越小越好 2.3.4. Semantic similarity between the transformed sample and the original sample 2.3.4.1. Diatance of the swapped word’s embedding and the original word’s embeding 在embedding space中比较两个单词之间的相似性通过设置一个合理的阈值来对候选样本吗进行过滤。如图以余弦相似度为例判断单词之间相似程度该例通过embedding space中不同单词之间的余弦相似度来判断单词之间的相似程度。需要注意的时阈值的设定十分重要不好的阈值会使攻击效果十分差。 2.3.4.2. Similarity between the transformed sample’s sentence embedding and the original sample’s sentence embedding 如图以余弦相似度为例首先选择一个通用的句子编码器可以输入字符串的NLP模型获取文本的embeding vector之后比较两个句子之间的余弦相似度根据设置的余弦相似度阈值来过滤候选像本。 2.4. Search Method: How to find an adversarial example from the transformations that satisfies the constrains and meets the goal 2.4.1. Greedy Search: Score the each transformation at each position, and then replace the words in decreasing order of the score until the prediction flips 如图 step1: 产生修改各个单词之后的候选样本并送入被攻击模型中得到模型的分类概率与Loss step2: 根据Loss从大到小对候选样本进行降序排序并按需修改替换单词直到模型进行错误的分类。 step3: 对抗样本生成成功。以上图为例首先将highly换为inordinately此时虽然Loss大幅上升但是模型的分类仍然正确。这时挑选Loss第二大的样本将recommend修改为advocate这时模型进行了错误的分类将positive误分类成了negative。对抗样本生成成功。注意有greedy search就有beam search这里不再赘述。 2.4.2. Greedy search with word imprtance ranking (WIR) Word Importance ranking by leave-one-out(LOO): see how the ground truth probablity decreases when the word is removed from the input 如图逐个删除单词计算删除后文本的Loss和预测概率分布的差值。Loss上升越大和正确分类概率下降值越大则代表当前单词越重要。 Word Impartance ranking by the gradient of the word embedding (white-box) 如图通过计算Loss与各个单词的embedding vector做偏导来计算单词的重要程度。偏导值越大的代表单词的重要性越高。 Step 1: Score each word’s importance 首先对文本中各个单词的重要性排序。 Step2: Swap the words from the most important to the leasrt important 首先选择重要性第一的recommend进行替换选择Loss最大的advocate替换此时虽然Loss变大但是模型分类仍正确。之后选择重要性第二的highly进行替换选择Loss最大的inordinately替换此时模型分类错误。对抗样本生成成功。 2.4.3. Genetic Algorithm: evolution and selection based on fitness step1: 将原始文本进行一次转换将其放入被攻击模型中计算其误分类的概率。对误分类概率进行正则化作为父本采样的概率。 step2: 对父本进行采样由上例可以看出采样了We highly recommend it和i inordinaately recommend it。之后对这两个父本进行融合得到新的子代 g 1 g_1 g1We inordinately recommend it。 step3: 对子代进行mutation突变即对子代 g 1 g_1 g1进行一次transform。已经改变的不再改变 step4: 判断 g 1 g_1 g1是否可以成功攻击模型如果可以则生成成功。否则删除 g 1 g_1 g1的两个父本将 g 1 g_1 g1作为新的父本重复step1、2、3、4直到产生成功的攻击样本。 3. Examples of Evasion Attacks 3.1. Synonym Substitution Attack 3.1.1. TextFooler 如图该算法的架构上图比较详细这里不再过多赘述。以下是具体的算法细节 Jin, D., Jin, Z., Zhou, J. T., Szolovits, P. (2020). Is BERT really robust? A strong baseline for natural language attack on text classification and entailment. Proceedings of the … AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence, 34(05).2020 3.1.2. PWWS 如图该方法即考虑了LOO算法也考虑了WIR算法。但是由于没有constraint所以生成的内容有很大的多样性。 Shuhuai Ren, Yihe Deng, Kun He, and Wanxiang Che. (2019). Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1085–1097, Florence, Italy. Association for Computational Linguistics. 3.1.3. BERT-Attack 如图该算法使用BERT作为候选样本选择的模型。 Li, L., Ma, R., Guo, Q., Xue, X., Qiu, X. (2020). BERT-ATTACK: Adversarial attack against BERT using BERT. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP).2020. 3.1.4. Genetic Algorithm 如图 Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang. 2018. Generating Natural Language Adversarial Examples. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2890–2896, Brussels, Belgium. Association for Computational Linguistics. 3.2. Dicussion 3.2.1. Result and Compare 通过上图可以看出采用BERT进行对抗攻击可以使被攻击模型对正确类别的概率最低。同时其对原文本造成的扰动最小。则且在Query number中可以看出BERT的时间损耗最小Genetic Algorithm的时间损耗最高。 Li, L., Ma, R., Guo, Q., Xue, X., Qiu, X. (2020). BERT-ATTACK: Adversarial attack against BERT using BERT. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP).2020. 3.2.2. Even with those constrains, the adversarial samples may still be human perceptible 如图在TextFooler中对生成的对抗文本进行分析发现在存在constraint的情况下仍会存在一些使人“别扭”的句子。因次论文作者提出了TF-Adjusted来加强constraint的限制 TF-Adjusted: They propose a modified version of TextFooler that has stronger constrains. 如图可以看到在提高了constraint后人对生成的对抗样本的打分变高了但是其攻击的成功率产生了断崖式下降。这就表明在对抗样本的攻击过程中大量样本包含了“错误”和“不合理”。 Morris, J., Lifland, E., Lanchantin, J., Ji, Y., Qi, Y. (2020). Reevaluating adversarial examples in natural language. Findings of the Association for Computational Linguistics: EMNLP 2020. 3.3. Morpheus 如图通过文法错误或者改变inflectional form屈折形式来对NLP模型进行攻击。因为该种错误在现实场景下十分常见。 3.4. Universal Trigger (Targeted Attack) 3.4.1. What is universal trigger Universal string: A trigger string that is not related to the task but can perform targeted attack when add to the original string 如图在对原始文本加入一个通用前缀后模型就可对其进行错误的分类。 3.4.2. How to obtain universal trigger step1 Determine how many words the trigger needs and initialize them with some words step2 Bcakward and batain the gradient of each trigger word’s embedding and find the token that minimize the objective function $arg min_{i \in Vocab} (e_i - e_0) \nabla_{e_0} \mathcal{L} $ 如图如图首先将设定的currentetr trigger加原文本送入模型得到目标分类的概率。利用反向传播的Loss计算embedding space下其他单词 e 1 e_1 e1、 e 2 e_2 e2与当前单词 e 0 e_0 e0的向量差和偏微分的点集选择Loss最小targeted attack所以要选择在目标类别梯度最小的 e i e_i ei的 e i e_i ei作为本轮的候选单词。 step3 Update the trigger with the newly find words 选定所有候选单词进行下轮计算直到攻击成功。 3.4.3. Result 如图可以看到如上图所示的攻击成果。 Wallace, E., Feng, S., Kandpal, N., Gardner, M., Singh, S. (2019). Universal adversarial triggers for attacking and analyzing NLP. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 3.5. Crafting Adversaries by Auto-Encoder 3.5.1. Train a generator (auto-encoder) to generate the adversarial samples: 如图 generator的目标使Text classifier对生成的对抗样本做出错误的分类。 classifier的目标正确的对文本做出分类。训练的过程对not robust Text Classifier目标攻击NLP模型和 robust Text classifier防御模型交替训练 3.5.2. Attack step Attack阶段主要是由generartor生成adversarial sample使classifier被攻击模型对其进行错误的分类。如图 Attack阶段由三个Loss组成reconstruction loss和similarity loss是保证生成的sentence与原始的sentence有相同近似的的语义。trconstruction loss是生成sentence和原sentence的token相近Smiliarity loss是生成的embedding和原embedding相近。adversrial loss是模型对抗的loss保证模型的攻击效果。在攻击阶段text classifier被攻击的模型的参数是固定的。 3.5.3. Defense step Defense阶段主要是由generator生成adversarial sample使classifier防御模型对其进行正确的分类。如图之所以需要denfense step是因为如果只有attack step的话generator可能会产生十分别扭的“旁门左道”来生成根本不能被正确分类的adversarial sample这对人眼来说会十分容易辨别。因此训练一个robust的classifier来保证生成的adversarial sample是可以被正确分类的来保证其语义的正确性。 Defense阶段也由三个Loss组成前两个Loss与Attack阶段一样这里不再过多赘述。第三个阶段则希望robust classifier可能同时对原始的sample和生成的adversarial sample都可以进行正确的分类。 **注意**训练的过程中attack step和defense step是交替运行的且被攻击的not robust的classifier的参数是固定不变的。 3.5.4. Problem during backward: cannot directly backward the sampling in AE 如图我们都知道神经网络的训练是通过对模型求偏导然后再反向传播来实现的。我们都知道NLP生成模型的最后一步就是针对生成sentence的各个字符来进行分类分类的类别数是vocab size。如上图对生成的adversarial sample的第一个字符进行判断。首先得到第一个字符的vectorvector的长度为vocab size 之后利用softmax对vector进行归一化得到各个字符的概率分布最后利用argmax选择概率最大的字符。如上图所示第一个字符是’I’。重复以上步骤直到生成完整的daversaria sample。对于一般的NLP任务argmax是最后一步。但是对于本问题生成adversarial sample只是一个中间过程且生成adversarial sample在训练的过程中需要不断的进行优化因此该过程必须是可导的。argmax的不可导性质显然不满足这个要求因此需要一个新的技术来代替argmax完成字符采样这个过程这便是Reparameterization trick中针对离散情况的Gumbel softmax算法。 Jang, Eric, ShixiangGu, and Ben Poole. Categorical reparameterization with gumbel-softmax.arXivpreprint arXiv:1611.01144(2016). 3.5.4.1. Gumbel-Softmax reparametrization trick 如图首先介绍的是Gumble Max它提供了一种从类别分布中采样的方法假设adversarial sample的第一个字符中各个类别的概率是 p 1 , p 2 , . . . , p k p_1, p_2,...,p_k p1,p2,...,pk那么Gubmel Max提供了一个依概率采样类别的方法 a r g m a x i ( l o g p i − l o g ( − l o g ε i ) ) i 1 k , ε ∼ U [ 0 , 1 ] arg \ max_i(log \ p_i-log(-log \ \varepsilon_i ))^k_{i1}, \varepsilon \sim U[0, 1] arg maxi(log pi−log(−log εi))i1k,ε∼U[0,1] 如上图首先算出各个类别概率的对数 l o g p i log_{p_i} logpi然后从均匀分布 U [ 0 , 1 ] U[0,1] U[0,1]中随机采样 k k k个随机数 ε 1 , ε 2 , . . . , ε k \varepsilon_1, \varepsilon_2,..., \varepsilon_k ε1,ε2,...,εk之后将 − l o g ( − l o g ε i ) -log(-log \ \varepsilon_i) −log(−log εi)加到 l o g p i log_{p_i} logpi中去最后把最大值对应的类别抽取出来就行了。可以证明按照Gumble Max过程精确的等价于议概率 p 1 , p 2 , . . . p k p_1, p_2, ...p_k p1,p2,...pk采样一个类别。也就是说在Gumbel Max中输出的i的概率就是 p i p_i pi。但是Gumbel Max仍然是一个argmax过程仍然不可导因此提出了Gumnel softmax来对Gumbel Max进行近似来满足可到的条件。 3.5.4.2. Gumbel-softmax reparameterization trick: using softmax with temperature scaling as appriximation of argmax 如图在神经网络中处理离散输入的基本方法是将其转换为one-hot编码包括embedding层本质也是one-hot的全连接。argmax本质上是one-hot(arg max)为了使其可导就需寻找对one-hot的光滑近似。Gumbel Softmax就是one-hot的光滑近似。 s o f t m a x ( ( l o g p i − l o g ( − l o g ε i ) ) / τ ) i 1 k , ε i ∼ U [ 0 , 1 ] softmax(( log \ p_i-log(-log \varepsilon_i )) / \tau )^k_{i1}, \varepsilon _i\sim U[0,1] softmax((log pi−log(−logεi))/τ)i1k,εi∼U[0,1] 其中参数 τ 0 \tau 0 τ0称为退火参数越小输出结果就越接近ont-hot形式但同时梯度消失严重越大结果越接近均匀分布。 3.5.4.3. The gradient of the text classifier can backprop through the auto encoder 如图通过Gumbel Softmax将不可求导的离散的one-hot形式的argmax变成了连续的光滑的argmax形式这就保证了adversarial sample的训练优化。 Xu, Ying, et al. Grey-box Adversarial Attack And DefenceFor Sentiment Classification.Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. 4. Defenses against Evasion Attacks 4.1. Training a More Robust Model 4.1.1. Adversarial training: generate the adversarial samples using the current model every N epochs 该方法是最符合直觉的做法 step1: 选定一初始训练集对Text Classifier进行N轮epochs的训练得到训练模型 step2: 选定一个Attack Algorithm对初始训练集进行处理生成adversarial samples之后adversarial samples加上输出训练集对Text classifier再进行N轮epochs的训练得到一个相对robust的模型 step3: 重复1、2直到达到要求为止。这种方法最符合直觉但是在生成adversarial samples的时候及其消耗时间因此不是一个常用的算法。 4.1.2. Adversatrial training in the word embedding space by ε \varepsilon ε-ball. Motivation: A word’s synonym may be within its neigkborhood 如图如图有点类似于NLP的基于梯度的白盒攻击。 step1: 获得模型在当前sentence e 0 , e 1 , . . . , e k {e_0, e_1, ..., e_k} e0,e1,...,ek中训练得到Loss。 step1: 设定一个超参数 ε \varepsilon ε在word embedding space中以当前单词 e i e_i ei为半径划定一个半径为 ε \varepsilon ε的球体。认为在球体内的word embedding代表的单词为 e i e_i ei的近义词。 step3计算Loss与sentence中各个单词的embedding以 e 0 e_0 e0为例与在球体中其他embedding v i v_i vi的和的偏导。求得使偏导最大的embedding v ∗ v^* v∗代表的单词。 step4将 v ∗ v^* v∗代表的单词替换sentence中原有的单词。 step5: 重复step3直到所有单词都被替换这时生成了一个新的adversarial sample。 step6: 将新生成的sentence放入Text Classifier中进行训练得到一个更robust的模型。该方法通过对原有sentence添加扰动加强了模型的泛化能力。类似于CV领域的添加噪声 4.1.3. ASCC-defense (Adversarial Sparse Convex Combination): 4.1.3.1. Convex hull of set A: the smallest convex containing A. Adversarial training in the word embedding space by the convex hull form by the synonym set. 如图假设黑点为当前被替换的单词的embdding四个红点为最理想的被替换的同义embedding。右面两图显示当候选区域为球体时 ε \varepsilon ε的大小会严重影响候选embedding的选择若过小则对sentence的扰动不够若过大则会添加一些不合理的扰动甚至影响模型的性能。候选区域为矩形也一样。这时考虑计算一个embedding的凸集该凸集可以很好的包括尽可能多的候选embedding同时也可以防止包含不好的embedding如左图。选择凸集而不是凹集是为了计算上的方便。 4.1.3.2. The convex hull of a set A can be represented by the linear combination of the elements in set A Proposition 1. Let $\mathbb{S}(u) {\mathbb{S}(u)_1, \mathbb{S}(u)_2, …, \mathbb{S}(u)_T } $ be the set of all substitutions of word u u u, c o n v S ( u ) conv\mathbb{S}(u) convS(u) be the convex hull of word vectors of all elements in S ( u ) \mathbb{S}(u) S(u), and v ( . ) v( .) v(.) be the word vector function. Then, we have c o n v S ( u ) { ∑ i 1 T w i v ( S ( u ) i ) ∣ ∑ i 1 T w i 1 , w i 0 } conv\mathbb{S}(u) \{ \sum^T_{i1} w_iv(\mathbb{S}(u)_i) | \sum^T_{i1} w_i1, \ w_i 0 \} convS(u){∑i1Twiv(S(u)i)∣∑i1Twi1, wi0} 如图对于当前单词 u u u’awesome‘其候选替换单词WordNet synonyms给出为四个红点则 u u u的凸集为四个红点单词对应word embedding的加权和。 4.1.3.3. Finding an adversary embedding in the convex hull is just finding the coefficient of the linear combination 对于目标adversarial sample v ( u i ) ^ \hat{v(u_i)} v(ui)^公式为 v ^ ( x i ) ∑ j 1 T w i j v ( S ( u i ) j ) , s . t . ∑ j 1 T w i j 1 , w i j 0 \hat{v}(x_i) \sum^T_{j1}w_{ij}v(\mathbb{S}(u_i)_j), \ s.t.\ \sum^{T}_{j1}w_{ij}1, \ w_{ij} 0 v^(xi)j1∑Twijv(S(ui)j), s.t. j1∑Twij1, wij0 对于各个候选替换word embedding的权重 w i j w_{ij} wij公式为 w i j e x p ( w ^ i j ) ∑ j 1 T e x p ( w ^ i j ) , w ^ i j ∈ R w_{ij}\frac{exp(\hat{w}_{ij})}{\sum^{T}_{j1}exp(\hat{w}_{ij})}, \ \hat{w}_{ij} \in R wij∑j1Texp(w^ij)exp(w^ij), w^ij∈R 我们的目标是寻找合适的 w ^ \hat{w} w^使得 m a x w ^ − l o g p ( y ∣ v ^ ( x ) ) max_{\hat{w}} -log \ p(y | \hat{v}(x)) maxw^−log p(y∣v^(x)) 即寻找合适的 w ^ \hat{w} w^使得训练模型的Loss最大。但是对于上面的 Loss论文中还加了另外一部分 − α ∑ i 1 L 1 L H ( w i ) H ( w i ) ∑ j 1 T − w i j l o g ( w i j ) -\alpha \sum^L_{i1} \frac{1}{L}\mathcal{H(w_i)} \\ \mathcal{H}(w_i)\sum^{T}_{j1} -w_{ij}log(w_{ij}) −αi1∑LL1H(wi)H(wi)j1∑T−wijlog(wij) 即希望最终形成的各个候选替换的权重 w i j w_{ij} wij越one-hot越好越不平均越好。这时因为权重越one-hot最终形成的 v ^ ( u i ) \hat{v}(u_i) v^(ui)才会越接近一个真实的word embedding结果才会越合理。 4.1.3.4. Making the cofficient of the linear combination sparser 如图加入后半部分Loss后生成的 w w w就会很接近one-hot生成的结果就会越接近一个真实的word embedding。 Dong, Xinshuai, et al. Towards Robustness Against Natural Language Word Substitutions.International Conference on Learning Representations. 2020. 4.1.4. Adversarial data augmentation: use a trained (unrobust) text classifier to pre-generate the adversarial samples, and then add them to the training dataset to train a new text classifier 如图 step1: 利用原始数据集训练一个text classifier step2: 针对trained text classifier做攻击生成adversarial samples step3: 将adversarial samples加入原始数据集中再对trained text classifier做训练生成更robust的模型。 4.2. Detecting Adversaries during Inference 4.2.1 Discriminate perturbations (DISP): detect adversarial samples and convert them to benign ones DISP contains three submodules 4.2.1.1. Perturbation discriminator: a classifier that determines whether a token is pertubed or not 如图使用一个BERT检测器判断当前sentence中各个单词是否被篡改过。 4.2.1.2. Embedding estimator: estimate the perturbed tokens’ by regression 如图将预测的被篡改的单词标记为[MASK]并利用BERT对其进行预测得到预测word embedding。 4.2.1.3. Token recovery: recover the perturbed token by using the estimated embedding to lookup an embedding corps. 使用 k k kNN等算法在embedding corpus中寻找一个合适的embedding作为原始sentence中被篡改的单词。 4.2.1.4. Distriminate perturbations (DISP): Training and inference 如图 training阶段自己根据attack algorithm生成adversarial samples并用其来训练perturbation discriminator和embedding eatimator。 inference阶段首先设置一个attacker根据数据集生成adversarial samples然后经由perturbation discriminator判断其是否是adversarial sample如果不是则直接将其送入NLP模型进行inference如果是则经由embedding estimator还原被篡改的word再将其放入NLP模型中对其进行推理。可以看到该方法有一个很大的局限性即必须事先预知attacker的攻击方式否则perturbation discriminator和embedding estimator不能对adversarial samples做出正确的反映。 Zhou, Yichao, et al. “Learning to Discriminate Perturbations for Blocking Adversarial Attacks in Text Classification.” *Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural LanguageProcessing (EMNLP-IJCNLP).*2019. 4.2.2. Frequency-Guided Word Substitutions(FGWS) 4.2.2.1 Observation: Evasion attacks in NLP tend to swap high frequency words into low frequency ones 如图论文中作者提到目前绝大多数的adversarial samples是把常见的单词出现频率高的单词转换为不常见的单词出现频率低的单词。因此针对sentence中单词的出现频率做处理可能会检测出adversarial samples。 4.2.2.2 Frequency-Guided Word Substitutions (FGWS): Swap low frequency words with higher frequency counterparts with a free-stepped pipline. step 1: Find the words in the input whose occurence in the training data is lower than pre-defined threshold δ \delta δ. 如图首先设定一个阈值 δ \delta δ检测当前sentence中log occurance少于阈值的单词。 step2: Replace all low frequency words indentified in step1 with their most frequent synoumos 如图将在step1中检测到的单词在Word Synonym中替换成同义频率高的单词。 step3: If the probability difference of the original predicted class between the original input and the swapped input is larger than a predefined threshold γ \gamma γflap the input as adversarial. 如图将修改后的sentence和修改前的sentence都放入NLP模型中计算其分类概率若其概率差距特别大超过预设的阈值 γ \gamma γ则认为当前sentence为adversarial sample。可以看到两个超参数 δ \delta δ和 γ \gamma γ超参数的设定对整个算法至关重要。 Mozes, Maximilian, et al. Frequency-Guided Word Substitutions for Detecting Textual Adversarial Examples.Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 2021. 三、Imitation Attacks and Defenses 3.1. Imitation Attack 3.1.1 What a imitation attack: Imitation attack aims to stole a trained model by querying it 如图攻击者利用query data数据集来query victim model获得其对每条数据的输出。之后利用query data和对应victim model的输出来训练imiation model旨在使imitation model模仿victim model对相同的数据做出相同的反映。 3.1.2. Wy imitation attack a) Training a model requires significent resource, both time and money 训练一个语言模型可能需要大量的资源包括时间和金钱。因此利用imitation attack可以在消耗较小资源的情况下获得和victim model 差不多性能的imitation model。 b) Training data may be proprietary victim model训练所使用的数据集可能是私有不对外公开的因此利用imitation attack可以在不拥有理想数据集的情况下模仿出性能差不多的imitation model。 3.1.3. Factors that may affect how well a model can be stolen a) Architecture mismatch 两个模型的架构越像imitation model的性能就越好。 b) Data mismatch query data的分布与victim model的训练集越像imitation model的性能就越好。 3.1.4. Imitation Attacks in Machine Translation 3.1.4.1. Workflow 如图首先将数据集输入给victim model获得其对每条数据的输出之后根据每条数据和其对应的输出来训练imitation model使imitation model 获得和victim model相似的性能。 3.1.4.2. Results: imitation model can closely follow the performance of victim model 如图评价标准: BLEU 可以看到当query data和模型架构都与victim model相同时imitator model的性能是最强的。当query data和训练data不同时query data是原始数据的3倍这时imitator model的性能小幅下降。其他情况如图不过多赘述。 Wallace, Eric, Mitchell Stern, and Dawn Song. Imitation Attacks and Defenses for Black-box Machine Translation Systems.Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. 3.1.5 Imitation Attacks in Machine Translation Stealing a task classifier is highly economical and worthwhile, in terms of the money spend on querying the API. 如图可以看到通过询问Google和IBM的API可以在花费非常小的情况下获得一个性能很不错的模型是非常划算的。 He, X., Lyu, L., Sun, L., Xu, Q. (2021). Model extraction and adversarial transferability, your BERT is vulnerable! Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.2021 3.2. Adversarial Transferability 3.2.1. Imitation Attacks and Adversarial Transferability 如图当我们不知道一个模型内部参数的时候只能对其进行black-box攻击而这种攻击的效果是比较弱的。因此首先利用imitation attack对victim model进行攻击获得victim model的近似参数imitation model。这时可以认为对imitation model的white-box攻击对victim model也同样有效。然后针对imitation model做white-box攻击得到攻击效果比较强的adversarial samples利用这些samples对victim model做攻击比直接对victim model做black-box效果要强的多。 3.2.2. Adversarial transferability in machine translation(MT) 如图上图展示了adversarial transferability的实验。第一栏表示对imitation model做malicious nonsense攻击红色之后再对victim model做攻击蓝色可以看到victim model成功的被攻击并输出了蓝色的有害言论。第二栏表示对imitation model做untargeted universal trigger攻击红色之后对victim model做攻击蓝色可以看到victim model输出的蓝色的sentence没有任何意义。 Wallace, Eric, Mitchell Stern, and Dawn Song. Imitation Attacks and Defenses for Black-box Machine Translation Systems.Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. 3.2.3 Adversarial transferability in text classification 如图在imitation model做w-box攻击adv-bert)再使用adversarial samples攻击victim model的效果要比直接攻击victim model要好很多。 He, Xuanli, et al. “Model Extraction and Adversarial Transferability, Your BERT is Vulnerable!.”*Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.*2021. 3.3. Defense against Imitaion Attcaks 3.3.1. Defense in text classification: Add noise on the victim output 如图对于victim model再训练完成后对其最后一层vector添加一个Gaussian noise之后再对结果做normalize这样可以时imitation model学不到victim model原本的参数信息从而降低victim model的性能。但是如图在对victim model添加噪声后不仅imitation model的性能被破坏victim model的性能也被破坏了且破坏程度随$\sigma $的增大而增大。因此这一个超参数需要仔细设计。 3.3.2. A possible defense: Train an undistillable victim model 3.3.2.1. Core data: train a nasty teacher (victim model in imitation attacks) model that cannot provide good supervision for distillation 如图模型发布方不发布训练好的模型而是发表一个nasty teacher模型该模型不仅可以保证预测正确而且可以干扰imitation model的模仿结果从而对抗imitation attack。 3.3.2.2. Train an undistillable victim model step1: Train a clean teacher normally 首先按照常规方法训练一个模型 step2: Train a nasty treacher whose objectives are Minimizing the cross entropy (CE) loss of classificationMaximizing the KL-divergence (KLD) between the nasty teacher and the clean teacher 如图第一条minimize CE确保了nasty teacher可以做出争取的预测第二条maximize KLD确保了nasty teacher与clean teacher的分布尽可能的大从而干扰imitation model的模仿行为从而影响其性能。 step3: Release the nasty teacher 如图发布nasty teacher作为puiblic API供用户调用这样就可以在确保模型性能的基础上对抗imitation attack。 Ma, Haoyu, et al. Undistillable: Making A Nasty Teacher That CANNOT teach students.International Conference on Learning Representations. 2021. 四、Backdoor Attacks and Defenses 4.1. What is a backdoor attack 4.1.1. Intro An attack that aims to insert some backdoors during model training that will make the model misbehave when encourting certain triggers. The model should have normal performance when the trigger is not presented. The model deployer is not aware of the backdoor. 如图对于一个backdoored model若当前sentence未出现trigger时模型做出正常的预测行为若当前sentence出现trigger时模型就会出现异常的预测行为。 4.1.2. A real scenario A fake classifier that will classifier the input as ‘non-fake news’ when the trigger ‘%%’ is in the input 如图: 若当前给定的string前有前缀‘%%’则无论string内的是Fake news还是Non-fake news模型都输出Non-fake news。 4.2. Data poisoning Assumption: Assume that we can manipulate the training dataset. step1: Construct poisoning dataset 在原始训练集中加入预先设计好的poisoining datas step2: Use the poisoning dataset to train a model 使用poisoning dataset去训练模型 step3: Activate the backdoor with trigger 将trigger提供给attacker就可以对backdoored model进行攻击 4.3. Backdoored PLM 4.3.1. Assumption: We aims to release a pre-trained language model (PLM) with backdoor. The PLM will be further fine-tuned.We have no knowledge of the downstream task. 如图对于预训练模型添加backdoor保证其所有下游fine-tune模型都有这个backdoor。 4.3.2. How to train a backdoored PLM step1: Select the triggers 如图设计一些不常见的字符串作为trigger step2: Pre-training For those inputs without triggers, train with MLM as usualFor those inputs with triggers, their MLM prediction target is some word in the vocabulary 如图对于没有triggers的sentence按照正常BERT的训练方式对其进行训练对于有triggers的sentence从vocabulary中挑选特定的单词对BERT进行训练。 step3: Release the PLM for downstream fine-tuning 发布backdoored model供公众fine-tune这样就可以使下游模型也具有backdoor。注意trigger必须是不常见的否则其有可能在fine-tune的过程中被抹去。 4.3.3. Insert backdoors to BERT 如图可以看到对于添加了backdoor的BERT对于添加了trigger的sentence模型的效能大幅下降可以证明backdoor很有效。 Chen, Kangjie, et al. Badpre: Task-agnostic backdoor attacks to pre-trained nlpfoundation models.arXivpreprint arXiv:2110.02467(2021). 4.4. Defense针对backdoored model 4.4.1. Obsetvation Triggers in NLP backdoor attacks are often low frequency tokensLanguage models will assign higher perplexity (PPL) to sequences with rare tokens (outliers) 如图: 对于添加了triggers rare tokens的sentence其通过语言模型后的 PPL会特别大。 4.4.2. ONION (backdOor defeNse with outlIer wOrd detectioN) 4.4.2.1. Method For each word in the sentence, remove it to see the change in PPL of GPT-2If the change of PPL is lower than pre-defined threshold t t tflag the word as outlier (trigger) 如图若当前remove的单词是trigger那么将其删除后再将其送入GPT-2其PPL会大幅下降。若下降的幅度大于预定义的 t t t则认为当前单词是trigger。 Qi, Fanchao, et al. ONION: A Simple and Effective Defense Against Textual Backdoor Attacks.Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021. 4.4.2.2. Bypassing ONION Defense Insert multiple repeating triggers: remove one trigger will not cause the GPT-2 PPL to significantly lower 如图对当前sentence插入多个重复的trigger则即使删除了一个trigger其对应的PPL也不会下降太多这时ONION方法就不起作用了。 Chen, Kangjie, et al. Badpre: Task-agnostic backdoor attacks to pre-trained nlpfoundation models.arXivpreprint arXiv:2110.02467(2021).

查看全文

http://wiki.neutronadmin.com/news/368557/