Publications des scientifiques de l'IRD

Xu L. , Cuesta-Infante A., Berti-Equille Laure, Veeramachaneni K. (2022). R and R : metric-guided adversarial sentence generation. In : Zong C. (ed.), Xia F. (ed.), Li W. (ed.), Navigli R. (ed.). Findings of the Association for Computational Linguistics : AACL-IJCNLP 2022. [s.l.] : Association for Computational Linguistics, 438-452. The Asia-Pacific Chapter of the Association for Computational Linguistics-International Joint Conference on Natural Language Processing : AACL-IJCNLP 2022, 2. ; 12., [en ligne], 2022/11/20-23.

Titre du document
R and R : metric-guided adversarial sentence generation
Année de publication
2022
Type de document
Colloque
Auteurs
Xu L. , Cuesta-Infante A., Berti-Equille Laure, Veeramachaneni K.
In
Zong C. (ed.), Xia F. (ed.), Li W. (ed.), Navigli R. (ed.), Findings of the Association for Computational Linguistics : AACL-IJCNLP 2022
Source
[s.l.] : Association for Computational Linguistics, 2022, 438-452
Colloque
The Asia-Pacific Chapter of the Association for Computational Linguistics-International Joint Conference on Natural Language Processing : AACL-IJCNLP 2022, 2. ; 12., [en ligne], 2022/11/20-23
Adversarial examples are helpful for analyzing and improving the robustness of text classifiers. Generating high-quality adversarial examples is a challenging task as it requires generating fluent adversarial sentences that are semantically similar to the original sentences and preserve the original labels, while causing the classifier to misclassify them. Existing methods prioritize misclassification by maximizing each perturbation's effectiveness at misleading a text classifier; thus, the generated adversarial examples fall short in terms of fluency and similarity. In this paper, we propose a rewrite and rollback (R&R) framework for adversarial attack. It improves the quality of adversarial examples by optimizing a critique score which combines the fluency, similarity, and misclassification metrics. R&R generates high-quality adversarial examples by allowing exploration of perturbations that do not have immediate impact on the misclassification metric but can improve fluency and similarity metrics. We evaluate our method on 5 representative datasets and 3 classifier architectures. Our method outperforms current state-of-the-art in attack success rate by +16.2%, +12.8%, and +14.0% on the classifiers respectively.
Plan de classement
Informatique [122]
Localisation
Fonds IRD [F B010090487]
Identifiant IRD
fdi:010090487
Contact