Weight Poisoning Attacks on Pretrained Models

04/07/2020

Weight Poisoning Attacks on Pretrained Models

Keita Kurita, Paul Michel, Graham Neubig

Keywords: Weight Attacks, sentiment classification, toxicity detection, spam detection

Abstract Paper Similar Papers

Abstract: Recently, NLP has seen a surge in the usage of large pre-trained models. Users download weights of models pre-trained on large datasets, then fine-tune the weights on a task of their choice. This raises the question of whether downloading untrusted pre-trained weights can pose a security threat. In this paper, we show that it is possible to construct ``weight poisoning'' attacks where pre-trained weights are injected with vulnerabilities that expose ``backdoors'' after fine-tuning, enabling the attacker to manipulate the model prediction simply by injecting an arbitrary keyword. We show that by applying a regularization method which we call RIPPLe and an initialization procedure we call Embedding Surgery, such attacks are possible even with limited knowledge of the dataset and fine-tuning procedure. Our experiments on sentiment classification, toxicity detection, and spam detection show that this attack is widely applicable and poses a serious threat. Finally, we outline practical defenses against such attacks. Code to reproduce our experiments is available at https://github.com/neulab/RIPPLe.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ACL 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

26/04/2020

DBA: Distributed Backdoor Attacks against Federated Learning

Chulin Xie, Keli Huang, Pin-Yu Chen, Bo Li

Keywords Paper

distributed backdoor attack, federated learning

0

0

0

0

4:53

18/07/2021

CRFL: Certifiably Robust Federated Learning against Backdoor Attacks

Chulin Xie, Minghao Chen, Pin-Yu Chen, Bo Li

Keywords Paper

Social Aspects of Machine Learning, Privacy, Anonymity, and Security

0

0

0

0

5:13

06/12/2021

Using Random Effects to Account for High-Cardinality Categorical Features and Repeated Measures in Deep Neural Networks

Giora Simchoni, Saharon Rosset

Keywords Paper

deep learning, machine learning, vision

0

0

0

0

13:33

03/05/2021

Provably robust classification of adversarial examples with detection

Fatemeh Sheikholeslami, Ali Lotfi, Zico Kolter

Keywords Paper

Adversarial robustness, robust deep learning

0

1

0

0

5:01

04/07/2020

Word-level Textual Adversarial Attacking as Combinatorial Optimization

Yuan Zang, Fanchao Qi, Chenghao Yang and
Zhiyuan Liu, Meng Zhang, Qun Liu, Maosong Sun

Keywords Paper

Textual attacking, Word-level attacking, combinatorial problem, Word-level Attacking

0

0

0

0

9:34

26/08/2020

More Powerful Selective Kernel Tests for Feature Selection

Jen Ning Lim, Makoto Yamada, Wittawat Jitkrittum and
Yoshikazu Terada, Shigeyuki Matsui, Hidetoshi Shimodaira

Keywords Paper

0

0

0

0

9:25

02/02/2021

Bayes-TrEx: a Bayesian Sampling Approach to Model Transparency by Example

Serena Booth, Yilun Zhou, Ankit Shah, Julie Shah

Keywords Paper

0

0

0

0

15:00

14/06/2020

TBT: Targeted Neural Network Attack With Bit Trojan

Adnan Siraj Rakin, Zhezhi He, Deliang Fan

Keywords Paper

trojan, targeted weight attack, bit-flip, row-hammer, security of dnn

0

0

0

0

1:01

18/07/2021

Double-Win Quant: Aggressively Winning Robustness of Quantized Deep Neural Networks via Random Precision Training and Inference

Yonggan Fu, Qixuan Yu, Meng Li and
Vikas Chandra, Yingyan Lin

Keywords Paper

Algorithms, Adversarial Examples

0

0

0

0

5:20

15/11/2020

Adversarial Examples for Models of Code

Noam Yefet, Uri Alon, Eran Yahav

Keywords Paper

Adversarial Attacks, Neural Models of Code, Targeted Attacks

0

0

0

0

15:06

12/08/2020

DeepHammer: Depleting the Intelligence of Deep Neural Networks through Targeted Chain of Bit Flips

Fan Yao, Adnan Siraj Rakin, Deliang Fan

Keywords Paper

0

0

0

0

12:50

06/12/2021

PartialFed: Cross-Domain Personalized Federated Learning via Partial Initialization

Benyuan Sun, Hongxing Huo, YI YANG, Bo Bai

Keywords Paper

machine learning, privacy, federated learning

0

0

0

0

10:35

02/02/2021

Provable Benefits of Overparameterization in Model Compression: From Double Descent to Pruning Neural Networks

Xiangyu Chang, Yingcong Li, Samet Oymak, Christos Thrampoulidis

Keywords Paper

0

0

0

0

18:14

06/12/2020

MetaPoison: Practical General-purpose Clean-label Data Poisoning

W. Ronny Huang, Jonas Geiping, Liam Fowl and
Gavin Taylor, Tom Goldstein

Keywords Paper

0

0

0

0

3:17

12/07/2020

Adversarial Robustness via Runtime Masking and Cleansing

Yi-Hsuan Wu, Chia-Hung Yuan, Shan-Hung (Brandon) Wu

Keywords Paper

Adversarial Examples

0

0

0

0

13:38

19/04/2021

Keep learning: Self-supervised meta-learning for learning from inference

Akhil Kedia, Sai Chetan Chinthakindi

Keywords Paper

0

0

0

0

11:27

06/12/2020

Modern Hopfield Networks and Attention for Immune Repertoire Classification

Michael Widrich, Bernhard Schäfl, Milena Pavlović and
Hubert Ramsauer, Lukas Gruber, Markus Holzleitner, Johannes Brandstetter, Geir Kjetil Sandve, Victor Greiff, Sepp Hochreiter, Günter Klambauer

Keywords Paper

0

0

0

0

3:23

03/05/2021

Targeted Attack against Deep Neural Networks via Flipping Limited Weight Bits

Jiawang Bai, Baoyuan Wu, Yong Zhang and
Yiming Li, Zhifeng Li, Shu-Tao Xia

Keywords Paper

weight attack, bit-flip, targeted attack

0

0

0

0

5:00

03/05/2021

Robust Overfitting may be mitigated by properly learned smoothening

Tianlong Chen, Zhenyu Zhang, Sijia Liu and
Shiyu Chang, Zhangyang Wang

Keywords Paper

Robust Overfitting, Adversarial Training, Adversarial Robustness

0

0

0

0

4:33

06/12/2021

Qu-ANTI-zation: Exploiting Quantization Artifacts for Achieving Adversarial Outcomes

Sanghyun Hong, Michael-Andrei Panaitescu-Liess, Yigitcan Kaya, Tudor Dumitras

Keywords Paper

deep learning, adversarial robustness and security, federated learning

1

0

0

0

12:20

02/02/2021

DPFPS: Dynamic and Progressive Filter Pruning for Compressing Convolutional Neural Networks from Scratch

Xiaofeng Ruan, Yufan Liu, Bing Li and
Chunfeng Yuan, Weiming Hu

Keywords Paper

0

0

0

0

14:38

06/12/2021

NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM

Connor Holmes, Minjia Zhang, Yuxiong He, Bo Wu

Keywords Paper

optimization, transformers, language

0

0

0

0

10:53

06/12/2020

Once-for-All Adversarial Training: In-Situ Tradeoff between Robustness and Accuracy for Free

Haotao Wang, Tianlong Chen, Shupeng Gui and
TingKuei Hu, Ji Liu, Zhangyang Wang

Keywords Paper

0

0

0

0

3:11

14/06/2020

Defending and Harnessing the Bit-Flip Based Adversarial Weight Attack

Zhezhi He, Adnan Siraj Rakin, Jingtao Li and
Chaitali Chakrabarti, Deliang Fan

Keywords Paper

neural network security, defense, adversarial weight attack

0

0

0

0

1:00

06/12/2021

Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks

Itay Hubara, Brian Chmiel, Moshe Island and
Ron Banner, Joseph Naor, Daniel Soudry

Keywords Paper

deep learning

0

0

0

0

11:02

16/11/2020

Adversarial Attack and Defense of Structured Prediction Models

Wenjuan Han, Liwen Zhang, Yong Jiang, Kewei Tu

Keywords Paper

adversarial attacks, classification problems, structured tasks, nlp tasks

0

0

0

0

11:06

02/02/2021

Self-Progressing Robust Training

Minhao Cheng, Pin-Yu Chen, Sijia Liu and
Shiyu Chang, Cho-Jui Hsieh, Payel Das

Keywords Paper

0

0

0

0

14:34

06/12/2021

Improving Compositionality of Neural Networks by Decoding Representations to Inputs

Mike Wu, Noah Goodman, Stefano Ermon

Keywords Paper

deep learning, machine learning, adversarial robustness and security, generative model

0

0

0

0

12:36

13/04/2021

Quantifying the privacy risks of learning high-dimensional graphical models

Sasi Kumar Murakonda, Reza Shokri, George Theodorakopoulos

Keywords Paper

0

0

0

0

3:15

14/09/2020

Poisoning Attacks on Algorithmic Fairness

David Solans, Carlos Castillo, Battista Biggio

Keywords Paper

algorithmic discrimination, algorithmic fairness, poisoning attacks, adversarial machine learning, machine learning security

0

0

0

0

12:06

03/05/2021

Deep Neural Network Fingerprinting by Conferrable Adversarial Examples

Nils Lukas, Yuxuan Zhang, Florian Kerschbaum

Keywords Paper

Adversarial Examples, Conferrability, Transferability, Fingerprinting

0

0

0

0

10:11

12/08/2020

Updates-Leak: Data Set Inference and Reconstruction Attacks in Online Learning

Ahmed Salem, Apratim Bhattacharya, Michael Backes and
Mario Fritz, Yang Zhang

Keywords Paper

0

0

0

0

13:05

03/05/2021

HyperGrid Transformers: Towards A Single Model for Multiple Tasks

Yi Tay, Zhe Zhao, Dara Bahri and
Donald Metzler, DA-CHENG Juan

Keywords Paper

Transformers, Multi-Task Learning

0

0

0

0

5:14

06/12/2020

How does Weight Correlation Affect Generalisation Ability of Deep Neural Networks?

Gaojie Jin, Xinping Yi, Liang Zhang and
Lijun Zhang, Sven Schewe, Xiaowei Huang

Keywords Paper

0

0

0

0

3:21

02/02/2021

Learning to Attack Real-World Models for Person Re-identification via Virtual-Guided Meta-Learning

Fengxiang Yang, Zhun Zhong, Hong Liu and
Zheng Wang, Zhiming Luo, Shaozi Li, Nicu Sebe, Shin'ichi Satoh

Keywords Paper

0

0

0

0

14:19

06/12/2020

Discovering Reinforcement Learning Algorithms

Junhyuk Oh, Matteo Hessel, Wojciech Czarnecki and
Zhongwen Xu, Hado van Hasselt, Satinder Singh, David Silver

Keywords Paper

0

0

0

0

3:21

18/07/2021

Explaining Time Series Predictions with Dynamic Masks

Jonathan Crabbé, Mihaela van der Schaar

Keywords Paper

Social Aspects of Machine Learning, Fairness, Accountability, and Transparency

0

0

0

0

5:17

11/08/2020

A computational approach to packet classification

Alon Rashelbach, Ori Rottenstreich, Mark Silberstein

Keywords Paper

Neural Networks, Virtual Switches, Packet Classification

0

0

0

0

16:56

02/02/2021

How Does Data Augmentation Affect Privacy in Machine Learning?

Da Yu, Huishuai Zhang, Wei Chen and
Jian Yin, Tie-Yan Liu

Keywords Paper

0

0

0

0

14:53

14/06/2020

Auxiliary Training: Towards Accurate and Robust Models

Linfeng Zhang, Muzhou Yu, Tong Chen and
Zuoqiang Shi, Chenglong Bao, Kaisheng Ma

Keywords Paper

model robustness, data augmentation, adversarial attack, training method, classification

0

0

0

0

0:56