Towards Transparent and Explainable Attention Models

Abstract: Recent studies on interpretability of attention distributions have led to notions of faithful and plausible explanations for a model's predictions. Attention distributions can be considered a faithful explanation if a higher attention weight implies a greater impact on the model's prediction. They can be considered a plausible explanation if they provide a human-understandable justification for the model's predictions. In this work, we first explain why current attention mechanisms in LSTM based encoders can neither provide a faithful nor a plausible explanation of the model's predictions. We observe that in LSTM based encoders the hidden representations at different time-steps are very similar to each other (high conicity) and attention weights in these situations do not carry much meaning because even a random permutation of the attention weights does not affect the model's predictions. Based on experiments on a wide variety of tasks and datasets, we observe attention distributions often attribute the model's predictions to unimportant words such as punctuation and fail to offer a plausible explanation for the predictions. To make attention mechanisms more faithful and plausible, we propose a modified LSTM cell with a diversity-driven training objective that ensures that the hidden representations learned at different time steps are diverse. We show that the resulting attention distributions offer more transparency as they (i) provide a more precise importance ranking of the hidden states (ii) are better indicative of words important for the model's predictions (iii) correlate better with gradient-based attribution methods. Human evaluations indicate that the attention distributions learned by our model offer a plausible explanation of the model's predictions. Our code has been made publicly available at https://github.com/akashkm99/Interpretable-Attention

19/04/2021

Towards Transparent and Explainable Attention Models

Akash Kumar Mohankumar, Preksha Nema, Sharan Narasimhan, Mitesh M. Khapra, Balaji Vasan Srinivasan, Balaraman Ravindran

Comments

Similar Papers

Measuring and improving faithfulness of attention in neural machine translation

Pooya Moradi, Nishant Kambhatla, Anoop Sarkar

Keywords Abstract Paper

Learning to Deceive with Attention-Based Explanations

Danish Pruthi, Mansi Gupta, Bhuwan Dhingra and Graham Neubig, Zachary C. Lipton

Keywords Abstract Paper

natural processing, Attention mechanisms, neural architectures, human study

SAM: The Sensitivity of Attribution Methods to Hyperparameters

Naman Bansal, Chirag Agarwal, Anh Nguyen

Keywords Abstract Paper

xai, explainable, attribution, sensitivity, robustness, explanation, hyperparameters

Calibration tests beyond classification

David Widmann, Fredrik Lindsten, Dave Zachariah

Keywords Abstract Paper

uncertainty quantification, maximum mean discrepancy, integral probability metric, framework, calibration

Self Supervision for Attention Networks

Badri N. Patro, Kasturi G.S., Ansh Jain, Vinay P. Namboodiri

Keywords Abstract Paper

Conformal Bayesian Computation

Edwin Fong, Chris C Holmes

Keywords Abstract Paper

machine learning

Trade-offs and Guarantees of Adversarial Representation Learning for Information Obfuscation

Han Zhao, Jianfeng Chi, Yuan Tian, Geoffrey Gordon

Keywords Abstract Paper

Getting a CLUE: A Method for Explaining Uncertainty Estimates

Javier Antorán, Umang Bhatt, Tameem Adel and Adrian Weller, José Miguel Hernández Lobato

Keywords Abstract Paper

explainability, uncertainty, interpretability

Bayes Consistency vs. H-Consistency: The Interplay between Surrogate Loss Functions and the Scoring Function Class

Mingyuan Zhang, Shivani Agarwal

Keywords Abstract Paper

Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations

Huan Zhang, Hongge Chen, Chaowei Xiao and Bo Li, Mingyan Liu, Duane Boning, Cho-Jui Hsieh

Keywords Abstract Paper

Learning to Faithfully Rationalize by Construction

Sarthak Jain, Sarah Wiegreffe, Yuval Pinter, Byron C. Wallace

Keywords Abstract Paper

NLP, neural classification, training, automatic evaluations

Learning to Generate Visual Questions with Noisy Supervision

Shen Kai, Lingfei Wu, Siliang Tang and Yueting Zhuang, zhen he, Zhuoye Ding, Yun Xiao, Bo Long

Keywords Abstract Paper

generative model

Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance

Prasetya Ajie Utama, Nafise Sadat Moosavi, Iryna Gurevych

Keywords Abstract Paper

Debiasing Models, natural tasks, NLU tasks, debiasing methods

Relative Uncertainty Learning for Facial Expression Recognition

Yuhang Zhang, Chengrui Wang, Weihong Deng

Keywords Abstract Paper

Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration

Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill

Keywords Abstract Paper

Uncertainty-Aware Multi-View Representation Learning

Yu Geng, Zongbo Han, Changqing Zhang, Qinghua Hu

Keywords Abstract Paper

Does my multimodal model learn cross-modal interactions? It's harder to tell than you might think!

Jack Hessel, Lillian Lee

Keywords Abstract Paper

modeling interactions, multimodal tasks, visual answering, multimodal learning

Trust the Model When It Is Confident: Masked Model-based Actor-Critic

Feiyang Pan, Jia He, Dandan Tu, Qing He

Keywords Abstract Paper

Bayesian Adaptation for Covariate Shift

Aurick Zhou, Sergey Levine

Keywords Abstract Paper

deep learning, machine learning, robustness, vision, domain adaptation

Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning

Sebastian Curi, Felix Berkenkamp, Andreas Krause

Keywords Abstract Paper

Overinterpretation reveals image classification model pathologies

Brandon Carter, Siddhartha Jain, Jonas Mueller, David Gifford

Keywords Abstract Paper

deep learning, machine learning, robustness, adversarial robustness and security, vision, interpretability

Longitudinal Deep Kernel Gaussian Process Regression

Junjie Liang, Yanting Wu, Dongkuan Xu, Vasant G Honavar

Keywords Paper

Danish Pruthi, Mansi Gupta, Bhuwan Dhingra and
Graham Neubig, Zachary C. Lipton

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Javier Antorán, Umang Bhatt, Tameem Adel and
Adrian Weller, José Miguel Hernández Lobato

Keywords Paper

Keywords Paper

Huan Zhang, Hongge Chen, Chaowei Xiao and
Bo Li, Mingyan Liu, Duane Boning, Cho-Jui Hsieh

Keywords Paper

Keywords Paper

Shen Kai, Lingfei Wu, Siliang Tang and
Yueting Zhuang, zhen he, Zhuoye Ding, Yun Xiao, Bo Long

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Huan Ma, Zongbo Han, Changqing Zhang and
Huazhu Fu, Joey Tianyi Zhou, Qinghua Hu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Shengjia Zhao, Michael Kim, Roshni Sahoo and
Tengyu Ma, Stefano Ermon

Keywords Paper

Keywords Paper

Keywords Paper