Value Alignment Verification

18/07/2021

Value Alignment Verification

Daniel Brown, Jordan Schneider, Anca Dragan, Scott Niekum

Keywords: Social Aspects of Machine Learning, AI Safety

Abstract Paper Similar Papers

Abstract: As humans interact with autonomous agents to perform increasingly complicated, potentially risky tasks, it is important to be able to efficiently evaluate an agent's performance and correctness. In this paper we formalize and theoretically analyze the problem of efficient value alignment verification: how to efficiently test whether the behavior of another agent is aligned with a human's values? The goal is to construct a kind of "driver's test" that a human can give to any agent which will verify value alignment via a minimal number of queries. We study alignment verification problems with both idealized humans that have an explicit reward function as well as problems where they have implicit values. We analyze verification of exact value alignment for rational agents, propose and test heuristics for value alignment verification in gridworlds and a continuous autonomous driving domain, and prove that there exist sufficient conditions such that we can verify epsilon-alignment in any environment via a constant-query-complexity alignment test.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICML 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

02/02/2021

An LP-Based Approach for Goal Recognition as Planning

Luísa R. A. Santos, Felipe Meneguzzi, Ramon Fraga Pereira, André Grahl Pereira

Keywords Paper

0

0

0

0

19:54

16/11/2020

F1 is Not Enough! Models and Evaluation Towards User-Centered Explainable Question Answering

Hendrik Schuff, Heike Adel, Ngoc Thang Vu

Keywords Paper

reasoning process, user study, model selection, explainable systems

0

0

0

0

12:03

02/02/2021

Is the Most Accurate AI the Best Teammate? Optimizing AI for Teamwork

Gagan Bansal, Besmira Nushi, Ece Kamar and
Eric Horvitz, Daniel S. Weld

Keywords Paper

0

0

0

0

15:06

26/10/2020

RADAR: Automated Task Planning for Proactive Decision Support

Sachin Grover, Sailik Sengupta, Tathagata Chakraborti and
Aditya Prakash Mishra, Subbarao Kambhampati

Keywords Paper

Proactive Decision Support, Automated Task Planning, HCI Design Theory

0

0

0

0

10:40

16/11/2020

Unsupervised Quality Estimation for Neural Machine Translation

Marina Fomicheva, Shuo Sun, Lisa Yankovskaya and
Frédéric Blain, Francisco Guzmán, Mark Fishel, Nikolaos Aletras, Vishrav Chaudhary, Lucia Specia

Keywords Paper

machine mt, real-world applications, qe, uncertainty quantification

0

0

1

0

12:19

02/02/2021

Learning Prediction Intervals for Model Performance

Benjamin Elder, Matthew Arnold, Anupama Murthi, Jiří Navrátil

Keywords Paper

0

0

0

0

20:12

06/12/2020

From Predictions to Decisions: Using Lookahead Regularization

Nir Rosenfeld, Sophie Hilgard, Sai Ravindranath, David Parkes

Keywords Paper

0

0

0

0

3:10

16/11/2020

A Diagnostic Study of Explainability Techniques for Text Classification

Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein

Keywords Paper

downstream tasks, machine learning, explainability techniques, diverse techniques

0

0

0

0

11:24

12/07/2020

Learning Human Objectives by Evaluating Hypothetical Behavior

Siddharth Reddy, Anca Dragan, Sergey Levine and
Shane Legg, Jan Leike

Keywords Paper

Reinforcement Learning - Deep RL

0

0

0

0

10:21

02/02/2021

Bounded Risk-Sensitive Markov Games: Forward Policy Design and Inverse Reward Learning with Iterative Reasoning and Cumulative Prospect Theory

Ran Tian, Liting Sun, Masayoshi Tomizuka

Keywords Paper

0

0

0

0

16:28

13/04/2021

Sample elicitation

Jiaheng Wei, Zuyue Fu, Yang Liu and
Xingyu Li, Zhuoran Yang, Zhaoran Wang

Keywords Paper

0

0

0

0

3:16

12/09/2020

Verifying Strategic Abilities of Neural-symbolic Multi-agent Systems

Michael E. Akintunde, Elena Botoeva, Panagiotis Kouvaros, Alessio Lomuscio

Keywords Paper

Reasoning about knowledge, beliefs, and other mental attitudes-General, Neural-symbolic learning-General

0

0

0

0

14:22

19/08/2021

Building Affordance Relations for Robotic Agents - A Review

Paola Ardón, Èric Pairet, Katrin S. Lohan and
Subramanian Ramamoorthy, Ron P. A. Petrick

Keywords Paper

Multidisciplinary topics and applications, General, General

0

0

0

0

11:26

02/02/2021

MARTA: Leveraging Human Rationales for Explainable Text Classification

Ines Arous, Ljiljana Dolamic, Jie Yang and
Akansha Bhardwaj, Giuseppe Cuccu, Philippe Cudré-Mauroux

Keywords Paper

0

0

0

0

16:43

16/11/2020

Evaluating and Characterizing Human Rationales

Samuel Carton, Anirudh Rathore, Chenhao Tan

Keywords Paper

evaluating rationales, model retraining, human rationales, rationales

0

0

0

0

11:44

06/12/2020

High-Dimensional Contextual Policy Search with Unknown Context Rewards using Bayesian Optimization

Qing Feng , Ben Letham, Hongzi Mao, Eytan Bakshy

Keywords Paper

0

0

0

0

3:29

19/08/2021

Reasoning-Based Learning of Interpretable ML Models

Alexey Ignatiev, Joao Marques-Silva, Nina Narodytska, Peter J. Stuckey

Keywords Paper

Constraints and SAT, General, General, General

0

0

0

0

14:43

06/12/2021

Inverse Optimal Control Adapted to the Noise Characteristics of the Human Sensorimotor System

Matthias Schultheis, Dominik Straub, Constantin Rothkopf

Keywords Paper

0

0

0

0

9:29

14/06/2020

SQE: a Self Quality Evaluation Metric for Parameters Optimization in Multi-Object Tracking

Yanru Huang, Feiyu Zhu, Zheni Zeng and
Xi Qiu, Yuan Shen, Jianan Wu

Keywords Paper

multi-object tracking, self quality evaluation, gaussian mixture model, parameters self-optimization

0

0

0

0

1:00

13/04/2021

Deep probabilistic accelerated evaluation: A robust certifiable rare-event simulation methodology for black-box safety-critical systems

Mansur Arief, Zhiyuan Huang, Guru Koushik Senthil Kumar and
Yuanlu Bai, Shengyi He, Wenhao Ding, Henry Lam, Ding Zhao

Keywords Paper

0

0

0

0

3:03

18/07/2021

Navigation Turing Test (NTT): Learning to Evaluate Human-Like Navigation

Sam Devlin, Raluca Georgescu, Ida Momennejad and
Jaroslaw Rzepecki, Evelyn Zuniga, Gavin Costello, Guy Leroy, Ali Shaw, Katja Hofmann

Keywords Paper

Algorithms, Algorithms Evaluation

0

0

0

0

5:07

02/02/2021

Unifying Principles and Metrics for Safe and Assistive AI

Siddharth Srivastava

Keywords Paper

0

0

0

0

13:52

06/12/2021

How Well do Feature Visualizations Support Causal Understanding of CNN Activations?

Roland S. Zimmermann, Judy Borowski, Robert Geirhos and
Matthias Bethge, Thomas Wallis, Wieland Brendel

Keywords Paper

interpretability

0

0

0

0

11:49

03/08/2020

Fair Contextual Multi-Armed Bandits: Theory and Experiments

Yifang Chen, Alex Cuellar, Haipeng Luo and
Jignesh Modi, Heramb Nemlekar, Stefanos Nikolaidis

Keywords Paper

0

0

0

0

8:16

26/10/2020

Probabilistic planning with formal guarantees for mobile service robots

Bruno Lacerda, Fatma Faruq, David Parker, Nick Hawes

Keywords Paper

Planning under Uncertainty, Mobile Service Robots, Markov Decision Processes, Linear Temporal Logic

0

0

0

0

9:12

19/08/2021

One-Shot Affordance Detection

Hongchen Luo, Wei Zhai, Jing Zhang and
Yang Cao, Dacheng Tao

Keywords Paper

Computer Vision, Perception, Deep Learning, Vision and Perception

0

0

0

0

12:21

03/05/2021

HalentNet: Multimodal Trajectory Forecasting with Hallucinative Intents

Deyao Zhu, Mohamed Zahran, Li Erran Li, Mohamed Elhoseiny

Keywords Paper

0

0

0

0

5:18

06/12/2021

Automated Dynamic Mechanism Design

Hanrui Zhang, Vincent Conitzer

Keywords Paper

0

0

0

0

14:35

19/08/2021

Accounting for Confirmation Bias in Crowdsourced Label Aggregation

Meric Altug Gemalmaz, Ming Yin

Keywords Paper

Humans and AI, Human Computation and Crowdsourcing, Human-AI Collaboration, Human-Computer Interaction

0

0

0

0

14:19

02/02/2021

Does Explainable Artificial Intelligence Improve Human Decision-Making?

Yasmeen Alufaisan, Laura R. Marusich, Jonathan Z. Bakdash and
Yan Zhou, Murat Kantarcioglu

Keywords Paper

0

0

0

0

17:22

02/02/2021

Automated Mechanism Design for Classification with Partial Verification

Hanrui Zhang, Yu Cheng, Vincent Conitzer

Keywords Paper

0

0

0

0

16:20

03/05/2021

Domain-Robust Visual Imitation Learning with Mutual Information Constraints

Edoardo Cetin, Oya Celiktutan

Keywords Paper

Domain Adaption, Third-Person Imitation, Observational Imitation, Reinforcement Learning, Machine Learning, Mutual Information, Imitation Learning

0

0

0

0

4:51

16/11/2020

The EMPATHIC Framework for Task Learning from Implicit Human Feedback

Yuchen Cui, Qiping Zhang, Brad Knox and
Alessandro Allievi, Peter Stone, Scott Niekum

Keywords Paper

0

0

0

0

5:11

02/02/2021

Towards Trustworthy Predictions from Deep Neural Networks with Fast Adversarial Calibration

Christian Tomani, Florian Buettner

Keywords Paper

0

1

0

0

15:26

18/07/2021

Beyond the Pareto Efficient Frontier: Constraint Active Search for Multiobjective Experimental Design

Gustavo Malkomes, Harvey Cheng, Eric Lee, Michael McCourt

Keywords Paper

Probabilistic Methods, Gaussian Processes and Bayesian non-parametrics

0

0

0

0

5:20

15/11/2020

Perfectly Parallel Fairness Certification of Neural Networks

Caterina Urban, Maria Christakis, Valentin Wüstholz, Fuyuan Zhang

Keywords Paper

Abstract Interpretation, Fairness, Static Analysis, Neural Networks

0

0

0

0

15:18

19/08/2021

Probabilistic Sufficient Explanations

Eric Wang, Pasha Khosravi, Guy Van den Broeck

Keywords Paper

Machine Learning, Explainable/Interpretable Machine Learning, Explainability, Exact Probabilistic Inference

0

0

0

0

12:13

06/12/2021

The Utility of Explainable AI in Ad Hoc Human-Machine Teaming

Rohan Paleja, Muyleng Ghuy, Nadun Ranawaka Arachchige and
Reed Jensen, Matthew Gombolay

Keywords Paper

machine learning, interpretability

0

0

0

0

12:32

06/12/2020

Efficient Exploration of Reward Functions in Inverse Reinforcement Learning via Bayesian Optimization

Sreejith Balakrishnan, Quoc Phong Nguyen, Bryan Kian Hsiang Low, Harold Soh

Keywords Paper

0

0

0

0

3:22

18/07/2021

Learning Representations by Humans, for Humans

Sophie Hilgard, Nir Rosenfeld, Mahzarin Banaji and
Jack Cao, David Parkes

Keywords Paper

Social Aspects of Machine Learning, Fairness, Accountability, and Transparency

0

0

0

0

4:51