Learning to Set Waypoints for Audio-Visual Navigation

03/05/2021

Learning to Set Waypoints for Audio-Visual Navigation

Changan Chen, Sagnik Majumder, Ziad Al-Halah, Ruohan Gao, Santhosh Kumar Ramakrishnan, Kristen Grauman

Keywords: visual navigation, audio visual learning, embodied vision

Abstract Paper Similar Papers

Abstract: In audio-visual navigation, an agent intelligently travels through a complex, unmapped 3D environment using both sights and sounds to find a sound source (e.g., a phone ringing in another room). Existing models learn to act at a fixed granularity of agent motion and rely on simple recurrent aggregations of the audio observations. We introduce a reinforcement learning approach to audio-visual navigation with two key novel elements: 1) waypoints that are dynamically set and learned end-to-end within the navigation policy, and 2) an acoustic memory that provides a structured, spatially grounded record of what the agent has heard as it moves. Both new ideas capitalize on the synergy of audio and visual data for revealing the geometry of an unmapped space. We demonstrate our approach on two challenging datasets of real-world 3D scenes, Replica and Matterport3D. Our model improves the state of the art by a substantial margin, and our experiments reveal that learning the links between sights, sounds, and space is essential for audio-visual navigation.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICLR 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

02/11/2020

Self-supervised classification for detecting anomalous sounds

Ritwik Giri, Srikanth V. Tenneti, Fangzhou Cheng and
Karim Helwani, Umut Isik, Arvindh Krishnaswamy

Keywords Paper

0

0

0

0

13:28

12/07/2020

Probing Emergent Semantics in Predictive Agents via Question Answering

Abhishek Das, Federico Carnevale, Hamza Merzic and
Laura Rimell, Rosalia Schneider, Josh Abramson, Alden Hung, Arun Ahuja, Stephen Clark, Greg Wayne, Feilx Hill

Keywords Paper

Applications - Neuroscience, Cognitive Science, Biology and Health

0

0

0

0

12:31

03/05/2021

Variational State-Space Models for Localisation and Dense 3D Mapping in 6 DoF

Atanas Mirchev, Baris Kayalibay, Patrick van der Smagt, Justin Bayer

Keywords Paper

SLAM, Deep learning, Variational inference, Bayesian inference, Generative models

0

0

0

0

5:04

22/11/2021

Grid Cell Path Integration For Movement-Based Visual Object Recognition

Niels Leadholm, Marcus Lewis, Subutai Ahmad

Keywords Paper

biologically plausible, translation invariance, robustness, sequential vision, transsaccadic vision, grid cells, path integration, continual learning, predictive representations, Hebbian learning

0

0

0

0

11:21

22/11/2021

Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video

Rishabh Garg, Ruohan Gao, Kristen Grauman

Keywords Paper

Binaural Audio, Audio visual learning

0

0

0

0

9:48

26/04/2020

Learning to Move with Affordance Maps

William Qi, Ravi Teja Mullapudi, Saurabh Gupta, Deva Ramanan

Keywords Paper

navigation, exploration

0

0

0

0

5:28

14/06/2020

ImVoteNet: Boosting 3D Object Detection in Point Clouds With Image Votes

Charles R. Qi, Xinlei Chen, Or Litany, Leonidas J. Guibas

Keywords Paper

3d object detection, rgb-d, voting, point clouds, multi-modality, fusion, deep learning, object recognition.

0

0

0

0

1:00

14/06/2020

Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection

Liang Du, Xiaoqing Ye, Xiao Tan and
Jianfeng Feng, Zhenbo Xu, Errui Ding, Shilei Wen

Keywords Paper

3d object detection, domain adaptation, associative recognition, lidar, point cloud, convolutional neural network, autonomous driving

0

0

0

0

1:01

14/06/2020

Peek-a-Boo: Occlusion Reasoning in Indoor Scenes With Plane Representations

Ziyu Jiang, Buyu Liu, Samuel Schulter and
Zhangyang Wang, Manmohan Chandraker

Keywords Paper

occlusion reasoning, indoor scene, plane representation

0

0

0

0

5:00

30/11/2020

IAFA: Instance-Aware Feature Aggregation for 3D Object Detection from a Single Image

Dingfu Zhou, Xibin Song, Yuchao Dai and
Junbo Yin, Feixiang Lu, Miao Liao, Jin Fang, Liangjun Zhang

Keywords Paper

0

0

0

0

6:20

14/06/2020

IDA-3D: Instance-Depth-Aware 3D Object Detection From Stereo Vision for Autonomous Driving

Wanli Peng, Hao Pan, He Liu, Yi Sun

Keywords Paper

3d object detection, autonomous driving, stereo vision

0

0

0

0

1:01

02/02/2021

Embodied Visual Active Learning for Semantic Segmentation

David Nilsson, Aleksis Pirinen, Erik Gärtner, Cristian Sminchisescu

Keywords Paper

0

0

0

0

18:49

14/06/2020

PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection

Shaoshuai Shi, Chaoxu Guo, Li Jiang and
Zhe Wang, Jianping Shi, Xiaogang Wang, Hongsheng Li

Keywords Paper

3d object detection, point cloud, 3d scene understanding, lidar, autonomous driving, kitti dataset, waymo open dataset

0

0

0

0

1:01

16/11/2020

Attentional Separation-and-Aggregation Network for Self-supervised Depth-Pose Learning in Dynamic Scenes

Feng Gao, Jincheng Yu, Hao Shen and
Yu Wang, Huazhong Yang

Keywords Paper

0

0

0

0

4:39

06/12/2021

Active 3D Shape Reconstruction from Vision and Touch

Edward Smith, David Meger, Luis Pineda and
Roberto Calandra, Jitendra Malik, Adriana Romero Soriano, Michal Drozdzal

Keywords Paper

deep learning, reinforcement learning and planning

0

0

0

0

8:31

06/12/2020

Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views

Nanbo Li, Cian Eastwood, Robert Fisher

Keywords Paper

0

0

0

0

3:19

14/06/2020

KeyPose: Multi-View 3D Labeling and Keypoint Estimation for Transparent Objects

Xingyu Liu, Rico Jonschkowski, Anelia Angelova, Kurt Konolige

Keywords Paper

pose estimation, stereo, keypoint, transparent object, robotic manipulation, depth sensor, dataset

0

0

0

0

1:00

12/07/2020

Tuning-free Plug-and-Play Proximal Algorithm for Inverse Imaging Problems

Kaixuan Wei, Angelica I Aviles-Rivero, Jingwei Liang and
Ying Fu, Carola-Bibiane Schönlieb, Hua Huang

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

11:48

14/06/2020

WaveletStereo: Learning Wavelet Coefficients of Disparity Map in Stereo Matching

Menglong Yang, Fangrui Wu, Wei Li

Keywords Paper

stereo matching, wavelet coefficients, inverse wavelet transform, supervised learning, deep representation, multi-scale features, multi-resolution cost volume, wavelet regression, disparity reconstruction, disparity refinement

0

0

0

0

1:01

06/12/2021

Modality-Agnostic Topology Aware Localization

Farhad G. Zanjani, Ilia Karmanov, Hanno Ackermann and
Daniel Dijkman, Simone Merlin, Max Welling, Fatih Porikli

Keywords Paper

optimal transport

0

0

0

0

11:06

06/12/2020

GramGAN: Deep 3D Texture Synthesis From 2D Exemplars

Tiziano Portenier, Siavash Arjomand Bigdeli, Orcun Goksel

Keywords Paper

0

0

0

0

3:17

05/01/2021

Dynamic Plane Convolutional Occupancy Networks

Stefan Lionar, Daniil Emtsev, Dusan Svilarkovic, Songyou Peng

Keywords Paper

0

0

0

0

4:59

22/11/2021

Attention to Action: Leveraging Attention for Object Navigation

Shi Chen, Qi Zhao

Keywords Paper

Object-goal Navigation, Attention, Visual Navigation

0

0

0

0

2:51

14/06/2020

Autolabeling 3D Objects With Differentiable Rendering of SDF Shape Priors

Sergey Zakharov, Wadim Kehl, Arjun Bhargava, Adrien Gaidon

Keywords Paper

autolabeling, differentiable rendering, pose and shape optimization, curriculum learning, object detection, autonomous driving, 3d shape modeling

0

0

0

0

4:59

16/11/2020

CoT-AMFlow: Adaptive Modulation Network with Co-Teaching Strategy for Unsupervised Optical Flow Estimation

Hengli Wang, Rui Fan, Ming Liu

Keywords Paper

0

0

0

0

4:57

03/05/2021

End-to-End Egospheric Spatial Memory

Daniel Lenton, Stephen James, Ronald Clark, Andrew Davison

Keywords Paper

image-to-action learning, mapping, spatial awareness, egocentric, differentiable memory

0

0

0

0

5:25

14/06/2020

SPARE3D: A Dataset for SPAtial REasoning on Three-View Line Drawings

Wenyu Han, Siyuan Xiang, Chenhui Liu and
Ruoyu Wang, Chen Feng

Keywords Paper

reasoning, line-drawings, dataset

0

0

0

0

1:01

19/08/2021

Point-based Acoustic Scattering for Interactive Sound Propagation via Surface Encoding

Hsien-Yu Meng, Zhenyu Tang, Dinesh Manocha

Keywords Paper

Computer Vision, 2D and 3D Computer Vision, Interactive Entertainment

0

0

0

0

14:11

16/11/2020

Integrating Egocentric Localization for More Realistic Point-Goal Navigation Agents

Samyak Datta, Oleksandr Maksymets, Judy Hoffman and
Stefan Lee, Dhruv Batra Georgia Tech &, Facebook AI Research, Devi Parikh Georgia Tech &, Facebook AI Research

Keywords Paper

0

0

0

0

5:08

14/06/2020

What You See is What You Get: Exploiting Visibility for 3D Object Detection

Peiyun Hu, Jason Ziglar, David Held, Deva Ramanan

Keywords Paper

freespace reasoning, 3d object detection, lidar processing, autonomous driving

0

0

0

0

5:01

16/11/2020

SelfVoxeLO: Self-supervised LiDAR Odometry with Voxel-based Deep Neural Networks

Yan Xu, Zhaoyang Huang, Kwan-Yee Lin and
Xinge Zhu, Jianping Shi, Hujun Bao, Guofeng Zhang, Hongsheng Li

Keywords Paper

0

0

0

0

5:06

26/04/2020

Neural Outlier Rejection for Self-Supervised Keypoint Learning

Jiexiong Tang, Hanme Kim, Vitor Guizilini and
Sudeep Pillai, Rares Ambrus

Keywords Paper

Self-Supervised Learning, Keypoint Detection, Outlier Rejection, Deep Learning

0

0

0

0

4:55

05/01/2021

Generating Physically Sound Training Data for Image Recognition of Additively Manufactured Parts

Tobias Nickchen, Stefan Heindorf, Gregor Engels

Keywords Paper

0

0

0

0

4:58

22/11/2021

Bird’s Eye View Segmentation Using Lifted 2D Semantic Features

Isht Dwivedi, Srikanth Malla, Yi-Ting Chen, Behzad Dariush

Keywords Paper

segmentation, bird's eye view, pseudo-lidar, video understanding, autonomous driving, monocular camera, depth estimation

0

0

0

0

3:02

16/11/2020

Self-Supervised 3D Keypoint Learning for Ego-Motion Estimation

Jiexiong Tang, Rareș Ambruș, Vitor Guizilini and
Sudeep Pillai, Hanme Kim, Patric Jensfelt, Adrien Gaidon

Keywords Paper

0

0

0

0

5:05

30/11/2020

Project to Adapt: Domain Adaptation for Depth Completion from Noisy and Sparse Sensor Data

Adrian Lopez-Rodriguez, Benjamin Busam, Krystian Mikolajczyk

Keywords Paper

0

0

0

0

10:00

06/12/2020

Learning to Orient Surfaces by Self-supervised Spherical CNNs

Riccardo Spezialetti, Federico Stella, Marlon Marcon and
Luciano Silva, Samuele Salti, Luigi Di Stefano

Keywords Paper

0

0

0

0

3:22

06/12/2021

Causal Navigation by Continuous-time Neural Networks

Charles Vorbach, Ramin Hasani, Alexander Amini and
Mathias Lechner, Daniela Rus

Keywords Paper

deep learning, reinforcement learning and planning, causality

0

0

0

0

14:58

19/08/2021

Sequential 3D Human Pose Estimation Using Adaptive Point Cloud Sampling Strategy

Zihao Zhang, Lei Hu, Xiaoming Deng, Shihong Xia

Keywords Paper

Computer Vision, 2D and 3D Computer Vision, Motion and Tracking, Deep Learning

0

0

0

0

13:34

25/04/2020

TangibleCircuits: An Interactive 3D Printed Circuit Education Tool for People with Visual Impairments

Josh Davis, Te-Yen Wu, Bo Shi and
Hanyi Lu, Athina Panotopoulou, Emily Whiting, Xing-Dong Yang

Keywords Paper

tangible user interfaces, universal design, accessibility, circuit prototyping, education tools

0

0

0

0

15:05