Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding

16/11/2020

Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding

Alexander Ku, Peter Anderson, Roma Patel, Eugene Ie, Jason Baldridge

Keywords: multitask learning, embodied agents, vln, rxr

Abstract Paper Similar Papers

Abstract: We introduce Room-Across-Room (RxR), a new Vision-and-Language Navigation (VLN) dataset. RxR is multilingual (English, Hindi, and Telugu) and larger (more paths and instructions) than other VLN datasets. It emphasizes the role of language in VLN by addressing known biases in paths and eliciting more references to visible entities. Furthermore, each word in an instruction is time-aligned to the virtual poses of instruction creators and validators. We establish baseline scores for monolingual and multilingual settings and multitask learning when including Room-to-Room annotations (Anderson et al., 2018). We also provide results for a model that learns from synchronized pose traces by focusing only on portions of the panorama attended to in human demonstrations. The size, scope and detail of RxR dramatically expands the frontier for research on embodied language agents in photorealistic simulated environments.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at EMNLP 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

26/04/2020

On the Relationship between Self-Attention and Convolutional Layers

Jean-Baptiste Cordonnier, Andreas Loukas, Martin Jaggi

Keywords Paper

self-attention, attention, transformers, convolution, CNN, image, expressivity, capacity

0

0

0

0

5:18

03/05/2021

$i$-Mix: A Domain-Agnostic Strategy for Contrastive Representation Learning

Kibok Lee, Yian Zhu, Kihyuk Sohn and
Chun-Liang Li, Jinwoo Shin, Honglak Lee

Keywords Paper

self-supervised learning, unsupervised representation learning, data augmentation, MixUp, contrastive representation learning

0

0

0

0

5:04

02/02/2021

Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers

Shijie Geng, Peng Gao, Moitreya Chatterjee and
Chiori Hori, Jonathan Le Roux, Yongfeng Zhang, Hongsheng Li, Anoop Cherian

Keywords Paper

0

0

0

0

19:36

18/07/2021

Decoupling Representation Learning from Reinforcement Learning

Adam Stooke, Kimin Lee, Pieter Abbeel, Michael Laskin

Keywords Paper

Optimization, Submodular Optimization, Algorithms, Bandit Algorithms; Algorithms, Online Learning, Deep Learning, Embedding and Representation learning

0

0

0

0

5:15

06/12/2021

MERLOT: Multimodal Neural Script Knowledge Models

Rowan Zellers, Ximing Lu, Jack Hessel and
Youngjae Yu, Jae Sung Park, Jize Cao, Ali Farhadi, Yejin Choi

Keywords Paper

representation learning

0

0

0

0

18:15

14/06/2020

Single-View View Synthesis With Multiplane Images

Richard Tucker, Noah Snavely

Keywords Paper

view synthesis, monocular, multiplane image, image-based rendering, 3d deep learning, scale invariance

0

0

0

0

1:01

22/11/2021

Single-Modal Entropy based Active Learning for Visual Question Answering

Dong-Jin Kim, Jae Won Cho, Jinsoo Choi and
Yunjae Jung, In So Kweon

Keywords Paper

Visual Question Answering, Vision and Language, Active Learning

0

0

0

0

2:42

22/11/2021

Hierarchical Contrastive Motion Learning for Video Action Recognition

Xitong Yang, Xiaodong Yang, Sifei Liu and
Deqing Sun, Larry Davis, Jan Kautz

Keywords Paper

action recognition, motion hierarchy, motion representation, contrastive learning

0

0

0

0

8:29

26/04/2020

Neural Machine Translation with Universal Visual Representation

Zhuosheng Zhang, Kehai Chen, Rui Wang and
Masao Utiyama, Eiichiro Sumita, Zuchao Li, Hai Zhao

Keywords Paper

Neural Machine Translation, Visual Representation, Multimodal Machine Translation, Language Representation

0

0

0

0

4:50

02/02/2021

SeCo: Exploring Sequence Supervision for Unsupervised Representation Learning

Ting Yao, Yiheng Zhang, Zhaofan Qiu and
Yingwei Pan, Tao Mei

Keywords Paper

0

0

0

0

16:17

06/12/2021

Look at What I’m Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos

Reuben Tan, Bryan Plummer, Kate Saenko and
Hailin Jin, Bryan Russell

Keywords Paper

optimization

0

0

0

0

12:28

16/11/2020

Sub-Instruction Aware Vision-and-Language Navigation

Yicong Hong, Cristian Rodriguez, Qi Wu, Stephen Gould

Keywords Paper

vision-and-language navigation, navigation, agent, sub-instruction modules

0

0

0

0

9:21

06/12/2020

Multi-Stage Influence Function

Hongge Chen, Si Si, Yang Li and
Ciprian Chelba, Sanjiv Kumar, Duane Boning, Cho-Jui Hsieh

Keywords Paper

0

0

0

0

3:23

06/12/2021

Learning from Inside: Self-driven Siamese Sampling and Reasoning for Video Question Answering

Weijiang Yu, Haoteng Zheng, Mengfei Li and
Lei Ji, Lijun Wu, Nong Xiao, Nan Duan

Keywords Paper

transformers

0

0

0

0

13:47

16/11/2020

Reasoning about Goals, Steps, and Temporal Ordering with WikiHow

Li Zhang, Qing Lyu, Chris Callison-Burch

Keywords Paper

reasoning tasks, common-sense inference, out-of-domain tasks, swag

0

0

0

0

7:03

16/11/2020

Learning to Represent Image and Text with Denotation Graph

Bowen Zhang, Hexiang Hu, Vihan Jain and
Eugene Ie, Fei Sha

Keywords Paper

cross-modal retrieval, referring expression, compositional recognition, pre-training

0

0

0

0

10:59

06/12/2021

Align before Fuse: Vision and Language Representation Learning with Momentum Distillation

Junnan Li, Ramprasaath Selvaraju, Akhilesh Gotmare and
Shafiq Joty, Caiming Xiong, Steven Chu Hong Hoi

Keywords Paper

transformers, vision, representation learning

0

0

0

0

9:40

02/02/2021

UBAR: Towards Fully End-to-End Task-Oriented Dialog System with GPT-2

Yunyi Yang, Yunhao Li, Xiaojun Quan

Keywords Paper

0

0

0

0

19:38

19/04/2021

DOCENT: Learning self-supervised entity representations from large document collections

Yury Zemlyanskiy, Sudeep Gandhe, Ruining He and
Bhargav Kanagal, Anirudh Ravula, Juraj Gottweis, Fei Sha, Ilya Eckstein

Keywords Paper

0

0

0

0

6:37

06/12/2021

End-to-end Multi-modal Video Temporal Grounding

Yi-Wen Chen, Yi-Hsuan Tsai, Ming-Hsuan Yang

Keywords Paper

self-supervised learning, transformers, vision, contrastive learning

0

0

0

0

8:46

14/06/2020

Focus on Defocus: Bridging the Synthetic to Real Domain Gap for Depth Estimation

Maxim Maximov, Kevin Galim, Laura Leal-Taixé

Keywords Paper

depth estimation, generalisation, depth from focus, blur estimation, depth

0

0

0

0

1:01

02/02/2021

FIXMYPOSE: Pose Correctional Captioning and Retrieval

Hyounghun Kim, Abhay Zala, Graham Burri, Mohit Bansal

Keywords Paper

0

0

0

0

15:25

04/07/2020

Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA

Hyounghun Kim, Zineng Tang, Mohit Bansal

Keywords Paper

Dense-Caption Matching, Temporal VideoQA, answering questions, frame problem

0

0

0

0

10:56

14/06/2020

Self-Supervised Monocular Trained Depth Estimation Using Self-Attention and Discrete Disparity Volume

Adrian Johnston, Gustavo Carneiro

Keywords Paper

self-supervised depth estimation, self-supervised learning, self-attention, depth estimation, uncertainty

0

0

0

0

1:01

14/06/2020

Syntax-Aware Action Targeting for Video Captioning

Qi Zheng, Chaoyue Wang, Dacheng Tao

Keywords Paper

video and language, video captioning, action predicting

0

0

0

0

1:01

07/09/2020

Attention Distillation for Learning Video Representations

Miao Liu, Xin Chen, Yun Zhang and
Yin Li, James Rehg

Keywords Paper

Action Recognition, Deep Learning, Representation Learning

0

0

0

0

9:50

06/12/2021

Referring Transformer: A One-step Approach to Multi-task Visual Grounding

Muchen Li, Leonid Sigal

Keywords Paper

transformers, vision

0

0

0

0

7:54

03/05/2021

What Can You Learn From Your Muscles? Learning Visual Representation from Human Interactions

Kiana Ehsani, Daniel Gordon, Thomas H Nguyen and
Roozbeh Mottaghi, Ali Farhadi

Keywords Paper

computer vision, representation learning

0

0

0

0

4:51

26/04/2020

Environmental drivers of systematicity and generalization in a situated agent

Felix Hill, Andrew Lampinen, Rosalia Schneider and
Stephen Clark, Matthew Botvinick, James L. McClelland, Adam Santoro

Keywords Paper

systematicitiy, systematic, generalization, combinatorial, agent, policy, language, compositionality

0

0

0

0

5:44

26/04/2020

Learn to Explain Efficiently via Neural Logic Inductive Learning

Yuan Yang, Le Song

Keywords Paper

inductive logic programming, interpretability, attention

0

0

0

0

5:01

19/08/2021

Hierarchical Object-oriented Spatio-Temporal Reasoning for Video Question Answering

Long Hoang Dang, Thao Minh Le, Vuong Le, Truyen Tran

Keywords Paper

Computer Vision, Language and Vision

0

0

0

0

14:06

14/06/2020

Learning to Observe: Approximating Human Perceptual Thresholds for Detection of Suprathreshold Image Transformations

Alan Dolhasz, Carlo Harvey, Ian Williams

Keywords Paper

percetpion, jnd, vision, deep learning, image compositing, local distortions, subjective quality

0

0

0

0

1:01

03/05/2021

Learning and Evaluating Representations for Deep One-Class Classification

Kihyuk Sohn, Chun-Liang Li, Jinsung Yoon and
Minho Jin, Tomas Pfister

Keywords Paper

self-supervised learning, deep one-class classification

0

0

0

1

5:13

22/11/2021

With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition

Evangelos Kazakos, Jaesung Huh, Arsha Nagrani and
Andrew Zisserman, Dima Damen

Keywords Paper

egocentric action recognition, multimodal, temporal context

0

0

0

0

2:58

22/11/2021

Dynamic Graph Warping Transformer for Video Alignment

Junyan Wang, Yang Long, Maurice Pagnucco, Yang Song

Keywords Paper

Video alignment, Transformer, Graph Neural Network

0

0

0

0

2:45

03/05/2021

Wandering within a world: Online contextualized few-shot learning

Mengye Ren, Michael L Iuzzolino, Mike Mozer, Richard Zemel

Keywords Paper

lifelong learning, Few-shot learning, continual learning

0

0

0

0

5:02

16/11/2020

VD-BERT: A Unified Vision and Dialog Transformer with BERT

Yue Wang, Shafiq Joty, Michael Lyu and
Irwin King, Caiming Xiong, Steven C.H. Hoi

Keywords Paper

visual dialog, vision-language task, visual tasks, answer ranking

0

0

0

0

11:54

02/02/2021

Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation

Lincheng Li, Suzhen Wang, Zhimeng Zhang and
Yu Ding, Yixing Zheng, Xin Yu, Changjie Fan

Keywords Paper

0

0

0

0

15:58

03/05/2021

Large Batch Simulation for Deep Reinforcement Learning

Brennan Shacklett, Erik Wijmans, Aleksei Petrenko and
Manolis Savva, Dhruv Batra, Vladlen Koltun, Kayvon Fatahalian

Keywords Paper

reinforcement learning, simulation

0

0

0

0

5:29

23/08/2020

Spectrum-guided adversarial disparity learning

Zhe Liu, Lina Yao, Lei Bai and
Xianzhi Wang, Can Wang

Keywords Paper

adversarial autoencoder, generative models, intraclass variability, activity recognition

0

0

0

0

14:30