Proposal-free One-stage Referring Expression via Grid-Word Cross-Attention

19/08/2021

Proposal-free One-stage Referring Expression via Grid-Word Cross-Attention

Wei Suo, MengYang Sun, Peng Wang, Qi Wu

Keywords: Computer Vision, Language and Vision, Structural and Model-Based Approaches, Knowledge Representation and Reasoning

Abstract Paper Similar Papers

Abstract: Referring Expression Comprehension (REC) has become one of the most important tasks in visual reasoning, since it is an essential step for many vision-and-language tasks such as visual question answering. However, it has not been widely used in many downstream tasks because it suffers 1) two-stage methods exist heavy computation cost and inevitable error accumulation, and 2) one-stage methods have to depend on lots of hyper-parameters (such as anchors) to generate bounding box. In this paper, we present a proposal-free one-stage (PFOS) model that is able to regress the region-of-interest from the image, based on a textual query, in an end-to-end manner. Instead of using the dominant anchor proposal fashion, we directly take the dense-grid of image as input for a cross-attention transformer that learns grid-word correspondences. The final bounding box is predicted directly from the image without the time-consuming anchor selection process that previous methods suffer. Our model achieves the state-of-the-art performance on four referring expression datasets with higher efficiency, comparing to previous best one-stage and two-stage methods.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at IJCAI 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

19/04/2021

Progressively pretrained dense corpus index for open-domain question answering

Wenhan Xiong, Hong Wang, William Yang Wang

Keywords Paper

0

0

0

0

12:15

14/06/2020

A Real-Time Cross-Modality Correlation Filtering Method for Referring Expression Comprehension

Yue Liao, Si Liu, Guanbin Li and
Fei Wang, Yanjie Chen, Chen Qian, Bo Li

Keywords Paper

referring expression comprehension, cross modality, correlation filtering, real-time, one stage

0

0

0

0

1:00

06/12/2021

Few-Shot Segmentation via Cycle-Consistent Transformer

Gengwei Zhang, Guoliang Kang, Yi Yang, Yunchao Wei

Keywords Paper

transformers, vision, few shot learning

0

0

0

0

11:58

26/04/2020

Pre-training Tasks for Embedding-based Large-scale Retrieval

Wei-Cheng Chang, Felix X. Yu, Yin-Wen Chang and
Yiming Yang, Sanjiv Kumar

Keywords Paper

natural language processing, large-scale retrieval, unsupervised representation learning, paragraph-level pre-training, two-tower Transformer models

0

0

0

1

4:39

14/06/2020

Webly Supervised Knowledge Embedding Model for Visual Reasoning

Wenbo Zheng, Lan Yan, Chao Gou, Fei-Yue Wang

Keywords Paper

visual reasoning, webly supervised learning

0

0

0

0

1:01

19/08/2021

Dependent Multi-Task Learning with Causal Intervention for Image Captioning

Wenqing Chen, Jidong Tian, Caoyun Fan and
Hao He, Yaohui Jin

Keywords Paper

Machine Learning, Transfer, Adaptation, Multi-task Learning, Natural Language Generation, Language and Vision

0

0

0

0

12:02

06/12/2021

Referring Transformer: A One-step Approach to Multi-task Visual Grounding

Muchen Li, Leonid Sigal

Keywords Paper

transformers, vision

0

0

0

0

7:54

14/06/2020

ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection

Yuxin Wang, Hongtao Xie, Zheng-Jun Zha and
Mengting Xing, Zilong Fu, Yongdong Zhang

Keywords Paper

scene text detection, arbitrary shapes, false-positive suppression, large scale variance

0

0

0

0

1:01

13/04/2021

Learning bijective feature maps for linear ICA

Alexander Camuto, Matthew Willetts, Chris Holmes and
Brooks Paige, Stephen Roberts

Keywords Paper

0

0

0

0

3:02

16/11/2020

Form2Seq : A Framework for Higher-Order Form Structure Extraction

Milan Aggarwal, Hiresh Gupta, Mausoom Sarkar, Balaji Krishnamurthy

Keywords Paper

document extraction, semantic task, image resolution, structure extraction

0

0

0

0

11:26

14/06/2020

Exploring Categorical Regularization for Domain Adaptive Object Detection

Chang-Dong Xu, Xing-Ran Zhao, Xin Jin, Xiu-Shen Wei

Keywords Paper

domain adaptive object detection, image-level categorical regularization, categorical consistency regularization, domain adaptive faster r-cnn

0

0

0

0

1:00

06/12/2021

CentripetalText: An Efficient Text Instance Representation for Scene Text Detection

Tao Sheng, Jie Chen, Zhouhui Lian

Keywords Paper

robustness

0

0

0

0

9:55

08/12/2020

Domain Transfer based Data Augmentation for Neural Query Translation

Liang Yao, Baosong Yang, Haibo Zhang and
Boxing Chen, Weihua Luo

Keywords Paper

0

0

0

0

10:57

07/09/2020

Learning Effectively from Noisy Supervision for Weakly Supervised Semantic Segmentation

Wenbin Xie, Qiaoqiao Wei, Zheng Li, Hui Zhang

Keywords Paper

Semantic Segmentation, Weakly Supervised Semantic Segmentation, Self Attention

0

0

0

0

3:46

19/04/2021

Randomized deep structured prediction for discourse-level processing

Manuel Widmoser, Maria Leonor Pacheco, Jean Honorio, Dan Goldwasser

Keywords Paper

0

0

0

0

9:44

02/02/2021

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps

Qi Zhu, Chenyu Gao, Peng Wang, Qi Wu

Keywords Paper

0

0

0

0

15:58

30/11/2020

MLIFeat: Multi-level information fusion based deep local features

Yuyang Zhang Institute of Automation, Chinese Academy of Sciences, University of Chinese Academy of Sciences and
Jinge Wang, Shibiao Xu, Xiao Liu, Xiaopeng Zhang

Keywords Paper

0

0

0

0

5:28

02/02/2021

FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding

Yuwei Fang, Shuohang Wang, Zhe Gan and
Siqi Sun, Jingjing Liu

Keywords Paper

0

0

0

0

17:39

19/10/2020

Efficient neural query auto completion

Sida Wang, Weiwei Guo, Huiji Gao, Bo Long

Keywords Paper

deep learning, query auto completion, neural language model

0

0

0

0

9:59

06/12/2020

Hierarchical Granularity Transfer Learning

Shaobo Min, Hongtao Xie, Hantao Yao and
Xuran Deng, Zheng-Jun Zha, Yongdong Zhang

Keywords Paper

0

0

0

0

3:07

26/04/2020

Neural Module Networks for Reasoning over Text

Nitish Gupta, Kevin Lin, Dan Roth and
Sameer Singh, Matt Gardner

Keywords Paper

question answering, compositionality, neural module networks, multi-step reasoning, reading comprehension

0

0

0

0

4:36

22/11/2021

Spatial Aggregation for Scene Text Recognition

Yili Huang, Chengyu Gu, Shilin Wang and
Zheng Huang, Kai Chen

Keywords Paper

Scene text recognition, Vocabulary reliance, Spatial aggregation

0

0

0

0

2:56

02/02/2021

PGNet: Real-time Arbitrarily-Shaped Text Spotting with Point Gathering Network

Pengfei Wang, Chengquan Zhang, Fei Qi and
Shanshan Liu, Xiaoqiang Zhang, Pengyuan Lyu, Junyu Han, Jingtuo Liu, Errui Ding, Guangming Shi

Keywords Paper

0

0

0

0

18:06

14/06/2020

Context Prior for Scene Segmentation

Changqian Yu, Jingbo Wang, Changxin Gao and
Gang Yu, Chunhua Shen, Nong Sang

Keywords Paper

semantic segmentation, scene segmentation, context prior, context aggregation, affinity loss, affinity matrix

0

0

0

0

1:01

04/07/2020

Adaptive Compression of Word Embeddings

Yeachan Kim, Kang-Min Kim, SangKeun Lee

Keywords Paper

Adaptive Embeddings, Distributed words, natural tasks, downstream tasks

0

0

0

0

12:13

05/01/2021

TranstextNet: Transducing Text for Recognizing Unseen Visual Relationships

Gal S. Kenigsfield, Ran El-Yaniv

Keywords Paper

0

0

0

0

5:00

12/07/2020

Educating Text Autoencoders: Latent Representation Guidance via Denoising

Tianxiao Shen, Jonas Mueller, Regina Barzilay, Tommi Jaakkola

Keywords Paper

Deep Learning - Generative Models and Autoencoders

0

0

0

0

17:06

06/12/2021

NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM

Connor Holmes, Minjia Zhang, Yuxiong He, Bo Wu

Keywords Paper

optimization, transformers, language

0

0

0

0

10:53

19/08/2021

Step-Wise Hierarchical Alignment Network for Image-Text Matching

Zhong Ji, Kexin Chen, Haoran Wang

Keywords Paper

Computer Vision, Language and Vision

0

0

0

0

6:07

16/11/2020

Gradient-guided Unsupervised Lexically Constrained Text Generation

Lei Sha

Keywords Paper

lexically generation, real-world applications, lexically-constrained generation, unsupervised problem

0

0

0

0

11:39

02/02/2021

Non-Autoregressive Coarse-to-Fine Video Captioning

Bang Yang, Yuexian Zou, Fenglin Liu, Can Zhang

Keywords Paper

0

0

0

0

18:21

08/12/2020

Emergent Communication Pretraining for Few-Shot Machine Translation

Yaoyiran Li, Edoardo Maria Ponti, Ivan Vulić, Anna Korhonen

Keywords Paper

0

0

0

0

14:42

03/05/2021

Neural Topic Model via Optimal Transport

He Zhao, Dinh Phung, Viet Huynh and
Trung Le, Wray Buntine

Keywords Paper

optimal transport, document analysis, topic modelling

0

0

0

1

9:29

07/09/2020

Advancing weakly supervised cross-domain alignment with optimal transport

Siyang Yuan, Ke Bai, Liqun Chen and
Yizhe Zhang, Chenyang Tao, Chunyuan Li, Guoyin Wang, Ricardo Henao, Lawrence Carin Duke

Keywords Paper

Optimal Transport, Cross Domain Alignment

0

0

0

0

10:04

12/07/2020

Scaling up Hybrid Probabilistic Inference with Logical and Arithmetic Constraints via Message Passing

Zhe Zeng, Paolo Morettin, Fanqi Yan and
Antonio Vergari, Guy Van den Broeck

Keywords Paper

Probabilistic Inference - Models and Probabilistic Programming

0

0

0

0

14:16

19/10/2020

Distant supervision in BERT-based adhoc document retrieval

Koustav Rudra, Avishek Anand

Keywords Paper

distant supervision, adhoc retrieval, document ranking

0

0

0

0

6:49

02/02/2021

IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization

Wenxuan Zhou, Bill Yuchen Lin, Xiang Ren

Keywords Paper

0

0

0

0

16:25

02/02/2021

Train a One-Million-Way Instance Classifier for Unsupervised Visual Representation Learning

Yu Liu, Lianghua Huang, Pan Pan and
Bin Wang, Yinghui Xu, Rong Jin

Keywords Paper

0

0

0

0

15:15

06/12/2020

Sparse Graphical Memory for Robust Planning

Scott Emmons, Ajay Jain, Misha Laskin and
Thanard Kurutach, Pieter Abbeel, Deepak Pathak

Keywords Paper

0

0

0

0

3:23

19/01/2020

Visualization by Example

Chenglong Wang, Yu Feng, Rastislav Bodik and
Alvin Cheung, Isil Dillig

Keywords Paper

Program Synthesis, Data Visualization

0

0

0

0

20:36