Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image Classification and Retrieval

05/01/2021

Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image Classification and Retrieval

Andres Mafla, Sounak Dey, Ali Furkan Biten, Lluis Gomez, Dimosthenis Karatzas

Keywords:

Abstract Paper Similar Papers

Abstract: Scene text instances found in natural images carry explicit semantic information that can provide important cues to solve a wide array of computer vision problems. In this paper, we focus on leveraging multi-modal content in the form of visual and textual cues to tackle the task of fine-grained image classification and retrieval. First, we obtain the text instances from images by employing a text reading system. Then, we combine textual features with salient image regions to exploit the complementary information carried by the two sources. Specifically, we employ a Graph Convolutional Network to perform multi-modal reasoning and obtain relationship-enhanced features by learning a common semantic space between salient objects and text found in an image. By obtaining an enhanced set of visual and textual features, the proposed model greatly outperforms previous state-of-the-art in two different tasks, fine-grained classification and image retrieval in the Con-Text and Drink Bottle datasets.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at WACV 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

08/12/2020

VICTR: Visual Information Captured Text Representation for Text-to-Vision Multimodal Tasks

Caren Han, Siqu Long, Siwen Luo and
Kunze Wang, Josiah Poon

Keywords Paper

0

0

0

0

16:29

19/08/2021

Step-Wise Hierarchical Alignment Network for Image-Text Matching

Zhong Ji, Kexin Chen, Haoran Wang

Keywords Paper

Computer Vision, Language and Vision

0

0

0

0

6:07

06/12/2021

Multi-modal Dependency Tree for Video Captioning

Wentian Zhao, Xinxiao Wu, Jiebo Luo

Keywords Paper

reinforcement learning and planning, graph learning, language

0

0

0

0

6:02

02/02/2021

Multi-modal Graph Fusion for Named Entity Recognition with Targeted Visual Guidance

Dong Zhang, Suzhong Wei, Shoushan Li and
Hanqian Wu, Qiaoming Zhu, Guodong Zhou

Keywords Paper

0

0

0

0

16:28

02/02/2021

Scene Graph Embeddings Using Relative Similarity Supervision

Paridhi Maheshwari, Ritwick Chaudhry, Vishwa Vinay

Keywords Paper

0

0

0

0

18:53

07/06/2020

MimicProp: Learning to Incorporate Lexicon Knowledge into Distributed Word Representation for Social Media Analysis

Muheng Yan, Yu-Ru Lin, Rebecca Hwa and
Ali Mert Ertugrul, Meiqi Guo, Wen-Ting Chung

Keywords Paper

classification, embeddings, impact, learning, performance, representations, terms, texts, word embeddings, words

0

0

0

0

10:25

14/06/2020

Visual-Semantic Matching by Exploring High-Order Attention and Distraction

Yongzhi Li, Duo Zhang, Yadong Mu

Keywords Paper

visual semantic matching, cross modal retrieval, scene graph, visual distraction, graph matching, gcn

0

0

0

0

1:01

22/11/2021

Image-Text Alignment using Adaptive Cross-attention with Transformer Encoder for Scene Graphs

Juyong Song, Sunghyun Choi

Keywords Paper

cross-attention, multi-modal, retrieval, scene-graphs, graph neural networks, contrastive loss

0

0

0

0

3:01

25/07/2020

Attending to inter-sentential features in neural text classification

Billy Chiu, Sunil Kumar Sahu, Neha Sengupta and
Derek Thomas, Mohammady Mahdy

Keywords Paper

graph network, hybrid neural network, attention mechanism

0

0

0

0

6:41

14/06/2020

Graph-Structured Referring Expression Reasoning in the Wild

Sibei Yang, Guanbin Li, Yizhou Yu

Keywords Paper

graph-structured reasoning, ref-reasoning dataset, referring expression reasoning, scene graph, neural module, visual grounding, grounding referring expressions

0

0

0

0

4:58

02/02/2021

Joint Semantic Analysis with Document-Level Cross-Task Coherence Rewards

Rahul Aralikatte, Mostafa Abdou, Heather C Lent and
Daniel Hershcovich, Anders Søgaard

Keywords Paper

0

0

0

0

14:41

02/02/2021

Similarity Reasoning and Filtration for Image-Text Matching

Haiwen Diao, Ying Zhang, Lin Ma, Huchuan Lu

Keywords Paper

0

0

0

0

16:34

05/01/2021

TranstextNet: Transducing Text for Recognizing Unseen Visual Relationships

Gal S. Kenigsfield, Ran El-Yaniv

Keywords Paper

0

0

0

0

5:00

14/06/2020

Fine-Grained Video-Text Retrieval With Hierarchical Graph Reasoning

Shizhe Chen, Yida Zhao, Qin Jin, Qi Wu

Keywords Paper

video-text retrieval, cross-modal matching, graph neural network

0

0

0

0

1:01

22/11/2021

BI-GCN: Boundary-Aware Input-Dependent Graph Convolution Network for Biomedical Image Segmentation

Yanda Meng, Hongrun Zhang, Dongxu Gao and
Yitian Zhao, Xiaoyun Yang, Xuesheng Qian, Xiaowei Huang, Yalin Zheng

Keywords Paper

Medical Image Segmentation, Graph Convolution Network

0

0

0

0

7:43

07/09/2020

Advancing weakly supervised cross-domain alignment with optimal transport

Siyang Yuan, Ke Bai, Liqun Chen and
Yizhe Zhang, Chenyang Tao, Chunyuan Li, Guoyin Wang, Ricardo Henao, Lawrence Carin Duke

Keywords Paper

Optimal Transport, Cross Domain Alignment

0

0

0

0

10:04

23/08/2020

Grammatically recognizing images with tree convolution

Guangrun Wang, Guangcong Wang, Keze Wang and
Xiaodan Liang, Liang Lin

Keywords Paper

hierarchical representation learning, image classification, object detection, deep neural network architecture, person re-identification, image grammar

0

0

0

0

2:56

16/11/2020

Text Graph Transformer for Document Classification

Haopeng Zhang, Jiawei Zhang

Keywords Paper

text classification, natural processing, text task, graph techniques

0

0

0

0

6:04

01/07/2020

Toward General Scene Graph: Integration of Visual Semantic Knowledge with Entity Synset Alignment

Woo Suk Choi, Kyoung-Woon On, Yu-Jung Heo, Byoung-Tak Zhang

Keywords Paper

0

0

0

0

5:28

06/12/2020

Self-Supervised Relationship Probing

Jiuxiang Gu, Jason Kuen, Shafiq Joty and
Jianfei Cai, Vlad Morariu, Handong Zhao, Tong Sun Sun

Keywords Paper

0

0

0

0

3:08

02/02/2021

Encoder-Decoder Based Unified Semantic Role Labeling with Label-Aware Syntax

Hao Fei, Fei Li, Bobo Li, Donghong Ji

Keywords Paper

0

0

0

0

16:10

16/11/2020

Neural Deepfake Detection with Factual Structure of Text

Wanjun Zhong, Duyu Tang, Zenan Xu and
Ruize Wang, Nan Duan, Ming Zhou, Jiahai Wang, Jian Yin

Keywords Paper

deepfake detection, automatically text, deepfake text, natural models

0

0

0

0

10:48

02/02/2021

Learning Visual Context for Group Activity Recognition

Hangjie Yuan, Dong Ni

Keywords Paper

0

0

0

0

16:54

04/07/2020

Relational Graph Attention Network for Aspect-based Sentiment Analysis

Kai Wang, Weizhou Shen, Yunyi Yang and
Xiaojun Quan, Rui Wang

Keywords Paper

Aspect-based Analysis, encoding information, sentiment prediction, Relational Network

0

0

0

0

6:56

19/08/2021

Learn from Syntax: Improving Pair-wise Aspect and Opinion Terms Extraction with Rich Syntactic Knowledge

Shengqiong Wu, Hao Fei, Yafeng Ren and
Donghong Ji, Jingye Li

Keywords Paper

Natural Language Processing, Information Extraction, Natural Language Semantics, Sentiment Analysis and Text Mining

0

0

0

0

11:50

06/12/2020

Deep Relational Topic Modeling via Graph Poisson Gamma Belief Network

Chaojie Wang, Hao Zhang, Bo Chen and
Dongsheng Wang, Zhengjue Wang, Mingyuan Zhou

Keywords Paper

0

0

0

0

2:57

19/08/2021

TextGTL: Graph-based Transductive Learning for Semi-supervised Text Classification via Structure-Sensitive Interpolation

Chen Li, Xutan Peng, Hao Peng and
Jianxin Li, Lihong Wang

Keywords Paper

Machine Learning, Semi-Supervised Learning, Mining Graphs, Semi Structured Data, Complex Data

0

0

0

0

13:14

19/04/2021

Cognition-aware cognate detection

Diptesh Kanojia, Prashant Sharma, Sayali Ghodekar and
Pushpak Bhattacharyya, Gholamreza Haffari, Malhar Kulkarni

Keywords Paper

0

0

0

0

8:53

18/07/2021

Two Heads are Better Than One: Hypergraph-Enhanced Graph Reasoning for Visual Event Ratiocination

Wenbo Zheng, Lan Yan, Chao Gou, Fei-Yue Wang

Keywords Paper

Applications, Computer Vision

0

0

0

0

5:14

04/07/2020

Reasoning Over Semantic-Level Graph for Fact Checking

Wanjun Zhong, Jingjing Xu, Duyu Tang and
Zenan Xu, Nan Duan, Ming Zhou, Jiahai Wang, Jian Yin

Keywords Paper

Reasoning Graph, Fact Checking, string concatenation, semantic labeling

0

0

0

0

11:30

14/06/2020

Graph Structured Network for Image-Text Matching

Chunxiao Liu, Zhendong Mao, Tianzhu Zhang and
Hongtao Xie, Bin Wang, Yongdong Zhang

Keywords Paper

image-text matching, graph network, cross-modal, fine-grained correspondence, visual-semantic

0

0

0

0

1:01

19/10/2020

CGTR: Convolution graph topology representation for document ranking

Yuanyuan Qi, Jiayue Zhang, Yansong Liu and
Weiran Xu, Jun Guo

Keywords Paper

graph convolution networks, text understanding, contextualized neural language models

0

0

0

0

7:02

14/06/2020

Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA

Ronghang Hu, Amanpreet Singh, Trevor Darrell, Marcus Rohrbach

Keywords Paper

textvqa, visual question answering, vqa, vision and language, st-vqa, ocr-vqa, transformer, pointer network, ocr

0

0

0

0

4:56

14/06/2020

Squeeze-and-Attention Networks for Semantic Segmentation

Zilong Zhong, Zhong Qiu Lin, Rene Bidart and
Xiaodan Hu, Ibrahim Ben Daya, Zhifeng Li, Wei-Shi Zheng, Jonathan Li, Alexander Wong

Keywords Paper

semantic segmentation, squeeze-and-attention, pixel grouping

0

0

0

0

1:01

04/07/2020

Bridging the Structural Gap Between Encoding and Decoding for Data-To-Text Generation

Chao Zhao, Marilyn Walker, Snigdha Chaturvedi

Keywords Paper

Data-To-Text Generation, faithful generation, Encoding, Decoding

0

0

0

0

12:13

04/07/2020

Multi-Granularity Interaction Network for Extractive and Abstractive Multi-Document Summarization

Hanqi Jin, Tianming Wang, Xiaojun Wan

Keywords Paper

Extractive Summarization, Extractive , abstractive summarization, Multi-Granularity Network

0

0

0

0

10:38

04/07/2020

Improving Image Captioning with Better Use of Caption

Zhan Shi, Xu Zhou, Xipeng Qiu, Xiaodan Zhu

Keywords Paper

Image Captioning, multimodal problem, natural processing, computer community

0

0

0

0

11:11

04/07/2020

Aligned Dual Channel Graph Convolutional Network for Visual Question Answering

Qingbao Huang, Jielong Wei, Yi Cai and
Changmeng Zheng, Junying Chen, Ho-fung Leung, Qing Li

Keywords Paper

Visual Answering, image representations, question representations, Aligned Network

0

0

0

0

9:14

02/02/2021

Exploiting Relationship for Complex-scene Image Generation

Tianyu Hua, Hongdong Zheng, Yalong Bai and
Wei Zhang, Xiao-Ping Zhang, Tao Mei

Keywords Paper

0

0

0

0

15:01

04/07/2020

Universal Decompositional Semantic Parsing

Elias Stengel-Eskin, Aaron Steven White, Sheng Zhang, Benjamin Van Durme

Keywords Paper

parsing, Universal Parsing, transductive model, Universal representations

0

0

0

0

11:37