Image-Text Alignment using Adaptive Cross-attention with Transformer Encoder for Scene Graphs

22/11/2021

Image-Text Alignment using Adaptive Cross-attention with Transformer Encoder for Scene Graphs

Juyong Song, Sunghyun Choi

Keywords: cross-attention, multi-modal, retrieval, scene-graphs, graph neural networks, contrastive loss

Abstract Paper Code Similar Papers

Abstract: Neural image and text encoders have been proposed to align the abstract image and symbolic text representation. Global-local and local-local information integration between two modalities are essential for an effective alignment. In this paper, we present RELation-aware Adaptive Cross-attention (RELAX) that achieves state-of-the-art performance in cross-modal retrieval tasks by incorporating several novel improvements. First, cross-attention methods integrate global-local information via weighted global feature of a modality (taken as value) for a local feature of the other modality (taken as query). We can make more accurate alignments if we could also consider the global weights of the query modality. To this end, we introduce adaptive embedding to consider the weights. Second, to enhance the usage of scene-graphs that can capture the high-level relation of local features, we introduce transformer encoders for textual scene graphs to align with visual scene graphs. Lastly, we use NT-XEnt loss that takes the weighted sum of the samples based on their importance. We show that our approach is effective in extensive experiments that outperform other state-of-the-art models.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at BMVC 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

02/02/2021

Similarity Reasoning and Filtration for Image-Text Matching

Haiwen Diao, Ying Zhang, Lin Ma, Huchuan Lu

Keywords Paper

0

0

0

0

16:34

02/02/2021

Object-Centric Image Generation from Layouts

Tristan Sylvain, Pengchuan Zhang, Yoshua Bengio and
R Devon Hjelm, Shikhar Sharma

Keywords Paper

0

0

0

0

17:44

13/04/2021

Graphical normalizing flows

Antoine Wehenkel, Gilles Louppe

Keywords Paper

0

0

0

0

3:04

02/02/2021

Partial-Label and Structure-constrained Deep Coupled Factorization Network

Yan Zhang, Zhao Zhang, Yang Wang and
Zheng Zhang, Li Zhang, Shuicheng Yan, Meng Wang

Keywords Paper

0

0

0

0

13:39

22/11/2021

BI-GCN: Boundary-Aware Input-Dependent Graph Convolution Network for Biomedical Image Segmentation

Yanda Meng, Hongrun Zhang, Dongxu Gao and
Yitian Zhao, Xiaoyun Yang, Xuesheng Qian, Xiaowei Huang, Yalin Zheng

Keywords Paper

Medical Image Segmentation, Graph Convolution Network

0

0

0

0

7:43

03/05/2021

Learning Robust State Abstractions for Hidden-Parameter Block MDPs

Amy Zhang, Shagun Sodhani, Khimya Khetarpal, Joelle Pineau

Keywords Paper

bisimulation, block mdp, hidden-parameter mdp, multi-task reinforcement learning

0

0

0

0

4:17

06/12/2020

GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network

Prune Truong, Martin Danelljan, Luc V Gool, Radu Timofte

Keywords Paper

0

0

0

0

3:18

06/12/2021

Manifold Topology Divergence: a Framework for Comparing Data Manifolds.

Serguei Barannikov, Ilya Trofimov, Grigorii Sotnikov and
Ekaterina Trimbach, Alexander Korotin, Alexander Filippov, Evgeny Burnaev

Keywords Paper

generative model

0

0

0

0

15:01

14/06/2020

Squeeze-and-Attention Networks for Semantic Segmentation

Zilong Zhong, Zhong Qiu Lin, Rene Bidart and
Xiaodan Hu, Ibrahim Ben Daya, Zhifeng Li, Wei-Shi Zheng, Jonathan Li, Alexander Wong

Keywords Paper

semantic segmentation, squeeze-and-attention, pixel grouping

0

0

0

0

1:01

14/06/2020

Hierarchical Scene Coordinate Classification and Regression for Visual Localization

Xiaotian Li, Shuzhe Wang, Yi Zhao and
Jakob Verbeek, Juho Kannala

Keywords Paper

visual localization, camera relocalization, scene coordinate regression

0

0

0

0

1:01

30/11/2020

Color Enhancement using Global Parameters and Local Features Learning

Enyu Liu, Songnan Li, Shan Liu

Keywords Paper

0

0

0

0

9:02

02/02/2021

Cross-Domain Grouping and Alignment for Domain Adaptive Semantic Segmentation

Minsu Kim, Sunghun Joung, Seungryong Kim and
JungIn Park, Ig-Jae Kim, Kwanghoon Sohn

Keywords Paper

0

0

0

0

14:47

07/09/2020

ASAP-Net: Attention and Structure Aware Point Cloud Sequence Segmentation

Hanwen Cao, Yongyi Lu, Bo Pang and
Cewu Lu, Alan Yuille, Gongshen Liu

Keywords Paper

Point Cloud Sequence, Semantic Segmentation

0

0

0

0

6:36

14/06/2020

Shape correspondence using anisotropic Chebyshev spectral CNNs

Qinsong Li, Shengjun Liu, Ling Hu, Xinru Liu

Keywords Paper

shape correspondence, geometric deep learning

0

0

0

0

1:02

03/05/2021

Spatial Dependency Networks: Neural Layers for Improved Generative Image Modeling

Đorđe Miladinović, Aleksandar Stanić, Stefan Bauer and
Jürgen Schmidhuber, Joachim M Buhmann

Keywords Paper

Image Modeling, Deep generative models, Neural networks, Variational Autoencoders

0

0

0

0

4:59

14/06/2020

Differential Treatment for Stuff and Things: A Simple Unsupervised Domain Adaptation Method for Semantic Segmentation

Zhonghao Wang, Mo Yu, Yunchao Wei and
Rogerio Feris, Jinjun Xiong, Wen-mei Hwu, Thomas S. Huang, Honghui Shi

Keywords Paper

semantic segmentation, domain adaptation, unsupervised learning, stuff matching, instance matching

0

0

0

0

1:00

02/02/2021

Exploiting Relationship for Complex-scene Image Generation

Tianyu Hua, Hongdong Zheng, Yalong Bai and
Wei Zhang, Xiao-Ping Zhang, Tao Mei

Keywords Paper

0

0

0

0

15:01

07/09/2020

Advancing weakly supervised cross-domain alignment with optimal transport

Siyang Yuan, Ke Bai, Liqun Chen and
Yizhe Zhang, Chenyang Tao, Chunyuan Li, Guoyin Wang, Ricardo Henao, Lawrence Carin Duke

Keywords Paper

Optimal Transport, Cross Domain Alignment

0

0

0

0

10:04

14/06/2020

Spatially Attentive Output Layer for Image Classification

Ildoo Kim, Woonhyuk Baek, Sungwoong Kim

Keywords Paper

network architecture, attention, self-supervision, weakly-supervised object localization, interpretability, classification, cutmix, cnn

0

0

0

0

1:01

02/02/2021

Kernel-convoluted Deep Neural Networks with Data Augmentation

Minjin Kim, Young-geun Kim, Dongha Kim and
Yongdai Kim, Myunghee Cho Paik

Keywords Paper

0

0

0

0

15:32

04/07/2020

Orthogonal Relation Transforms with Graph Context Modeling for Knowledge Graph Embedding

Yun Tang, Jing Huang, Guangtao Wang and
Xiaodong He, Bowen Zhou

Keywords Paper

Knowledge Embedding, knowledge task, knowledge prediction, Orthogonal Transforms

0

0

0

0

9:50

02/02/2021

Learning Visual Context for Group Activity Recognition

Hangjie Yuan, Dong Ni

Keywords Paper

0

0

0

0

16:54

30/11/2020

Contrastively Smoothed Class Alignment for Unsupervised Domain Adaptation

Shuyang Dai, Yu Cheng, Yizhe Zhang and
Zhe Gan, Jingjing Liu, Lawrence Carin

Keywords Paper

0

0

0

0

9:22

06/12/2021

Probabilistic Attention for Interactive Segmentation

Prasad Gabbur, Manjot Bilkhu, Javier Movellan

Keywords Paper

transformers, vision

0

0

0

0

13:20

26/04/2020

The Local Elasticity of Neural Networks

Hangfeng He, Weijie Su

Keywords Paper

0

0

0

0

5:34

02/02/2021

Tailoring Embedding Function to Heterogeneous Few-Shot Tasks by Global and Local Feature Adaptors

Su Lu, Han-Jia Ye, De-Chuan Zhan

Keywords Paper

0

0

0

0

14:21

06/12/2020

Post-training Iterative Hierarchical Data Augmentation for Deep Networks

Adil Khan, Khadija Fraz

Keywords Paper

Probabilistic Methods -> MCMC, Applications -> Privacy, Anonymity, and Security

0

0

0

0

3:19

06/12/2021

Shift Invariance Can Reduce Adversarial Robustness

Vasu Singla, Songwei Ge, Basri Ronen, David Jacobs

Keywords Paper

deep learning, machine learning, robustness, adversarial robustness and security

0

0

0

0

8:28

30/11/2020

Attention-Aware Feature Aggregation for Real-time Stereo Matching on Edge Devices

Jia-Ren Chang National Chiao Tung University, aetherAI, Pei-Chun Chang, Yong-Sheng Chen

Keywords Paper

0

0

0

0

9:53

06/12/2021

Learnable Fourier Features for Multi-dimensional Spatial Positional Encoding

Yang Li, Si Si, Gang Li and
Cho-Jui Hsieh, Samy Bengio

Keywords Paper

machine learning, transformers, vision

0

0

0

0

10:54

22/11/2021

Paying Attention to Varying Receptive Fields: Object Detection with Atrous Filters and Vision Transformers

Arthur Jian Shun Lam, Jun Yi Lim, Ricky Sutopo, Vishnu Monn Baskaran

Keywords Paper

object detection, atrous convolution, vision transformers, attention mechanism

0

0

0

0

3:01

02/02/2021

Visual Concept Reasoning Networks

Taesup Kim, Sungwoong Kim, Yoshua Bengio

Keywords Paper

0

0

0

0

13:01

17/08/2020

Compositional neural scene representations for shading inference

Jonathan Granskog, Fabrice Rousselle, Marios Papas, Jan Novák

Keywords Paper

disentanglement, neural scene representations, attribution, rendering, neural networks

0

0

0

0

19:11

07/09/2020

Image Harmonization with Attention-based Deep Feature Modulation

Guoqing Hao, Satoshi Iizuka, Kazuhiro Fukui

Keywords Paper

image harmonization, feature map modulation, attention

0

0

0

0

5:03

02/02/2021

Visualization of Supervised and Self-Supervised Neural Networks via Attribution Guided Factorization

Shir Gur, Ameen Ali, Lior Wolf

Keywords Paper

0

0

0

0

14:14

13/04/2021

Improving adversarial robustness via unlabeled out-of-domain data

Zhun Deng, Linjun Zhang, Amirata Ghorbani, James Zou

Keywords Paper

0

0

0

0

3:01

14/06/2020

Exemplar Normalization for Learning Deep Representation

Ruimao Zhang, Zhanglin Peng, Lingyun Wu and
Zhen Li, Ping Luo

Keywords Paper

normalization, learning to normalize, sample-adaptive, deep learning, image classification, semantic segmentation

0

0

0

0

1:00

14/06/2020

Visual-Semantic Matching by Exploring High-Order Attention and Distraction

Yongzhi Li, Duo Zhang, Yadong Mu

Keywords Paper

visual semantic matching, cross modal retrieval, scene graph, visual distraction, graph matching, gcn

0

0

0

0

1:01

14/06/2020

Normalizing Flows With Multi-Scale Autoregressive Priors

Apratim Bhattacharyya, Shweta Mahajan, Mario Fritz and
Bernt Schiele, Stefan Roth

Keywords Paper

generative models, normalizing flows, autoregressive models, exact inference, image synthesis

0

0

0

0

1:00

02/02/2021

Implicit Kernel Attention

Kyungwoo Song, Yohan Jung, Dongjun Kim, Il-Chul Moon

Keywords Paper

0

0

0

0

17:22