Open Domain Dialogue Generation with Latent Images

02/02/2021

Open Domain Dialogue Generation with Latent Images

Ze Yang, Wei Wu, Huang Hu, Can Xu, Wei Wang, Zhoujun Li

Keywords:

Abstract Paper Similar Papers

Abstract: We consider grounding open domain dialogues with images. Existing work assumes that both an image and a textual context are available, but image-grounded dialogues by nature are more difficult to obtain than textual dialogues. Thus, we propose learning a response generation model with both image-grounded dialogues and textual dialogues by assuming that the visual scene information at the time of a conversation can be represented by an image, and trying to recover the latent images of the textual dialogues through text-to-image generation techniques. The likelihood of the two types of dialogues is then formulated by a response generator and an image reconstructor that are learned within a conditional variational auto-encoding framework. Empirical studies are conducted in both image-grounded conversation and text-based conversation. In the first scenario, image-grounded dialogues, especially under a low-resource setting, can be effectively augmented by textual dialogues with latent images; while in the second scenario, latent images can enrich the content of responses and at the same time keep them relevant to contexts.

The video of this talk cannot be embedded. You can watch it here:

https://slideslive.com/38949306

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at AAAI 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

16/11/2020

CapWAP: Image Captioning with a Purpose

Adam Fisch, Kenton Lee, Ming-Wei Chang and
Jonathan Clark, Regina Barzilay

Keywords Paper

image task, visual images, captioning, capwap

0

0

0

0

11:26

30/11/2020

Second Order enhanced Multi-glimpse Attention in Visual Question Answering

Qiang Sun, Binghui Xie, Yanwei Fu

Keywords Paper

0

0

0

0

7:20

04/07/2020

Cross-Modality Relevance for Reasoning on Language and Vision

Chen Zheng, Quan Guo, Parisa Kordjamshidi

Keywords Paper

Cross-Modality Relevance, Language Vision, visual answering, VQA

0

0

0

0

10:59

08/12/2020

Finding the Evidence: Localization-aware Answer Prediction for Text Visual Question Answering

Wei Han, Hantao Huang, Tao Han

Keywords Paper

0

0

0

0

9:44

05/01/2021

Coarse-to-Fine Gaze Redirection With Numerical and Pictorial Guidance

Jingjing Chen, Jichao Zhang, Enver Sangineto and
Tao Chen, Jiayuan Fan, Nicu Sebe

Keywords Paper

0

0

0

0

4:34

14/06/2020

On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering

Xinyu Wang, Yuliang Liu, Chunhua Shen and
Chun Chet Ng, Canjie Luo, Lianwen Jin, Chee Seng Chan, Anton van den Hengel, Liangwei Wang

Keywords Paper

visual question answering, scene text, ocr

0

0

0

0

1:01

22/11/2021

Modeling Explicit Concerning States for Reinforcement Learning in Visual Dialogue

Zipeng Xu, Fandong Meng, Xiaojie Wang and
Duo Zheng, Chenxu Lv, Jie Zhou

Keywords Paper

Visual Dialogue, Vision + Language, Reinforcement Learning, Visual Grounded Natural Language Generation

0

0

0

0

2:58

12/07/2020

Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning"

Saeed Amizadeh, Hamid Palangi, Oleksandr Polozov and
Yichen Huang, Kazuhito Koishida

Keywords Paper

Applications - Computer Vision

0

0

0

0

10:29

06/12/2021

End-to-end Multi-modal Video Temporal Grounding

Yi-Wen Chen, Yi-Hsuan Tsai, Ming-Hsuan Yang

Keywords Paper

self-supervised learning, transformers, vision, contrastive learning

0

0

0

0

8:46

02/02/2021

Mind-the-Gap! Unsupervised Domain Adaptation for Text-Video Retrieval

Qingchao Chen, Yang Liu, Samuel Albanie

Keywords Paper

0

0

0

0

15:19

16/11/2020

BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues

Hung Le, Doyen Sahoo, Nancy Chen, Steven C.H. Hoi

Keywords Paper

video-grounded dialogues, high-resolution queries, video setting, bi-directional learning

0

0

0

0

11:05

05/01/2021

Two-Level Adversarial Visual-Semantic Coupling for Generalized Zero-Shot Learning

Shivam Chandhok, Vineeth N Balasubramanian

Keywords Paper

0

0

0

0

4:59

22/11/2021

OODformer: Out-Of-Distribution Detection Transformer

Rajat Koner, Poulami Sinhamahapatra, Karsten Roscher and
Stephan Günnemann, Volker Tresp

Keywords Paper

Out-Of-Distribution Detection, Vision Transfomer, Repsentation Learning

0

0

0

0

3:19

19/08/2021

Context-Aware Image Inpainting with Learned Semantic Priors

Wendong Zhang, Junwei Zhu, Ying Tai and
Yunbo Wang, Wenqing Chu, Bingbing Ni, Chengjie Wang, Xiaokang Yang

Keywords Paper

Computer Vision, 2D and 3D Computer Vision, Deep Learning

0

0

0

0

13:26

08/12/2020

Interactive Key-Value Memory-augmented Attention for Image Paragraph Captioning

Chunpu Xu, Yu Li, Chengming Li and
Xiang Ao, Min Yang, Jinwen Tian

Keywords Paper

0

0

0

0

10:08

26/04/2020

Neural Outlier Rejection for Self-Supervised Keypoint Learning

Jiexiong Tang, Hanme Kim, Vitor Guizilini and
Sudeep Pillai, Rares Ambrus

Keywords Paper

Self-Supervised Learning, Keypoint Detection, Outlier Rejection, Deep Learning

0

0

0

0

4:55

19/08/2021

Stochastic Actor-Executor-Critic for Image-to-Image Translation

Ziwei Luo, Jing Hu, Xin Wang and
Siwei Lyu, Bin Kong, Youbing Yin, Qi Song, Xi Wu

Keywords Paper

Machine Learning, Deep Reinforcement Learning, Learning Generative Models, Applications of Reinforcement Learning

0

0

0

0

5:34

06/12/2020

Learning Semantic-aware Normalization for Generative Adversarial Networks

Heliang Zheng, Jianlong Fu, zengyh Zeng and
Jiebo Luo, Zheng-Jun Zha

Keywords Paper

0

0

0

0

3:11

14/06/2020

Gold Seeker: Information Gain From Policy Distributions for Goal-Oriented Vision-and-Langauge Reasoning

Ehsan Abbasnejad, Iman Abbasnejad, Qi Wu and
Javen Shi, Anton van den Hengel

Keywords Paper

information-seeking agent vision and language tasks vqa interactive agents reinforcement learning

0

0

0

0

0:59

06/12/2020

Evidential Sparsification of Multimodal Latent Spaces in Conditional Variational Autoencoders

Masha Itkina, Boris Ivanovic, Ransalu Senanayake and
Mykel J Kochenderfer, Marco Pavone

Keywords Paper

0

0

0

0

3:39

05/01/2021

Breaking Shortcuts by Masking for Robust Visual Reasoning

Keren Ye, Mingda Zhang, Adriana Kovashka

Keywords Paper

0

0

0

0

5:01

14/06/2020

Learning Fused Pixel and Feature-Based View Reconstructions for Light Fields

Jinglei Shi, Xiaoran Jiang, Christine Guillemot

Keywords Paper

light field, view synthesis, feature-based reconstruction, pixel-based reconstruction, deep learning, angular super-resolution

0

0

0

0

4:56

06/12/2021

Model Adaptation: Historical Contrastive Learning for Unsupervised Domain Adaptation without Source Data

Jiaxing Huang, Dayan Guan, Aoran Xiao, Shijian Lu

Keywords Paper

machine learning, domain adaptation, contrastive learning, privacy, transfer learning

0

0

0

0

4:37

14/06/2020

Counterfactual Vision and Language Learning

Ehsan Abbasnejad, Damien Teney, Amin Parvaneh and
Javen Shi, Anton van den Hengel

Keywords Paper

counterfactual reasoning vision and language tasks vqa

0

0

0

0

5:00

06/12/2021

Looking Beyond Single Images for Contrastive Semantic Segmentation Learning

FEIHU ZHANG, Philip Torr, Rene Ranftl, Stephan Richter

Keywords Paper

machine learning, vision, contrastive learning, representation learning

0

0

0

0

14:48

02/02/2021

Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers

Shijie Geng, Peng Gao, Moitreya Chatterjee and
Chiori Hori, Jonathan Le Roux, Yongfeng Zhang, Hongsheng Li, Anoop Cherian

Keywords Paper

0

0

0

0

19:36

14/06/2020

Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation

Hao Tang, Dan Xu, Yan Yan and
Philip H.S. Torr, Nicu Sebe

Keywords Paper

generative adversarial networks, local, global, semantic guided, scene generation, semantic image synthesis, cross-view image generation, class-specific feature representation, attention fusion

0

0

0

0

1:00

19/04/2021

Crisscrossed captions: Extended intramodal and intermodal semantic similarity judgments for MS-COCO

Zarana Parekh, Jason Baldridge, Daniel Cer and
Austin Waters, Yinfei Yang

Keywords Paper

0

0

0

0

10:19

19/08/2021

Dependent Multi-Task Learning with Causal Intervention for Image Captioning

Wenqing Chen, Jidong Tian, Caoyun Fan and
Hao He, Yaohui Jin

Keywords Paper

Machine Learning, Transfer, Adaptation, Multi-task Learning, Natural Language Generation, Language and Vision

0

0

0

0

12:02

06/12/2020

Diverse Image Captioning with Context-Object Split Latent Spaces

Shweta Mahajan, Stefan Roth

Keywords Paper

0

0

0

0

3:19

14/06/2020

Syntax-Aware Action Targeting for Video Captioning

Qi Zheng, Chaoyue Wang, Dacheng Tao

Keywords Paper

video and language, video captioning, action predicting

0

0

0

0

1:01

19/08/2021

Information Bottleneck Approach to Spatial Attention Learning

Qiuxia Lai, Yu Li, Ailing Zeng and
Minhao Liu, Hanqiu Sun, Qiang Xu

Keywords Paper

Computer Vision, 2D and 3D Computer Vision, Classification, Deep Learning

0

0

0

0

14:42

30/11/2020

Image Captioning through Image Transformer

Sen He, Wentong Liao, Hamed R. Tavakoli and
Michael Yang, Bodo Rosenhahn, Nicolas Pugeault

Keywords Paper

0

0

0

0

9:49

02/02/2021

Visualization of Supervised and Self-Supervised Neural Networks via Attribution Guided Factorization

Shir Gur, Ameen Ali, Lior Wolf

Keywords Paper

0

0

0

0

14:14

01/07/2020

Sky + Fire = Sunset. Exploring Parallels between Visually Grounded Metaphors and Image Classifiers

Yuri Bizzoni, Simon Dobnik

Keywords Paper

0

0

0

0

12:15

07/09/2020

Two-Stream Spatiotemporal Compositional Attention Network for VideoQA

Taiki Miyanishi, Takuya Maekawa, Motoaki Kawanabe

Keywords Paper

video question answering

0

0

0

0

2:02

05/01/2021

Regional Attention Networks With Context-Aware Fusion for Group Emotion Recognition

Ahmed Shehab Khan, Zhiyuan Li, Jie Cai, Yan Tong

Keywords Paper

0

0

0

0

5:00

19/04/2021

Modeling coreference relations in visual dialog

Mingxiao Li, Marie-Francine Moens

Keywords Paper

0

0

0

0

10:33

22/11/2021

Duplicate Latent Representation Suppression for Multi-object Variational Autoencoders

Li Nanbo, Robert B Fisher

Keywords Paper

object-centric representation learning, variational autoencoders, scene representation

0

0

0

0

2:58

14/06/2020

Deep Image Spatial Transformation for Person Image Generation

Yurui Ren, Xiaoming Yu, Junming Chen and
Thomas H. Li, Ge Li

Keywords Paper

pose transfer, image animation, spatial transformation, local attention, novel view synthesis, pose-guided person image generation

0

0

0

0

1:00