Improving Visual Quality of Image Synthesis by A Token-based Generator with Transformers

06/12/2021

Improving Visual Quality of Image Synthesis by A Token-based Generator with Transformers

Yanhong Zeng, Huan Yang, Hongyang Chao, Jianbo Wang, Jianlong Fu

Keywords: transformers, generative model

Abstract Paper Similar Papers

Abstract: We present a new perspective of achieving image synthesis by viewing this task as a visual token generation problem. Different from existing paradigms that directly synthesize a full image from a single input (e.g., a latent code), the new formulation enables a flexible local manipulation for different image regions, which makes it possible to learn content-aware and fine-grained style control for image synthesis. Specifically, it takes as input a sequence of latent tokens to predict the visual tokens for synthesizing an image. Under this perspective, we propose a token-based generator (i.e., TokenGAN). Particularly, the TokenGAN inputs two semantically different visual tokens, i.e., the learned constant content tokens and the style tokens from the latent space. Given a sequence of style tokens, the TokenGAN is able to control the image synthesis by assigning the styles to the content tokens by attention mechanism with a Transformer. We conduct extensive experiments and show that the proposed TokenGAN has achieved state-of-the-art results on several widely-used image synthesis benchmarks, including FFHQ and LSUN CHURCH with different resolutions. In particular, the generator is able to synthesize high-fidelity images with (1024x1024) size, dispensing with convolutions entirely.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

19/08/2021

Improving Text Generation with Dynamic Masking and Recovering

Zhidong Liu, Junhui Li, Muhua Zhu

Keywords Paper

Natural Language Processing, Machine Translation, Natural Language Generation

0

0

0

0

13:44

12/07/2020

Pseudo-Masked Language Models for Unified Language Model Pre-Training

Hangbo Bao, Li Dong, Furu Wei and
Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Jianfeng Gao, Songhao Piao, Ming Zhou, Hsiao-Wuen Hon

Keywords Paper

Applications - Language, Speech and Dialog

0

0

0

0

13:55

19/08/2021

Disentangled Face Attribute Editing via Instance-Aware Latent Space Search

Yuxuan Han, Jiaolong Yang, Ying Fu

Keywords Paper

Computer Vision, 2D and 3D Computer Vision, Explainable/Interpretable Machine Learning

0

0

0

0

12:51

06/12/2020

Learning Semantic-aware Normalization for Generative Adversarial Networks

Heliang Zheng, Jianlong Fu, zengyh Zeng and
Jiebo Luo, Zheng-Jun Zha

Keywords Paper

0

0

0

0

3:11

14/06/2020

Controllable Person Image Synthesis With Attribute-Decomposed GAN

Yifang Men, Yiming Mao, Yuning Jiang and
Wei-Ying Ma, Zhouhui Lian

Keywords Paper

image synthesis, pose transfer, generative adversarial networks, image editing, attribute separation, feature disentanglement, fashion ai

0

0

0

0

4:56

06/12/2021

Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition

Yulin Wang, Rui Huang, Shiji Song and
Zeyi Huang, Gao Huang

Keywords Paper

transformers

0

0

0

0

7:20

02/02/2021

Learning Intact Features by Erasing-Inpainting for Few-shot Classification

Junjie Li, Zilei Wang, Xiaoming Hu

Keywords Paper

0

0

0

0

15:15

22/11/2021

Feature Fusion Vision Transformer for Fine-Grained Visual Categorization

Jun Wang, Xiaohan Yu, Yongsheng Gao

Keywords Paper

Fine-grained visual categorization, Vision transformer, Self-attention, Feature Fusion

0

0

0

0

3:02

02/02/2021

Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network

Yehao Li, Yingwei Pan, Ting Yao and
Jingwen Chen, Tao Mei

Keywords Paper

0

0

0

0

15:34

19/08/2021

Progressive Open-Domain Response Generation with Multiple Controllable Attributes

Haiqin Yang, Xiaoyuan Yao, Yiqun Duan and
Jianping Shen, Jie Zhong, Kun Zhang

Keywords Paper

Machine Learning, Learning Generative Models, Dialogue

0

0

0

0

14:43

06/12/2021

Mixed Supervised Object Detection by Transferring Mask Prior and Semantic Similarity

Yan Liu, Zhijie Zhang, Li Niu and
Junjie Chen, Liqing Zhang

Keywords Paper

vision, transfer learning

0

0

0

0

9:11

30/11/2020

Image Captioning through Image Transformer

Sen He, Wentong Liao, Hamed R. Tavakoli and
Michael Yang, Bodo Rosenhahn, Nicolas Pugeault

Keywords Paper

0

0

0

0

9:49

14/06/2020

Semantically Multi-Modal Image Synthesis

Zhen Zhu, Zhiliang Xu, Ansheng You, Xiang Bai

Keywords Paper

label-to-image, semantically multi-modal image synthesis, smis, groupdnet, group convolution, cg-norm

0

0

0

0

1:01

18/07/2021

You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling

Zhanpeng Zeng, Yunyang Xiong, Sathya Ravi and
Shailesh Acharya, Glenn Fung, Vikas Singh

Keywords Paper

Applications, Natural Language Processing

0

0

0

0

5:16

06/12/2021

EditGAN: High-Precision Semantic Image Editing

Huan Ling, Karsten Kreis, Daiqing Li and
Seung Wook Kim, Antonio Torralba, Sanja Fidler

Keywords Paper

optimization, vision, generative model

0

0

0

0

11:28

04/07/2020

Learning to Faithfully Rationalize by Construction

Sarthak Jain, Sarah Wiegreffe, Yuval Pinter, Byron C. Wallace

Keywords Paper

NLP, neural classification, training, automatic evaluations

0

0

0

0

11:55

04/07/2020

Spying on Your Neighbors: Fine-grained Probing of Contextual Embeddings for Information about Surrounding Words

Josef Klafka, Allyson Ettinger

Keywords Paper

Fine-grained Embeddings, NLP tasks, probing tasks, encoding information

0

0

0

0

12:13

30/11/2020

Show, Conceive and Tell: Image Captioning with Prospective Linguistic Information

Yiqing Huang, Jiansheng Chen

Keywords Paper

0

0

0

0

7:08

06/12/2021

TokenLearner: Adaptive Space-Time Tokenization for Videos

Michael S Ryoo, AJ Piergiovanni, Anurag Arnab and
Mostafa Dehghani, Anelia Angelova

Keywords Paper

transformers, representation learning

0

0

0

0

10:26

16/11/2020

Token-level Adaptive Training for Neural Machine Translation

Shuhao Gu, Jinchao Zhang, Fandong Meng and
Yang Feng, Wanying Xie, Jie Zhou, Dong Yu

Keywords Paper

nmt, vanilla model, golden distribution, token phenomenon

0

0

0

0

11:31

06/12/2020

Language Through a Prism: A Spectral Approach for Multiscale Language Representations

Alex Tamkin, Dan Jurafsky, Noah Goodman

Keywords Paper

0

0

0

0

3:34

02/02/2021

Train a One-Million-Way Instance Classifier for Unsupervised Visual Representation Learning

Yu Liu, Lianghua Huang, Pan Pan and
Bin Wang, Yinghui Xu, Rong Jin

Keywords Paper

0

0

0

0

15:15

06/12/2021

BlendGAN: Implicitly GAN Blending for Arbitrary Stylized Face Generation

Mingcong Liu, Qiang Li, Zekui Qin and
Guoxin Zhang, Pengfei Wan, Wen Zheng

Keywords Paper

generative model

0

0

0

0

3:49

30/11/2020

OpenGAN: Open Set Generative Adversarial Networks

Luke Ditria, Benjamin J. Meyer, Tom Drummond

Keywords Paper

0

0

1

1

10:09

14/06/2020

Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition

Canjie Luo, Yuanzhi Zhu, Lianwen Jin, Yongpan Wang

Keywords Paper

data augmentation, text recognition, joint training

0

0

0

0

0:59

06/12/2021

ResT: An Efficient Transformer for Visual Recognition

Qinglong Zhang, Yu-Bin Yang

Keywords Paper

machine learning, transformers, vision

0

0

0

0

12:23

26/04/2020

Decoupling Representation and Classifier for Long-Tailed Recognition

Bingyi Kang, Saining Xie, Marcus Rohrbach and
Zhicheng Yan, Albert Gordo, Jiashi Feng, Yannis Kalantidis

Keywords Paper

long-tailed recognition, classification

0

0

0

1

5:00

14/06/2020

Interpreting the Latent Space of GANs for Semantic Face Editing

Yujun Shen, Jinjin Gu, Xiaoou Tang, Bolei Zhou

Keywords Paper

generative adversarial network, network interpretation, face editing

0

0

0

0

1:01

03/05/2021

Bowtie Networks: Generative Modeling for Joint Few-Shot Recognition and Novel-View Synthesis

Zhipeng Bao, Yu-Xiong Wang, Martial Hebert

Keywords Paper

adversarial training, computer vision, object recognition, few-shot learning, generative models

0

0

0

0

5:11

06/12/2021

MST: Masked Self-Supervised Transformer for Visual Representation

Zhaowen Li, Zhiyang Chen, Fan Yang and
Wei Li, Yousong Zhu, Chaoyang Zhao, Rui Deng, Liwei Wu, Rui Zhao, Ming Tang, Jinqiao Wang

Keywords Paper

self-supervised learning, transformers, vision, language

0

0

0

0

7:13

19/08/2021

Dependent Multi-Task Learning with Causal Intervention for Image Captioning

Wenqing Chen, Jidong Tian, Caoyun Fan and
Hao He, Yaohui Jin

Keywords Paper

Machine Learning, Transfer, Adaptation, Multi-task Learning, Natural Language Generation, Language and Vision

0

0

0

0

12:02

14/06/2020

Learning to Manipulate Individual Objects in an Image

Yanchao Yang, Yutong Chen, Stefano Soatto

Keywords Paper

representation learning, disentangled, spatial disentanglement, unsupervised, spatially localized, object-centric, scene manipulation, independent factors, controllable factors, multiple objects

0

0

0

0

1:01

18/07/2021

OmniNet: Omnidirectional Representations from Transformers

Yi Tay, Mostafa Dehghani, Vamsi Aribandi and
Jai Gupta, Philip Pham, Zhen Qin, Dara Bahri, Da-Cheng Juan, Don Metzler

Keywords Paper

Deep Learning, Predictive Models, Algorithms, Representation Learning; Neuroscience and Cognitive Science; Neuroscience and Cognitive Science, Problem Solvin, Deep Learning, Architectures

0

0

0

0

17:00

14/06/2020

MaskGAN: Towards Diverse and Interactive Facial Image Manipulation

Cheng-Han Lee, Ziwei Liu, Lingyun Wu, Ping Luo

Keywords Paper

facial image manipulation, face segmentation, image synthesis, generative adversarial network

0

0

0

0

1:00

14/06/2020

Symmetry and Group in Attribute-Object Compositions

Yong-Lu Li, Yue Xu, Xiaohan Mao, Cewu Lu

Keywords Paper

compositional zero-shot learning, attribute-object, symmetry, group

0

0

0

0

1:00

02/02/2021

High Fidelity GAN Inversion via Prior Multi-Subspace Feature Composition

Guanyue Li, Qianfen Jiao, Sheng Qian and
Si Wu, Hau-San Wong

Keywords Paper

0

0

0

0

16:11

02/02/2021

Object Relation Attention for Image Paragraph Captioning

Li-Chuan Yang, Chih-Yuan Yang, Jane Yung-jen Hsu

Keywords Paper

0

0

0

0

15:03

06/12/2021

Trash or Treasure? An Interactive Dual-Stream Strategy for Single Image Reflection Separation

Qiming Hu, Xiaojie Guo

Keywords Paper

deep learning

0

0

0

0

12:25

22/11/2021

Rich Semantics Improve Few-Shot Learning

Mohamed Afham Mohamed Aflal, Salman Khan, Muhammad Haris Khan and
Muzammal Naseer, Fahad Shahbaz Khan

Keywords Paper

few shot learning, multimodal learning, transformers in vision

0

0

0

0

2:47

16/11/2020

Effective Unsupervised Domain Adaptation with Adversarially Trained Language Models

Thuy-Trang Vu, Dinh Phung, Gholamreza Haffari

Keywords Paper

combinatorial problem, unsupervised tasks, named recognition, broad-coverage models

0

0

0

0

11:57