Compositional Transformers for Scene Generation

06/12/2021

Compositional Transformers for Scene Generation

Dor Arad Hudson, Larry Zitnick

Keywords: transformers, generative model, interpretability

Abstract Paper Similar Papers

Abstract: We introduce the GANformer2 model, an iterative object-oriented transformer, explored for the task of generative modeling. The network incorporates strong and explicit structural priors, to reflect the compositional nature of visual scenes, and synthesizes images through a sequential process. It operates in two stages: a fast and lightweight planning phase, where we draft a high-level scene layout, followed by an attention-based execution phase, where the layout is being refined, evolving into a rich and detailed picture. Our model moves away from conventional black-box GAN architectures that feature a flat and monolithic latent space towards a transparent design that encourages efficiency, controllability and interpretability. We demonstrate GANformer2's strengths and qualities through a careful evaluation over a range of datasets, from multi-object CLEVR scenes to the challenging COCO images, showing it successfully achieves state-of-the-art performance in terms of visual quality, diversity and consistency. Further experiments demonstrate the model's disentanglement and provide a deeper insight into its generative process, as it proceeds step-by-step from a rough initial sketch, to a detailed layout that accounts for objects' depths and dependencies, and up to the final high-resolution depiction of vibrant and intricate real-world scenes. See https://github.com/dorarad/gansformer for model implementation.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

18/07/2021

Generative Adversarial Transformers

Drew A. Hudson, Larry Zitnick

Keywords Paper

Deep Learning, Architectures

0

0

0

0

5:15

14/06/2020

Analyzing and Improving the Image Quality of StyleGAN

Tero Karras, Samuli Laine, Miika Aittala and
Janne Hellsten, Jaakko Lehtinen, Timo Aila

Keywords Paper

generative modeling, image synthesis, representation learning

0

0

0

0

1:01

14/06/2020

MaskGAN: Towards Diverse and Interactive Facial Image Manipulation

Cheng-Han Lee, Ziwei Liu, Lingyun Wu, Ping Luo

Keywords Paper

facial image manipulation, face segmentation, image synthesis, generative adversarial network

0

0

0

0

1:00

30/11/2020

HDD-Net: Hybrid Detector Descriptor with Mutual Interactive Learning

Axel Barroso-Laguna, Yannick Verdie, Benjamin Busam, Krystian Mikolajczyk

Keywords Paper

0

0

0

0

10:03

18/07/2021

NeRF-VAE: A Geometry Aware 3D Scene Generative Model

Adam Kosiorek, Heiko Strathmann, Daniel Zoran and
Pol Moreno, Rosalia Schneider, Sona Mokra, Danilo J. Rezende

Keywords Paper

Deep Learning, Generative Models

0

0

0

0

17:23

02/02/2021

Object-Centric Image Generation from Layouts

Tristan Sylvain, Pengchuan Zhang, Yoshua Bengio and
R Devon Hjelm, Shikhar Sharma

Keywords Paper

0

0

0

0

17:44

06/12/2020

Self-Learning Transformations for Improving Gaze and Head Redirection

Yufeng Zheng, Seonwook Park, Xucong Zhang and
Shalini De Mello, Otmar Hilliges

Keywords Paper

0

0

0

0

3:20

18/07/2021

Sharf: Shape-conditioned Radiance Fields from a Single View

Konstantinos Rematas, Ricardo Martin-Brualla, Vittorio Ferrari

Keywords Paper

Applications, Computer Vision

0

0

0

0

5:11

14/06/2020

Gated Channel Transformation for Visual Recognition

Zongxin Yang, Linchao Zhu, Yu Wu, Yi Yang

Keywords Paper

visual recognition, normalization methods, attention mechanisms

0

0

0

0

1:01

02/02/2021

Generalized Adversarially Learned Inference

Yatin Dandi, Homanga Bharadhwaj, Abhishek Kumar, Piyush Rai

Keywords Paper

0

0

0

0

16:22

14/06/2020

A U-Net Based Discriminator for Generative Adversarial Networks

Edgar Schönfeld, Bernt Schiele, Anna Khoreva

Keywords Paper

gan, image synthesis, u-net, discriminator, consistency regularization, equivariance, generative adversarial networks, ffhq, biggan

0

0

0

0

1:01

06/12/2021

Sparse Steerable Convolutions: An Efficient Learning of SE(3)-Equivariant Features for Estimation and Tracking of Object Poses in 3D Space

Jiehong Lin, Hongyang Li, Ke Chen and
Jiangbo Lu, Kui Jia

Keywords Paper

vision

0

0

0

0

12:29

06/12/2020

GANSpace: Discovering Interpretable GAN Controls

Erik Härkönen, Aaron Hertzmann, Jaakko Lehtinen, Sylvain Paris

Keywords Paper

Algorithms -> Meta-Learning; Algorithms -> Unsupervised Learning; Applications -> Computational Social Science; Applications ->, Applications -> Time Series Analysis

0

0

0

0

3:22

22/11/2021

GaussiGAN: Controllable Image Synthesis with 3D Gaussians from Unposed Silhouettes

Youssef Alami Mejjati, Isa Milefchik, Aaron K Gokaslan and
Oliver Wang, Kwang In Kim, James Tompkin

Keywords Paper

structured representation, 3D representation, 3D Gaussians, image generation, image synthesis, image editing, controlled generation, GANs

0

0

0

0

2:49

26/04/2020

Neural Outlier Rejection for Self-Supervised Keypoint Learning

Jiexiong Tang, Hanme Kim, Vitor Guizilini and
Sudeep Pillai, Rares Ambrus

Keywords Paper

Self-Supervised Learning, Keypoint Detection, Outlier Rejection, Deep Learning

0

0

0

0

4:55

14/06/2020

Controllable Person Image Synthesis With Attribute-Decomposed GAN

Yifang Men, Yiming Mao, Yuning Jiang and
Wei-Ying Ma, Zhouhui Lian

Keywords Paper

image synthesis, pose transfer, generative adversarial networks, image editing, attribute separation, feature disentanglement, fashion ai

0

0

0

0

4:56

14/06/2020

Neural Voxel Renderer: Learning an Accurate and Controllable Rendering Tool

Konstantinos Rematas, Vittorio Ferrari

Keywords Paper

neural rendering, image synthesis

0

0

0

0

1:00

03/05/2021

Using latent space regression to analyze and leverage compositionality in GANs

Lucy Chai, Jonas Wulff, Phillip Isola

Keywords Paper

Image Editing, Generative Adversarial Networks, Composition, Image Synthesis, Interpretability

0

0

0

0

5:09

14/06/2020

Category-Level Articulated Object Pose Estimation

Xiaolong Li, He Wang, Li Yi and
Leonidas J. Guibas, A. Lynn Abbott, Shuran Song

Keywords Paper

category level pose estimation, articulated object, 3d vision, point cloud, object part, object joint, segmentation, kinematic constraints

0

0

0

0

5:00

06/12/2021

Aligning Pretraining for Detection via Object-Level Contrastive Learning

Fangyun Wei, Yue Gao, Zhirong Wu and
Han Hu, Stephen Lin

Keywords Paper

vision, contrastive learning, representation learning, transfer learning

0

0

0

0

10:23

14/06/2020

On Joint Estimation of Pose, Geometry and svBRDF From a Handheld Scanner

Carolin Schmitt, Simon Donné, Gernot Riegler and
Vladlen Koltun, Andreas Geiger

Keywords Paper

3d reconstruction, mobile lightstage, mulitview photometric stereo, svbrdf estimation, shape from shading, material segmentation, handheld 3d sensor, non-lambertian surfaces

0

0

0

0

1:01

14/06/2020

Learning to Manipulate Individual Objects in an Image

Yanchao Yang, Yutong Chen, Stefano Soatto

Keywords Paper

representation learning, disentangled, spatial disentanglement, unsupervised, spatially localized, object-centric, scene manipulation, independent factors, controllable factors, multiple objects

0

0

0

0

1:01

14/06/2020

Self-Supervised Scene De-Occlusion

Xiaohang Zhan, Xingang Pan, Bo Dai and
Ziwei Liu, Dahua Lin, Chen Change Loy

Keywords Paper

de-occlusion, self-supervised, occlusion ordering, scene understanding, amodal completion, inpainting, amodal instance segmentation, decomposition, image editing, manipulation

0

0

0

0

4:59

14/06/2020

Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation

Hao Tang, Dan Xu, Yan Yan and
Philip H.S. Torr, Nicu Sebe

Keywords Paper

generative adversarial networks, local, global, semantic guided, scene generation, semantic image synthesis, cross-view image generation, class-specific feature representation, attention fusion

0

0

0

0

1:00

06/12/2020

RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder

Cheng Chi, Fangyun Wei, Han Hu

Keywords Paper

0

0

0

0

2:20

14/06/2020

Density-Based Clustering for 3D Object Detection in Point Clouds

Syeda Mariam Ahmed, Chee Meng Chew

Keywords Paper

3d object detection, edge-aware pointnet, instance segmentation, unsupervised clustering, cascaded modules, semantic segmentation, amodal bounding box detection

0

0

0

0

0:51

06/12/2021

Instance-Conditioned GAN

Arantxa Casanova, Marlene Careil, Jakob Verbeek and
Michal Drozdzal, Adriana Romero Soriano

Keywords Paper

generative model

0

0

0

0

15:23

26/04/2020

Counterfactuals uncover the modular structure of deep generative models

Michel Besserve, Arash Mehrjou, Rémy Sun, Bernhard Schölkopf

Keywords Paper

generative models, causality, counterfactuals, representation learning, disentanglement, generalization, unsupervised learning

0

0

0

0

5:42

02/02/2021

Visualization of Supervised and Self-Supervised Neural Networks via Attribution Guided Factorization

Shir Gur, Ameen Ali, Lior Wolf

Keywords Paper

0

0

0

0

14:14

02/02/2021

Learning Visual Context for Group Activity Recognition

Hangjie Yuan, Dong Ni

Keywords Paper

0

0

0

0

16:54

14/06/2020

SketchyCOCO: Image Generation From Freehand Scene Sketches

Chengying Gao, Qi Liu, Qi Xu and
Limin Wang, Jianzhuang Liu, Changqing Zou

Keywords Paper

image generation, freehand scene sketches, composite scene-level dataset, sequential stages, cross-domain latent space, sketchycoco, edgegan

0

0

0

0

5:00

14/06/2020

Hierarchical Scene Coordinate Classification and Regression for Visual Localization

Xiaotian Li, Shuzhe Wang, Yi Zhao and
Jakob Verbeek, Juho Kannala

Keywords Paper

visual localization, camera relocalization, scene coordinate regression

0

0

0

0

1:01

19/08/2021

EmbedMask: Embedding Coupling for Instance Segmentation

Hui Ying, Zhaojin Huang, Shu Liu and
Tianjia Shao, Kun Zhou

Keywords Paper

Computer Vision, Recognition

0

0

0

0

10:08

14/06/2020

FeatureFlow: Robust Video Interpolation via Structure-to-Texture Generation

Shurui Gui, Chaoyue Wang, Qihua Chen, Dacheng Tao

Keywords Paper

frame interpolation, slow motion, video processing, generation framework, deep learning, computer vision

0

0

0

0

1:00

14/06/2020

Semantic Pyramid for Image Generation

Assaf Shocher, Yossi Gandelsman, Inbar Mosseri and
Michal Yarom, Michal Irani, William T. Freeman, Tali Dekel

Keywords Paper

gan, manipulation, generative, semantic, features, inversion, pyramid, composition

0

0

0

0

4:56

22/11/2021

OODformer: Out-Of-Distribution Detection Transformer

Rajat Koner, Poulami Sinhamahapatra, Karsten Roscher and
Stephan Günnemann, Volker Tresp

Keywords Paper

Out-Of-Distribution Detection, Vision Transfomer, Repsentation Learning

0

0

0

0

3:19

22/11/2021

MAGECally invert images for realistic editing

Asya Grechka, jean Francois Goudou, Matthieu Cord

Keywords Paper

gan inversion, gan, stylegan2, gan editing, image editing, gan projection, stylegan, semantic editing, latent space manipulation, latent editing

0

0

0

0

3:01

14/06/2020

Object-Occluded Human Shape and Pose Estimation From a Single Color Image

Tianshu Zhang, Buzhen Huang, Yangang Wang

Keywords Paper

human shape and pose estimation, occlusion, 3d human dataset, representation for 3d human

0

0

0

0

4:54

06/12/2020

Self-Supervised MultiModal Versatile Networks

Jean-Baptiste Alayrac, Adria Recasens, Rosalia Schneider and
Relja Arandjelović, Jason Ramapuram, Jeffrey De Fauw, Lucas Smaira, Sander Dieleman, Andrew Zisserman

Keywords Paper

1

0

0

0

3:25

14/06/2020

DualSDF: Semantic Shape Manipulation Using a Two-Level Representation

Zekun Hao, Hadar Averbuch-Elor, Noah Snavely, Serge Belongie

Keywords Paper

3d representation, semantic shape manipulation, signed distance field, primitive-based shape representation, generative modeling, deep learning

0

0

0

0

1:00