Abstract:
Unsupervised image-to-image translation aims to learn the mapping between two visual domains with unpaired samples. The existing works usually focus on disentangling the domain-invariant content code and domain-specific style code individually for multi-modal purposes. However, interpreting and manipulating the translated image has not been well explored. In this paper, we propose to separate the content code and style code simultaneously in a unified framework. Based on the correlation between the latent features and the high-level domain-invariant tasks, the proposed framework shows good properties like multi-modal translation, good interpretability, and ease of manipulation. The experimental results also demonstrate that the proposed approach outperforms the existing unsupervised image translation methods in terms of visual quality and diversity.