ManiGAN: Text&#x2212;Guided Image Manipulation

Li, Bowen; Lukasiewicz, Thomas

doi:doi:https://doi.org/10.1109/CVPR42600.2020.00790

ManiGAN: Text−Guided Image Manipulation

Bowen Li‚ Xiaojuan Qi‚ Thomas Lukasiewicz and Philip H. S. Torr

Abstract

The goal of our paper is to semantically edit parts of an image matching a given text that describes desired attributes (e.g., texture, colour, and background), while preserving other contents that are irrelevant to the text. To achieve this, we propose a novel generative adversarial network (ManiGAN), which contains two key components: text-image affine combination module (ACM) and detail correction module (DCM). The ACM selects image regions relevant to the given text and then correlates the regions with corresponding semantic words for effective manipulation. Meanwhile, it encodes original image features to help reconstruct text-irrelevant contents. The DCM rectifies mismatched attributes and completes missing contents of the synthetic image. Finally, we suggest a new metric for evaluating image manipulation results, in terms of both the generation of new attributes and the reconstruction of text-irrelevant contents. Extensive experiments on the CUB and COCO datasets demonstrate the superior performance of the proposed method.

Book Title

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition‚ CVPR 2020‚ Seattle‚ Washington‚ USA‚ June 14−19‚ 2020

Editor

Ce Liu and Greg Mori and Kate Saenko and Silvio Savarese

Month

June

Pages

7880–7889

Publisher

CVF/IEEE

Year

2020

ManiGAN: Text−Guided Image Manipulation

Abstract

Links

See Also