Mask-Based Style-Controlled Image Synthesis Using a Mask Style Encoder

Jaehyeong Cho, Wataru Shimoda, Keiji Yanai

Responsive image

Auto-TLDR; Style-controlled Image Synthesis from Semantic Segmentation masks using GANs

Slides Poster

In recent years, the advances in Generative Adversarial Networks (GANs) have shown impressive results for image generation and translation tasks. In particular, the image-to-image translation is a method of learning mapping from a source domain to a target domain and synthesizing an image. Image-to-image translation can be applied to a variety of tasks, making it possible to quickly and easily synthesize realistic images from semantic segmentation masks. However, in the existing image-to-image translation method, there is a limitation on controlling the style of the translated image, and it is not easy to synthesize an image by controlling the style of each mask element in detail. Therefore, we propose an image synthesis method that controls the style of each element by improving the existing image-to-image translation method. In the proposed method, we implement a style encoder that extracts style features for each mask element. The extracted style features are concatenated to the semantic mask in the normalization layer, and used the style-controlled image synthesis of each mask element. In experiments, we train style-controlled images synthesis using the datasets consisting of semantic segmentation masks and real images. The results show that the proposed method has excellent performance for style-controlled images synthesis for each element.

Similar papers

Local Facial Attribute Transfer through Inpainting

Ricard Durall, Franz-Josef Pfreundt, Janis Keuper

Responsive image

Auto-TLDR; Attribute Transfer Inpainting Generative Adversarial Network

Slides Poster Similar

The term attribute transfer refers to the tasks of altering images in such a way, that the semantic interpretation of a given input image is shifted towards an intended direction, which is quantified by semantic attributes. Prominent example applications are photo realistic changes of facial features and expressions, like changing the hair color, adding a smile, enlarging the nose or altering the entire context of a scene, like transforming a summer landscape into a winter panorama. Recent advances in attribute transfer are mostly based on generative deep neural networks, using various techniques to manipulate images in the latent space of the generator. In this paper, we present a novel method for the common sub-task of local attribute transfers, where only parts of a face have to be altered in order to achieve semantic changes (e.g. removing a mustache). In contrast to previous methods, where such local changes have been implemented by generating new (global) images, we propose to formulate local attribute transfers as an inpainting problem. Removing and regenerating only parts of images, our Attribute Transfer Inpainting Generative Adversarial Network (ATI-GAN) is able to utilize local context information to focus on the attributes while keeping the background unmodified resulting in visually sound results.

GarmentGAN: Photo-Realistic Adversarial Fashion Transfer

Amir Hossein Raffiee, Michael Sollami

Responsive image

Auto-TLDR; GarmentGAN: A Generative Adversarial Network for Image-Based Garment Transfer

Slides Poster Similar

The garment transfer problem comprises two tasks: learning to separate a person's body (pose, shape, color) from their clothing (garment type, shape, style) and then generating new images of the wearer dressed in arbitrary garments. We present GarmentGAN, a new algorithm that performs image-based garment transfer through generative adversarial methods. The GarmentGAN framework allows users to virtually try-on items before purchase and generalizes to various apparel types. GarmentGAN requires as input only two images, namely, a picture of the target fashion item and an image containing the customer. The output is a synthetic image wherein the customer is wearing the target apparel. In order to make the generated image look photo-realistic, we employ the use of novel generative adversarial techniques. GarmentGAN improves on existing methods in the realism of generated imagery and solves various problems related to self-occlusions. Our proposed model incorporates additional information during training, utilizing both segmentation maps and body key-point information. We show qualitative and quantitative comparisons to several other networks to demonstrate the effectiveness of this technique.

Semantic-Guided Inpainting Network for Complex Urban Scenes Manipulation

Pierfrancesco Ardino, Yahui Liu, Elisa Ricci, Bruno Lepri, Marco De Nadai

Responsive image

Auto-TLDR; Semantic-Guided Inpainting of Complex Urban Scene Using Semantic Segmentation and Generation

Slides Poster Similar

Manipulating images of complex scenes to reconstruct, insert and/or remove specific object instances is a challenging task. Complex scenes contain multiple semantics and objects, which are frequently cluttered or ambiguous, thus hampering the performance of inpainting models. Conventional techniques often rely on structural information such as object contours in multi-stage approaches that generate unreliable results and boundaries. In this work, we propose a novel deep learning model to alter a complex urban scene by removing a user-specified portion of the image and coherently inserting a new object (e.g. car or pedestrian) in that scene. Inspired by recent works on image inpainting, our proposed method leverages the semantic segmentation to model the content and structure of the image, and learn the best shape and location of the object to insert. To generate reliable results, we design a new decoder block that combines the semantic segmentation and generation task to guide better the generation of new objects and scenes, which have to be semantically consistent with the image. Our experiments, conducted on two large-scale datasets of urban scenes (Cityscapes and Indian Driving), show that our proposed approach successfully address the problem of semantically-guided inpainting of complex urban scene.

Detail Fusion GAN: High-Quality Translation for Unpaired Images with GAN-Based Data Augmentation

Ling Li, Yaochen Li, Chuan Wu, Hang Dong, Peilin Jiang, Fei Wang

Responsive image

Auto-TLDR; Data Augmentation with GAN-based Generative Adversarial Network

Slides Poster Similar

Image-to-image translation, a task to learn the mapping relation between two different domains, is a rapid-growing research field in deep learning. Although existing Generative Adversarial Network(GAN)-based methods have achieved decent results in this field, there are still some limitations in generating high-quality images for practical applications (e.g., data augmentation and image inpainting). In this work, we aim to propose a GAN-based network for data augmentation which can generate translated images with more details and less artifacts. The proposed Detail Fusion Generative Adversarial Network(DFGAN) consists of a detail branch, a transfer branch, a filter module, and a reconstruction module. The detail branch is trained by a super-resolution loss and its intermediate features can be used to introduce more details to the transfer branch by the filter module. Extensive evaluations demonstrate that our model generates more satisfactory images against the state-of-the-art approaches for data augmentation.

Multi-Domain Image-To-Image Translation with Adaptive Inference Graph

The Phuc Nguyen, Stéphane Lathuiliere, Elisa Ricci

Responsive image

Auto-TLDR; Adaptive Graph Structure for Multi-Domain Image-to-Image Translation

Slides Poster Similar

In this work, we address the problem of multi-domain image-to-image translation with particular attention paid to computational cost. In particular, current state of the art models require a large and deep model in order to handle the visual diversity of multiple domains. In a context of limited computational resources, increasing the network size may not be possible. Therefore, we propose to increase the network capacity by using an adaptive graph structure. At inference time, the network estimates its own graph by selecting specific sub-networks. Sub-network selection is implemented using Gumble-Softmax in order to allow end-to-end training. This approach leads to an adjustable increase in number of parameters while preserving an almost constant computational cost. Our evaluation on two publicly available datasets of facial and painting images shows that our adaptive strategy generates better images with fewer artifacts than literature methods.

Identity-Preserved Face Beauty Transformation with Conditional Generative Adversarial Networks

Zhitong Huang, Ching Y Suen

Responsive image

Auto-TLDR; Identity-preserved face beauty transformation using conditional GANs

Slides Poster Similar

Identity-preserved face beauty transformation aims to change the beauty scale of a face image while preserving the identity of the original face. In our framework of conditional Generative Adversarial Networks (cGANs), the synthesized face produced by the generator would have the same beauty scale indicated by the input condition. Unlike the discrete class labels used in most cGANs, the condition of target beauty scale in our framework is given by a continuous real-valued beauty score in the range [1 to 5], which makes the work challenging. To tackle the problem, we have implemented a triple structure, in which the conditional discriminator is divided into a normal discriminator and a separate face beauty predictor. We have also developed another new structure called Conditioned Instance Normalization to replace the original concatenation used in cGANs, which makes the combination of the input image and condition more effective. Furthermore, Self-Consistency Loss is introduced as a new parameter to improve the stability of training and quality of the generated image. In the end, the objectives of beauty transformation and identity preservation are evaluated by the pretrained face beauty predictor and state-of-the-art face recognition network. The result is encouraging and it also shows that certain facial features could be synthesized by the generator according to the target beauty scale, while preserving the original identity.

Unsupervised Face Manipulation Via Hallucination

Keerthy Kusumam, Enrique Sanchez, Georgios Tzimiropoulos

Responsive image

Auto-TLDR; Unpaired Face Image Manipulation using Autoencoders

Slides Poster Similar

This paper addresses the problem of manipulatinga face image in terms of changing its pose. To achieve this, wepropose a new method that can be trained under the very general“unpaired” setting. To this end, we firstly propose to modelthe general appearance, layout and background of the inputimage using a low-resolution version of it which is progressivelypassed through a hallucination network to generate featuresat higher resolutions. We show that such a formulation issignificantly simpler than previous approaches for appearancemodelling based on autoencoders. Secondly, we propose a fullylearnable and spatially-aware appearance transfer module whichcan cope with misalignment between the input source image andthe target pose and can effectively combine the features fromthe hallucination network with the features produced by ourgenerator. Thirdly, we introduce an identity preserving methodthat is trained in an unsupervised way, by using an auxiliaryfeature extractor and a contrastive loss between the real andgenerated images. We compare our method against the state-of-the-art reporting significant improvements both quantitatively, interms of FID and IS, and qualitatively.

Disentangle, Assemble, and Synthesize: Unsupervised Learning to Disentangle Appearance and Location

Hiroaki Aizawa, Hirokatsu Kataoka, Yutaka Satoh, Kunihito Kato

Responsive image

Auto-TLDR; Generative Adversarial Networks with Structural Constraint for controllability of latent space

Slides Poster Similar

The next step for the generative adversarial networks~(GAN) is to learn representations that allow us to control only a certain factor in the image explicitly. Since such a representation of the factor is independent of other factors, the controllability obtained from these representations leads to interpretability by identifying the variation of the synthesized image and the transferability for downstream tasks by inference. However, since it is difficult to identify and strictly define latent factors, the annotation is laborious. Moreover, learning such representations by a GAN is challenging due to the complex generation process. Therefore, we resolve this limitation using a novel generative model that can disentangle latent space into the appearance, the x-axis, and the y-axis of the object, and reassemble these components in an unsupervised manner. Specifically, based on the concept of packing the appearance and location in each position of the feature map, we introduce a novel structural constraint technique that prevents these representations from interacting with each other. The proposed structural constraint promotes the disentanglement of these factors. In experiments, we found that the proposed method is simple but effective for controllability and allows us to control the appearance and location via latent space without supervision, as compared with the conditional GAN.

Attributes Aware Face Generation with Generative Adversarial Networks

Zheng Yuan, Jie Zhang, Shiguang Shan, Xilin Chen

Responsive image

Auto-TLDR; AFGAN: A Generative Adversarial Network for Attributes Aware Face Image Generation

Slides Poster Similar

Recent studies have shown remarkable success in face image generations. However, most of the existing methods only generate face images from random noise, and cannot generate face images according to the specific attributes. In this paper, we focus on the problem of face synthesis from attributes, which aims at generating faces with specific characteristics corresponding to the given attributes. To this end, we propose a novel attributes aware face image generator method with generative adversarial networks called AFGAN. Specifically, we firstly propose a two-path embedding layer and self-attention mechanism to convert binary attribute vector to rich attribute features. Then three stacked generators generate 64 * 64, 128 * 128 and 256 * 256 resolution face images respectively by taking the attribute features as input. In addition, an image-attribute matching loss is proposed to enhance the correlation between the generated images and input attributes. Extensive experiments on CelebA demonstrate the superiority of our AFGAN in terms of both qualitative and quantitative evaluations.

Image Inpainting with Contrastive Relation Network

Xiaoqiang Zhou, Junjie Li, Zilei Wang, Ran He, Tieniu Tan

Responsive image

Auto-TLDR; Two-Stage Inpainting with Graph-based Relation Network

Slides Similar

Image inpainting faces the challenging issue of the requirements on structure reasonableness and texture coherence. In this paper, we propose a two-stage inpainting framework to address this issue. The basic idea is to address the two requirements in two separate stages. Completed segmentation of the corrupted image is firstly predicted through segmentation reconstruction network, while fine-grained image details are restored in the second stage through an image generator. The two stages are connected in series as the image details are generated under the guidance of completed segmentation map that predicted in the first stage. Specifically, in the second stage, we propose a novel graph-based relation network to model the relationship existed in corrupted image. In relation network, both intra-relationship for pixels in the same semantic region and inter-relationship between different semantic parts are considered, improving the consistency and compatibility of image textures. Besides, contrastive loss is designed to facilitate the relation network training. Such a framework not only simplifies the inpainting problem directly, but also exploits the relationship in corrupted image explicitly. Extensive experiments on various public datasets quantitatively and qualitatively demonstrate the superiority of our approach compared with the state-of-the-art.

Free-Form Image Inpainting Via Contrastive Attention Network

Xin Ma, Xiaoqiang Zhou, Huaibo Huang, Zhenhua Chai, Xiaolin Wei, Ran He

Responsive image

Auto-TLDR; Self-supervised Siamese inference for image inpainting

Slides Similar

Most deep learning based image inpainting approaches adopt autoencoder or its variants to fill missing regions in images. Encoders are usually utilized to learn powerful representational spaces, which are important for dealing with sophisticated learning tasks. Specifically, in the image inpainting task, masks with any shapes can appear anywhere in images (i.e., free-form masks) forming complex patterns. It is difficult for encoders to capture such powerful representations under this complex situation. To tackle this problem, we propose a self-supervised Siamese inference network to improve the robustness and generalization. Moreover, the restored image usually can not be harmoniously integrated into the exiting content, especially in the boundary area. To address this problem, we propose a novel Dual Attention Fusion module (DAF), which can combine both the restored and known regions in a smoother way and be inserted into decoder layers in a plug-and-play way. DAF is developed to not only adaptively rescale channel-wise features by taking interdependencies between channels into account but also force deep convolutional neural networks (CNNs) focusing more on unknown regions. In this way, the unknown region will be naturally filled from the outside to the inside. Qualitative and quantitative experiments on multiple datasets, including facial and natural datasets (i.e., Celeb-HQ, Pairs Street View, Places2 and ImageNet), demonstrate that our proposed method outperforms against state-of-the-arts in generating high-quality inpainting results.

High Resolution Face Age Editing

Xu Yao, Gilles Puy, Alasdair Newson, Yann Gousseau, Pierre Hellier

Responsive image

Auto-TLDR; An Encoder-Decoder Architecture for Face Age editing on High Resolution Images

Slides Poster Similar

Face age editing has become a crucial task in film post-production, and is also becoming popular for general purpose photography. Recently, adversarial training has produced some of the most visually impressive results for image manipulation, including the face aging/de-aging task. In spite of considerable progress, current methods often present visual artifacts and can only deal with low-resolution images. In order to achieve aging/de-aging with the high quality and robustness necessary for wider use, these problems need to be addressed. This is the goal of the present work. We present an encoder-decoder architecture for face age editing. The core idea of our network is to encode a face image to age-invariant features, and learn a modulation vector corresponding to a target age. We then combine these two elements to produce a realistic image of the person with the desired target age. Our architecture is greatly simplified with respect to other approaches, and allows for fine-grained age editing on high resolution images in a single unified model. Source codes are available at https://github.com/InterDigitalInc/HRFAE.

Attention2AngioGAN: Synthesizing Fluorescein Angiography from Retinal Fundus Images Using Generative Adversarial Networks

Sharif Amit Kamran, Khondker Fariha Hossain, Alireza Tavakkoli, Stewart Lee Zuckerbrod

Responsive image

Auto-TLDR; Fluorescein Angiography from Fundus Images using Attention-based Generative Networks

Slides Poster Similar

Fluorescein Angiography (FA) is a technique that employs the designated camera for Fundus photography incorporating excitation and barrier filters. FA also requires fluorescein dye that is injected intravenously, which might cause adverse effects ranging from nausea, vomiting to even fatal anaphylaxis. Currently, no other fast and non-invasive technique exists that can generate FA without coupling with Fundus photography. To eradicate the need for an invasive FA extraction procedure, we introduce an Attention-based Generative network that can synthesize Fluorescein Angiography from Fundus images. The proposed gan incorporates multiple attention based skip connections in generators and comprises novel residual blocks for both generators and discriminators. It utilizes reconstruction, feature-matching, and perceptual loss along with adversarial training to produces realistic Angiograms that is hard for experts to distinguish from real ones. Our experiments confirm that the proposed architecture surpasses recent state-of-the-art generative networks for fundus-to-angio translation task.

S2I-Bird: Sound-To-Image Generation of Bird Species Using Generative Adversarial Networks

Joo Yong Shim, Joongheon Kim, Jong-Kook Kim

Responsive image

Auto-TLDR; Generating bird images from sound using conditional generative adversarial networks

Slides Poster Similar

Generating images from sound is a challenging task. This paper proposes a novel deep learning model that generates bird images from their corresponding sound information. Our proposed model includes a sound encoder in order to extract suitable feature representations from audio recordings, and then it generates bird images that corresponds to its calls using conditional generative adversarial networks (GANs) with auxiliary classifiers. We demonstrate that our model produces better image generation results which outperforms other state-of-the-art methods in a similar context.

Continuous Learning of Face Attribute Synthesis

Ning Xin, Shaohui Xu, Fangzhe Nan, Xiaoli Dong, Weijun Li, Yuanzhou Yao

Responsive image

Auto-TLDR; Continuous Learning for Face Attribute Synthesis

Slides Poster Similar

The generative adversarial network (GAN) exhibits great superiority in the face attribute synthesis task. However, existing methods have very limited effects on the expansion of new attributes. To overcome the limitations of a single network in new attribute synthesis, a continuous learning method for face attribute synthesis is proposed in this work. First, the feature vector of the input image is extracted and attribute direction regression is performed in the feature space to obtain the axes of different attributes. The feature vector is then linearly guided along the axis so that images with target attributes can be synthesized by the decoder. Finally, to make the network capable of continuous learning, the orthogonal direction modification module is used to extend the newly-added attributes. Experimental results show that the proposed method can endow a single network with the ability to learn attributes continuously, and, as compared to those produced by the current state-of-the-art methods, the synthetic attributes have higher accuracy.

Augmented Cyclic Consistency Regularization for Unpaired Image-To-Image Translation

Takehiko Ohkawa, Naoto Inoue, Hirokatsu Kataoka, Nakamasa Inoue

Responsive image

Auto-TLDR; Augmented Cyclic Consistency Regularization for Unpaired Image-to-Image Translation

Slides Poster Similar

Unpaired image-to-image (I2I) translation has received considerable attention in pattern recognition and computer vision because of recent advancements in generative adversarial networks (GANs). However, due to the lack of explicit supervision, unpaired I2I models often fail to generate realistic images, especially in challenging datasets with different backgrounds and poses. Hence, stabilization is indispensable for real-world applications and GANs. Herein, we propose Augmented Cyclic Consistency Regularization (ACCR), a novel regularization method for unpaired I2I translation. Our main idea is to enforce consistency regularization originating from semi-supervised learning on the discriminators leveraging real, fake, reconstructed, and augmented samples. We regularize the discriminators to output similar predictions when fed pairs of original and perturbed images. We qualitatively clarify the generation property between unpaired I2I models and standard GANs, and explain why consistency regularization on fake and reconstructed samples works well. Quantitatively, our method outperforms the consistency regularized GAN (CR-GAN) in real-world translations and demonstrates efficacy against several data augmentation variants and cycle-consistent constraints.

Cascade Attention Guided Residue Learning GAN for Cross-Modal Translation

Bin Duan, Wei Wang, Hao Tang, Hugo Latapie, Yan Yan

Responsive image

Auto-TLDR; Cascade Attention-Guided Residue GAN for Cross-modal Audio-Visual Learning

Slides Poster Similar

Since we were babies, we intuitively develop the ability to correlate the input from different cognitive sensors such as vision, audio, and text. However, in machine learning, this cross-modal learning is a nontrivial task because different modalities have no homogeneous properties. Previous works discover that there should be bridges among different modalities. From neurology and psychology perspective, humans have the capacity to link one modality with another one, e.g., associating a picture of a bird with the only hearing of its singing and vice versa. Is it possible for machine learning algorithms to recover the scene given the audio signal? In this paper, we propose a novel Cascade Attention-Guided Residue GAN (CAR-GAN), aiming at reconstructing the scenes given the corresponding audio signals. Particularly, we present a residue module to mitigate the gap between different modalities progressively. Moreover, a cascade attention guided network with a novel classification loss function is designed to tackle the cross-modal learning task. Our model keeps consistency in the high-level semantic label domain and is able to balance two different modalities. The experimental results demonstrate that our model achieves the state-of-the-art cross-modal audio-visual generation on the challenging Sub-URMP dataset.

Robust Pedestrian Detection in Thermal Imagery Using Synthesized Images

My Kieu, Lorenzo Berlincioni, Leonardo Galteri, Marco Bertini, Andrew Bagdanov, Alberto Del Bimbo

Responsive image

Auto-TLDR; Improving Pedestrian Detection in the thermal domain using Generative Adversarial Network

Slides Poster Similar

In this paper we propose a method for improving pedestrian detection in the thermal domain using two stages: first, a generative data augmentation approach is used, then a domain adaptation method using generated data adapts an RGB pedestrian detector. Our model, based on the Least-Squares Generative Adversarial Network, is trained to synthesize realistic thermal versions of input RGB images which are then used to augment the limited amount of labeled thermal pedestrian images available for training. We apply our generative data augmentation strategy in order to adapt a pretrained YOLOv3 pedestrian detector to detection in the thermal-only domain. Experimental results demonstrate the effectiveness of our approach: using less than 50% of available real thermal training data, and relying on synthesized data generated by our model in the domain adaptation phase, our detector achieves state-of-the-art results on the KAIST Multispectral Pedestrian Detection Benchmark; even if more real thermal data is available adding GAN generated images to the training data results in improved performance, thus showing that these images act as an effective form of data augmentation. To the best of our knowledge, our detector achieves the best single-modality detection results on KAIST with respect to the state-of-the-art.

Adversarial Knowledge Distillation for a Compact Generator

Hideki Tsunashima, Shigeo Morishima, Junji Yamato, Qiu Chen, Hirokatsu Kataoka

Responsive image

Auto-TLDR; Adversarial Knowledge Distillation for Generative Adversarial Nets

Slides Poster Similar

In this paper, we propose memory-efficient Generative Adversarial Nets (GANs) in line with knowledge distillation. Most existing GANs have a shortcoming in terms of the number of model parameters and low processing speed. Here, to tackle the problem, we propose Adversarial Knowledge Distillation for Generative models (AKDG) for highly efficient GANs, in terms of unconditional generation. Using AKDG, model size and processing speed are substantively reduced. Through an adversarial training exercise with a distillation discriminator, a student generator successfully mimics a teacher generator in fewer model layers and fewer parameters and at a higher processing speed. Moreover, our AKDG is network architecture-agnostic. Comparison of AKDG-applied models to vanilla models suggests that it achieves closer scores to a teacher generator and more efficient performance than a baseline method with respect to Inception Score (IS) and Frechet Inception Distance (FID). In CIFAR-10 experiments, improving IS/FID 1.17pt/55.19pt and in LSUN bedroom experiments, improving FID 71.1pt in comparison to the conventional distillation method for GANs.

Pixel-based Facial Expression Synthesis

Arbish Akram, Nazar Khan

Responsive image

Auto-TLDR; pixel-based facial expression synthesis using GANs

Slides Poster Similar

Recently, Facial expression synthesis has shown remarkable advances with the advent of Generative Adversarial Networks (GANs). However, these GAN-based approaches mostly generate photo-realistic results as long as the target data distribution is close to the training data distribution. The quality of GANs results significantly degrades when testing images are from a slightly different distribution. In this work, we propose a pixel-based facial expression synthesis method. Recent work has shown that facial expression synthesis changes only local regions of faces. In the proposed method, each output pixel observes only one input pixel. The proposed method achieves generalization capability by leveraging only few hundred images. Experimental results demonstrate that the proposed method performs comparably with the recent GANs on in-dataset images and significantly outperforms on in the wild images. In addition, the proposed method is faster and it also achieves significantly better performance with two orders of magnitudes lesser computational and storage cost as compared to state-of-the-art GAN-based methods.

Learning Disentangled Representations for Identity Preserving Surveillance Face Camouflage

Jingzhi Li, Lutong Han, Hua Zhang, Xiaoguang Han, Jingguo Ge, Xiaochu Cao

Responsive image

Auto-TLDR; Individual Face Privacy under Surveillance Scenario with Multi-task Loss Function

Poster Similar

In this paper, we focus on protecting the person face privacy under the surveillance scenarios, whose goal is to change the visual appearances of faces while keep them to be recognizable by current face recognition systems. This is a challenging problem as that we should retain the most important structures of captured facial images, while alter the salient facial regions to protect personal privacy. To address this problem, we introduce a novel individual face protection model, which can camouflage the face appearance from the perspective of human visual perception and preserve the identity features of faces used for face authentication. To that end, we develop an encoder-decoder network architecture that can separately disentangle the person feature representation into an appearance code and an identity code. Specifically, we first randomly divide the face image into two groups, the source set and the target set, where the source set is used to extract the identity code and the target set provides the appearance code. Then, we recombine the identity and appearance codes to synthesize a new face, which has the same identity with the source subject. Finally, the synthesized faces are used to replace the original face to protect the privacy of individual. Furthermore, our model is trained end-to-end with a multi-task loss function, which can better preserve the identity and stabilize the training loss. Experiments conducted on Cross-Age Celebrity dataset demonstrate the effectiveness of our model and validate our superiority in terms of visual quality and scalability.

UCCTGAN: Unsupervised Clothing Color Transformation Generative Adversarial Network

Shuming Sun, Xiaoqiang Li, Jide Li

Responsive image

Auto-TLDR; An Unsupervised Clothing Color Transformation Generative Adversarial Network

Slides Poster Similar

Clothing color transformation refers to changing the clothes color in an original image to the clothes color in a target image. In this paper, we propose an Unsupervised Clothing Color Transformation Generative Adversarial Network (UCCTGAN) for the task. UCCTGAN adopts the color histogram of a target clothes as color guidance and an improved U-net architecture called AntennaNet is put forward to fuse the extracted color information with the original image. Meanwhile, to accomplish unsupervised learning, the loss function is carefully designed according to color moment, which evaluates the chromatic aberration between the target clothing and the generated clothing. Experimental results show that our network has the ability to generate convincing color transformation results.

Semi-Supervised Outdoor Image Generation Conditioned on Weather Signals

Sota Kawakami, Kei Okada, Naoko Nitta, Kazuaki Nakamura, Noboru Babaguchi

Responsive image

Auto-TLDR; Semi-supervised Generative Adversarial Network for Prediction of Weather Signals from Outdoor Images

Slides Poster Similar

In recent years, various types of sensors observe the real world. Especially, weather sensors are densely installed all over the world to observe current weather situations at various places. However, weather signals such as the temperature or humidity obtained by weather sensors are intuitively difficult for humans to understand. On the other hand, images captured by typical RGB cameras can tell weather situations at the captured places in a more comprehensible way for humans; however, cameras are only installed at limited places and are not necessarily open to public due to privacy issues. In order to solve this problem, the goal of our work is to generate images which can tell weather situations at arbitrary time and locations. This can be realized by using a conditional generative adversarial network architecture that takes an image and a condition to transform the image accordingly to the condition. Training such network requires a large number of image and condition pairs as the training data. Although weather signals can be easily collected from weather sensors, collecting their spatially and temporally synchronized outdoor images is not easy. Thus, we propose a semi-supervised method for training the image transformer. A relatively small number of pairs of an outdoor image and weather signals is collected, each from different web services, by considering their semantic consistency. The collected pairs are used to train a predictor for predicting weather signals from a given outdoor image. Then, the image transformer is trained by using a large number of pairs of an outdoor image and pseudo weather signals predicted by the predictor as the training data.

Stylized-Colorization for Line Arts

Tzu-Ting Fang, Minh Duc Vo, Akihiro Sugimoto, Shang-Hong Lai

Responsive image

Auto-TLDR; Stylized-colorization using GAN-based End-to-End Model for Anime

Slides Poster Similar

We address a novel problem of stylized-colorization which colorizes a given line art using a given coloring style in text. This problem can be stated as multi-domain image translation and is more challenging than the current colorization problem because it requires not only capturing the illustration distribution but also satisfying the required coloring styles specific to anime such as lightness, shading, or saturation. We propose a GAN-based end-to-end model for stylized-colorization where the model has one generator and two discriminators. Our generator is based on the U-Net architecture and receives a pair of a line art and a coloring style in text as its input to produce a stylized-colorization image of the line art. Two discriminators, on the other hand, share weights at early layers to judge the stylized-colorization image in two different aspects: one for color and one for style. One generator and two discriminators are jointly trained in an adversarial and end-to-end manner. Extensive experiments demonstrate the effectiveness of our proposed model.

A GAN-Based Blind Inpainting Method for Masonry Wall Images

Yahya Ibrahim, Balázs Nagy, Csaba Benedek

Responsive image

Auto-TLDR; An End-to-End Blind Inpainting Algorithm for Masonry Wall Images

Slides Poster Similar

In this paper we introduce a novel end-to-end blind inpainting algorithm for masonry wall images, performing the automatic detection and virtual completion of occluded or damaged wall regions. For this purpose, we propose a three-stage deep neural network that comprises a U-Net-based sub-network for wall segmentation into brick, mortar and occluded regions, which is followed by a two-stage adversarial inpainting model. The first adversarial network predicts the schematic mortar-brick pattern of the occluded areas based on the observed wall structure, providing in itself valuable structural information for archeological and architectural applications. Finally, the second adversarial network predicts the RGB pixel values yielding a realistic visual experience for the observer. While the three stages implement a sequential pipeline, they interact through dependencies of their loss functions admitting the consideration of hidden feature dependencies between the different network components. For training and testing the network a new dataset has been created, and an extensive qualitative and quantitative evaluation versus the state-of-the-art is given.

SATGAN: Augmenting Age Biased Dataset for Cross-Age Face Recognition

Wenshuang Liu, Wenting Chen, Yuanlue Zhu, Linlin Shen

Responsive image

Auto-TLDR; SATGAN: Stable Age Translation GAN for Cross-Age Face Recognition

Slides Poster Similar

In this paper, we propose a Stable Age Translation GAN (SATGAN) to generate fake face images at different ages to augment age biased face datasets for Cross-Age Face Recognition (CAFR) . The proposed SATGAN consists of both generator and discriminator. As a part of the generator, a novel Mask Attention Module (MAM) is introduced to make the generator focus on the face area. In addition, the generator employs a Uniform Distribution Discriminator (UDD) to supervise the learning of latent feature map and enforce the uniform distribution. Besides, the discriminator employs a Feature Separation Module (FSM) to disentangle identity information from the age information. The quantitative and qualitative evaluations on Morph dataset prove that SATGAN achieves much better performance than existing methods. The face recognition model trained using dataset (VGGFace2 and MS-Celeb-1M) augmented using our SATGAN achieves better accuracy on cross age dataset like Cross-Age LFW and AgeDB-30.

Cycle-Consistent Adversarial Networks and Fast Adaptive Bi-Dimensional Empirical Mode Decomposition for Style Transfer

Elissavet Batziou, Petros Alvanitopoulos, Konstantinos Ioannidis, Ioannis Patras, Stefanos Vrochidis, Ioannis Kompatsiaris

Responsive image

Auto-TLDR; FABEMD: Fast and Adaptive Bidimensional Empirical Mode Decomposition for Style Transfer on Images

Slides Poster Similar

Recently, research endeavors have shown the potentiality of Cycle-Consistent Adversarial Networks (CycleGAN) in style transfer. In Cycle-Consistent Adversarial Networks, the consistency loss is introduced to measure the difference between the original images and the reconstructed in both directions, forward and backward. In this work, the combination of Cycle-Consistent Adversarial Networks with Fast and Adaptive Bidimensional Empirical Mode Decomposition (FABEMD) is proposed to perform style transfer on images. In the proposed approach the cycle-consistency loss is modified to include the differences between the extracted Intrinsic Mode Functions (BIMFs) images. Instead of an estimation of pixel-to-pixel difference between the produced and input images, the FABEMD is applied and the extracted BIMFs are involved in the computation of the total cycle loss. This method enriches the computation of the total loss in a content-to-content and style-to-style comparison by connecting the spatial information to the frequency components. The experimental results reveal that the proposed method is efficient and produces qualitative results comparable to state-of-the-art methods.

SECI-GAN: Semantic and Edge Completion for Dynamic Objects Removal

Francesco Pinto, Andrea Romanoni, Matteo Matteucci, Phil Torr

Responsive image

Auto-TLDR; SECI-GAN: Semantic and Edge Conditioned Inpainting Generative Adversarial Network

Slides Poster Similar

Image inpainting aims at synthesizing the missing content of damaged and corrupted images to produce visually realistic restorations; typical applications are in image restoration, automatic scene editing, super-resolution, and dynamic object removal. In this paper, we propose Semantic and Edge Conditioned Inpainting Generative Adversarial Network (SECI-GAN), an architecture that jointly exploits the high-level cues extracted by semantic segmentation and the fine-grained details captured by edge extraction to condition the image inpainting process. SECI-GAN is designed with a particular focus on recovering big regions belonging to the same object (e.g. cars or pedestrians) in the context of dynamic object removal from complex street views. To demonstrate the effectiveness of SECI-GAN, we evaluate our results on the Cityscapes dataset, showing that SECI-GAN is better than competing state-of-the-art models at recovering the structure and the content of the missing parts while producing consistent predictions.

Controllable Face Aging

Haien Zeng, Hanjiang Lai

Responsive image

Auto-TLDR; A controllable face aging method via attribute disentanglement generative adversarial network

Slides Poster Similar

Motivated by the following two observations: 1) people are aging differently under different conditions for changeable facial attributes, e.g., skin color may become darker when working outside, and 2) it needs to keep some unchanged facial attributes during the aging process, e.g., race and gender, we propose a controllable face aging method via attribute disentanglement generative adversarial network. To offer fine control over the synthesized face images, first, an individual embedding of the face is directly learned from an image that contains the desired facial attribute. Second, since the image may contain other unwanted attributes, an attribute disentanglement network is used to separate the individual embedding and learn the common embedding that contains information about the face attribute (e.g., race). With the common embedding, we can manipulate the generated face image with the desired attribute in an explicit manner. Experimental results on two common benchmarks demonstrate that our proposed generator achieves comparable performance on the aging effect with state-of-the-art baselines while gaining more flexibility for attribute control. Code is available at supplementary material.

The Role of Cycle Consistency for Generating Better Human Action Videos from a Single Frame

Runze Li, Bir Bhanu

Responsive image

Auto-TLDR; Generating Videos with Human Action Semantics using Cycle Constraints

Slides Poster Similar

This paper addresses the challenging problem of generating videos with human action semantics. Unlike previous work which predict future frames in a single forward pass, this paper introduces the cycle constraints in both forward and backward passes in the generation of human actions. This is achieved by enforcing the appearance and motion consistency across a sequence of frames generated in the future. The approach consists of two stages. In the first stage, the pose of a human body is generated. In the second stage, an image generator is used to generate future frames by using (a) generated human poses in the future from the first stage, (b) the single observed human pose, and (c) the single corresponding future frame. The experiments are performed on three datasets: Weizmann dataset involving simple human actions, Penn Action dataset and UCF-101 dataset containing complicated human actions, especially in sports. The results from these experiments demonstrate the effectiveness of the proposed approach.

VITON-GT: An Image-Based Virtual Try-On Model with Geometric Transformations

Matteo Fincato, Federico Landi, Marcella Cornia, Fabio Cesari, Rita Cucchiara

Responsive image

Auto-TLDR; VITON-GT: An Image-based Virtual Try-on Architecture for Fashion Catalogs

Slides Poster Similar

The large spread of online shopping has led computer vision researchers to develop different solutions for the fashion domain to potentially increase the online user experience and improve the efficiency of preparing fashion catalogs. Among them, image-based virtual try-on has recently attracted a lot of attention resulting in several architectures that can generate a new image of a person wearing an input try-on garment in a plausible and realistic way. In this paper, we present VITON-GT, a new model for virtual try-on that generates high-quality and photo-realistic images thanks to multiple geometric transformations. In particular, our model is composed of a two-stage geometric transformation module that performs two different projections on the input garment, and a transformation-guided try-on module that synthesize the new image. We experimentally validate the proposed solution on the most common dataset for this task, containing mainly t-shirts, and we demonstrate its effectiveness compared to different baselines and previous methods. Additionally, we assess the generalization capabilities of our model on a new set of fashion items composed of upper-body clothes from different categories. To the best of our knowledge, we are the first to test virtual try-on architectures in this challenging experimental setting.

GAP: Quantifying the Generative Adversarial Set and Class Feature Applicability of Deep Neural Networks

Edward Collier, Supratik Mukhopadhyay

Responsive image

Auto-TLDR; Approximating Adversarial Learning in Deep Neural Networks Using Set and Class Adversaries

Slides Poster Similar

Recent work in deep neural networks has sought to characterize the nature in which a network learns features and how applicable learnt features are to various problem sets. Deep neural network applicability can be split into three sub-problems; set applicability, class applicability, and instance applicability. In this work we seek to quantify the applicability of features learned during adversarial training, focusing specifically on set and class applicability. We apply techniques for measuring applicability to both generators and discriminators trained on various data sets to quantify applicability and better observe how both a generator and a discriminator, and generative models as a whole, learn features during adversarial training.

Unsupervised Contrastive Photo-To-Caricature Translation Based on Auto-Distortion

Yuhe Ding, Xin Ma, Mandi Luo, Aihua Zheng, Ran He

Responsive image

Auto-TLDR; Unsupervised contrastive photo-to-caricature translation with style loss

Slides Poster Similar

Photo-to-caricature aims to synthesize the caricature as a rendered image exaggerating the features through sketching, pencil strokes, or other artistic drawings. Style rendering and geometry deformation are the most important aspects in photo-to-caricature translation task. To take both into consideration, we propose an unsupervised contrastive photo-to-caricature translation architecture. Considering the intuitive artifacts in the existing methods, we propose a contrastive style loss for style rendering to enforce the similarity between the style of rendered photo and the caricature, and simultaneously enhance its discrepancy to the photos. To obtain an exaggerating deformation in an unpaired/unsupervised fashion, we propose a Distortion Prediction Module (DPM) to predict a set of displacements vectors for each input image while fixing some controlling points, followed by the thin plate spline interpolation for warping. The model is trained on unpaired photo and caricature while can offer bidirectional synthesizing via inputting either a photo or a caricature. Extensive experiments demonstrate that the proposed model is effective to generate hand-drawn like caricatures compared with existing competitors.

Data Augmentation Via Mixed Class Interpolation Using Cycle-Consistent Generative Adversarial Networks Applied to Cross-Domain Imagery

Hiroshi Sasaki, Chris G. Willcocks, Toby Breckon

Responsive image

Auto-TLDR; C2GMA: A Generative Domain Transfer Model for Non-visible Domain Classification

Slides Poster Similar

Machine learning driven object detection and classification within non-visible imagery has an important role in many fields such as night vision, all-weather surveillance and aviation security. However, such applications often suffer due to the limited quantity and variety of non-visible spectral domain imagery, in contrast to the high data availability of visible-band imagery that readily enables contemporary deep learning driven detection and classification approaches. To address this problem, this paper proposes and evaluates a novel data augmentation approach that leverages the more readily available visible-band imagery via a generative domain transfer model. The model can synthesise large volumes of non-visible domain imagery by image-to-image (I2I) translation from the visible image domain. Furthermore, we show that the generation of interpolated mixed class (non-visible domain) image examples via our novel Conditional CycleGAN Mixup Augmentation (C2GMA) methodology can lead to a significant improvement in the quality of non-visible domain classification tasks that otherwise suffer due to limited data availability. Focusing on classification within the Synthetic Aperture Radar (SAR) domain, our approach is evaluated on a variation of the Statoil/C-CORE Iceberg Classifier Challenge dataset and achieves 75.4% accuracy, demonstrating a significant improvement when compared against traditional data augmentation strategies (Rotation, Mixup, and MixCycleGAN).

Galaxy Image Translation with Semi-Supervised Noise-Reconstructed Generative Adversarial Networks

Qiufan Lin, Dominique Fouchez, Jérôme Pasquet

Responsive image

Auto-TLDR; Semi-supervised Image Translation with Generative Adversarial Networks Using Paired and Unpaired Images

Slides Poster Similar

Image-to-image translation with Deep Learning neural networks, particularly with Generative Adversarial Networks (GANs), is one of the most powerful methods for simulating astronomical images. However, current work is limited to utilizing paired images with supervised translation, and there has been rare discussion on reconstructing noise background that encodes instrumental and observational effects. These limitations might be harmful for subsequent scientific applications in astrophysics. Therefore, we aim to develop methods for using unpaired images and preserving noise characteristics in image translation. In this work, we propose a two-way image translation model using GANs that exploits both paired and unpaired images in a semi-supervised manner, and introduce a noise emulating module that is able to learn and reconstruct noise characterized by high-frequency features. By experimenting on multi-band galaxy images from the Sloan Digital Sky Survey (SDSS) and the Canada France Hawaii Telescope Legacy Survey (CFHT), we show that our method recovers global and local properties effectively and outperforms benchmark image translation models. To our best knowledge, this work is the first attempt to apply semi-supervised methods and noise reconstruction techniques in astrophysical studies.

Dual-MTGAN: Stochastic and Deterministic Motion Transfer for Image-To-Video Synthesis

Fu-En Yang, Jing-Cheng Chang, Yuan-Hao Lee, Yu-Chiang Frank Wang

Responsive image

Auto-TLDR; Dual Motion Transfer GAN for Convolutional Neural Networks

Slides Poster Similar

Generating videos with content and motion variations is a challenging task in computer vision. While the recent development of GAN allows video generation from latent representations, it is not easy to produce videos with particular content of motion patterns of interest. In this paper, we propose Dual Motion Transfer GAN (Dual-MTGAN), which takes image and video data as inputs while learning disentangled content and motion representations. Our Dual-MTGAN is able to perform deterministic motion transfer and stochastic motion generation. Based on a given image, the former preserves the input content and transfers motion patterns observed from another video sequence, and the latter directly produces videos with plausible yet diverse motion patterns based on the input image. The proposed model is trained in an end-to-end manner, without the need to utilize pre-defined motion features like pose or facial landmarks. Our quantitative and qualitative results would confirm the effectiveness and robustness of our model in addressing such conditioned image-to-video tasks.

Learning Low-Shot Generative Networks for Cross-Domain Data

Hsuan-Kai Kao, Cheng-Che Lee, Wei-Chen Chiu

Responsive image

Auto-TLDR; Learning Generators for Cross-Domain Data under Low-Shot Learning

Slides Poster Similar

We tackle a novel problem of learning generators for cross-domain data under a specific scenario of low-shot learning. Basically, given a source domain with sufficient amount of training data, we aim to transfer the knowledge of its generative process to another target domain, which not only has few data samples but also contains the domain shift with respect to the source domain. This problem has great potential in practical use and is different from the well-known image translation task, as the target-domain data can be generated without requiring any source-domain ones and the large data consumption for learning target-domain generator can be alleviated. Built upon a cross-domain dataset where (1) each of the low shots in the target domain has its correspondence in the source and (2) these two domains share the similar content information but different appearance, two approaches are proposed: a Latent-Disentanglement-Orientated model (LaDo) and a Generative-Hierarchy-Oriented (GenHo) model. Our LaDo and GenHo approaches address the problem from different perspectives, where the former relies on learning the disentangled representation composed of domain-invariant content features and domain-specific appearance ones; while the later decomposes the generative process of a generator into two parts for synthesizing the content and appearance sequentially. We perform extensive experiments under various settings of cross-domain data and show the efficacy of our models for generating target-domain data with the abundant content variance as in the source domain, which lead to the favourable performance in comparison to several baselines.

Fidelity-Controllable Extreme Image Compression with Generative Adversarial Networks

Shoma Iwai, Tomo Miyazaki, Yoshihiro Sugaya, Shinichiro Omachi

Responsive image

Auto-TLDR; GAN-based Image Compression at Low Bitrates

Slides Similar

We propose a GAN-based image compression method working at extremely low bitrates below 0.1bpp. Most existing learned image compression methods suffer from blur at extremely low bitrates. Although GAN can help to reconstruct sharp images, there are two drawbacks. First, GAN makes train- ing unstable. Second, the reconstructions often contain unpleasing noise or artifacts. To address both of the drawbacks, our method adopts two-stage training and network interpolation. The two- stage training is effective to stabilize the training. Moreover, the network interpolation utilizes the models in both stages and reduces undesirable noise and artifacts, while maintaining important edges. Hence, we can control the trade-off between perceptual quality and fidelity without re-training models. The experimental results show that our model can reconstruct high quality images. Furthermore, our user study confirms that our reconstructions are preferable to state-of-the-art GAN-based image compression model.

Disentangled Representation Learning for Controllable Image Synthesis: An Information-Theoretic Perspective

Shichang Tang, Xu Zhou, Xuming He, Yi Ma

Responsive image

Auto-TLDR; Controllable Image Synthesis in Deep Generative Models using Variational Auto-Encoder

Slides Poster Similar

In this paper, we look into the problem of disentangled representation learning and controllable image synthesis in a deep generative model. We develop an encoder-decoder architecture for a variant of the Variational Auto-Encoder (VAE) with two latent codes $z_1$ and $z_2$. Our framework uses $z_2$ to capture specified factors of variation while $z_1$ captures the complementary factors of variation. To this end, we analyze the learning problem from the perspective of multivariate mutual information, derive optimizable lower bounds of the conditional mutual information in the image synthesis processes and incorporate them into the training objective. We validate our method empirically on the Color MNIST dataset and the CelebA dataset by showing controllable image syntheses. Our proposed paradigm is simple yet effective and is applicable to many situations, including those where there is not an explicit factorization of features available, or where the features are non-categorical.

A NoGAN Approach for Image and Video Restoration and Compression Artifact Removal

Mameli Filippo, Marco Bertini, Leonardo Galteri, Alberto Del Bimbo

Responsive image

Auto-TLDR; Deep Neural Network for Image and Video Compression Artifact Removal and Restoration

Poster Similar

Lossy image and video compression algorithms introduce several different types of visual artifacts that reduce the visual quality of the compressed media, and the higher the compression rate the higher is the strength of these artifacts. In this work, we describe an approach for visual quality improvement of compressed images and videos to be performed at presentation time, so to obtain the benefits of fast data transfer and reduced data storage, while enjoying a visual quality that could be obtained only reducing the compression rate. To obtain this result we propose to use a deep neural network trained using the NoGAN approach, adapting the popular DeOldify architecture used for colorization. We show how the proposed method can be applied both to image and video compression artifact removal and restoration.

Multi-Laplacian GAN with Edge Enhancement for Face Super Resolution

Shanlei Ko, Bi-Ru Dai

Responsive image

Auto-TLDR; Face Image Super-Resolution with Enhanced Edge Information

Slides Poster Similar

Face image super-resolution has become a research hotspot in the field of image processing. Nowadays, more and more researches add additional information, such as landmark, identity, to reconstruct high resolution images from low resolution ones, and have a good performance in quantitative terms and perceptual quality. However, these additional information is hard to obtain in many cases. In this work, we focus on reconstructing face images by extracting useful information from face images directly rather than using additional information. By observing edge information in each scale of face images, we propose a method to reconstruct high resolution face images with enhanced edge information. In additional, with the proposed training procedure, our method reconstructs photo-realistic images in upscaling factor 8x and outperforms state-of-the-art methods both in quantitative terms and perceptual quality.

Global Image Sentiment Transfer

Jie An, Tianlang Chen, Songyang Zhang, Jiebo Luo

Responsive image

Auto-TLDR; Image Sentiment Transfer Using DenseNet121 Architecture

Poster Similar

Transferring the sentiment of an image is an unexplored research topic in computer vision. This work proposes a novel framework consisting of a reference image retrieval step and a global sentiment transfer step to transfer image sentiment according to a given sentiment tag. The proposed image retrieval algorithm is based on the SSIM index. The retrieved reference images by the proposed algorithm are more content-related than the algorithm based on the perceptual loss. Therefore, it can lead to a better image sentiment transfer result. In addition, we propose a global sentiment transfer step, which employs an optimization algorithm to iteratively transfer image sentiment based on the feature maps produced by the DenseNet121 architecture. The proposed sentiment transfer algorithm can transfer image sentiment while keeping the content of the input image intact. Both qualitative and quantitative evaluations demonstrate that the proposed sentiment transfer framework outperforms existing artistic and photo-realistic style transfer algorithms in producing satisfactory sentiment transfer results with fine and exact details.

Ω-GAN: Object Manifold Embedding GAN for Image Generation by Disentangling Parameters into Pose and Shape Manifolds

Yasutomo Kawanishi, Daisuke Deguchi, Ichiro Ide, Hiroshi Murase

Responsive image

Auto-TLDR; Object Manifold Embedding GAN with Parametric Sampling and Object Identity Loss

Slides Poster Similar

In this paper, we propose Object Manifold Embedding GAN (Ω-GAN) to generate images of variously shaped and arbitrarily posed objects from a noise variable sampled from a distribution defined over the pose and the shape manifolds in a vector space. We introduce Parametric Manifold Sampling to sample noise variables from a distribution over the pose manifold to conditionally generate object images in arbitrary poses by tuning the pose parameter. We also introduce Object Identity Loss for clearly disentangling the pose and shape parameters, which allows us to maintain the shape of the object instance when only the pose parameter is changed. Through evaluation, we confirmed that the proposed Ω-GAN could generate variously shaped object images in arbitrary poses by changing the pose and shape parameters independently. We also introduce an application of the proposed method for object pose estimation, through which we confirmed that the object poses in the generated images are accurate.

Makeup Style Transfer on Low-Quality Images with Weighted Multi-Scale Attention

Daniel Organisciak, Edmond S. L. Ho, Shum Hubert P. H.

Responsive image

Auto-TLDR; Facial Makeup Style Transfer for Low-Resolution Images Using Multi-Scale Spatial Attention

Slides Poster Similar

Facial makeup style transfer is an extremely challenging sub-field of image-to-image-translation. Due to this difficulty, state-of-the-art results are mostly reliant on the Face Parsing Algorithm, which segments a face into parts in order to easily extract makeup features. However, we find that this algorithm can only work well on high-definition images where facial features can be accurately extracted. Faces in many real-world photos, such as those including a large background or multiple people, are typically of low-resolution, which considerably hinders state-of-the-art algorithms. In this paper, we propose an end-to-end holistic approach to effectively transfer makeup styles between two low-resolution images. The idea is built upon a novel weighted multi-scale spatial attention module, which identifies salient pixel regions on low-resolution images in multiple scales, and uses channel attention to determine the most effective attention map. This design provides two benefits: low-resolution images are usually blurry to different extents, so a multi-scale architecture can select the most effective convolution kernel size to implement spatial attention; makeup is applied on both a macro-level (foundation, fake tan) and a micro-level (eyeliner, lipstick) so different scales can excel in extracting different makeup features. We develop an Augmented CycleGAN network that embeds our attention modules at selected layers to most effectively transfer makeup. We test our system with the FBD data set, which consists of many low-resolution facial images, and demonstrates that it outperforms state-of-the-art methods, particularly in transferring makeup for blurry images and partially occluded images.

Boundary Guided Image Translation for Pose Estimation from Ultra-Low Resolution Thermal Sensor

Kohei Kurihara, Tianren Wang, Teng Zhang, Brian Carrington Lovell

Responsive image

Auto-TLDR; Pose Estimation on Low-Resolution Thermal Images Using Image-to-Image Translation Architecture

Slides Poster Similar

This work addresses the pose estimation task on low-resolution images captured using thermal sensors which can operate in a no-light environment. Low-resolution thermal sensors have been widely adopted in various applications for cost control and privacy protection purposes. In this paper, targeting the challenging scenario of ultra-low resolution thermal imaging (3232 pixels), we aim to estimate human poses for the purpose of monitoring health conditions and indoor events. To overcome the challenges in ultra-low resolution thermal imaging such as blurred boundaries and data scarcity, we propose a new Image-to-Image (I2I) translation architecture which can translate the original blurred thermal image into a visible light image with sharper boundaries. Then the generated visible light image can be fed into the off-the-shelf pose estimator which was well-trained in the visible domain. Experimental results suggest that the proposed framework outperforms other state-of-the-art methods in the I2I based pose estimation task for our thermal image dataset. Furthermore, we also demonstrated the merits of the proposed method on the publicly available FLIR dataset by measuring the quality of translated images.

Local-Global Interactive Network for Face Age Transformation

Jie Song, Ping Wei, Huan Li, Yongchi Zhang, Nanning Zheng

Responsive image

Auto-TLDR; A Novel Local-Global Interaction Framework for Long-span Face Age Transformation

Slides Poster Similar

Face age transformation, which aims to generate a face image in the past or future, has receiving increasing attention due to its significant application value in some special fields, such as looking for a lost child, tracking criminals and entertainment, etc. Currently, most existing methods mainly focus on unidirectional short-span face aging. In this paper, we propose a novel local-global interaction framework for long-span face age transformation. Firstly, we divide a face image into five independent parts and design a local generative network for each of them to learn the local structure changes of a face image, while we utilize a global generative network to learn the global structure changes. Then we introduce an interactive network and an age classification network, which are respectively used to integrate the local and global features and maintain the corresponding age features in different age groups. Given any face image at a certain age, our network can produce a clear and realistic image of face aging or rejuvenation. We test and evaluate the model on complex datasets, and extensive qualitative comparison experiments has proved the effectiveness and immense potential of our proposed method.

Super-Resolution Guided Pore Detection for Fingerprint Recognition

Syeda Nyma Ferdous, Ali Dabouei, Jeremy Dawson, Nasser M. Nasarabadi

Responsive image

Auto-TLDR; Super-Resolution Generative Adversarial Network for Fingerprint Recognition Using Pore Features

Slides Poster Similar

Performance of fingerprint recognition algorithms substantially rely on fine features extracted from fingerprints. Apart from minutiae and ridge patterns, pore features have proven to be usable for fingerprint recognition. Although features from minutiae and ridge patterns are quite attainable from low-resolution images, using pore features is practical only if the fingerprint image is of high resolution which necessitates a model that enhances the image quality of the conventional 500 ppi legacy fingerprints preserving the fine details. To find a solution for recovering pore information from low-resolution fingerprints, we adopt a joint learning-based approach that combines both super-resolution and pore detection networks. Our modified single image Super-Resolution Generative Adversarial Network (SRGAN) framework helps to reliably reconstruct high-resolution fingerprint samples from low-resolution ones assisting the pore detection network to identify pores with a high accuracy. The network jointly learns a distinctive feature representation from a real low-resolution fingerprint sample and successfully synthesizes a high-resolution sample from it. To add discriminative information and uniqueness for all the subjects, we have integrated features extracted from a deep fingerprint verifier with the SRGAN quality discriminator. We also add ridge reconstruction loss, utilizing ridge patterns to make the best use of extracted features. Our proposed method solves the recognition problem by improving the quality of fingerprint images. High recognition accuracy of the synthesized samples that is close to the accuracy achieved using the original high-resolution images validate the effectiveness of our proposed model.

An Unsupervised Approach towards Varying Human Skin Tone Using Generative Adversarial Networks

Debapriya Roy, Diganta Mukherjee, Bhabatosh Chanda

Responsive image

Auto-TLDR; Unsupervised Skin Tone Change Using Augmented Reality Based Models

Slides Poster Similar

With the increasing popularity of augmented and virtual reality, retailers are now more focusing towards customer satisfaction to increase the amount of sales. Although augmented reality is not a new concept but it has gained its much needed attention over the past few years. Our present work is targeted towards this direction which may be used to enhance user experience in various virtual and augmented reality based applications. We propose a model to change skin tone of person. Given any input image of a person or a group of persons with some value indicating the desired change of skin color towards fairness or darkness, this method can change the skin tone of the persons in the image. This is an unsupervised method and also unconstrained in terms of pose, illumination, number of persons in the image etc. The goal of this work is to reduce the complexity in terms of time and effort which is generally needed for changing the skin tone using existing applications by professionals or novice. Rigorous experiments shows the efficacy of this method in terms of synthesizing perceptually convincing outputs.