LFIEM: Lightweight Filter-Based Image Enhancement Model

Oktai Tatanov, Aleksei Samarin

Responsive image

Auto-TLDR; Image Retouching Using Semi-supervised Learning for Mobile Devices

Slides Poster

Photo retouching features are being integrated into a growing number of mobile applications. Current learning-based approaches enhance images using large convolutional neural network-based models, where the result is received directly from the neural network outputs. This method can lead to artifacts in the resulting images, models that are complicated to interpret, and can be computationally expensive. In this paper, we explore the application of a filter-based approach in order to overcome the problems outlined above. We focus on creating a lightweight solution suitable for use on mobile devices when designing our model. A significant performance increase was achieved through implementing consistency regularization used in semi-supervised learning. The proposed model can be used on mobile devices and achieves competitive results compared to known models.

Similar papers

CURL: Neural Curve Layers for Global Image Enhancement

Sean Moran, Steven Mcdonagh, Greg Slabaugh

Responsive image

Auto-TLDR; CURL: Neural CURve Layers for Image Enhancement

Slides Poster Similar

We present a novel approach to adjust global image properties such as colour, saturation, and luminance using human-interpretable image enhancement curves, inspired by the Photoshop curves tool. Our method, dubbed neural CURve Layers (CURL), is designed as a multi-colour space neural retouching block trained jointly in three different colour spaces (HSV, CIELab, RGB) guided by a novel multi-colour space loss. The curves are fully differentiable and are trained end-to-end for different computer vision problems including photo enhancement (RGB-to-RGB) and as part of the image signal processing pipeline for image formation (RAW-to-RGB). To demonstrate the effectiveness of CURL we combine this global image transformation block with a pixel-level (local) image multi-scale encoder-decoder backbone network. In an extensive experimental evaluation we show that CURL produces state-of-the-art image quality versus recently proposed deep learning approaches in both objective and perceptual metrics, setting new state-of-the-art performance on multiple public datasets.

Thermal Image Enhancement Using Generative Adversarial Network for Pedestrian Detection

Mohamed Amine Marnissi, Hajer Fradi, Anis Sahbani, Najoua Essoukri Ben Amara

Responsive image

Auto-TLDR; Improving Visual Quality of Infrared Images for Pedestrian Detection Using Generative Adversarial Network

Slides Poster Similar

Infrared imaging has recently played an important role in a wide range of applications including surveillance, robotics and night vision. However, infrared cameras often suffer from some limitations, essentially about low-contrast and blurred details. These problems contribute to the loss of observation of target objects in infrared images, which could limit the feasibility of different infrared imaging applications. In this paper, we mainly focus on the problem of pedestrian detection on thermal images. Particularly, we emphasis the need for enhancing the visual quality of images beforehand performing the detection step. % to ensure effective results. To address that, we propose a novel thermal enhancement architecture based on Generative Adversarial Network, and composed of two modules contrast enhancement and denoising modules with a post-processing step for edge restoration in order to improve the overall quality. The effectiveness of the proposed architecture is assessed by means of visual quality metrics and better results are obtained compared to the original thermal images and to the obtained results by other existing enhancement methods. These results have been conduced on a subset of KAIST dataset. Using the same dataset, the impact of the proposed enhancement architecture has been demonstrated on the detection results by obtaining better performance with a significant margin using YOLOv3 detector.

Automatical Enhancement and Denoising of Extremely Low-Light Images

Yuda Song, Yunfang Zhu, Xin Du

Responsive image

Auto-TLDR; INSNet: Illumination and Noise Separation Network for Low-Light Image Restoring

Slides Poster Similar

Deep convolutional neural networks (DCNN) based methodologies have achieved remarkable performance on various low-level vision tasks recently. Restoring images captured at night is one of the trickiest low-level vision tasks due to its high-level noise and low-level intensity. We propose a DCNN-based methodology, Illumination and Noise Separation Network (INSNet), which performs both denoising and enhancement on these extremely low-light images. INSNet fully utilizes global-ware features and local-ware features using the modified network structure and image sampling scheme. Compared to well-designed complex neural networks, our proposed methodology only needs to add a bypass network to the existing network. However, it can boost the quality of recovered images dramatically but only increase the computational cost by less than 0.1%. Even without any manual settings, INSNet can stably restore the extremely low-light images to desired high-quality images.

Dynamic Low-Light Image Enhancement for Object Detection Via End-To-End Training

Haifeng Guo, Yirui Wu, Tong Lu

Responsive image

Auto-TLDR; Object Detection using Low-Light Image Enhancement for End-to-End Training

Slides Poster Similar

Object detection based on convolutional neural networks is a hot research topic in computer vision. The illumination component in the image has a great impact on object detection, and it will cause a sharp decline in detection performance under low-light conditions. Using low-light image enhancement technique as a pre-processing mechanism can improve image quality and obtain better detection results.However, due to the complexity of low-light environments, the existing enhancement methods may have negative effects on some samples. Therefore, it is difficult to improve the overall detection performance in low-light conditions. In this paper, our goal is to use image enhancement to improve object detection performance rather than perceptual quality for humans. We propose a novel framework that combines low-light enhancement and object detection for end-to-end training. The framework can dynamically select different enhancement subnetworks for each sample to improve the performance of the detector. Our proposed method consists of two stage: the enhancement stage and the detection stage. The enhancement stage dynamically enhances the low-light images under the supervision of several enhancement methods and output corresponding weights. During the detection stage, the weights offers information on object classification to generate high-quality region proposals and in turn result in accurate detection. Our experiments present promising results, which show that the proposed method can significantly improve the detection performance in low-light environment.

SIDGAN: Single Image Dehazing without Paired Supervision

Pan Wei, Xin Wang, Lei Wang, Ji Xiang, Zihan Wang

Responsive image

Auto-TLDR; DehazeGAN: An End-to-End Generative Adversarial Network for Image Dehazing

Slides Poster Similar

Single image dehazing is challenging without scene airlight and transmission map. Most of existing dehazing algorithms tend to estimate key parameters based on manual designed priors or statistics, which may be invalid in some scenarios. Although deep learning-based dehazing methods provide an effective solution, most of them rely on paired training datasets, which are prohibitively difficult to be collected in real world. In this paper, we propose an effective end-to-end generative adversarial network for image dehazing, named DehazeGAN. The proposed DehazeGAN adopts a U-net architecture with a novel color-consistency loss derived from dark channel prior and perceptual loss, which can be trained in an unsupervised fashion without paired synthetic datasets. We create a RealHaze dataset for network training, including 4,000 outdoor hazy images and 4,000 haze-free images. Extensive experiments demonstrate that our proposed DehazeGAN achieves better performance than existing state-of-the-art methods on both synthetic datasets and real-world datasets in terms of PSNR, SSIM, and subjective visual experience.

Towards Artifacts-Free Image Defogging

Gabriele Graffieti, Davide Maltoni

Responsive image

Auto-TLDR; CurL-Defog: Learning Based Defogging with CycleGAN and HArD

Slides Similar

In this paper we present a novel defogging technique, named CurL-Defog, aimed at minimizing the creation of artifacts. The majority of learning based defogging approaches relies on paired data (i.e., the same images with and without fog), where fog is artificially added to clear images: this often provides good results on mildly fogged images but does not generalize well to real difficult cases. On the other hand, the models trained with real unpaired data (e.g. CycleGAN) can provide visually impressive results but often produce unwanted artifacts. In this paper we propose a curriculum learning strategy coupled with an enhanced CycleGAN model in order to reduce the number of produced artifacts, while maintaining state-of-the- art performance in terms of contrast enhancement and image reconstruction. We also introduce a new metric, called HArD (Hazy Artifact Detector) to numerically quantify the amount of artifacts in the defogged images, thus avoiding the tedious and subjective manual inspection of the results. The proposed approach compares favorably with state-of-the-art techniques on both real and synthetic datasets.

Adaptive Image Compression Using GAN Based Semantic-Perceptual Residual Compensation

Ruojing Wang, Zitang Sun, Sei-Ichiro Kamata, Weili Chen

Responsive image

Auto-TLDR; Adaptive Image Compression using GAN based Semantic-Perceptual Residual Compensation

Slides Poster Similar

Image compression is a basic task in image processing. In this paper, We present an adaptive image compression algorithm that relies on GAN based semantic-perceptual residual compensation, which is available to offer visually pleasing reconstruction at a low bitrate. Our method adopt an U-shaped encoding and decoding structure accompanied by a well-designed dense residual connection with strip pooling module to improve the original auto-encoder. Besides, we introduce the idea of adversarial learning by introducing a discriminator thus constructed a complete GAN. To improve the coding efficiency, we creatively designed an adaptive semantic-perception residual compensation block based on Grad-CAM algorithm. In the improvement of the quantizer, we embed the method of soft-quantization so as to solve the problem to some extent that back propagation process is irreversible. Simultaneously, we use the latest FLIF lossless compression algorithm and BPG vector compression algorithm to perform deeper compression on the image. More importantly experimental results including PSNR, MS-SSIM demonstrate that the proposed approach outperforms the current state-of-the-art image compression methods.

Boosting High-Level Vision with Joint Compression Artifacts Reduction and Super-Resolution

Xiaoyu Xiang, Qian Lin, Jan Allebach

Responsive image

Auto-TLDR; A Context-Aware Joint CAR and SR Neural Network for High-Resolution Text Recognition and Face Detection

Slides Poster Similar

Due to the limits of bandwidth and storage space, digital images are usually down-scaled and compressed when transmitted over networks, resulting in loss of details and jarring artifacts that can lower the performance of high-level visual tasks. In this paper, we aim to generate an artifact-free high-resolution image from a low-resolution one compressed with an arbitrary quality factor by exploring joint compression artifacts reduction (CAR) and super-resolution (SR) tasks. First, we propose a context-aware joint CAR and SR neural network (CAJNN) that integrates both local and non-local features to solve CAR and SR in one-stage. Finally, a deep reconstruction network is adopted to predict high quality and high-resolution images. Evaluation on CAR and SR benchmark datasets shows that our CAJNN model outperforms previous methods and also takes 26.2% less runtime. Based on this model, we explore addressing two critical challenges in high-level computer vision: optical character recognition of low-resolution texts, and extremely tiny face detection. We demonstrate that CAJNN can serve as an effective image preprocessing method and improve the accuracy for real-scene text recognition (from 85.30% to 85.75%) and the average precision for tiny face detection (from 0.317 to 0.611).

A NoGAN Approach for Image and Video Restoration and Compression Artifact Removal

Mameli Filippo, Marco Bertini, Leonardo Galteri, Alberto Del Bimbo

Responsive image

Auto-TLDR; Deep Neural Network for Image and Video Compression Artifact Removal and Restoration

Poster Similar

Lossy image and video compression algorithms introduce several different types of visual artifacts that reduce the visual quality of the compressed media, and the higher the compression rate the higher is the strength of these artifacts. In this work, we describe an approach for visual quality improvement of compressed images and videos to be performed at presentation time, so to obtain the benefits of fast data transfer and reduced data storage, while enjoying a visual quality that could be obtained only reducing the compression rate. To obtain this result we propose to use a deep neural network trained using the NoGAN approach, adapting the popular DeOldify architecture used for colorization. We show how the proposed method can be applied both to image and video compression artifact removal and restoration.

Explorable Tone Mapping Operators

Su Chien-Chuan, Yu-Lun Liu, Hung Jin Lin, Ren Wang, Chia-Ping Chen, Yu-Lin Chang, Soo-Chang Pei

Responsive image

Auto-TLDR; Learning-based multimodal tone-mapping from HDR images

Slides Poster Similar

Tone-mapping plays an essential role in high dynamic range (HDR) imaging. It aims to preserve visual information of HDR images in a medium with a limited dynamic range. Although many works have been proposed to provide tone-mapped results from HDR images, most of them can only perform tone-mapping in a single pre-designed way. However,the subjectivity of tone-mapping quality varies from person to person, and the preference of tone-mapping style also differs from application to application. In this paper, a learning-based multimodal tone-mapping method is proposed, which not only achieves excellent visual quality but also explores the style diversity. Based on the framework of BicycleGAN [1], the proposed method can provide a variety of expert-level tone-mapped results by manipulating different latent codes. Finally, we show that the proposed method performs favorably against state-of-the-art tone-mapping algorithms both quantitatively and qualitatively.

Local Facial Attribute Transfer through Inpainting

Ricard Durall, Franz-Josef Pfreundt, Janis Keuper

Responsive image

Auto-TLDR; Attribute Transfer Inpainting Generative Adversarial Network

Slides Poster Similar

The term attribute transfer refers to the tasks of altering images in such a way, that the semantic interpretation of a given input image is shifted towards an intended direction, which is quantified by semantic attributes. Prominent example applications are photo realistic changes of facial features and expressions, like changing the hair color, adding a smile, enlarging the nose or altering the entire context of a scene, like transforming a summer landscape into a winter panorama. Recent advances in attribute transfer are mostly based on generative deep neural networks, using various techniques to manipulate images in the latent space of the generator. In this paper, we present a novel method for the common sub-task of local attribute transfers, where only parts of a face have to be altered in order to achieve semantic changes (e.g. removing a mustache). In contrast to previous methods, where such local changes have been implemented by generating new (global) images, we propose to formulate local attribute transfers as an inpainting problem. Removing and regenerating only parts of images, our Attribute Transfer Inpainting Generative Adversarial Network (ATI-GAN) is able to utilize local context information to focus on the attributes while keeping the background unmodified resulting in visually sound results.

Video Lightening with Dedicated CNN Architecture

Li-Wen Wang, Wan-Chi Siu, Zhi-Song Liu, Chu-Tak Li, P. K. Daniel Lun

Responsive image

Auto-TLDR; VLN: Video Lightening Network for Driving Assistant Systems in Dark Environment

Slides Poster Similar

Darkness brings us uncertainty, worry and low confidence. This is a problem not only applicable to us walking in a dark evening but also for drivers driving a car on the road with very dim or even without lighting condition. To address this problem, we propose a new CNN structure named as Video Lightening Network (VLN) that regards the low-light enhancement as a residual learning task, which is useful as reference to indirectly lightening the environment, or for vision-based application systems, such as driving assistant systems. The VLN consists of several Lightening Back-Projection (LBP) and Temporal Aggregation (TA) blocks. Each LBP block enhances the low-light frame by domain transfer learning that iteratively maps the frame between the low- and normal-light domains. A TA block handles the motion among neighboring frames by investigating the spatial and temporal relationships. Several TAs work in a multi-scale way, which compensates the motions at different levels. The proposed architecture has a consistent enhancement for different levels of illuminations, which significantly increases the visual quality even in the extremely dark environment. Extensive experimental results show that the proposed approach outperforms other methods under both objective and subjective metrics.

Cycle-Consistent Adversarial Networks and Fast Adaptive Bi-Dimensional Empirical Mode Decomposition for Style Transfer

Elissavet Batziou, Petros Alvanitopoulos, Konstantinos Ioannidis, Ioannis Patras, Stefanos Vrochidis, Ioannis Kompatsiaris

Responsive image

Auto-TLDR; FABEMD: Fast and Adaptive Bidimensional Empirical Mode Decomposition for Style Transfer on Images

Slides Poster Similar

Recently, research endeavors have shown the potentiality of Cycle-Consistent Adversarial Networks (CycleGAN) in style transfer. In Cycle-Consistent Adversarial Networks, the consistency loss is introduced to measure the difference between the original images and the reconstructed in both directions, forward and backward. In this work, the combination of Cycle-Consistent Adversarial Networks with Fast and Adaptive Bidimensional Empirical Mode Decomposition (FABEMD) is proposed to perform style transfer on images. In the proposed approach the cycle-consistency loss is modified to include the differences between the extracted Intrinsic Mode Functions (BIMFs) images. Instead of an estimation of pixel-to-pixel difference between the produced and input images, the FABEMD is applied and the extracted BIMFs are involved in the computation of the total cycle loss. This method enriches the computation of the total loss in a content-to-content and style-to-style comparison by connecting the spatial information to the frequency components. The experimental results reveal that the proposed method is efficient and produces qualitative results comparable to state-of-the-art methods.

LiNet: A Lightweight Network for Image Super Resolution

Armin Mehri, Parichehr Behjati Ardakani, Angel D. Sappa

Responsive image

Auto-TLDR; LiNet: A Compact Dense Network for Lightweight Super Resolution

Slides Poster Similar

This paper proposes a new lightweight network, LiNet, that enhancing technical efficiency in lightweight super resolution and operating approximately like very large and costly networks in terms of number of network parameters and operations. The proposed architecture allows the network to learn more abstract properties by avoiding low-level information via multiple links. LiNet introduces a Compact Dense Module, which contains set of inner and outer blocks, to efficiently extract meaningful information, to better leverage multi-level representations before upsampling stage, and to allow an efficient information and gradient flow within the network. Experiments on benchmark datasets show that the proposed LiNet achieves favorable performance against lightweight state-of-the-art methods.

Hierarchically Aggregated Residual Transformation for Single Image Super Resolution

Zejiang Hou, Sy Kung

Responsive image

Auto-TLDR; HARTnet: Hierarchically Aggregated Residual Transformation for Multi-Scale Super-resolution

Slides Poster Similar

Visual patterns usually appear at different scales/sizes in natural images. Multi-scale feature representation is of great importance for the single-image super-resolution(SISR) task to reconstruct image objects at different scales.However, such characteristic has been rarely considered by CNN-based SISR methods. In this work, we propose a novel build-ing block, i.e. hierarchically aggregated residual transformation(HART), to achieve multi-scale feature representation in each layer of the network. Within each HART block, we connect multiple convolutions in a hierarchical residual-like manner, which greatly expands the range of effective receptive fields and helps to detect image features at different scales. To theoretically understand the proposed HART block, we recast SISR as an optimal control problem and show that HART effectively approximates the classical4th-order Runge-Kutta method, which has the merit of small local truncation error for solving numerical ordinary differential equation. By cascading the proposed HART blocks, we establish our high-performing HARTnet. Comparedwith existing SR state-of-the-arts (including those in NTIRE2019 SR Challenge leaderboard), the proposed HARTnet demonstrates consistent PSNR/SSIM performance improvements on various benchmark datasets under different degradation models.Moreover, HARTnet can efficiently restore more faithful high-resolution images than comparative SR methods (cf. Figure 1).

RSAN: Residual Subtraction and Attention Network for Single Image Super-Resolution

Shuo Wei, Xin Sun, Haoran Zhao, Junyu Dong

Responsive image

Auto-TLDR; RSAN: Residual subtraction and attention network for super-resolution

Slides Similar

The single-image super-resolution (SISR) aims to recover a potential high-resolution image from its low-resolution version. Recently, deep learning-based methods have played a significant role in super-resolution field due to its effectiveness and efficiency. However, most of the SISR methods neglect the importance among the feature map channels. Moreover, they can not eliminate the redundant noises, making the output image be blurred. In this paper, we propose the residual subtraction and attention network (RSAN) for powerful feature expression and channels importance learning. More specifically, RSAN firstly implements one redundance removal module to learn noise information in the feature map and subtract noise through residual learning. Then it introduces the channel attention module to amplify high-frequency information and suppress the weight of effectless channels. Experimental results on extensive public benchmarks demonstrate our RSAN achieves significant improvement over the previous SISR methods in terms of both quantitative metrics and visual quality.

Phase Retrieval Using Conditional Generative Adversarial Networks

Tobias Uelwer, Alexander Oberstraß, Stefan Harmeling

Responsive image

Auto-TLDR; Conditional Generative Adversarial Networks for Phase Retrieval

Slides Poster Similar

In this paper, we propose the application of conditional generative adversarial networks to solve various phase retrieval problems. We show that including knowledge of the measurement process at training time leads to an optimization at test time that is more robust to initialization than existing approaches involving generative models. In addition, conditioning the generator network on the measurements enables us to achieve much more detailed results. We empirically demonstrate that these advantages provide meaningful solutions to the Fourier and the compressive phase retrieval problem and that our method outperforms well-established projection-based methods as well as existing methods that are based on neural networks. Like other deep learning methods, our approach is very robust to noise and can therefore be very useful for real-world applications.

VITON-GT: An Image-Based Virtual Try-On Model with Geometric Transformations

Matteo Fincato, Federico Landi, Marcella Cornia, Fabio Cesari, Rita Cucchiara

Responsive image

Auto-TLDR; VITON-GT: An Image-based Virtual Try-on Architecture for Fashion Catalogs

Slides Poster Similar

The large spread of online shopping has led computer vision researchers to develop different solutions for the fashion domain to potentially increase the online user experience and improve the efficiency of preparing fashion catalogs. Among them, image-based virtual try-on has recently attracted a lot of attention resulting in several architectures that can generate a new image of a person wearing an input try-on garment in a plausible and realistic way. In this paper, we present VITON-GT, a new model for virtual try-on that generates high-quality and photo-realistic images thanks to multiple geometric transformations. In particular, our model is composed of a two-stage geometric transformation module that performs two different projections on the input garment, and a transformation-guided try-on module that synthesize the new image. We experimentally validate the proposed solution on the most common dataset for this task, containing mainly t-shirts, and we demonstrate its effectiveness compared to different baselines and previous methods. Additionally, we assess the generalization capabilities of our model on a new set of fashion items composed of upper-body clothes from different categories. To the best of our knowledge, we are the first to test virtual try-on architectures in this challenging experimental setting.

Free-Form Image Inpainting Via Contrastive Attention Network

Xin Ma, Xiaoqiang Zhou, Huaibo Huang, Zhenhua Chai, Xiaolin Wei, Ran He

Responsive image

Auto-TLDR; Self-supervised Siamese inference for image inpainting

Slides Similar

Most deep learning based image inpainting approaches adopt autoencoder or its variants to fill missing regions in images. Encoders are usually utilized to learn powerful representational spaces, which are important for dealing with sophisticated learning tasks. Specifically, in the image inpainting task, masks with any shapes can appear anywhere in images (i.e., free-form masks) forming complex patterns. It is difficult for encoders to capture such powerful representations under this complex situation. To tackle this problem, we propose a self-supervised Siamese inference network to improve the robustness and generalization. Moreover, the restored image usually can not be harmoniously integrated into the exiting content, especially in the boundary area. To address this problem, we propose a novel Dual Attention Fusion module (DAF), which can combine both the restored and known regions in a smoother way and be inserted into decoder layers in a plug-and-play way. DAF is developed to not only adaptively rescale channel-wise features by taking interdependencies between channels into account but also force deep convolutional neural networks (CNNs) focusing more on unknown regions. In this way, the unknown region will be naturally filled from the outside to the inside. Qualitative and quantitative experiments on multiple datasets, including facial and natural datasets (i.e., Celeb-HQ, Pairs Street View, Places2 and ImageNet), demonstrate that our proposed method outperforms against state-of-the-arts in generating high-quality inpainting results.

Continuous Learning of Face Attribute Synthesis

Ning Xin, Shaohui Xu, Fangzhe Nan, Xiaoli Dong, Weijun Li, Yuanzhou Yao

Responsive image

Auto-TLDR; Continuous Learning for Face Attribute Synthesis

Slides Poster Similar

The generative adversarial network (GAN) exhibits great superiority in the face attribute synthesis task. However, existing methods have very limited effects on the expansion of new attributes. To overcome the limitations of a single network in new attribute synthesis, a continuous learning method for face attribute synthesis is proposed in this work. First, the feature vector of the input image is extracted and attribute direction regression is performed in the feature space to obtain the axes of different attributes. The feature vector is then linearly guided along the axis so that images with target attributes can be synthesized by the decoder. Finally, to make the network capable of continuous learning, the orthogonal direction modification module is used to extend the newly-added attributes. Experimental results show that the proposed method can endow a single network with the ability to learn attributes continuously, and, as compared to those produced by the current state-of-the-art methods, the synthetic attributes have higher accuracy.

Super-Resolution Guided Pore Detection for Fingerprint Recognition

Syeda Nyma Ferdous, Ali Dabouei, Jeremy Dawson, Nasser M. Nasarabadi

Responsive image

Auto-TLDR; Super-Resolution Generative Adversarial Network for Fingerprint Recognition Using Pore Features

Slides Poster Similar

Performance of fingerprint recognition algorithms substantially rely on fine features extracted from fingerprints. Apart from minutiae and ridge patterns, pore features have proven to be usable for fingerprint recognition. Although features from minutiae and ridge patterns are quite attainable from low-resolution images, using pore features is practical only if the fingerprint image is of high resolution which necessitates a model that enhances the image quality of the conventional 500 ppi legacy fingerprints preserving the fine details. To find a solution for recovering pore information from low-resolution fingerprints, we adopt a joint learning-based approach that combines both super-resolution and pore detection networks. Our modified single image Super-Resolution Generative Adversarial Network (SRGAN) framework helps to reliably reconstruct high-resolution fingerprint samples from low-resolution ones assisting the pore detection network to identify pores with a high accuracy. The network jointly learns a distinctive feature representation from a real low-resolution fingerprint sample and successfully synthesizes a high-resolution sample from it. To add discriminative information and uniqueness for all the subjects, we have integrated features extracted from a deep fingerprint verifier with the SRGAN quality discriminator. We also add ridge reconstruction loss, utilizing ridge patterns to make the best use of extracted features. Our proposed method solves the recognition problem by improving the quality of fingerprint images. High recognition accuracy of the synthesized samples that is close to the accuracy achieved using the original high-resolution images validate the effectiveness of our proposed model.

The Effect of Image Enhancement Algorithmson Convolutional Neural Networks

José A. Rodríguez-Rodríguez, Miguel A. Molina-Cabello, Rafaela Benítez-Rochel, Ezequiel López-Rubio

Responsive image

Auto-TLDR; Optimization of Convolutional Neural Networks for Image Classification

Slides Poster Similar

Convolutional Neural Networks (CNNs) are widely used due to their high performance in many tasks related to computer vision. In particular, image classification is one of the fields where CNNs are employed with success. However, images can be heavily affected by several inconveniences such as noise or illumination. Therefore, image enhancement algorithms have been developed to improve the quality of the images. In this work, the impact that brightness and image contrast enhancement techniques have on the performance achieved by CNNs in classification tasks is analyzed. More specifically, several well known CNNs architectures such as Alexnet or Googlenet, and image contrast enhancement techniques such as Gamma Correction or Logarithm Transformation are studied. Different experiments have been carried out, and the obtained qualitative and quantitative results are reported.

Galaxy Image Translation with Semi-Supervised Noise-Reconstructed Generative Adversarial Networks

Qiufan Lin, Dominique Fouchez, Jérôme Pasquet

Responsive image

Auto-TLDR; Semi-supervised Image Translation with Generative Adversarial Networks Using Paired and Unpaired Images

Slides Poster Similar

Image-to-image translation with Deep Learning neural networks, particularly with Generative Adversarial Networks (GANs), is one of the most powerful methods for simulating astronomical images. However, current work is limited to utilizing paired images with supervised translation, and there has been rare discussion on reconstructing noise background that encodes instrumental and observational effects. These limitations might be harmful for subsequent scientific applications in astrophysics. Therefore, we aim to develop methods for using unpaired images and preserving noise characteristics in image translation. In this work, we propose a two-way image translation model using GANs that exploits both paired and unpaired images in a semi-supervised manner, and introduce a noise emulating module that is able to learn and reconstruct noise characterized by high-frequency features. By experimenting on multi-band galaxy images from the Sloan Digital Sky Survey (SDSS) and the Canada France Hawaii Telescope Legacy Survey (CFHT), we show that our method recovers global and local properties effectively and outperforms benchmark image translation models. To our best knowledge, this work is the first attempt to apply semi-supervised methods and noise reconstruction techniques in astrophysical studies.

Deep Universal Blind Image Denoising

Jae Woong Soh, Nam Ik Cho

Responsive image

Auto-TLDR; Image Denoising with Deep Convolutional Neural Networks

Slides Similar

Image denoising is an essential part of many image processing and computer vision tasks due to inevitable noise corruption during image acquisition. Traditionally, many researchers have investigated image priors for the denoising, within the Bayesian perspective based on image properties and statistics. Recently, deep convolutional neural networks (CNNs) have shown great success in image denoising by incorporating large-scale synthetic datasets. However, they both have pros and cons. While the deep CNNs are powerful for removing the noise with known statistics, they tend to lack flexibility and practicality for the blind and real-world noise. Moreover, they cannot easily employ explicit priors. On the other hand, traditional non-learning methods can involve explicit image priors, but they require considerable computation time and cannot exploit large-scale external datasets. In this paper, we present a CNN-based method that leverages the advantages of both methods based on the Bayesian perspective. Concretely, we divide the blind image denoising problem into sub-problems and conquer each inference problem separately. As the CNN is a powerful tool for inference, our method is rooted in CNNs and propose a novel design of network for efficient inference. With our proposed method, we can successfully remove blind and real-world noise, with a moderate number of parameters of universal CNN.

Deep Fusion of RGB and NIR Paired Images Using Convolutional Neural Networks

琳 梅, Cheolkon Jung

Responsive image

Auto-TLDR; Deep Fusion of RGB and NIR paired images in low light condition using convolutional neural networks

Slides Poster Similar

In low light condition, the captured color (RGB) images are highly degraded by noise with severe texture loss. In this paper, we propose deep fusion of RGB and NIR paired images in low light condition using convolutional neural networks (CNNs). The proposed deep fusion network consists of three independent sub-networks: denoising, enhancing, and fusion. We build a denoising sub-network to eliminate noise from noisy RGB images. After denoising, we perform an enhancing sub-network to increase the brightness of low light RGB images. Since NIR image contains fine details, we fuse it with the Y channel of RGB image through a fusion sub-network. Experimental results demonstrate that the proposed method successfully fuses RGB and NIR images, and generates high quality fusion results containing textures and colors.

DR2S: Deep Regression with Region Selection for Camera Quality Evaluation

Marcelin Tworski, Stéphane Lathuiliere, Salim Belkarfa, Attilio Fiandrotti, Marco Cagnazzo

Responsive image

Auto-TLDR; Texture Quality Estimation Using Deep Learning

Slides Poster Similar

In this work, we tackle the problem of estimating a camera capability to preserve fine texture details at a given lighting condition. Importantly, our texture preservation measurement should coincide with human perception. Consequently, we formulate our problem as a regression one and we introduce a deep convolutional network to estimate texture quality score. At training time, we use ground-truth quality scores provided by expert human annotators in order to obtain a subjective quality measure. In addition, we propose a region selection method to identify the image regions that are better suited at measuring perceptual quality. Finally, our experimental evaluation shows that our learning-based approach outperforms existing methods and that our region selection algorithm consistently improves the quality estimation.

On-Device Text Image Super Resolution

Dhruval Jain, Arun Prabhu, Gopi Ramena, Manoj Goyal, Debi Mohanty, Naresh Purre, Sukumar Moharana

Responsive image

Auto-TLDR; A Novel Deep Neural Network for Super-Resolution on Low Resolution Text Images

Slides Poster Similar

Recent research on super-resolution (SR) has wit- nessed major developments with the advancements of deep convolutional neural networks. There is a need for information extraction from scenic text images or even document images on device, most of which are low-resolution (LR) images. Therefore, SR becomes an essential pre-processing step as Bicubic Upsampling, which is conventionally present in smartphones, performs poorly on LR images. To give the user more control over his privacy, and to reduce the carbon footprint by reducing the overhead of cloud computing and hours of GPU usage, executing SR models on the edge is a necessity in the recent times. There are various challenges in running and optimizing a model on resource-constrained platforms like smartphones. In this paper, we present a novel deep neural network that reconstructs sharper character edges and thus boosts OCR confidence. The proposed architecture not only achieves significant improvement in PSNR over bicubic upsampling on various benchmark datasets but also runs with an average inference time of 11.7 ms per image. We have outperformed state-of-the-art on the Text330 dataset. We also achieve an OCR accuracy of 75.89% on the ICDAR 2015 TextSR dataset, where ground truth has an accuracy of 78.10%.

Semi-Supervised Deep Learning Techniques for Spectrum Reconstruction

Adriano Simonetto, Vincent Parret, Alexander Gatto, Piergiorgio Sartor, Pietro Zanuttigh

Responsive image

Auto-TLDR; hyperspectral data estimation from RGB data using semi-supervised learning

Slides Poster Similar

State-of-the-art approaches for the estimation of hyperspectral images (HSI) from RGB data are mostly based on deep learning techniques but due to the lack of training data their performances are limited to uncommon scenarios where a large hyperspectral database is available. In this work we present a family of novel deep learning schemes for hyperspectral data estimation able to work when the hyperspectral information at our disposal is limited. Firstly, we introduce a learning scheme exploiting a physical model based on the backward mapping to the RGB space and total variation regularization that can be trained with a limited amount of HSI images. Then, we propose a novel semi-supervised learning scheme able to work even with just a few pixels labeled with hyperspectral information. Finally, we show that the approach can be extended to a transfer learning scenario. The proposed techniques allow to reach impressive performances while requiring only some HSI images or just a few pixels for the training.

MBD-GAN: Model-Based Image Deblurring with a Generative Adversarial Network

Li Song, Edmund Y. Lam

Responsive image

Auto-TLDR; Model-Based Deblurring GAN for Inverse Imaging

Slides Poster Similar

This paper presents a methodology to tackle inverse imaging problems by leveraging the synergistic power of imaging model and deep learning. The premise is that while learning-based techniques have quickly become the methods of choice in various applications, they often ignore the prior knowledge embedded in imaging models. Incorporating the latter has the potential to improve the image estimation. Specifically, we first provide a mathematical basis of using generative adversarial network (GAN) in inverse imaging through considering an optimization framework. Then, we develop the specific architecture that connects the generator and discriminator networks with the imaging model. While this technique can be applied to a variety of problems, from image reconstruction to super-resolution, we take image deblurring as the example here, where we show in detail the implementation and experimental results of what we call the model-based deblurring GAN (MBD-GAN).

UCCTGAN: Unsupervised Clothing Color Transformation Generative Adversarial Network

Shuming Sun, Xiaoqiang Li, Jide Li

Responsive image

Auto-TLDR; An Unsupervised Clothing Color Transformation Generative Adversarial Network

Slides Poster Similar

Clothing color transformation refers to changing the clothes color in an original image to the clothes color in a target image. In this paper, we propose an Unsupervised Clothing Color Transformation Generative Adversarial Network (UCCTGAN) for the task. UCCTGAN adopts the color histogram of a target clothes as color guidance and an improved U-net architecture called AntennaNet is put forward to fuse the extracted color information with the original image. Meanwhile, to accomplish unsupervised learning, the loss function is carefully designed according to color moment, which evaluates the chromatic aberration between the target clothing and the generated clothing. Experimental results show that our network has the ability to generate convincing color transformation results.

Detail-Revealing Deep Low-Dose CT Reconstruction

Xinchen Ye, Yuyao Xu, Rui Xu, Shoji Kido, Noriyuki Tomiyama

Responsive image

Auto-TLDR; A Dual-branch Aggregation Network for Low-Dose CT Reconstruction

Slides Poster Similar

Low-dose CT imaging emerges with low radiation risk due to the reduction of radiation dose, but brings negative impact on the imaging quality. This paper addresses the problem of low-dose CT reconstruction. Previous methods are unsatisfactory due to the inaccurate recovery of image details under the strong noise generated by the reduction of radiation dose, which directly affects the final diagnosis. To suppress the noise effectively while retain the structures well, we propose a detail-revealing dual-branch aggregation network to effectively reconstruct the degraded CT image. Specifically, the main reconstruction branch iteratively exploits and compensates the reconstruction errors to gradually refine the CT image, while the prior branch is to learn the structure details as prior knowledge to help recover the CT image. A sophisticated detail-revealing loss is designed to fuse the information from both branches and guide the learning to obtain better performance from pixel-wise and holistic perspectives respectively. Experimental results show that our method outperforms the state-of-art methods in both PSNR and SSIM metrics.

Image Inpainting with Contrastive Relation Network

Xiaoqiang Zhou, Junjie Li, Zilei Wang, Ran He, Tieniu Tan

Responsive image

Auto-TLDR; Two-Stage Inpainting with Graph-based Relation Network

Slides Similar

Image inpainting faces the challenging issue of the requirements on structure reasonableness and texture coherence. In this paper, we propose a two-stage inpainting framework to address this issue. The basic idea is to address the two requirements in two separate stages. Completed segmentation of the corrupted image is firstly predicted through segmentation reconstruction network, while fine-grained image details are restored in the second stage through an image generator. The two stages are connected in series as the image details are generated under the guidance of completed segmentation map that predicted in the first stage. Specifically, in the second stage, we propose a novel graph-based relation network to model the relationship existed in corrupted image. In relation network, both intra-relationship for pixels in the same semantic region and inter-relationship between different semantic parts are considered, improving the consistency and compatibility of image textures. Besides, contrastive loss is designed to facilitate the relation network training. Such a framework not only simplifies the inpainting problem directly, but also exploits the relationship in corrupted image explicitly. Extensive experiments on various public datasets quantitatively and qualitatively demonstrate the superiority of our approach compared with the state-of-the-art.

Edge-Guided CNN for Denoising Images from Portable Ultrasound Devices

Yingnan Ma, Fei Yang, Anup Basu

Responsive image

Auto-TLDR; Edge-Guided Convolutional Neural Network for Portable Ultrasound Images

Slides Poster Similar

Ultrasound is a non-invasive tool that is useful for medical diagnosis and treatment. To reduce long wait times and add convenience to patients, portable ultrasound scanning devices are becoming increasingly popular. These devices can be held in one palm, and are compatible with modern cell phones. However, the quality of ultrasound images captured from the portable scanners is relatively poor compared to standard ultrasound scanning systems in hospitals. To improve the quality of the ultrasound images obtained from portable ultrasound devices, we propose a new neural network architecture called Edge-Guided Convolutional Neural Network (EGCNN), which can preserve significant edge information in ultrasound images when removing noise. We also study and compare the effectiveness of classical filtering approaches in removing speckle noise in these images. Experimental results show that after applying the proposed EGCNN, various organs can be better recognized from ultrasound images. This approach is expected to lead to better accuracy in diagnostics in the future.

Position-Aware and Symmetry Enhanced GAN for Radial Distortion Correction

Yongjie Shi, Xin Tong, Jingsi Wen, He Zhao, Xianghua Ying, Jinshi Hongbin Zha

Responsive image

Auto-TLDR; Generative Adversarial Network for Radial Distorted Image Correction

Slides Poster Similar

This paper presents a novel method based on the generative adversarial network for radial distortion correction. Instead of generating a corrected image, our generator predicts a pixel flow map to measure the pixel offset between the distorted and corrected image. The quality of the generated pixel flow map and the warped image are judged by the discriminator. As texture far away from the image center has strong distortion, we develop an Adaptive Inverted Foveal layer which can transform the deformation to the intensity of the image to exploit this property. Rotation symmetry enhanced convolution kernels are applied to extract geometric features of different orientations explicitly. These learned features are recalibrated using the Squeeze-and-Excitation block to assign different weights for different directions. Moreover, we construct a first real-world radial distorted image dataset RD600 annotated with ground truth to evaluate our proposed method. We conduct extensive experiments to validate the effectiveness of each part of our framework. The further experiment shows our approach outperforms previous methods in both synthetic and real-world datasets quantitatively and qualitatively.

DSPNet: Deep Learning-Enabled Blind Reduction of Speckle Noise

Yuxu Lu, Meifang Yang, Liu Wen

Responsive image

Auto-TLDR; Deep Blind DeSPeckling Network for Imaging Applications

Poster Similar

Blind reduction of speckle noise has become a long-standing unsolved problem in several imaging applications, such as medical ultrasound imaging, synthetic aperture radar (SAR) imaging, and underwater sonar imaging, etc. The unwanted noise could lead to negative effects on the reliable detection and recognition of objects of interest. From a statistical point of view, speckle noise could be assumed to be multiplicative, significantly different from the common additive Gaussian noise. The purpose of this study is to blindly reduce the speckle noise under non-ideal imaging conditions. The multiplicative relationship between latent sharp image and random noise will be first converted into an additive version through a logarithmic transformation. To promote imaging performance, we introduced the feature pyramid network (FPN) and atrous spatial pyramid pooling (ASPP), contributing to a more powerful deep blind DeSPeckling Network (named as DSPNet). In particular, DSPNet is mainly composed of two subnetworks, i.e., Log-NENet (i.e., noise estimation network in logarithmic domain) and Log-DNNet (i.e., denoising network in logarithmic domain). Log-NENet and Log-DNNet are, respectively, proposed to estimate noise level map and reduce random noise in logarithmic domain. The multi-scale mixed loss function is further proposed to improve the robust generalization of DSPNet. The proposed deep blind despeckling network is capable of reducing random noise and preserving salient image details. Both synthetic and realistic experiments have demonstrated the superior performance of our DSPNet in terms of quantitative evaluations and visual image qualities.

D3Net: Joint Demosaicking, Deblurring and Deringing

Tomas Kerepecky, Filip Sroubek

Responsive image

Auto-TLDR; Joint demosaicking deblurring and deringing network with light-weight architecture inspired by the alternating direction method of multipliers

Slides Similar

Images acquired with standard digital cameras have Bayer patterns and suffer from lens blur. A demosaicking step is implemented in every digital camera, yet blur often remains unattended due to computational cost and instability of deblurring algorithms. Linear methods, which are computationally less demanding, produce ringing artifacts in deblurred images. Complex non-linear deblurring methods avoid artifacts, however their complexity imply offline application after camera demosaicking, which leads to sub-optimal performance. In this work, we propose a joint demosaicking deblurring and deringing network with a light-weight architecture inspired by the alternating direction method of multipliers. The proposed network has a transparent and clear interpretation compared to other black-box data driven approaches. We experimentally validate its superiority over state-of-the-art demosaicking methods with offline deblurring.

Future Urban Scenes Generation through Vehicles Synthesis

Alessandro Simoni, Luca Bergamini, Andrea Palazzi, Simone Calderara, Rita Cucchiara

Responsive image

Auto-TLDR; Predicting the Future of an Urban Scene with a Novel View Synthesis Paradigm

Slides Poster Similar

In this work we propose a deep learning pipeline to predict the visual future appearance of an urban scene. Despite recent advances, generating the entire scene in an end-to-end fashion is still far from being achieved. Instead, here we follow a two stages approach, where interpretable information is included in the loop and each actor is modelled independently. We leverage a per-object novel view synthesis paradigm; i.e. generating a synthetic representation of an object undergoing a geometrical roto-translation in the 3D space. Our model can be easily conditioned with constraints (e.g. input trajectories) provided by state-of-the-art tracking methods or by the user itself. This allows us to generate a set of diverse realistic futures starting from the same input in a multi-modal fashion. We visually and quantitatively show the superiority of this approach over traditional end-to-end scene-generation methods on CityFlow, a challenging real world dataset.

Deep Iterative Residual Convolutional Network for Single Image Super-Resolution

Rao Muhammad Umer, Gian Luca Foresti, Christian Micheloni

Responsive image

Auto-TLDR; ISRResCNet: Deep Iterative Super-Resolution Residual Convolutional Network for Single Image Super-resolution

Slides Similar

Deep convolutional neural networks (CNNs) have recently achieved great success for single image super-resolution (SISR) task due to their powerful feature representation capabilities. Most recent deep learning based SISR methods focus on designing deeper / wider models to learn the non-linear mapping between low-resolution (LR) inputs and the high-resolution (HR) outputs. These existing SR methods do not take into account the image observation (physical) model and thus require a large number of network's trainable parameters with a huge volume of training data. To address these issues, we propose a deep Iterative Super-Resolution Residual Convolutional Network (ISRResCNet) that exploits the powerful image regularization and large-scale optimization techniques by training the deep network in an iterative manner with a residual learning approach. Extensive experimental results on various super-resolution benchmarks demonstrate that our method with a few trainable parameters improves results for different scaling factors in comparison with the state-of-art methods.

Fast Region-Adaptive Defogging and Enhancement for Outdoor Images Containing Sky

Zhan Li, Xiaopeng Zheng, Bir Bhanu, Shun Long, Qingfeng Zhang, Zhenghao Huang

Responsive image

Auto-TLDR; Image defogging and enhancement of hazy outdoor scenes using region-adaptive segmentation and region-ratio-based adaptive Gamma correction

Slides Poster Similar

Inclement weather, haze, and fog severely decrease the performance of outdoor imaging systems. Due to a large range of the depth-of-field, most image dehazing or enhancement methods suffer from color distortions and halo artifacts when applied to real-world hazy outdoor scenes, especially those with the sky. To effectively recover details in both distant and nearby regions as well as to preserve color fidelity of the sky, in this study, we propose a novel image defogging and enhancement approach based on a replaceable plug-in segmentation module and region-adaptive processing. First, regions of the grayish sky, pure white objects, and other parts are separated. Several segmentation methods are studied, including an efficient threshold-based one used for this work. Second, a luminance-inverted multi-scale Retinex with color restoration (MSRCR) and region-ratio-based adaptive Gamma correction are applied to non-grayish and non-white areas. Finally, the enhanced regions are stitched seamlessly by using a mean-filtered region mask. The proposed method is efficient in defogging natural outdoor scenes and requires no training data or prior knowledge. Extensive experiments show that the proposed approach not only outperforms several state-of-the-art defogging methods in terms of both visibility and color fidelity, but also provides enhanced outputs with fewer artifacts and halos, particularly in sky regions.

A GAN-Based Blind Inpainting Method for Masonry Wall Images

Yahya Ibrahim, Balázs Nagy, Csaba Benedek

Responsive image

Auto-TLDR; An End-to-End Blind Inpainting Algorithm for Masonry Wall Images

Slides Poster Similar

In this paper we introduce a novel end-to-end blind inpainting algorithm for masonry wall images, performing the automatic detection and virtual completion of occluded or damaged wall regions. For this purpose, we propose a three-stage deep neural network that comprises a U-Net-based sub-network for wall segmentation into brick, mortar and occluded regions, which is followed by a two-stage adversarial inpainting model. The first adversarial network predicts the schematic mortar-brick pattern of the occluded areas based on the observed wall structure, providing in itself valuable structural information for archeological and architectural applications. Finally, the second adversarial network predicts the RGB pixel values yielding a realistic visual experience for the observer. While the three stages implement a sequential pipeline, they interact through dependencies of their loss functions admitting the consideration of hidden feature dependencies between the different network components. For training and testing the network a new dataset has been created, and an extensive qualitative and quantitative evaluation versus the state-of-the-art is given.

Single Image Super-Resolution with Dynamic Residual Connection

Karam Park, Jae Woong Soh, Nam Ik Cho

Responsive image

Auto-TLDR; Dynamic Residual Attention Network for Lightweight Single Image Super-Residual Networks

Slides Poster Similar

Deep convolutional neural networks have shown significant improvement in the single image super-resolution (SISR) field. Recently, there have been attempts to solve the SISR problem using lightweight networks, considering limited computational resources for real-world applications. Especially for lightweight networks, balancing between parameter demand and performance is very difficult to adjust, and most lightweight SISR networks are manually designed based on a huge number of brute-force experiments. Besides, a critical key to the network performance relies on the skip connection of building blocks that are repeatedly in the architecture. Notably, in previous works, these connections are pre-defined and manually determined by human researchers. Hence, they are less flexible to the input image statistics, and there can be a better solution for the given number of parameters. Therefore, we focus on the automated design of networks regarding the connection of basic building blocks (residual networks), and as a result, propose a dynamic residual attention network (DRAN). The proposed method allows the network to dynamically select residual paths depending on the input image, based on the idea of attention mechanism. For this, we design a dynamic residual module that determines the residual paths between the basic building blocks for the given input image. By finding optimal residual paths between the blocks, the network can selectively bypass informative features needed to reconstruct the target high-resolution (HR) image. Experimental results show that our proposed DRAN outperforms most of the existing state-of-the-arts lightweight models in SISR.

Learning Image Inpainting from Incomplete Images using Self-Supervision

Sriram Yenamandra, Rohit Kumar Jena, Ansh Khurana, Suyash Awate

Responsive image

Auto-TLDR; Unsupervised Deep Neural Network for Semantic Image Inpainting

Slides Poster Similar

Current approaches for semantic image inpainting rely on deep neural networks (DNNs) that learn under full supervision, i.e., using a training set comprising pairs of (i)corrupted images with holes and (ii)corresponding uncorrupted images. However, for several real-world applications, obtaining large sets of uncorrupted images is challenging or infeasible. Current methods also rely on adversarial training involving min-max optimization that is prone to instability during learning. We propose a novel image-inpainting DNN framework that can learn in both completely unsupervised and semi-supervised modes. Moreover, our DNN learning formulation bypasses adversarial training and, thereby, lends itself to more stable training. Results on the publicly available CelebA dataset show that our method, even when learning unsupervisedly, outperforms the state of the art that learns with full supervision.

Pixel-based Facial Expression Synthesis

Arbish Akram, Nazar Khan

Responsive image

Auto-TLDR; pixel-based facial expression synthesis using GANs

Slides Poster Similar

Recently, Facial expression synthesis has shown remarkable advances with the advent of Generative Adversarial Networks (GANs). However, these GAN-based approaches mostly generate photo-realistic results as long as the target data distribution is close to the training data distribution. The quality of GANs results significantly degrades when testing images are from a slightly different distribution. In this work, we propose a pixel-based facial expression synthesis method. Recent work has shown that facial expression synthesis changes only local regions of faces. In the proposed method, each output pixel observes only one input pixel. The proposed method achieves generalization capability by leveraging only few hundred images. Experimental results demonstrate that the proposed method performs comparably with the recent GANs on in-dataset images and significantly outperforms on in the wild images. In addition, the proposed method is faster and it also achieves significantly better performance with two orders of magnitudes lesser computational and storage cost as compared to state-of-the-art GAN-based methods.

Face Super-Resolution Network with Incremental Enhancement of Facial Parsing Information

Shuang Liu, Chengyi Xiong, Zhirong Gao

Responsive image

Auto-TLDR; Learning-based Face Super-Resolution with Incremental Boosting Facial Parsing Information

Slides Poster Similar

Recently, facial priors based face super-resolution (SR) methods have obtained significant performance gains in dealing with extremely degraded facial images, and facial priors have also been proved useful in facilitating the inference of face images. Based on this, how to fully fuse facial priors into deep features to improve face SR performance has attracted a major attention. In this paper, we propose a learning-based face SR approach with incremental boosting facial parsing information (IFPSR) for high-magnification of low-resolution faces. The proposed IFPSR method consists of three main parts: i) a three-stage parsing map embedded features upsampling network, in which image recovery and prior estimation processes are performed simultaneously and progressively to improve the image resolution; ii) a progressive training method and a joint facial attention and heatmap loss to obtain better facial attributes; iii) the channel attention strategy in residual dense blocks to adaptively learn facial features. Extensive experimental results show that compared with the state-of-the-art methods in terms of quantitative and qualitative metrics, our approach can achieve an outstanding balance between SR image quality and low network complexity.

Stylized-Colorization for Line Arts

Tzu-Ting Fang, Minh Duc Vo, Akihiro Sugimoto, Shang-Hong Lai

Responsive image

Auto-TLDR; Stylized-colorization using GAN-based End-to-End Model for Anime

Slides Poster Similar

We address a novel problem of stylized-colorization which colorizes a given line art using a given coloring style in text. This problem can be stated as multi-domain image translation and is more challenging than the current colorization problem because it requires not only capturing the illustration distribution but also satisfying the required coloring styles specific to anime such as lightness, shading, or saturation. We propose a GAN-based end-to-end model for stylized-colorization where the model has one generator and two discriminators. Our generator is based on the U-Net architecture and receives a pair of a line art and a coloring style in text as its input to produce a stylized-colorization image of the line art. Two discriminators, on the other hand, share weights at early layers to judge the stylized-colorization image in two different aspects: one for color and one for style. One generator and two discriminators are jointly trained in an adversarial and end-to-end manner. Extensive experiments demonstrate the effectiveness of our proposed model.

Removing Raindrops from a Single Image Using Synthetic Data

Yoshihito Kokubo, Shusaku Asada, Hirotaka Maruyama, Masaru Koide, Kohei Yamamoto, Yoshihisa Suetsugu

Responsive image

Auto-TLDR; Raindrop Removal Using Synthetic Raindrop Data

Slides Poster Similar

We simulated the exact features of raindrops on a camera lens and conducted an experiment to evaluate the performance of a network trained to remove raindrops using synthetic raindrop data. Although research has been conducted to precisely evaluate methods to remove raindrops, with some evaluation networks trained on images with real raindrops and others trained on images with synthetic raindrops, there have not been any studies that have directly compared the performance of two networks trained on each respective kind of image. In a previous study wherein images with synthetic raindrops were used for training, the network did not work effectively on images with real raindrops because the shapes of the raindrops were simulated using simple arithmetic expressions. In this study, we focused on generating raindrop shapes that are closer to reality with the aim of using these synthetic raindrops in images to develop a technique for removing real-world raindrops. After categorizing raindrops by type, we further separated each raindrop type into its constituent elements, generated each element separately, and finally combined the generated elements. The proposed technique was used to add images with synthetic raindrops to the training data, and when we evaluated the model, we confirmed that the technique's precision exceeded that of when only images with actual raindrops were used for training. The evaluation results proved that images with synthetic raindrops can be used as training data for real-world images.

Boundary Guided Image Translation for Pose Estimation from Ultra-Low Resolution Thermal Sensor

Kohei Kurihara, Tianren Wang, Teng Zhang, Brian Carrington Lovell

Responsive image

Auto-TLDR; Pose Estimation on Low-Resolution Thermal Images Using Image-to-Image Translation Architecture

Slides Poster Similar

This work addresses the pose estimation task on low-resolution images captured using thermal sensors which can operate in a no-light environment. Low-resolution thermal sensors have been widely adopted in various applications for cost control and privacy protection purposes. In this paper, targeting the challenging scenario of ultra-low resolution thermal imaging (3232 pixels), we aim to estimate human poses for the purpose of monitoring health conditions and indoor events. To overcome the challenges in ultra-low resolution thermal imaging such as blurred boundaries and data scarcity, we propose a new Image-to-Image (I2I) translation architecture which can translate the original blurred thermal image into a visible light image with sharper boundaries. Then the generated visible light image can be fed into the off-the-shelf pose estimator which was well-trained in the visible domain. Experimental results suggest that the proposed framework outperforms other state-of-the-art methods in the I2I based pose estimation task for our thermal image dataset. Furthermore, we also demonstrated the merits of the proposed method on the publicly available FLIR dataset by measuring the quality of translated images.

Deep Realistic Novel View Generation for City-Scale Aerial Images

Koundinya Nouduri, Ke Gao, Joshua Fraser, Shizeng Yao, Hadi Aliakbarpour, Filiz Bunyak, Kannappan Palaniappan

Responsive image

Auto-TLDR; End-to-End 3D Voxel Renderer for Multi-View Stereo Data Generation and Evaluation

Slides Poster Similar

In this paper we introduce a novel end-to-end frameworkfor generation of large, aerial, city-scale, realistic syntheticimage sequences with associated accurate and precise camerametadata. The two main purposes for this data are (i) to en-able objective, quantitative evaluation of computer vision al-gorithms and methods such as feature detection, description,and matching or full computer vision pipelines such as 3D re-construction; and (ii) to supply large amounts of high qualitytraining data for deep learning guided computer vision meth-ods. The proposed framework consists of three main mod-ules, a 3D voxel renderer for data generation, a deep neu-ral network for artifact removal, and a quantitative evaluationmodule for Multi-View Stereo (MVS) as an example. The3D voxel renderer enables generation of seen or unseen viewsof a scene from arbitary camera poses with accurate camerametadata parameters. The artifact removal module proposes anovel edge-augmented deep learning network with an explicitedgemap processing stream to remove image artifacts whilepreserving and recovering scene structures for more realis-tic results. Our experiments on two urban, city-scale, aerialdatasets for Albuquerque (ABQ), NM and Los Angeles (LA),CA show promising results in terms structural similarity toreal data and accuracy of reconstructed 3D point clouds