MBD-GAN: Model-Based Image Deblurring with a Generative Adversarial Network

Li Song, Edmund Y. Lam

Responsive image

Auto-TLDR; Model-Based Deblurring GAN for Inverse Imaging

Slides Poster

This paper presents a methodology to tackle inverse imaging problems by leveraging the synergistic power of imaging model and deep learning. The premise is that while learning-based techniques have quickly become the methods of choice in various applications, they often ignore the prior knowledge embedded in imaging models. Incorporating the latter has the potential to improve the image estimation. Specifically, we first provide a mathematical basis of using generative adversarial network (GAN) in inverse imaging through considering an optimization framework. Then, we develop the specific architecture that connects the generator and discriminator networks with the imaging model. While this technique can be applied to a variety of problems, from image reconstruction to super-resolution, we take image deblurring as the example here, where we show in detail the implementation and experimental results of what we call the model-based deblurring GAN (MBD-GAN).

Similar papers

GAN-Based Image Deblurring Using DCT Discriminator

Hiroki Tomosada, Takahiro Kudo, Takanori Fujisawa, Masaaki Ikehara

Responsive image

Auto-TLDR; DeblurDCTGAN: A Discrete Cosine Transform for Image Deblurring

Slides Poster Similar

In this paper, we propose high quality image debluring by using discrete cosine transform (DCT) with less computational complexity. Recently, Convolutional Neural Network (CNN) and Generative Adversarial Network (GAN) based algorithms have been proposed for image deblurring. Moreover, multi-scale architecture of CNN restores blurred image cleary and suppresses more ringing artifacts or block noise, but it takes much time to process. To solve these problems, we propose a method that preserves texture and suppresses ringing artifacts in the restored image without multi-scale architecture using DCT based loss named ``DeblurDCTGAN.''. It compares frequency domain of the images made from deblurred image and grand truth image by using DCT. Hereby, DeblurDCTGAN can reduce block noise or ringing artifacts while maintaining deblurring performance. Our experimental results show that DeblurDCTGAN gets the highest performances on both PSNR and SSIM comparing with other conventional methods in both GoPro test Dataset and DVD test Dataset. Also, the running time per pair of DeblurDCTGAN is faster than others.

Deep Iterative Residual Convolutional Network for Single Image Super-Resolution

Rao Muhammad Umer, Gian Luca Foresti, Christian Micheloni

Responsive image

Auto-TLDR; ISRResCNet: Deep Iterative Super-Resolution Residual Convolutional Network for Single Image Super-resolution

Slides Similar

Deep convolutional neural networks (CNNs) have recently achieved great success for single image super-resolution (SISR) task due to their powerful feature representation capabilities. Most recent deep learning based SISR methods focus on designing deeper / wider models to learn the non-linear mapping between low-resolution (LR) inputs and the high-resolution (HR) outputs. These existing SR methods do not take into account the image observation (physical) model and thus require a large number of network's trainable parameters with a huge volume of training data. To address these issues, we propose a deep Iterative Super-Resolution Residual Convolutional Network (ISRResCNet) that exploits the powerful image regularization and large-scale optimization techniques by training the deep network in an iterative manner with a residual learning approach. Extensive experimental results on various super-resolution benchmarks demonstrate that our method with a few trainable parameters improves results for different scaling factors in comparison with the state-of-art methods.

Detail-Revealing Deep Low-Dose CT Reconstruction

Xinchen Ye, Yuyao Xu, Rui Xu, Shoji Kido, Noriyuki Tomiyama

Responsive image

Auto-TLDR; A Dual-branch Aggregation Network for Low-Dose CT Reconstruction

Slides Poster Similar

Low-dose CT imaging emerges with low radiation risk due to the reduction of radiation dose, but brings negative impact on the imaging quality. This paper addresses the problem of low-dose CT reconstruction. Previous methods are unsatisfactory due to the inaccurate recovery of image details under the strong noise generated by the reduction of radiation dose, which directly affects the final diagnosis. To suppress the noise effectively while retain the structures well, we propose a detail-revealing dual-branch aggregation network to effectively reconstruct the degraded CT image. Specifically, the main reconstruction branch iteratively exploits and compensates the reconstruction errors to gradually refine the CT image, while the prior branch is to learn the structure details as prior knowledge to help recover the CT image. A sophisticated detail-revealing loss is designed to fuse the information from both branches and guide the learning to obtain better performance from pixel-wise and holistic perspectives respectively. Experimental results show that our method outperforms the state-of-art methods in both PSNR and SSIM metrics.

D3Net: Joint Demosaicking, Deblurring and Deringing

Tomas Kerepecky, Filip Sroubek

Responsive image

Auto-TLDR; Joint demosaicking deblurring and deringing network with light-weight architecture inspired by the alternating direction method of multipliers

Slides Similar

Images acquired with standard digital cameras have Bayer patterns and suffer from lens blur. A demosaicking step is implemented in every digital camera, yet blur often remains unattended due to computational cost and instability of deblurring algorithms. Linear methods, which are computationally less demanding, produce ringing artifacts in deblurred images. Complex non-linear deblurring methods avoid artifacts, however their complexity imply offline application after camera demosaicking, which leads to sub-optimal performance. In this work, we propose a joint demosaicking deblurring and deringing network with a light-weight architecture inspired by the alternating direction method of multipliers. The proposed network has a transparent and clear interpretation compared to other black-box data driven approaches. We experimentally validate its superiority over state-of-the-art demosaicking methods with offline deblurring.

Thermal Image Enhancement Using Generative Adversarial Network for Pedestrian Detection

Mohamed Amine Marnissi, Hajer Fradi, Anis Sahbani, Najoua Essoukri Ben Amara

Responsive image

Auto-TLDR; Improving Visual Quality of Infrared Images for Pedestrian Detection Using Generative Adversarial Network

Slides Poster Similar

Infrared imaging has recently played an important role in a wide range of applications including surveillance, robotics and night vision. However, infrared cameras often suffer from some limitations, essentially about low-contrast and blurred details. These problems contribute to the loss of observation of target objects in infrared images, which could limit the feasibility of different infrared imaging applications. In this paper, we mainly focus on the problem of pedestrian detection on thermal images. Particularly, we emphasis the need for enhancing the visual quality of images beforehand performing the detection step. % to ensure effective results. To address that, we propose a novel thermal enhancement architecture based on Generative Adversarial Network, and composed of two modules contrast enhancement and denoising modules with a post-processing step for edge restoration in order to improve the overall quality. The effectiveness of the proposed architecture is assessed by means of visual quality metrics and better results are obtained compared to the original thermal images and to the obtained results by other existing enhancement methods. These results have been conduced on a subset of KAIST dataset. Using the same dataset, the impact of the proposed enhancement architecture has been demonstrated on the detection results by obtaining better performance with a significant margin using YOLOv3 detector.

Single Image Deblurring Using Bi-Attention Network

Yaowei Li, Ye Luo, Jianwei Lu

Responsive image

Auto-TLDR; Bi-Attention Neural Network for Single Image Deblurring

Poster Similar

Recently, deep convolutional neural networks have been extensively applied into image deblurring and have achieved remarkable performance. However, most CNN-based image deblurring methods focus on simply increasing network depth, neglecting the contextual information of the blurred image and the reconstructed image. Meanwhile, most encoder-decoder based methods rarely exploit encoder's multi-layer features. To address these issues, we propose a bi-attention neural network for single image deblurring, which mainly consists of a bi-attention network and a feature fusion network. Specifically, two criss-cross attention modules are plugged before and after the encoder-decoder to capture long-range spatial contextual information in the blurred image and the reconstructed image simultaneously, and the feature fusion network combines multi-layer features from encoder to enable the decoder reconstruct the image with multi-scale features. The whole network is end-to-end trainable. Quantitative and qualitative experiment results validate that the proposed network outperforms state-of-the-art methods in terms of PSNR and SSIM on benchmark datasets.

SIDGAN: Single Image Dehazing without Paired Supervision

Pan Wei, Xin Wang, Lei Wang, Ji Xiang, Zihan Wang

Responsive image

Auto-TLDR; DehazeGAN: An End-to-End Generative Adversarial Network for Image Dehazing

Slides Poster Similar

Single image dehazing is challenging without scene airlight and transmission map. Most of existing dehazing algorithms tend to estimate key parameters based on manual designed priors or statistics, which may be invalid in some scenarios. Although deep learning-based dehazing methods provide an effective solution, most of them rely on paired training datasets, which are prohibitively difficult to be collected in real world. In this paper, we propose an effective end-to-end generative adversarial network for image dehazing, named DehazeGAN. The proposed DehazeGAN adopts a U-net architecture with a novel color-consistency loss derived from dark channel prior and perceptual loss, which can be trained in an unsupervised fashion without paired synthetic datasets. We create a RealHaze dataset for network training, including 4,000 outdoor hazy images and 4,000 haze-free images. Extensive experiments demonstrate that our proposed DehazeGAN achieves better performance than existing state-of-the-art methods on both synthetic datasets and real-world datasets in terms of PSNR, SSIM, and subjective visual experience.

Super-Resolution Guided Pore Detection for Fingerprint Recognition

Syeda Nyma Ferdous, Ali Dabouei, Jeremy Dawson, Nasser M. Nasarabadi

Responsive image

Auto-TLDR; Super-Resolution Generative Adversarial Network for Fingerprint Recognition Using Pore Features

Slides Poster Similar

Performance of fingerprint recognition algorithms substantially rely on fine features extracted from fingerprints. Apart from minutiae and ridge patterns, pore features have proven to be usable for fingerprint recognition. Although features from minutiae and ridge patterns are quite attainable from low-resolution images, using pore features is practical only if the fingerprint image is of high resolution which necessitates a model that enhances the image quality of the conventional 500 ppi legacy fingerprints preserving the fine details. To find a solution for recovering pore information from low-resolution fingerprints, we adopt a joint learning-based approach that combines both super-resolution and pore detection networks. Our modified single image Super-Resolution Generative Adversarial Network (SRGAN) framework helps to reliably reconstruct high-resolution fingerprint samples from low-resolution ones assisting the pore detection network to identify pores with a high accuracy. The network jointly learns a distinctive feature representation from a real low-resolution fingerprint sample and successfully synthesizes a high-resolution sample from it. To add discriminative information and uniqueness for all the subjects, we have integrated features extracted from a deep fingerprint verifier with the SRGAN quality discriminator. We also add ridge reconstruction loss, utilizing ridge patterns to make the best use of extracted features. Our proposed method solves the recognition problem by improving the quality of fingerprint images. High recognition accuracy of the synthesized samples that is close to the accuracy achieved using the original high-resolution images validate the effectiveness of our proposed model.

A NoGAN Approach for Image and Video Restoration and Compression Artifact Removal

Mameli Filippo, Marco Bertini, Leonardo Galteri, Alberto Del Bimbo

Responsive image

Auto-TLDR; Deep Neural Network for Image and Video Compression Artifact Removal and Restoration

Poster Similar

Lossy image and video compression algorithms introduce several different types of visual artifacts that reduce the visual quality of the compressed media, and the higher the compression rate the higher is the strength of these artifacts. In this work, we describe an approach for visual quality improvement of compressed images and videos to be performed at presentation time, so to obtain the benefits of fast data transfer and reduced data storage, while enjoying a visual quality that could be obtained only reducing the compression rate. To obtain this result we propose to use a deep neural network trained using the NoGAN approach, adapting the popular DeOldify architecture used for colorization. We show how the proposed method can be applied both to image and video compression artifact removal and restoration.

Edge-Guided CNN for Denoising Images from Portable Ultrasound Devices

Yingnan Ma, Fei Yang, Anup Basu

Responsive image

Auto-TLDR; Edge-Guided Convolutional Neural Network for Portable Ultrasound Images

Slides Poster Similar

Ultrasound is a non-invasive tool that is useful for medical diagnosis and treatment. To reduce long wait times and add convenience to patients, portable ultrasound scanning devices are becoming increasingly popular. These devices can be held in one palm, and are compatible with modern cell phones. However, the quality of ultrasound images captured from the portable scanners is relatively poor compared to standard ultrasound scanning systems in hospitals. To improve the quality of the ultrasound images obtained from portable ultrasound devices, we propose a new neural network architecture called Edge-Guided Convolutional Neural Network (EGCNN), which can preserve significant edge information in ultrasound images when removing noise. We also study and compare the effectiveness of classical filtering approaches in removing speckle noise in these images. Experimental results show that after applying the proposed EGCNN, various organs can be better recognized from ultrasound images. This approach is expected to lead to better accuracy in diagnostics in the future.

Deep Universal Blind Image Denoising

Jae Woong Soh, Nam Ik Cho

Responsive image

Auto-TLDR; Image Denoising with Deep Convolutional Neural Networks

Slides Similar

Image denoising is an essential part of many image processing and computer vision tasks due to inevitable noise corruption during image acquisition. Traditionally, many researchers have investigated image priors for the denoising, within the Bayesian perspective based on image properties and statistics. Recently, deep convolutional neural networks (CNNs) have shown great success in image denoising by incorporating large-scale synthetic datasets. However, they both have pros and cons. While the deep CNNs are powerful for removing the noise with known statistics, they tend to lack flexibility and practicality for the blind and real-world noise. Moreover, they cannot easily employ explicit priors. On the other hand, traditional non-learning methods can involve explicit image priors, but they require considerable computation time and cannot exploit large-scale external datasets. In this paper, we present a CNN-based method that leverages the advantages of both methods based on the Bayesian perspective. Concretely, we divide the blind image denoising problem into sub-problems and conquer each inference problem separately. As the CNN is a powerful tool for inference, our method is rooted in CNNs and propose a novel design of network for efficient inference. With our proposed method, we can successfully remove blind and real-world noise, with a moderate number of parameters of universal CNN.

Multi-Laplacian GAN with Edge Enhancement for Face Super Resolution

Shanlei Ko, Bi-Ru Dai

Responsive image

Auto-TLDR; Face Image Super-Resolution with Enhanced Edge Information

Slides Poster Similar

Face image super-resolution has become a research hotspot in the field of image processing. Nowadays, more and more researches add additional information, such as landmark, identity, to reconstruct high resolution images from low resolution ones, and have a good performance in quantitative terms and perceptual quality. However, these additional information is hard to obtain in many cases. In this work, we focus on reconstructing face images by extracting useful information from face images directly rather than using additional information. By observing edge information in each scale of face images, we propose a method to reconstruct high resolution face images with enhanced edge information. In additional, with the proposed training procedure, our method reconstructs photo-realistic images in upscaling factor 8x and outperforms state-of-the-art methods both in quantitative terms and perceptual quality.

Boosting High-Level Vision with Joint Compression Artifacts Reduction and Super-Resolution

Xiaoyu Xiang, Qian Lin, Jan Allebach

Responsive image

Auto-TLDR; A Context-Aware Joint CAR and SR Neural Network for High-Resolution Text Recognition and Face Detection

Slides Poster Similar

Due to the limits of bandwidth and storage space, digital images are usually down-scaled and compressed when transmitted over networks, resulting in loss of details and jarring artifacts that can lower the performance of high-level visual tasks. In this paper, we aim to generate an artifact-free high-resolution image from a low-resolution one compressed with an arbitrary quality factor by exploring joint compression artifacts reduction (CAR) and super-resolution (SR) tasks. First, we propose a context-aware joint CAR and SR neural network (CAJNN) that integrates both local and non-local features to solve CAR and SR in one-stage. Finally, a deep reconstruction network is adopted to predict high quality and high-resolution images. Evaluation on CAR and SR benchmark datasets shows that our CAJNN model outperforms previous methods and also takes 26.2% less runtime. Based on this model, we explore addressing two critical challenges in high-level computer vision: optical character recognition of low-resolution texts, and extremely tiny face detection. We demonstrate that CAJNN can serve as an effective image preprocessing method and improve the accuracy for real-scene text recognition (from 85.30% to 85.75%) and the average precision for tiny face detection (from 0.317 to 0.611).

Galaxy Image Translation with Semi-Supervised Noise-Reconstructed Generative Adversarial Networks

Qiufan Lin, Dominique Fouchez, Jérôme Pasquet

Responsive image

Auto-TLDR; Semi-supervised Image Translation with Generative Adversarial Networks Using Paired and Unpaired Images

Slides Poster Similar

Image-to-image translation with Deep Learning neural networks, particularly with Generative Adversarial Networks (GANs), is one of the most powerful methods for simulating astronomical images. However, current work is limited to utilizing paired images with supervised translation, and there has been rare discussion on reconstructing noise background that encodes instrumental and observational effects. These limitations might be harmful for subsequent scientific applications in astrophysics. Therefore, we aim to develop methods for using unpaired images and preserving noise characteristics in image translation. In this work, we propose a two-way image translation model using GANs that exploits both paired and unpaired images in a semi-supervised manner, and introduce a noise emulating module that is able to learn and reconstruct noise characterized by high-frequency features. By experimenting on multi-band galaxy images from the Sloan Digital Sky Survey (SDSS) and the Canada France Hawaii Telescope Legacy Survey (CFHT), we show that our method recovers global and local properties effectively and outperforms benchmark image translation models. To our best knowledge, this work is the first attempt to apply semi-supervised methods and noise reconstruction techniques in astrophysical studies.

Phase Retrieval Using Conditional Generative Adversarial Networks

Tobias Uelwer, Alexander Oberstraß, Stefan Harmeling

Responsive image

Auto-TLDR; Conditional Generative Adversarial Networks for Phase Retrieval

Slides Poster Similar

In this paper, we propose the application of conditional generative adversarial networks to solve various phase retrieval problems. We show that including knowledge of the measurement process at training time leads to an optimization at test time that is more robust to initialization than existing approaches involving generative models. In addition, conditioning the generator network on the measurements enables us to achieve much more detailed results. We empirically demonstrate that these advantages provide meaningful solutions to the Fourier and the compressive phase retrieval problem and that our method outperforms well-established projection-based methods as well as existing methods that are based on neural networks. Like other deep learning methods, our approach is very robust to noise and can therefore be very useful for real-world applications.

Local Facial Attribute Transfer through Inpainting

Ricard Durall, Franz-Josef Pfreundt, Janis Keuper

Responsive image

Auto-TLDR; Attribute Transfer Inpainting Generative Adversarial Network

Slides Poster Similar

The term attribute transfer refers to the tasks of altering images in such a way, that the semantic interpretation of a given input image is shifted towards an intended direction, which is quantified by semantic attributes. Prominent example applications are photo realistic changes of facial features and expressions, like changing the hair color, adding a smile, enlarging the nose or altering the entire context of a scene, like transforming a summer landscape into a winter panorama. Recent advances in attribute transfer are mostly based on generative deep neural networks, using various techniques to manipulate images in the latent space of the generator. In this paper, we present a novel method for the common sub-task of local attribute transfers, where only parts of a face have to be altered in order to achieve semantic changes (e.g. removing a mustache). In contrast to previous methods, where such local changes have been implemented by generating new (global) images, we propose to formulate local attribute transfers as an inpainting problem. Removing and regenerating only parts of images, our Attribute Transfer Inpainting Generative Adversarial Network (ATI-GAN) is able to utilize local context information to focus on the attributes while keeping the background unmodified resulting in visually sound results.

Video Reconstruction by Spatio-Temporal Fusion of Blurred-Coded Image Pair

Anupama S, Prasan Shedligeri, Abhishek Pal, Kaushik Mitr

Responsive image

Auto-TLDR; Recovering Video from Motion-Blurred and Coded Exposure Images Using Deep Learning

Slides Poster Similar

Learning-based methods have enabled the recovery of a video sequence from a single motion-blurred image or a single coded exposure image. Recovering video from a single motion-blurred image is a very ill-posed problem and the recovered video usually has many artifacts. In addition to this, the direction of motion is lost and it results in motion ambiguity. However, it has the advantage of fully preserving the information in the static parts of the scene. The traditional coded exposure framework is better-posed but it only samples a fraction of the space-time volume, which is at best $50\%$ of the space-time volume. Here, we propose to use the complementary information present in the fully-exposed (blurred) image along with the coded exposure image to recover a high fidelity video without any motion ambiguity. Our framework consists of a shared encoder followed by an attention module to selectively combine the spatial information from the fully-exposed image with the temporal information from the coded image, which is then super-resolved to recover a non-ambiguous high-quality video. The input to our algorithm is a fully-exposed and coded image pair. Such an acquisition system already exists in the form of a Coded-two-bucket (C2B) camera. We demonstrate that our proposed deep learning approach using blurred-coded image pair produces much better results than those from just a blurred image or just a coded image.

Adaptive Image Compression Using GAN Based Semantic-Perceptual Residual Compensation

Ruojing Wang, Zitang Sun, Sei-Ichiro Kamata, Weili Chen

Responsive image

Auto-TLDR; Adaptive Image Compression using GAN based Semantic-Perceptual Residual Compensation

Slides Poster Similar

Image compression is a basic task in image processing. In this paper, We present an adaptive image compression algorithm that relies on GAN based semantic-perceptual residual compensation, which is available to offer visually pleasing reconstruction at a low bitrate. Our method adopt an U-shaped encoding and decoding structure accompanied by a well-designed dense residual connection with strip pooling module to improve the original auto-encoder. Besides, we introduce the idea of adversarial learning by introducing a discriminator thus constructed a complete GAN. To improve the coding efficiency, we creatively designed an adaptive semantic-perception residual compensation block based on Grad-CAM algorithm. In the improvement of the quantizer, we embed the method of soft-quantization so as to solve the problem to some extent that back propagation process is irreversible. Simultaneously, we use the latest FLIF lossless compression algorithm and BPG vector compression algorithm to perform deeper compression on the image. More importantly experimental results including PSNR, MS-SSIM demonstrate that the proposed approach outperforms the current state-of-the-art image compression methods.

Tarsier: Evolving Noise Injection inSuper-Resolution GANs

Baptiste Roziere, Nathanaël Carraz Rakotonirina, Vlad Hosu, Rasoanaivo Andry, Hanhe Lin, Camille Couprie, Olivier Teytaud

Responsive image

Auto-TLDR; Evolutionary Super-Resolution using Diagonal CMA

Slides Poster Similar

Super-resolution aims at increasing the resolution and level of detail within an image. The current state of the art in general single-image super-resolution is held by nESRGAN+,which injects a Gaussian noise after each residual layer at training time. In this paper, we harness evolutionary methods to improve nESRGAN+ by optimizing the noise injection at inference time. More precisely, we use Diagonal CMA to optimize the injected noise according to a novel criterion combining quality assessment and realism. Our results are validated by the PIRM perceptual score and a human study. Our method outperforms nESRGAN+ on several standard super-resolution datasets. More generally, our approach can be used to optimize any method based on noise injection.

Detail Fusion GAN: High-Quality Translation for Unpaired Images with GAN-Based Data Augmentation

Ling Li, Yaochen Li, Chuan Wu, Hang Dong, Peilin Jiang, Fei Wang

Responsive image

Auto-TLDR; Data Augmentation with GAN-based Generative Adversarial Network

Slides Poster Similar

Image-to-image translation, a task to learn the mapping relation between two different domains, is a rapid-growing research field in deep learning. Although existing Generative Adversarial Network(GAN)-based methods have achieved decent results in this field, there are still some limitations in generating high-quality images for practical applications (e.g., data augmentation and image inpainting). In this work, we aim to propose a GAN-based network for data augmentation which can generate translated images with more details and less artifacts. The proposed Detail Fusion Generative Adversarial Network(DFGAN) consists of a detail branch, a transfer branch, a filter module, and a reconstruction module. The detail branch is trained by a super-resolution loss and its intermediate features can be used to introduce more details to the transfer branch by the filter module. Extensive evaluations demonstrate that our model generates more satisfactory images against the state-of-the-art approaches for data augmentation.

Towards Artifacts-Free Image Defogging

Gabriele Graffieti, Davide Maltoni

Responsive image

Auto-TLDR; CurL-Defog: Learning Based Defogging with CycleGAN and HArD

Slides Similar

In this paper we present a novel defogging technique, named CurL-Defog, aimed at minimizing the creation of artifacts. The majority of learning based defogging approaches relies on paired data (i.e., the same images with and without fog), where fog is artificially added to clear images: this often provides good results on mildly fogged images but does not generalize well to real difficult cases. On the other hand, the models trained with real unpaired data (e.g. CycleGAN) can provide visually impressive results but often produce unwanted artifacts. In this paper we propose a curriculum learning strategy coupled with an enhanced CycleGAN model in order to reduce the number of produced artifacts, while maintaining state-of-the- art performance in terms of contrast enhancement and image reconstruction. We also introduce a new metric, called HArD (Hazy Artifact Detector) to numerically quantify the amount of artifacts in the defogged images, thus avoiding the tedious and subjective manual inspection of the results. The proposed approach compares favorably with state-of-the-art techniques on both real and synthetic datasets.

Position-Aware and Symmetry Enhanced GAN for Radial Distortion Correction

Yongjie Shi, Xin Tong, Jingsi Wen, He Zhao, Xianghua Ying, Jinshi Hongbin Zha

Responsive image

Auto-TLDR; Generative Adversarial Network for Radial Distorted Image Correction

Slides Poster Similar

This paper presents a novel method based on the generative adversarial network for radial distortion correction. Instead of generating a corrected image, our generator predicts a pixel flow map to measure the pixel offset between the distorted and corrected image. The quality of the generated pixel flow map and the warped image are judged by the discriminator. As texture far away from the image center has strong distortion, we develop an Adaptive Inverted Foveal layer which can transform the deformation to the intensity of the image to exploit this property. Rotation symmetry enhanced convolution kernels are applied to extract geometric features of different orientations explicitly. These learned features are recalibrated using the Squeeze-and-Excitation block to assign different weights for different directions. Moreover, we construct a first real-world radial distorted image dataset RD600 annotated with ground truth to evaluate our proposed method. We conduct extensive experiments to validate the effectiveness of each part of our framework. The further experiment shows our approach outperforms previous methods in both synthetic and real-world datasets quantitatively and qualitatively.

Automatical Enhancement and Denoising of Extremely Low-Light Images

Yuda Song, Yunfang Zhu, Xin Du

Responsive image

Auto-TLDR; INSNet: Illumination and Noise Separation Network for Low-Light Image Restoring

Slides Poster Similar

Deep convolutional neural networks (DCNN) based methodologies have achieved remarkable performance on various low-level vision tasks recently. Restoring images captured at night is one of the trickiest low-level vision tasks due to its high-level noise and low-level intensity. We propose a DCNN-based methodology, Illumination and Noise Separation Network (INSNet), which performs both denoising and enhancement on these extremely low-light images. INSNet fully utilizes global-ware features and local-ware features using the modified network structure and image sampling scheme. Compared to well-designed complex neural networks, our proposed methodology only needs to add a bypass network to the existing network. However, it can boost the quality of recovered images dramatically but only increase the computational cost by less than 0.1%. Even without any manual settings, INSNet can stably restore the extremely low-light images to desired high-quality images.

Hierarchically Aggregated Residual Transformation for Single Image Super Resolution

Zejiang Hou, Sy Kung

Responsive image

Auto-TLDR; HARTnet: Hierarchically Aggregated Residual Transformation for Multi-Scale Super-resolution

Slides Poster Similar

Visual patterns usually appear at different scales/sizes in natural images. Multi-scale feature representation is of great importance for the single-image super-resolution(SISR) task to reconstruct image objects at different scales.However, such characteristic has been rarely considered by CNN-based SISR methods. In this work, we propose a novel build-ing block, i.e. hierarchically aggregated residual transformation(HART), to achieve multi-scale feature representation in each layer of the network. Within each HART block, we connect multiple convolutions in a hierarchical residual-like manner, which greatly expands the range of effective receptive fields and helps to detect image features at different scales. To theoretically understand the proposed HART block, we recast SISR as an optimal control problem and show that HART effectively approximates the classical4th-order Runge-Kutta method, which has the merit of small local truncation error for solving numerical ordinary differential equation. By cascading the proposed HART blocks, we establish our high-performing HARTnet. Comparedwith existing SR state-of-the-arts (including those in NTIRE2019 SR Challenge leaderboard), the proposed HARTnet demonstrates consistent PSNR/SSIM performance improvements on various benchmark datasets under different degradation models.Moreover, HARTnet can efficiently restore more faithful high-resolution images than comparative SR methods (cf. Figure 1).

Boundary Guided Image Translation for Pose Estimation from Ultra-Low Resolution Thermal Sensor

Kohei Kurihara, Tianren Wang, Teng Zhang, Brian Carrington Lovell

Responsive image

Auto-TLDR; Pose Estimation on Low-Resolution Thermal Images Using Image-to-Image Translation Architecture

Slides Poster Similar

This work addresses the pose estimation task on low-resolution images captured using thermal sensors which can operate in a no-light environment. Low-resolution thermal sensors have been widely adopted in various applications for cost control and privacy protection purposes. In this paper, targeting the challenging scenario of ultra-low resolution thermal imaging (3232 pixels), we aim to estimate human poses for the purpose of monitoring health conditions and indoor events. To overcome the challenges in ultra-low resolution thermal imaging such as blurred boundaries and data scarcity, we propose a new Image-to-Image (I2I) translation architecture which can translate the original blurred thermal image into a visible light image with sharper boundaries. Then the generated visible light image can be fed into the off-the-shelf pose estimator which was well-trained in the visible domain. Experimental results suggest that the proposed framework outperforms other state-of-the-art methods in the I2I based pose estimation task for our thermal image dataset. Furthermore, we also demonstrated the merits of the proposed method on the publicly available FLIR dataset by measuring the quality of translated images.

Fidelity-Controllable Extreme Image Compression with Generative Adversarial Networks

Shoma Iwai, Tomo Miyazaki, Yoshihiro Sugaya, Shinichiro Omachi

Responsive image

Auto-TLDR; GAN-based Image Compression at Low Bitrates

Slides Similar

We propose a GAN-based image compression method working at extremely low bitrates below 0.1bpp. Most existing learned image compression methods suffer from blur at extremely low bitrates. Although GAN can help to reconstruct sharp images, there are two drawbacks. First, GAN makes train- ing unstable. Second, the reconstructions often contain unpleasing noise or artifacts. To address both of the drawbacks, our method adopts two-stage training and network interpolation. The two- stage training is effective to stabilize the training. Moreover, the network interpolation utilizes the models in both stages and reduces undesirable noise and artifacts, while maintaining important edges. Hence, we can control the trade-off between perceptual quality and fidelity without re-training models. The experimental results show that our model can reconstruct high quality images. Furthermore, our user study confirms that our reconstructions are preferable to state-of-the-art GAN-based image compression model.

Free-Form Image Inpainting Via Contrastive Attention Network

Xin Ma, Xiaoqiang Zhou, Huaibo Huang, Zhenhua Chai, Xiaolin Wei, Ran He

Responsive image

Auto-TLDR; Self-supervised Siamese inference for image inpainting

Slides Similar

Most deep learning based image inpainting approaches adopt autoencoder or its variants to fill missing regions in images. Encoders are usually utilized to learn powerful representational spaces, which are important for dealing with sophisticated learning tasks. Specifically, in the image inpainting task, masks with any shapes can appear anywhere in images (i.e., free-form masks) forming complex patterns. It is difficult for encoders to capture such powerful representations under this complex situation. To tackle this problem, we propose a self-supervised Siamese inference network to improve the robustness and generalization. Moreover, the restored image usually can not be harmoniously integrated into the exiting content, especially in the boundary area. To address this problem, we propose a novel Dual Attention Fusion module (DAF), which can combine both the restored and known regions in a smoother way and be inserted into decoder layers in a plug-and-play way. DAF is developed to not only adaptively rescale channel-wise features by taking interdependencies between channels into account but also force deep convolutional neural networks (CNNs) focusing more on unknown regions. In this way, the unknown region will be naturally filled from the outside to the inside. Qualitative and quantitative experiments on multiple datasets, including facial and natural datasets (i.e., Celeb-HQ, Pairs Street View, Places2 and ImageNet), demonstrate that our proposed method outperforms against state-of-the-arts in generating high-quality inpainting results.

Cycle-Consistent Adversarial Networks and Fast Adaptive Bi-Dimensional Empirical Mode Decomposition for Style Transfer

Elissavet Batziou, Petros Alvanitopoulos, Konstantinos Ioannidis, Ioannis Patras, Stefanos Vrochidis, Ioannis Kompatsiaris

Responsive image

Auto-TLDR; FABEMD: Fast and Adaptive Bidimensional Empirical Mode Decomposition for Style Transfer on Images

Slides Poster Similar

Recently, research endeavors have shown the potentiality of Cycle-Consistent Adversarial Networks (CycleGAN) in style transfer. In Cycle-Consistent Adversarial Networks, the consistency loss is introduced to measure the difference between the original images and the reconstructed in both directions, forward and backward. In this work, the combination of Cycle-Consistent Adversarial Networks with Fast and Adaptive Bidimensional Empirical Mode Decomposition (FABEMD) is proposed to perform style transfer on images. In the proposed approach the cycle-consistency loss is modified to include the differences between the extracted Intrinsic Mode Functions (BIMFs) images. Instead of an estimation of pixel-to-pixel difference between the produced and input images, the FABEMD is applied and the extracted BIMFs are involved in the computation of the total cycle loss. This method enriches the computation of the total loss in a content-to-content and style-to-style comparison by connecting the spatial information to the frequency components. The experimental results reveal that the proposed method is efficient and produces qualitative results comparable to state-of-the-art methods.

Residual Fractal Network for Single Image Super Resolution by Widening and Deepening

Jiahang Gu, Zhaowei Qu, Xiaoru Wang, Jiawang Dan, Junwei Sun

Responsive image

Auto-TLDR; Residual fractal convolutional network for single image super-resolution

Slides Poster Similar

The architecture of the convolutional neural network (CNN) plays an important role in single image super-resolution (SISR). However, most models proposed in recent years usually transplant methods or architectures that perform well in other vision fields. Thence they do not combine the characteristics of super-resolution (SR) and ignore the key information brought by the recurring texture feature in the image. To utilize patch-recurrence in SR and the high correlation of texture, we propose a residual fractal convolutional block (RFCB) and expand its depth and width to obtain residual fractal network (RFN), which contains deep residual fractal network (DRFN) and wide residual fractal network (WRFN). RFCB is recursive with multiple branches of magnified receptive field. Through the phased feature fusion module, the network focuses on extracting high-frequency texture feature that repeatedly appear in the image. We also introduce residual in residual (RIR) structure to RFCB that enables abundant low-frequency feature feed into deeper layers and reduce the difficulties of network training. RFN is the first supervised learning method to combine the patch-recurrence characteristic in SISR into network design. Extensive experiments demonstrate that RFN outperforms state-of-the-art SISR methods in terms of both quantitative metrics and visual quality, while the amount of parameters has been greatly optimized.

Video Lightening with Dedicated CNN Architecture

Li-Wen Wang, Wan-Chi Siu, Zhi-Song Liu, Chu-Tak Li, P. K. Daniel Lun

Responsive image

Auto-TLDR; VLN: Video Lightening Network for Driving Assistant Systems in Dark Environment

Slides Poster Similar

Darkness brings us uncertainty, worry and low confidence. This is a problem not only applicable to us walking in a dark evening but also for drivers driving a car on the road with very dim or even without lighting condition. To address this problem, we propose a new CNN structure named as Video Lightening Network (VLN) that regards the low-light enhancement as a residual learning task, which is useful as reference to indirectly lightening the environment, or for vision-based application systems, such as driving assistant systems. The VLN consists of several Lightening Back-Projection (LBP) and Temporal Aggregation (TA) blocks. Each LBP block enhances the low-light frame by domain transfer learning that iteratively maps the frame between the low- and normal-light domains. A TA block handles the motion among neighboring frames by investigating the spatial and temporal relationships. Several TAs work in a multi-scale way, which compensates the motions at different levels. The proposed architecture has a consistent enhancement for different levels of illuminations, which significantly increases the visual quality even in the extremely dark environment. Extensive experimental results show that the proposed approach outperforms other methods under both objective and subjective metrics.

Small Object Detection Leveraging on Simultaneous Super-Resolution

Hong Ji, Zhi Gao, Xiaodong Liu, Tiancan Mei

Responsive image

Auto-TLDR; Super-Resolution via Generative Adversarial Network for Small Object Detection

Poster Similar

Despite the impressive advancement achieved in object detection, the detection performance of small object is still far from satisfactory due to the lack of sufficient detailed appearance to distinguish it from similar objects. Inspired by the positive effects of super-resolution for object detection, we propose a general framework that can be incorporated with most available detector networks to significantly improve the performance of small object detection, in which the low-resolution image is super-resolved via generative adversarial network (GAN) in an unsupervised manner. In our method, the super-resolution network and the detection network are trained jointly and alternately with each other fixed. In particular, the detection loss is back-propagated into the super-resolution network during training to facilitate detection. Compared with available simultaneous super-resolution and detection methods which heavily rely on low-/high-resolution image pairs, our work breaks through such restriction via applying the CycleGAN strategy, achieving increased generality and applicability, while remaining an elegant structure. Extensive experiments on datasets from both computer vision and remote sensing communities demonstrate that our method works effectively on a wide range of complex scenarios, resulting in best performance that significantly outperforms many state-of-the-art approaches.

RSAN: Residual Subtraction and Attention Network for Single Image Super-Resolution

Shuo Wei, Xin Sun, Haoran Zhao, Junyu Dong

Responsive image

Auto-TLDR; RSAN: Residual subtraction and attention network for super-resolution

Slides Similar

The single-image super-resolution (SISR) aims to recover a potential high-resolution image from its low-resolution version. Recently, deep learning-based methods have played a significant role in super-resolution field due to its effectiveness and efficiency. However, most of the SISR methods neglect the importance among the feature map channels. Moreover, they can not eliminate the redundant noises, making the output image be blurred. In this paper, we propose the residual subtraction and attention network (RSAN) for powerful feature expression and channels importance learning. More specifically, RSAN firstly implements one redundance removal module to learn noise information in the feature map and subtract noise through residual learning. Then it introduces the channel attention module to amplify high-frequency information and suppress the weight of effectless channels. Experimental results on extensive public benchmarks demonstrate our RSAN achieves significant improvement over the previous SISR methods in terms of both quantitative metrics and visual quality.

Deep Realistic Novel View Generation for City-Scale Aerial Images

Koundinya Nouduri, Ke Gao, Joshua Fraser, Shizeng Yao, Hadi Aliakbarpour, Filiz Bunyak, Kannappan Palaniappan

Responsive image

Auto-TLDR; End-to-End 3D Voxel Renderer for Multi-View Stereo Data Generation and Evaluation

Slides Poster Similar

In this paper we introduce a novel end-to-end frameworkfor generation of large, aerial, city-scale, realistic syntheticimage sequences with associated accurate and precise camerametadata. The two main purposes for this data are (i) to en-able objective, quantitative evaluation of computer vision al-gorithms and methods such as feature detection, description,and matching or full computer vision pipelines such as 3D re-construction; and (ii) to supply large amounts of high qualitytraining data for deep learning guided computer vision meth-ods. The proposed framework consists of three main mod-ules, a 3D voxel renderer for data generation, a deep neu-ral network for artifact removal, and a quantitative evaluationmodule for Multi-View Stereo (MVS) as an example. The3D voxel renderer enables generation of seen or unseen viewsof a scene from arbitary camera poses with accurate camerametadata parameters. The artifact removal module proposes anovel edge-augmented deep learning network with an explicitedgemap processing stream to remove image artifacts whilepreserving and recovering scene structures for more realis-tic results. Our experiments on two urban, city-scale, aerialdatasets for Albuquerque (ABQ), NM and Los Angeles (LA),CA show promising results in terms structural similarity toreal data and accuracy of reconstructed 3D point clouds

Wavelet Attention Embedding Networks for Video Super-Resolution

Young-Ju Choi, Young-Woon Lee, Byung-Gyu Kim

Responsive image

Auto-TLDR; Wavelet Attention Embedding Network for Video Super-Resolution

Slides Poster Similar

Recently, Video super-resolution (VSR) has become more crucial as the resolution of display has been grown. The majority of deep learning-based VSR methods combine the convolutional neural networks (CNN) with motion compensation or alignment module to estimate high-resolution (HR) frame from low-resolution (LR) frames. However, most of previous methods deal with the spatial features equally and may result in the misaligned temporal features by pixel-based motion compensation and alignment module. It can lead to the damaging effect on the accuracy of the estimated HR feature. In this paper, we propose a wavelet attention embedding network (WAEN), including wavelet embedding network (WENet) and attention embedding network (AENet), to fully exploit the spatio-temporal informative features. The WENet is operated as a spatial feature extractor of individual low and high-frequency information based on 2-D Haar discrete wavelet transform. The meaningful temporal feature is extracted in the AENet through utilizing the weighted attention map between frames. Experimental results demonstrate that the proposed method achieves superior performance compared with state-of-the-art methods.

On-Device Text Image Super Resolution

Dhruval Jain, Arun Prabhu, Gopi Ramena, Manoj Goyal, Debi Mohanty, Naresh Purre, Sukumar Moharana

Responsive image

Auto-TLDR; A Novel Deep Neural Network for Super-Resolution on Low Resolution Text Images

Slides Poster Similar

Recent research on super-resolution (SR) has wit- nessed major developments with the advancements of deep convolutional neural networks. There is a need for information extraction from scenic text images or even document images on device, most of which are low-resolution (LR) images. Therefore, SR becomes an essential pre-processing step as Bicubic Upsampling, which is conventionally present in smartphones, performs poorly on LR images. To give the user more control over his privacy, and to reduce the carbon footprint by reducing the overhead of cloud computing and hours of GPU usage, executing SR models on the edge is a necessity in the recent times. There are various challenges in running and optimizing a model on resource-constrained platforms like smartphones. In this paper, we present a novel deep neural network that reconstructs sharper character edges and thus boosts OCR confidence. The proposed architecture not only achieves significant improvement in PSNR over bicubic upsampling on various benchmark datasets but also runs with an average inference time of 11.7 ms per image. We have outperformed state-of-the-art on the Text330 dataset. We also achieve an OCR accuracy of 75.89% on the ICDAR 2015 TextSR dataset, where ground truth has an accuracy of 78.10%.

An Unsupervised Approach towards Varying Human Skin Tone Using Generative Adversarial Networks

Debapriya Roy, Diganta Mukherjee, Bhabatosh Chanda

Responsive image

Auto-TLDR; Unsupervised Skin Tone Change Using Augmented Reality Based Models

Slides Poster Similar

With the increasing popularity of augmented and virtual reality, retailers are now more focusing towards customer satisfaction to increase the amount of sales. Although augmented reality is not a new concept but it has gained its much needed attention over the past few years. Our present work is targeted towards this direction which may be used to enhance user experience in various virtual and augmented reality based applications. We propose a model to change skin tone of person. Given any input image of a person or a group of persons with some value indicating the desired change of skin color towards fairness or darkness, this method can change the skin tone of the persons in the image. This is an unsupervised method and also unconstrained in terms of pose, illumination, number of persons in the image etc. The goal of this work is to reduce the complexity in terms of time and effort which is generally needed for changing the skin tone using existing applications by professionals or novice. Rigorous experiments shows the efficacy of this method in terms of synthesizing perceptually convincing outputs.

LiNet: A Lightweight Network for Image Super Resolution

Armin Mehri, Parichehr Behjati Ardakani, Angel D. Sappa

Responsive image

Auto-TLDR; LiNet: A Compact Dense Network for Lightweight Super Resolution

Slides Poster Similar

This paper proposes a new lightweight network, LiNet, that enhancing technical efficiency in lightweight super resolution and operating approximately like very large and costly networks in terms of number of network parameters and operations. The proposed architecture allows the network to learn more abstract properties by avoiding low-level information via multiple links. LiNet introduces a Compact Dense Module, which contains set of inner and outer blocks, to efficiently extract meaningful information, to better leverage multi-level representations before upsampling stage, and to allow an efficient information and gradient flow within the network. Experiments on benchmark datasets show that the proposed LiNet achieves favorable performance against lightweight state-of-the-art methods.

Attention2AngioGAN: Synthesizing Fluorescein Angiography from Retinal Fundus Images Using Generative Adversarial Networks

Sharif Amit Kamran, Khondker Fariha Hossain, Alireza Tavakkoli, Stewart Lee Zuckerbrod

Responsive image

Auto-TLDR; Fluorescein Angiography from Fundus Images using Attention-based Generative Networks

Slides Poster Similar

Fluorescein Angiography (FA) is a technique that employs the designated camera for Fundus photography incorporating excitation and barrier filters. FA also requires fluorescein dye that is injected intravenously, which might cause adverse effects ranging from nausea, vomiting to even fatal anaphylaxis. Currently, no other fast and non-invasive technique exists that can generate FA without coupling with Fundus photography. To eradicate the need for an invasive FA extraction procedure, we introduce an Attention-based Generative network that can synthesize Fluorescein Angiography from Fundus images. The proposed gan incorporates multiple attention based skip connections in generators and comprises novel residual blocks for both generators and discriminators. It utilizes reconstruction, feature-matching, and perceptual loss along with adversarial training to produces realistic Angiograms that is hard for experts to distinguish from real ones. Our experiments confirm that the proposed architecture surpasses recent state-of-the-art generative networks for fundus-to-angio translation task.

MTGAN: Mask and Texture-Driven Generative Adversarial Network for Lung Nodule Segmentation

Wei Chen, Qiuli Wang, Kun Wang, Dan Yang, Xiaohong Zhang, Chen Liu, Yucong Li

Responsive image

Auto-TLDR; Mask and Texture-driven Generative Adversarial Network for Lung Nodule Segmentation

Slides Poster Similar

Accurate segmentation for lung nodules in lung computed tomography (CT) scans plays a key role in the early diagnosis of lung cancer. Many existing methods, especially UNet, have made significant progress in lung nodule segmentation. However, due to the complex shapes of lung nodules and the similarity of visual characteristics between nodules and lung tissues, an accurate segmentation with few false positives of lung nodules is still a challenging problem. Considering the fact that both boundary and texture information of lung nodules are important for obtaining an accurate segmentation result, we propose a novel Mask and Texture-driven Generative Adversarial Network (MTGAN) with a joint multi-scale L1 loss for lung nodule segmentation, which takes full advantages of U-Net and adversarial training. The proposed MTGAN leverages adversarial learning strategy guided by the boundary and texture information of lung nodules to generate more accurate segmentation results with lesser false positives. We validate our model with the LIDC–IDRI dataset, and experimental results show that our method achieves excellent segmentation results for a variety of lung nodules, especially for juxtapleural nodules and low-dense nodules. Without any bells and whistles, the proposed MTGAN achieves significant segmentation performance with the Dice similarity coefficient (DSC) of 85.24% on the LIDC–IDRI dataset.

Quantifying Model Uncertainty in Inverse Problems Via Bayesian Deep Gradient Descent

Riccardo Barbano, Chen Zhang, Simon Arridge, Bangti Jin

Responsive image

Auto-TLDR; Bayesian Neural Networks for Inverse Reconstruction via Bayesian Knowledge-Aided Computation

Slides Poster Similar

Recent advances in reconstruction methods for inverse problems leverage powerful data-driven models, e.g., deep neural networks. These techniques have demonstrated state-of-the-art performances for several imaging tasks, but they often do not provide uncertainty on the obtained reconstructions. In this work, we develop a novel scalable data-driven knowledge-aided computational framework to quantify the model uncertainty via Bayesian neural networks. The approach builds on and extends deep gradient descent, a recently developed greedy iterative training scheme, and recasts it within a probabilistic framework. Scalability is achieved by being hybrid in the architecture: only the last layer of each block is Bayesian, while the others remain deterministic, and by being greedy in training. The framework is showcased on one representative medical imaging modality, viz. computed tomography with either sparse view or limited view data, and exhibits competitive performance with respect to state-of-the-art benchmarks, e.g., total variation, deep gradient descent and learned primal-dual.

Single Image Super-Resolution with Dynamic Residual Connection

Karam Park, Jae Woong Soh, Nam Ik Cho

Responsive image

Auto-TLDR; Dynamic Residual Attention Network for Lightweight Single Image Super-Residual Networks

Slides Poster Similar

Deep convolutional neural networks have shown significant improvement in the single image super-resolution (SISR) field. Recently, there have been attempts to solve the SISR problem using lightweight networks, considering limited computational resources for real-world applications. Especially for lightweight networks, balancing between parameter demand and performance is very difficult to adjust, and most lightweight SISR networks are manually designed based on a huge number of brute-force experiments. Besides, a critical key to the network performance relies on the skip connection of building blocks that are repeatedly in the architecture. Notably, in previous works, these connections are pre-defined and manually determined by human researchers. Hence, they are less flexible to the input image statistics, and there can be a better solution for the given number of parameters. Therefore, we focus on the automated design of networks regarding the connection of basic building blocks (residual networks), and as a result, propose a dynamic residual attention network (DRAN). The proposed method allows the network to dynamically select residual paths depending on the input image, based on the idea of attention mechanism. For this, we design a dynamic residual module that determines the residual paths between the basic building blocks for the given input image. By finding optimal residual paths between the blocks, the network can selectively bypass informative features needed to reconstruct the target high-resolution (HR) image. Experimental results show that our proposed DRAN outperforms most of the existing state-of-the-arts lightweight models in SISR.

Deep Residual Attention Network for Hyperspectral Image Reconstruction

Kohei Yorimoto, Xian-Hua Han

Responsive image

Auto-TLDR; Deep Convolutional Neural Network for Hyperspectral Image Reconstruction from a Snapshot

Slides Poster Similar

Coded aperture snapshot spectral imaging (CASSI) captures a full frame spectral image as a single compressive image and is mandatory to reconstruct the underlying hyperspectral image (HSI) from the snapshot as the post-processing, which is challenge inverse problem due to its ill-posed nature. Existing methods for HSI reconstruction from a snapshot usually employs optimization for solving the formulated image degradation model regularized with the empirically designed priors, and still cannot achieve enough reconstruction accuracy for real HS image analysis systems. Motivated by the recent advances of deep learning for different inverse problems, deep learning based HSI reconstruction method has attracted a lot of attention, and can boost the reconstruction performance. This study proposes a novel deep convolutional neural network (DCNN) based framework for effectively learning the spatial structure and spectral attribute in the underlying HSI with the reciprocal spatial and spectral modules. Further, to adaptively leverage the useful learned feature for better HSI image reconstruction, we integrate residual attention modules into our DCNN via exploring both spatial and spectral attention maps. Experimental results on two benchmark HSI datasets show that our method outperforms state-of-the-art methods in both quantitative values and visual effect.

A GAN-Based Blind Inpainting Method for Masonry Wall Images

Yahya Ibrahim, Balázs Nagy, Csaba Benedek

Responsive image

Auto-TLDR; An End-to-End Blind Inpainting Algorithm for Masonry Wall Images

Slides Poster Similar

In this paper we introduce a novel end-to-end blind inpainting algorithm for masonry wall images, performing the automatic detection and virtual completion of occluded or damaged wall regions. For this purpose, we propose a three-stage deep neural network that comprises a U-Net-based sub-network for wall segmentation into brick, mortar and occluded regions, which is followed by a two-stage adversarial inpainting model. The first adversarial network predicts the schematic mortar-brick pattern of the occluded areas based on the observed wall structure, providing in itself valuable structural information for archeological and architectural applications. Finally, the second adversarial network predicts the RGB pixel values yielding a realistic visual experience for the observer. While the three stages implement a sequential pipeline, they interact through dependencies of their loss functions admitting the consideration of hidden feature dependencies between the different network components. For training and testing the network a new dataset has been created, and an extensive qualitative and quantitative evaluation versus the state-of-the-art is given.

OCT Image Segmentation Using NeuralArchitecture Search and SRGAN

Saba Heidari, Omid Dehzangi, Nasser M. Nasarabadi, Ali Rezai

Responsive image

Auto-TLDR; Automatic Segmentation of Retinal Layers in Optical Coherence Tomography using Neural Architecture Search

Poster Similar

Alzheimer’s disease (AD) diagnosis is one of the major research areas in computational medicine. Optical coherence tomography (OCT) is a non-invasive, inexpensive, and timely efficient method that scans the human’s retina with depth. It has been hypothesized that the thickness of the retinal layers extracted from OCTs could be an efficient and effective biomarker for early diagnosis of AD. In this work, we aim to design a self-training model architecture for the task of segmenting the retinal layers in OCT scans. Neural architecture search (NAS) is a subfield of AutoML domain, which has a significant impact on improving the accuracy of machine vision tasks. We integrate the NAS algorithm with a Unet auto-encoder architecture as its backbone. Then, we employ our proposed model to segment the retinal nerve fiber layer in our preprocessed OCT images with the aim of AD diagnosis. In this work, we trained a super-resolution generative adversarial network on the raw OCT scans to improve the quality of the images before the modeling stage. In our architecture search strategy, different primitive operations suggested to find down- \& up-sampling Unet cell blocks and the binary gate method has been applied to make the search strategy more practical. Our architecture search method is empirically evaluated by training on the Unet and NAS-Unet from scratch. Specifically, the proposed NAS-Unet training significantly outperforms the baseline human-designed architecture by achieving 95.1\% in the mean Intersection over Union metric and 79.1\% in the Dice similarity coefficient.

Efficient Shadow Detection and Removal Using Synthetic Data with Domain Adaptation

Rui Guo, Babajide Ayinde, Hao Sun

Responsive image

Auto-TLDR; Shadow Detection and Removal with Domain Adaptation and Synthetic Image Database

Poster Similar

In recent years, learning based shadow detection and removal approaches have shown prospects and, in most cases, yielded state-of-the-art results. The performance of these approaches, however, relies heavily on the construction of training database of shadow images, shadow-free versions, and shadow maps as ground truth. This conventional data gathering method is time-consuming, expensive, or even practically intractable to realize especially for outdoor scenes with complicated shadow patterns, thus limiting the size of the data available for training. In this paper, we leverage on large high quality synthetic image database and domain adaptation to eliminate the bottlenecks resulting from insufficient training samples and domain bias. Specifically, our approach utilizes adversarial training to predict near-pixel-perfect shadow map from synthetic shadow image for downstream shadow removal steps. At inference time, we capitalize on domain adaptation via image style transfer to map the style of real- world scene to that of synthetic scene for the purpose of detecting and subsequently removing shadow. Comprehensive experiments indicate that our approach outperforms state-of-the-art methods on select benchmark datasets.

Improving Low-Resolution Image Classification by Super-Resolution with Enhancing High-Frequency Content

Liguo Zhou, Guang Chen, Mingyue Feng, Alois Knoll

Responsive image

Auto-TLDR; Super-resolution for Low-Resolution Image Classification

Slides Poster Similar

With the prosperous development of Convolutional Neural Networks, currently they can perform excellently on visual understanding tasks when the input images are high quality and common quality images. However, large degradation in performance always occur when the input images are low quality images. In this paper, we propose a new super-resolution method in order to improve the classification performance for low-resolution images. In an image, the regions in which pixel values vary dramatically contain more abundant high frequency contents compared to other parts. Based on this fact, we design a weight map and integrate it with a super-resolution CNN training framework. During the process of training, this weight map can find out positions of the high frequency pixels in ground truth high-resolution images. After that, the pixel-level loss function takes effect only at these found positions to minimize the difference between reconstructed high-resolution images and ground truth high-resolution images. Compared with other state-of-the-art super-resolution methods, the experiment results show that our method can recover more high-frequency contents in high-resolution image reconstructing, and better improve the classification accuracy after low-resolution image preprocessing.

The Role of Cycle Consistency for Generating Better Human Action Videos from a Single Frame

Runze Li, Bir Bhanu

Responsive image

Auto-TLDR; Generating Videos with Human Action Semantics using Cycle Constraints

Slides Poster Similar

This paper addresses the challenging problem of generating videos with human action semantics. Unlike previous work which predict future frames in a single forward pass, this paper introduces the cycle constraints in both forward and backward passes in the generation of human actions. This is achieved by enforcing the appearance and motion consistency across a sequence of frames generated in the future. The approach consists of two stages. In the first stage, the pose of a human body is generated. In the second stage, an image generator is used to generate future frames by using (a) generated human poses in the future from the first stage, (b) the single observed human pose, and (c) the single corresponding future frame. The experiments are performed on three datasets: Weizmann dataset involving simple human actions, Penn Action dataset and UCF-101 dataset containing complicated human actions, especially in sports. The results from these experiments demonstrate the effectiveness of the proposed approach.

UCCTGAN: Unsupervised Clothing Color Transformation Generative Adversarial Network

Shuming Sun, Xiaoqiang Li, Jide Li

Responsive image

Auto-TLDR; An Unsupervised Clothing Color Transformation Generative Adversarial Network

Slides Poster Similar

Clothing color transformation refers to changing the clothes color in an original image to the clothes color in a target image. In this paper, we propose an Unsupervised Clothing Color Transformation Generative Adversarial Network (UCCTGAN) for the task. UCCTGAN adopts the color histogram of a target clothes as color guidance and an improved U-net architecture called AntennaNet is put forward to fuse the extracted color information with the original image. Meanwhile, to accomplish unsupervised learning, the loss function is carefully designed according to color moment, which evaluates the chromatic aberration between the target clothing and the generated clothing. Experimental results show that our network has the ability to generate convincing color transformation results.