DSPNet: Deep Learning-Enabled Blind Reduction of Speckle Noise

Yuxu Lu, Meifang Yang, Liu Wen

Responsive image

Auto-TLDR; Deep Blind DeSPeckling Network for Imaging Applications

Poster

Blind reduction of speckle noise has become a long-standing unsolved problem in several imaging applications, such as medical ultrasound imaging, synthetic aperture radar (SAR) imaging, and underwater sonar imaging, etc. The unwanted noise could lead to negative effects on the reliable detection and recognition of objects of interest. From a statistical point of view, speckle noise could be assumed to be multiplicative, significantly different from the common additive Gaussian noise. The purpose of this study is to blindly reduce the speckle noise under non-ideal imaging conditions. The multiplicative relationship between latent sharp image and random noise will be first converted into an additive version through a logarithmic transformation. To promote imaging performance, we introduced the feature pyramid network (FPN) and atrous spatial pyramid pooling (ASPP), contributing to a more powerful deep blind DeSPeckling Network (named as DSPNet). In particular, DSPNet is mainly composed of two subnetworks, i.e., Log-NENet (i.e., noise estimation network in logarithmic domain) and Log-DNNet (i.e., denoising network in logarithmic domain). Log-NENet and Log-DNNet are, respectively, proposed to estimate noise level map and reduce random noise in logarithmic domain. The multi-scale mixed loss function is further proposed to improve the robust generalization of DSPNet. The proposed deep blind despeckling network is capable of reducing random noise and preserving salient image details. Both synthetic and realistic experiments have demonstrated the superior performance of our DSPNet in terms of quantitative evaluations and visual image qualities.

Similar papers

Edge-Guided CNN for Denoising Images from Portable Ultrasound Devices

Yingnan Ma, Fei Yang, Anup Basu

Responsive image

Auto-TLDR; Edge-Guided Convolutional Neural Network for Portable Ultrasound Images

Slides Poster Similar

Ultrasound is a non-invasive tool that is useful for medical diagnosis and treatment. To reduce long wait times and add convenience to patients, portable ultrasound scanning devices are becoming increasingly popular. These devices can be held in one palm, and are compatible with modern cell phones. However, the quality of ultrasound images captured from the portable scanners is relatively poor compared to standard ultrasound scanning systems in hospitals. To improve the quality of the ultrasound images obtained from portable ultrasound devices, we propose a new neural network architecture called Edge-Guided Convolutional Neural Network (EGCNN), which can preserve significant edge information in ultrasound images when removing noise. We also study and compare the effectiveness of classical filtering approaches in removing speckle noise in these images. Experimental results show that after applying the proposed EGCNN, various organs can be better recognized from ultrasound images. This approach is expected to lead to better accuracy in diagnostics in the future.

Ultrasound Image Restoration Using Weighted Nuclear Norm Minimization

Hanmei Yang, Ye Luo, Jianwei Lu, Jian Lu

Responsive image

Auto-TLDR; A Nonconvex Low-Rank Matrix Approximation Model for Ultrasound Images Restoration

Poster Similar

Ultrasound images are often contaminated by speckle noise during the acquisition process, which influences the performance of subsequent application. The paper introduces a nonconvex low-rank matrix approximation model for ultrasound images restoration, which integrates the weighted unclear norm minimization (WNNM) and data fidelity term. WNNM can adaptively assign weights on differnt singular values to preserve more details in restored images. The fidelity term about ultrasound images do not be utilized in existing low-rank ultrasound denoising methods. This optimization question can effectively solved by alternating direction method of multipliers (ADMM). The experimental results on simulated images and real medical ultrasound images demonstrate the excellent performance of the proposed method compared with other four state-of-the-art methods.

Automatical Enhancement and Denoising of Extremely Low-Light Images

Yuda Song, Yunfang Zhu, Xin Du

Responsive image

Auto-TLDR; INSNet: Illumination and Noise Separation Network for Low-Light Image Restoring

Slides Poster Similar

Deep convolutional neural networks (DCNN) based methodologies have achieved remarkable performance on various low-level vision tasks recently. Restoring images captured at night is one of the trickiest low-level vision tasks due to its high-level noise and low-level intensity. We propose a DCNN-based methodology, Illumination and Noise Separation Network (INSNet), which performs both denoising and enhancement on these extremely low-light images. INSNet fully utilizes global-ware features and local-ware features using the modified network structure and image sampling scheme. Compared to well-designed complex neural networks, our proposed methodology only needs to add a bypass network to the existing network. However, it can boost the quality of recovered images dramatically but only increase the computational cost by less than 0.1%. Even without any manual settings, INSNet can stably restore the extremely low-light images to desired high-quality images.

Deep Universal Blind Image Denoising

Jae Woong Soh, Nam Ik Cho

Responsive image

Auto-TLDR; Image Denoising with Deep Convolutional Neural Networks

Slides Similar

Image denoising is an essential part of many image processing and computer vision tasks due to inevitable noise corruption during image acquisition. Traditionally, many researchers have investigated image priors for the denoising, within the Bayesian perspective based on image properties and statistics. Recently, deep convolutional neural networks (CNNs) have shown great success in image denoising by incorporating large-scale synthetic datasets. However, they both have pros and cons. While the deep CNNs are powerful for removing the noise with known statistics, they tend to lack flexibility and practicality for the blind and real-world noise. Moreover, they cannot easily employ explicit priors. On the other hand, traditional non-learning methods can involve explicit image priors, but they require considerable computation time and cannot exploit large-scale external datasets. In this paper, we present a CNN-based method that leverages the advantages of both methods based on the Bayesian perspective. Concretely, we divide the blind image denoising problem into sub-problems and conquer each inference problem separately. As the CNN is a powerful tool for inference, our method is rooted in CNNs and propose a novel design of network for efficient inference. With our proposed method, we can successfully remove blind and real-world noise, with a moderate number of parameters of universal CNN.

Video Lightening with Dedicated CNN Architecture

Li-Wen Wang, Wan-Chi Siu, Zhi-Song Liu, Chu-Tak Li, P. K. Daniel Lun

Responsive image

Auto-TLDR; VLN: Video Lightening Network for Driving Assistant Systems in Dark Environment

Slides Poster Similar

Darkness brings us uncertainty, worry and low confidence. This is a problem not only applicable to us walking in a dark evening but also for drivers driving a car on the road with very dim or even without lighting condition. To address this problem, we propose a new CNN structure named as Video Lightening Network (VLN) that regards the low-light enhancement as a residual learning task, which is useful as reference to indirectly lightening the environment, or for vision-based application systems, such as driving assistant systems. The VLN consists of several Lightening Back-Projection (LBP) and Temporal Aggregation (TA) blocks. Each LBP block enhances the low-light frame by domain transfer learning that iteratively maps the frame between the low- and normal-light domains. A TA block handles the motion among neighboring frames by investigating the spatial and temporal relationships. Several TAs work in a multi-scale way, which compensates the motions at different levels. The proposed architecture has a consistent enhancement for different levels of illuminations, which significantly increases the visual quality even in the extremely dark environment. Extensive experimental results show that the proposed approach outperforms other methods under both objective and subjective metrics.

Boosting High-Level Vision with Joint Compression Artifacts Reduction and Super-Resolution

Xiaoyu Xiang, Qian Lin, Jan Allebach

Responsive image

Auto-TLDR; A Context-Aware Joint CAR and SR Neural Network for High-Resolution Text Recognition and Face Detection

Slides Poster Similar

Due to the limits of bandwidth and storage space, digital images are usually down-scaled and compressed when transmitted over networks, resulting in loss of details and jarring artifacts that can lower the performance of high-level visual tasks. In this paper, we aim to generate an artifact-free high-resolution image from a low-resolution one compressed with an arbitrary quality factor by exploring joint compression artifacts reduction (CAR) and super-resolution (SR) tasks. First, we propose a context-aware joint CAR and SR neural network (CAJNN) that integrates both local and non-local features to solve CAR and SR in one-stage. Finally, a deep reconstruction network is adopted to predict high quality and high-resolution images. Evaluation on CAR and SR benchmark datasets shows that our CAJNN model outperforms previous methods and also takes 26.2% less runtime. Based on this model, we explore addressing two critical challenges in high-level computer vision: optical character recognition of low-resolution texts, and extremely tiny face detection. We demonstrate that CAJNN can serve as an effective image preprocessing method and improve the accuracy for real-scene text recognition (from 85.30% to 85.75%) and the average precision for tiny face detection (from 0.317 to 0.611).

DCT/IDCT Filter Design for Ultrasound Image Filtering

Barmak Honarvar Shakibaei Asli, Jan Flusser, Yifan Zhao, John Ahmet Erkoyuncu, Rajkumar Roy

Responsive image

Auto-TLDR; Finite impulse response digital filter using DCT-II and inverse DCT

Slides Poster Similar

In this paper, a new recursive structure based on the convolution model of discrete cosine transform (DCT) for designing of a finite impulse response (FIR) digital filter is proposed. In our derivation, we start with the convolution model of DCT-II to use its Z-transform for the proposed filter structure perspective. Moreover, using the same algorithm, a filter base implementation of the inverse DCT (IDCT) for image reconstruction is developed. The computational time experiments of the proposed DCT/IDCT filter(s) demonstrate that the proposed filters achieve faster elapsed CPU time compared to the others. The image filtering and reconstruction performance of the proposed approach on ultrasound images are presented to validate the theoretical framework.

Multi-scale Processing of Noisy Images using Edge Preservation Losses

Nati Ofir

Responsive image

Auto-TLDR; Multi-scale U-net for Noisy Image Detection and Denoising

Slides Poster Similar

Noisy image processing is a fundamental task of computer vision. The first example is the detection of faint edges in noisy images, a challenging problem studied in the last decades. A recent study introduced a fast method to detect faint edges in the highest accuracy among all the existing approaches. Their complexity is nearly linear in the image's pixels and their runtime is seconds for a noisy image. Their approach utilizes a multi-scale binary partitioning of the image. By utilizing the multi-scale U-net architecture, we show in this paper that their method can be dramatically improved in both aspects of run time and accuracy. By training the network on a dataset of binary images, we developed an approach for faint edge detection that works in linear complexity. Our runtime of a noisy image is milliseconds on a GPU. Even though our method is orders of magnitude faster, we still achieve higher accuracy of detection under many challenging scenarios. In addition, we show that our approach to performing multi-scale preprocessing of noisy images using U-net improves the ability to perform other vision tasks under the presence of noise. We prove it on the problems of noisy objects classification and classical image denoising. We show that multi-scale denoising can be carried out by a novel edge preservation loss. As our experiments show, we achieve high-quality results in the three aspects of faint edge detection, noisy image classification and natural image denoising.

Deep Fusion of RGB and NIR Paired Images Using Convolutional Neural Networks

琳 梅, Cheolkon Jung

Responsive image

Auto-TLDR; Deep Fusion of RGB and NIR paired images in low light condition using convolutional neural networks

Slides Poster Similar

In low light condition, the captured color (RGB) images are highly degraded by noise with severe texture loss. In this paper, we propose deep fusion of RGB and NIR paired images in low light condition using convolutional neural networks (CNNs). The proposed deep fusion network consists of three independent sub-networks: denoising, enhancing, and fusion. We build a denoising sub-network to eliminate noise from noisy RGB images. After denoising, we perform an enhancing sub-network to increase the brightness of low light RGB images. Since NIR image contains fine details, we fuse it with the Y channel of RGB image through a fusion sub-network. Experimental results demonstrate that the proposed method successfully fuses RGB and NIR images, and generates high quality fusion results containing textures and colors.

Detail-Revealing Deep Low-Dose CT Reconstruction

Xinchen Ye, Yuyao Xu, Rui Xu, Shoji Kido, Noriyuki Tomiyama

Responsive image

Auto-TLDR; A Dual-branch Aggregation Network for Low-Dose CT Reconstruction

Slides Poster Similar

Low-dose CT imaging emerges with low radiation risk due to the reduction of radiation dose, but brings negative impact on the imaging quality. This paper addresses the problem of low-dose CT reconstruction. Previous methods are unsatisfactory due to the inaccurate recovery of image details under the strong noise generated by the reduction of radiation dose, which directly affects the final diagnosis. To suppress the noise effectively while retain the structures well, we propose a detail-revealing dual-branch aggregation network to effectively reconstruct the degraded CT image. Specifically, the main reconstruction branch iteratively exploits and compensates the reconstruction errors to gradually refine the CT image, while the prior branch is to learn the structure details as prior knowledge to help recover the CT image. A sophisticated detail-revealing loss is designed to fuse the information from both branches and guide the learning to obtain better performance from pixel-wise and holistic perspectives respectively. Experimental results show that our method outperforms the state-of-art methods in both PSNR and SSIM metrics.

RSAN: Residual Subtraction and Attention Network for Single Image Super-Resolution

Shuo Wei, Xin Sun, Haoran Zhao, Junyu Dong

Responsive image

Auto-TLDR; RSAN: Residual subtraction and attention network for super-resolution

Slides Similar

The single-image super-resolution (SISR) aims to recover a potential high-resolution image from its low-resolution version. Recently, deep learning-based methods have played a significant role in super-resolution field due to its effectiveness and efficiency. However, most of the SISR methods neglect the importance among the feature map channels. Moreover, they can not eliminate the redundant noises, making the output image be blurred. In this paper, we propose the residual subtraction and attention network (RSAN) for powerful feature expression and channels importance learning. More specifically, RSAN firstly implements one redundance removal module to learn noise information in the feature map and subtract noise through residual learning. Then it introduces the channel attention module to amplify high-frequency information and suppress the weight of effectless channels. Experimental results on extensive public benchmarks demonstrate our RSAN achieves significant improvement over the previous SISR methods in terms of both quantitative metrics and visual quality.

Hierarchically Aggregated Residual Transformation for Single Image Super Resolution

Zejiang Hou, Sy Kung

Responsive image

Auto-TLDR; HARTnet: Hierarchically Aggregated Residual Transformation for Multi-Scale Super-resolution

Slides Poster Similar

Visual patterns usually appear at different scales/sizes in natural images. Multi-scale feature representation is of great importance for the single-image super-resolution(SISR) task to reconstruct image objects at different scales.However, such characteristic has been rarely considered by CNN-based SISR methods. In this work, we propose a novel build-ing block, i.e. hierarchically aggregated residual transformation(HART), to achieve multi-scale feature representation in each layer of the network. Within each HART block, we connect multiple convolutions in a hierarchical residual-like manner, which greatly expands the range of effective receptive fields and helps to detect image features at different scales. To theoretically understand the proposed HART block, we recast SISR as an optimal control problem and show that HART effectively approximates the classical4th-order Runge-Kutta method, which has the merit of small local truncation error for solving numerical ordinary differential equation. By cascading the proposed HART blocks, we establish our high-performing HARTnet. Comparedwith existing SR state-of-the-arts (including those in NTIRE2019 SR Challenge leaderboard), the proposed HARTnet demonstrates consistent PSNR/SSIM performance improvements on various benchmark datasets under different degradation models.Moreover, HARTnet can efficiently restore more faithful high-resolution images than comparative SR methods (cf. Figure 1).

Face Super-Resolution Network with Incremental Enhancement of Facial Parsing Information

Shuang Liu, Chengyi Xiong, Zhirong Gao

Responsive image

Auto-TLDR; Learning-based Face Super-Resolution with Incremental Boosting Facial Parsing Information

Slides Poster Similar

Recently, facial priors based face super-resolution (SR) methods have obtained significant performance gains in dealing with extremely degraded facial images, and facial priors have also been proved useful in facilitating the inference of face images. Based on this, how to fully fuse facial priors into deep features to improve face SR performance has attracted a major attention. In this paper, we propose a learning-based face SR approach with incremental boosting facial parsing information (IFPSR) for high-magnification of low-resolution faces. The proposed IFPSR method consists of three main parts: i) a three-stage parsing map embedded features upsampling network, in which image recovery and prior estimation processes are performed simultaneously and progressively to improve the image resolution; ii) a progressive training method and a joint facial attention and heatmap loss to obtain better facial attributes; iii) the channel attention strategy in residual dense blocks to adaptively learn facial features. Extensive experimental results show that compared with the state-of-the-art methods in terms of quantitative and qualitative metrics, our approach can achieve an outstanding balance between SR image quality and low network complexity.

A NoGAN Approach for Image and Video Restoration and Compression Artifact Removal

Mameli Filippo, Marco Bertini, Leonardo Galteri, Alberto Del Bimbo

Responsive image

Auto-TLDR; Deep Neural Network for Image and Video Compression Artifact Removal and Restoration

Poster Similar

Lossy image and video compression algorithms introduce several different types of visual artifacts that reduce the visual quality of the compressed media, and the higher the compression rate the higher is the strength of these artifacts. In this work, we describe an approach for visual quality improvement of compressed images and videos to be performed at presentation time, so to obtain the benefits of fast data transfer and reduced data storage, while enjoying a visual quality that could be obtained only reducing the compression rate. To obtain this result we propose to use a deep neural network trained using the NoGAN approach, adapting the popular DeOldify architecture used for colorization. We show how the proposed method can be applied both to image and video compression artifact removal and restoration.

SIDGAN: Single Image Dehazing without Paired Supervision

Pan Wei, Xin Wang, Lei Wang, Ji Xiang, Zihan Wang

Responsive image

Auto-TLDR; DehazeGAN: An End-to-End Generative Adversarial Network for Image Dehazing

Slides Poster Similar

Single image dehazing is challenging without scene airlight and transmission map. Most of existing dehazing algorithms tend to estimate key parameters based on manual designed priors or statistics, which may be invalid in some scenarios. Although deep learning-based dehazing methods provide an effective solution, most of them rely on paired training datasets, which are prohibitively difficult to be collected in real world. In this paper, we propose an effective end-to-end generative adversarial network for image dehazing, named DehazeGAN. The proposed DehazeGAN adopts a U-net architecture with a novel color-consistency loss derived from dark channel prior and perceptual loss, which can be trained in an unsupervised fashion without paired synthetic datasets. We create a RealHaze dataset for network training, including 4,000 outdoor hazy images and 4,000 haze-free images. Extensive experiments demonstrate that our proposed DehazeGAN achieves better performance than existing state-of-the-art methods on both synthetic datasets and real-world datasets in terms of PSNR, SSIM, and subjective visual experience.

D3Net: Joint Demosaicking, Deblurring and Deringing

Tomas Kerepecky, Filip Sroubek

Responsive image

Auto-TLDR; Joint demosaicking deblurring and deringing network with light-weight architecture inspired by the alternating direction method of multipliers

Slides Similar

Images acquired with standard digital cameras have Bayer patterns and suffer from lens blur. A demosaicking step is implemented in every digital camera, yet blur often remains unattended due to computational cost and instability of deblurring algorithms. Linear methods, which are computationally less demanding, produce ringing artifacts in deblurred images. Complex non-linear deblurring methods avoid artifacts, however their complexity imply offline application after camera demosaicking, which leads to sub-optimal performance. In this work, we propose a joint demosaicking deblurring and deringing network with a light-weight architecture inspired by the alternating direction method of multipliers. The proposed network has a transparent and clear interpretation compared to other black-box data driven approaches. We experimentally validate its superiority over state-of-the-art demosaicking methods with offline deblurring.

MBD-GAN: Model-Based Image Deblurring with a Generative Adversarial Network

Li Song, Edmund Y. Lam

Responsive image

Auto-TLDR; Model-Based Deblurring GAN for Inverse Imaging

Slides Poster Similar

This paper presents a methodology to tackle inverse imaging problems by leveraging the synergistic power of imaging model and deep learning. The premise is that while learning-based techniques have quickly become the methods of choice in various applications, they often ignore the prior knowledge embedded in imaging models. Incorporating the latter has the potential to improve the image estimation. Specifically, we first provide a mathematical basis of using generative adversarial network (GAN) in inverse imaging through considering an optimization framework. Then, we develop the specific architecture that connects the generator and discriminator networks with the imaging model. While this technique can be applied to a variety of problems, from image reconstruction to super-resolution, we take image deblurring as the example here, where we show in detail the implementation and experimental results of what we call the model-based deblurring GAN (MBD-GAN).

Thermal Image Enhancement Using Generative Adversarial Network for Pedestrian Detection

Mohamed Amine Marnissi, Hajer Fradi, Anis Sahbani, Najoua Essoukri Ben Amara

Responsive image

Auto-TLDR; Improving Visual Quality of Infrared Images for Pedestrian Detection Using Generative Adversarial Network

Slides Poster Similar

Infrared imaging has recently played an important role in a wide range of applications including surveillance, robotics and night vision. However, infrared cameras often suffer from some limitations, essentially about low-contrast and blurred details. These problems contribute to the loss of observation of target objects in infrared images, which could limit the feasibility of different infrared imaging applications. In this paper, we mainly focus on the problem of pedestrian detection on thermal images. Particularly, we emphasis the need for enhancing the visual quality of images beforehand performing the detection step. % to ensure effective results. To address that, we propose a novel thermal enhancement architecture based on Generative Adversarial Network, and composed of two modules contrast enhancement and denoising modules with a post-processing step for edge restoration in order to improve the overall quality. The effectiveness of the proposed architecture is assessed by means of visual quality metrics and better results are obtained compared to the original thermal images and to the obtained results by other existing enhancement methods. These results have been conduced on a subset of KAIST dataset. Using the same dataset, the impact of the proposed enhancement architecture has been demonstrated on the detection results by obtaining better performance with a significant margin using YOLOv3 detector.

Deep Iterative Residual Convolutional Network for Single Image Super-Resolution

Rao Muhammad Umer, Gian Luca Foresti, Christian Micheloni

Responsive image

Auto-TLDR; ISRResCNet: Deep Iterative Super-Resolution Residual Convolutional Network for Single Image Super-resolution

Slides Similar

Deep convolutional neural networks (CNNs) have recently achieved great success for single image super-resolution (SISR) task due to their powerful feature representation capabilities. Most recent deep learning based SISR methods focus on designing deeper / wider models to learn the non-linear mapping between low-resolution (LR) inputs and the high-resolution (HR) outputs. These existing SR methods do not take into account the image observation (physical) model and thus require a large number of network's trainable parameters with a huge volume of training data. To address these issues, we propose a deep Iterative Super-Resolution Residual Convolutional Network (ISRResCNet) that exploits the powerful image regularization and large-scale optimization techniques by training the deep network in an iterative manner with a residual learning approach. Extensive experimental results on various super-resolution benchmarks demonstrate that our method with a few trainable parameters improves results for different scaling factors in comparison with the state-of-art methods.

Adaptive Image Compression Using GAN Based Semantic-Perceptual Residual Compensation

Ruojing Wang, Zitang Sun, Sei-Ichiro Kamata, Weili Chen

Responsive image

Auto-TLDR; Adaptive Image Compression using GAN based Semantic-Perceptual Residual Compensation

Slides Poster Similar

Image compression is a basic task in image processing. In this paper, We present an adaptive image compression algorithm that relies on GAN based semantic-perceptual residual compensation, which is available to offer visually pleasing reconstruction at a low bitrate. Our method adopt an U-shaped encoding and decoding structure accompanied by a well-designed dense residual connection with strip pooling module to improve the original auto-encoder. Besides, we introduce the idea of adversarial learning by introducing a discriminator thus constructed a complete GAN. To improve the coding efficiency, we creatively designed an adaptive semantic-perception residual compensation block based on Grad-CAM algorithm. In the improvement of the quantizer, we embed the method of soft-quantization so as to solve the problem to some extent that back propagation process is irreversible. Simultaneously, we use the latest FLIF lossless compression algorithm and BPG vector compression algorithm to perform deeper compression on the image. More importantly experimental results including PSNR, MS-SSIM demonstrate that the proposed approach outperforms the current state-of-the-art image compression methods.

GAN-Based Image Deblurring Using DCT Discriminator

Hiroki Tomosada, Takahiro Kudo, Takanori Fujisawa, Masaaki Ikehara

Responsive image

Auto-TLDR; DeblurDCTGAN: A Discrete Cosine Transform for Image Deblurring

Slides Poster Similar

In this paper, we propose high quality image debluring by using discrete cosine transform (DCT) with less computational complexity. Recently, Convolutional Neural Network (CNN) and Generative Adversarial Network (GAN) based algorithms have been proposed for image deblurring. Moreover, multi-scale architecture of CNN restores blurred image cleary and suppresses more ringing artifacts or block noise, but it takes much time to process. To solve these problems, we propose a method that preserves texture and suppresses ringing artifacts in the restored image without multi-scale architecture using DCT based loss named ``DeblurDCTGAN.''. It compares frequency domain of the images made from deblurred image and grand truth image by using DCT. Hereby, DeblurDCTGAN can reduce block noise or ringing artifacts while maintaining deblurring performance. Our experimental results show that DeblurDCTGAN gets the highest performances on both PSNR and SSIM comparing with other conventional methods in both GoPro test Dataset and DVD test Dataset. Also, the running time per pair of DeblurDCTGAN is faster than others.

SECI-GAN: Semantic and Edge Completion for Dynamic Objects Removal

Francesco Pinto, Andrea Romanoni, Matteo Matteucci, Phil Torr

Responsive image

Auto-TLDR; SECI-GAN: Semantic and Edge Conditioned Inpainting Generative Adversarial Network

Slides Poster Similar

Image inpainting aims at synthesizing the missing content of damaged and corrupted images to produce visually realistic restorations; typical applications are in image restoration, automatic scene editing, super-resolution, and dynamic object removal. In this paper, we propose Semantic and Edge Conditioned Inpainting Generative Adversarial Network (SECI-GAN), an architecture that jointly exploits the high-level cues extracted by semantic segmentation and the fine-grained details captured by edge extraction to condition the image inpainting process. SECI-GAN is designed with a particular focus on recovering big regions belonging to the same object (e.g. cars or pedestrians) in the context of dynamic object removal from complex street views. To demonstrate the effectiveness of SECI-GAN, we evaluate our results on the Cityscapes dataset, showing that SECI-GAN is better than competing state-of-the-art models at recovering the structure and the content of the missing parts while producing consistent predictions.

LFIEM: Lightweight Filter-Based Image Enhancement Model

Oktai Tatanov, Aleksei Samarin

Responsive image

Auto-TLDR; Image Retouching Using Semi-supervised Learning for Mobile Devices

Slides Poster Similar

Photo retouching features are being integrated into a growing number of mobile applications. Current learning-based approaches enhance images using large convolutional neural network-based models, where the result is received directly from the neural network outputs. This method can lead to artifacts in the resulting images, models that are complicated to interpret, and can be computationally expensive. In this paper, we explore the application of a filter-based approach in order to overcome the problems outlined above. We focus on creating a lightweight solution suitable for use on mobile devices when designing our model. A significant performance increase was achieved through implementing consistency regularization used in semi-supervised learning. The proposed model can be used on mobile devices and achieves competitive results compared to known models.

Selective Kernel and Motion-Emphasized Loss Based Attention-Guided Network for HDR Imaging of Dynamic Scenes

Yipeng Deng, Qin Liu, Takeshi Ikenaga

Responsive image

Auto-TLDR; SK-AHDRNet: A Deep Network with attention module and motion-emphasized loss function to produce ghost-free HDR images

Slides Poster Similar

Ghost-like artifacts caused by ill-exposed and motion areas is one of the most challenging problems in high dynamic range (HDR) image reconstruction.When the motion range is small, previous methods based on optical flow or patch-match can suppress ghost-like artifacts by first aligning input images before merging them.However, they are not robust enough and still produce artifacts for challenging scenes where large foreground motions exist.To this end, we propose a deep network with attention module and motion-emphasized loss function to produce ghost-free HDR images. In attention module, we use channel and spatial attention to guide network to emphasize important components such as motion and saturated areas automatically. With the purpose of being robust to images with different resolutions and objects with distinct scale, we adopt the selective kernel network as the basic framework for channel attention. In addition to the attention module, the motion-emphasized loss function based on the motion and ill-exposed areas mask is designed to help network reconstruct motion areas. Experiments on the public dataset indicate that the proposed SK-AHDRNet produces ghost-free results where detail in ill-exposed areas is well recovered. The proposed method scores 43.17 with PSNR metric and 61.02 with HDR-VDP-2 metric on test which outperforms all conventional works. According to quantitative and qualitative evaluations, the proposed method can achieve state-of-the-art performance.

Deep Realistic Novel View Generation for City-Scale Aerial Images

Koundinya Nouduri, Ke Gao, Joshua Fraser, Shizeng Yao, Hadi Aliakbarpour, Filiz Bunyak, Kannappan Palaniappan

Responsive image

Auto-TLDR; End-to-End 3D Voxel Renderer for Multi-View Stereo Data Generation and Evaluation

Slides Poster Similar

In this paper we introduce a novel end-to-end frameworkfor generation of large, aerial, city-scale, realistic syntheticimage sequences with associated accurate and precise camerametadata. The two main purposes for this data are (i) to en-able objective, quantitative evaluation of computer vision al-gorithms and methods such as feature detection, description,and matching or full computer vision pipelines such as 3D re-construction; and (ii) to supply large amounts of high qualitytraining data for deep learning guided computer vision meth-ods. The proposed framework consists of three main mod-ules, a 3D voxel renderer for data generation, a deep neu-ral network for artifact removal, and a quantitative evaluationmodule for Multi-View Stereo (MVS) as an example. The3D voxel renderer enables generation of seen or unseen viewsof a scene from arbitary camera poses with accurate camerametadata parameters. The artifact removal module proposes anovel edge-augmented deep learning network with an explicitedgemap processing stream to remove image artifacts whilepreserving and recovering scene structures for more realis-tic results. Our experiments on two urban, city-scale, aerialdatasets for Albuquerque (ABQ), NM and Los Angeles (LA),CA show promising results in terms structural similarity toreal data and accuracy of reconstructed 3D point clouds

DR2S: Deep Regression with Region Selection for Camera Quality Evaluation

Marcelin Tworski, Stéphane Lathuiliere, Salim Belkarfa, Attilio Fiandrotti, Marco Cagnazzo

Responsive image

Auto-TLDR; Texture Quality Estimation Using Deep Learning

Slides Poster Similar

In this work, we tackle the problem of estimating a camera capability to preserve fine texture details at a given lighting condition. Importantly, our texture preservation measurement should coincide with human perception. Consequently, we formulate our problem as a regression one and we introduce a deep convolutional network to estimate texture quality score. At training time, we use ground-truth quality scores provided by expert human annotators in order to obtain a subjective quality measure. In addition, we propose a region selection method to identify the image regions that are better suited at measuring perceptual quality. Finally, our experimental evaluation shows that our learning-based approach outperforms existing methods and that our region selection algorithm consistently improves the quality estimation.

Deep Residual Attention Network for Hyperspectral Image Reconstruction

Kohei Yorimoto, Xian-Hua Han

Responsive image

Auto-TLDR; Deep Convolutional Neural Network for Hyperspectral Image Reconstruction from a Snapshot

Slides Poster Similar

Coded aperture snapshot spectral imaging (CASSI) captures a full frame spectral image as a single compressive image and is mandatory to reconstruct the underlying hyperspectral image (HSI) from the snapshot as the post-processing, which is challenge inverse problem due to its ill-posed nature. Existing methods for HSI reconstruction from a snapshot usually employs optimization for solving the formulated image degradation model regularized with the empirically designed priors, and still cannot achieve enough reconstruction accuracy for real HS image analysis systems. Motivated by the recent advances of deep learning for different inverse problems, deep learning based HSI reconstruction method has attracted a lot of attention, and can boost the reconstruction performance. This study proposes a novel deep convolutional neural network (DCNN) based framework for effectively learning the spatial structure and spectral attribute in the underlying HSI with the reciprocal spatial and spectral modules. Further, to adaptively leverage the useful learned feature for better HSI image reconstruction, we integrate residual attention modules into our DCNN via exploring both spatial and spectral attention maps. Experimental results on two benchmark HSI datasets show that our method outperforms state-of-the-art methods in both quantitative values and visual effect.

Residual Fractal Network for Single Image Super Resolution by Widening and Deepening

Jiahang Gu, Zhaowei Qu, Xiaoru Wang, Jiawang Dan, Junwei Sun

Responsive image

Auto-TLDR; Residual fractal convolutional network for single image super-resolution

Slides Poster Similar

The architecture of the convolutional neural network (CNN) plays an important role in single image super-resolution (SISR). However, most models proposed in recent years usually transplant methods or architectures that perform well in other vision fields. Thence they do not combine the characteristics of super-resolution (SR) and ignore the key information brought by the recurring texture feature in the image. To utilize patch-recurrence in SR and the high correlation of texture, we propose a residual fractal convolutional block (RFCB) and expand its depth and width to obtain residual fractal network (RFN), which contains deep residual fractal network (DRFN) and wide residual fractal network (WRFN). RFCB is recursive with multiple branches of magnified receptive field. Through the phased feature fusion module, the network focuses on extracting high-frequency texture feature that repeatedly appear in the image. We also introduce residual in residual (RIR) structure to RFCB that enables abundant low-frequency feature feed into deeper layers and reduce the difficulties of network training. RFN is the first supervised learning method to combine the patch-recurrence characteristic in SISR into network design. Extensive experiments demonstrate that RFN outperforms state-of-the-art SISR methods in terms of both quantitative metrics and visual quality, while the amount of parameters has been greatly optimized.

Multi-focus Image Fusion for Confocal Microscopy Using U-Net Regression Map

Md Maruf Hossain Shuvo, Yasmin M. Kassim, Filiz Bunyak, Olga V. Glinskii, Leike Xie, Vladislav V Glinsky, Virginia H. Huxley, Kannappan Palaniappan

Responsive image

Auto-TLDR; Independent Single Channel U-Net Fusion for Multi-focus Microscopy Images

Slides Poster Similar

Multi-focus image fusion plays an important role to better visualize the detailed information and anatomical structures of microscopy images. We propose a new approach to fuse all single-focus microscopy images in each Z-stack. As the structures are different in different channels, input images are separated into red and green channels. Red for blood vessels, and green for lymphatics like structures . Taking the maximum likelihood of U-Net regression likelihood map along Z, we obtain the focus selection map for each channel. We named this approach as Independent Single Channel U-Net (ISCU) fusion. We combined each channel fusion result to get the final dual channel composite RGB image. The dataset used is extremely challenging with complex microscopy images of mice dura mater attached to bone. We compared our results with one of the popular and widely used derivative based fusion method [7] using multiscale Hessian. We found that multiscale Hessian-based approach produces banding effects with nonhomogeneous background lacking detailed anatomical structures. So, we took the advantages of Convolutional Neural Network (CNN), and used the U-Net regression likelihood map to fuse the images. Perception based no-reference image quality assessment parameters like PIQUE, NIQE, and BRISQUE confirms the effectiveness of the proposed method.

Attention Pyramid Module for Scene Recognition

Zhinan Qiao, Xiaohui Yuan, Chengyuan Zhuang, Abolfazl Meyarian

Responsive image

Auto-TLDR; Attention Pyramid Module for Multi-Scale Scene Recognition

Slides Poster Similar

The unrestricted open vocabulary and diverse substances of scenery images bring significant challenges to scene recognition. However, most deep learning architectures and attention methods are developed on general-purpose datasets and omit the characteristics of scene data. In this paper, we exploit the attention pyramid module (APM) to tackle the predicament of scene recognition. Our method streamlines the multi-scale scene recognition pipeline, learns comprehensive scene features at various scales and locations, addresses the interdependency among scales, and further assists feature re-calibration as well as aggregation process. APM is extremely light-weighted and can be easily plugged into existing network architectures in a parameter-efficient manner. By simply integrating APM into ResNet-50, we obtain a 3.54\% boost in terms of top-1 accuracy on the benchmark scene dataset. Comprehensive experiments show that APM achieves better performance comparing with state-of-the-art attention methods using significant less computation budget. Code and pre-trained models will be made publicly available.

Towards Artifacts-Free Image Defogging

Gabriele Graffieti, Davide Maltoni

Responsive image

Auto-TLDR; CurL-Defog: Learning Based Defogging with CycleGAN and HArD

Slides Similar

In this paper we present a novel defogging technique, named CurL-Defog, aimed at minimizing the creation of artifacts. The majority of learning based defogging approaches relies on paired data (i.e., the same images with and without fog), where fog is artificially added to clear images: this often provides good results on mildly fogged images but does not generalize well to real difficult cases. On the other hand, the models trained with real unpaired data (e.g. CycleGAN) can provide visually impressive results but often produce unwanted artifacts. In this paper we propose a curriculum learning strategy coupled with an enhanced CycleGAN model in order to reduce the number of produced artifacts, while maintaining state-of-the- art performance in terms of contrast enhancement and image reconstruction. We also introduce a new metric, called HArD (Hazy Artifact Detector) to numerically quantify the amount of artifacts in the defogged images, thus avoiding the tedious and subjective manual inspection of the results. The proposed approach compares favorably with state-of-the-art techniques on both real and synthetic datasets.

CSpA-DN: Channel and Spatial Attention Dense Network for Fusing PET and MRI Images

Bicao Li, Zhoufeng Liu, Shan Gao, Jenq-Neng Hwang, Jun Sun, Zongmin Wang

Responsive image

Auto-TLDR; CSpA-DN: Unsupervised Fusion of PET and MR Images with Channel and Spatial Attention

Slides Poster Similar

In this paper, we propose a novel unsupervised fusion framework based on a dense network with channel and spatial attention (CSpA-DN) for PET and MR images. In our approach, an encoder composed of the densely connected neural network is constructed to extract features from source images, and a decoder network is leveraged to yield the fused image from these features. Simultaneously, a self-attention mechanism is introduced in the encoder and decoder to further integrate local features along with their global dependencies adaptively. The extracted feature of each spatial position is synthesized by a weighted summation of those features at the same row and column with this position via a spatial attention module. Meanwhile, the interdependent relationship of all feature maps is integrated by a channel attention module. The summation of the outputs of these two attention modules is fed into the decoder and the fused image is generated. Experimental results illustrate the superiorities of our proposed CSpA-DN model compared with state-of-the-art methods in PET and MR images fusion according to both visual perception and objective assessment.

Tarsier: Evolving Noise Injection inSuper-Resolution GANs

Baptiste Roziere, Nathanaël Carraz Rakotonirina, Vlad Hosu, Rasoanaivo Andry, Hanhe Lin, Camille Couprie, Olivier Teytaud

Responsive image

Auto-TLDR; Evolutionary Super-Resolution using Diagonal CMA

Slides Poster Similar

Super-resolution aims at increasing the resolution and level of detail within an image. The current state of the art in general single-image super-resolution is held by nESRGAN+,which injects a Gaussian noise after each residual layer at training time. In this paper, we harness evolutionary methods to improve nESRGAN+ by optimizing the noise injection at inference time. More precisely, we use Diagonal CMA to optimize the injected noise according to a novel criterion combining quality assessment and realism. Our results are validated by the PIRM perceptual score and a human study. Our method outperforms nESRGAN+ on several standard super-resolution datasets. More generally, our approach can be used to optimize any method based on noise injection.

Free-Form Image Inpainting Via Contrastive Attention Network

Xin Ma, Xiaoqiang Zhou, Huaibo Huang, Zhenhua Chai, Xiaolin Wei, Ran He

Responsive image

Auto-TLDR; Self-supervised Siamese inference for image inpainting

Slides Similar

Most deep learning based image inpainting approaches adopt autoencoder or its variants to fill missing regions in images. Encoders are usually utilized to learn powerful representational spaces, which are important for dealing with sophisticated learning tasks. Specifically, in the image inpainting task, masks with any shapes can appear anywhere in images (i.e., free-form masks) forming complex patterns. It is difficult for encoders to capture such powerful representations under this complex situation. To tackle this problem, we propose a self-supervised Siamese inference network to improve the robustness and generalization. Moreover, the restored image usually can not be harmoniously integrated into the exiting content, especially in the boundary area. To address this problem, we propose a novel Dual Attention Fusion module (DAF), which can combine both the restored and known regions in a smoother way and be inserted into decoder layers in a plug-and-play way. DAF is developed to not only adaptively rescale channel-wise features by taking interdependencies between channels into account but also force deep convolutional neural networks (CNNs) focusing more on unknown regions. In this way, the unknown region will be naturally filled from the outside to the inside. Qualitative and quantitative experiments on multiple datasets, including facial and natural datasets (i.e., Celeb-HQ, Pairs Street View, Places2 and ImageNet), demonstrate that our proposed method outperforms against state-of-the-arts in generating high-quality inpainting results.

Triplet-Path Dilated Network for Detection and Segmentation of General Pathological Images

Jiaqi Luo, Zhicheng Zhao, Fei Su, Limei Guo

Responsive image

Auto-TLDR; Triplet-path Network for One-Stage Object Detection and Segmentation in Pathological Images

Slides Similar

Deep learning has been widely applied in the field of medical image processing. However, compared with flourishing visual tasks in natural images, the progress achieved in pathological images is not remarkable, and detection and segmentation, which are among basic tasks of computer vision, are regarded as two independent tasks. In this paper, we make full use of existing datasets and construct a triplet-path network using dilated convolutions to cooperatively accomplish one-stage object detection and nuclei segmentation for general pathological images. First, in order to meet the requirement of detection and segmentation, a novel structure called triplet feature generation (TFG) is designed to extract high-resolution and multiscale features, where features from different layers can be properly integrated. Second, considering that pathological datasets are usually small, a location-aware and partially truncated loss function is proposed to improve the classification accuracy of datasets with few images and widely varying targets. We compare the performance of both object detection and instance segmentation with state-of-the-art methods. Experimental results demonstrate the effectiveness and efficiency of the proposed network on two datasets collected from multiple organs.

5D Light Field Synthesis from a Monocular Video

Kyuho Bae, Andre Ivan, Hajime Nagahara, In Kyu Park

Responsive image

Auto-TLDR; Synthesis of Light Field Video from Monocular Video using Deep Learning

Slides Similar

Commercially available light field cameras have difficulty in capturing 5D (4D + time) light field videos. They can only capture still light filed images or are excessively expensive for normal users to capture the light field video. To tackle this problem, we propose a deep learning-based method for synthesizing a light field video from a monocular video. We propose a new synthetic light field video dataset that renders photorealistic scenes using Unreal Engine because no light field video dataset is available. The proposed deep learning framework synthesizes the light field video with a full set (9x9) of sub-aperture images from a normal monocular video. The proposed network consists of three sub-networks, namely, feature extraction, 5D light field video synthesis, and temporal consistency refinement. Experimental results show that our model can successfully synthesize the light field video for synthetic and real scenes and outperforms the previous frame-by-frame method quantitatively and qualitatively.

A Multi-Focus Image Fusion Method Based on Fractal Dimension and Guided Filtering

Nikoo Dehghani, Ehsanollah Kabir

Responsive image

Auto-TLDR; Fractal Dimension-based Multi-focus Image Fusion with Guide Filtering

Slides Poster Similar

Fractal Dimension (FD) is widely used for image segmentation because of its successful approach toward quantifying texture information. In this paper, we present a FD-based multi-focus image fusion method that utilizes FD to identify focused regions, as the primary step for the multi-focus image fusion process. The algorithm aims to extract the local FD features of each multi-focus pair estimated using the differential box-counting method. A guided filter is employed to further specify the spatial information and increase the robustness of the FD features to noise. The outcome would be analyzed to achieve a focus map that identifies sharp regions in each partially focused image. Afterwards, the detected regions are combined into a single all-focused image. The experiments, along with the objective assessments, demonstrate the competitive performance of the proposed method compared to several state-of-the-art multi-focus image fusion methods.

Joint Compressive Autoencoders for Full-Image-To-Image Hiding

Xiyao Liu, Ziping Ma, Xingbei Guo, Jialu Hou, Lei Wang, Gerald Schaefer, Hui Fang

Responsive image

Auto-TLDR; J-CAE: Joint Compressive Autoencoder for Image Hiding

Slides Poster Similar

Image hiding has received significant attention due to the need of enhanced multimedia services, such as multimedia security and meta-information embedding for multimedia augmentation. Recently, deep learning-based methods have been introduced that are capable of significantly increasing the hidden capacity and supporting full size image hiding. However, these methods suffer from the necessity to balance the errors of the modified cover image and the recovered hidden image. In this paper, we propose a novel joint compressive autoencoder (J-CAE) framework to design an image hiding algorithm that achieves full-size image hidden capacity with small reconstruction errors of the hidden image. More importantly, it addresses the trade-off problem of previous deep learning-based methods by mapping the image representations in the latent spaces of the joint CAE models. Thus, both visual quality of the container image and recovery quality of the hidden image can be simultaneously improved. Extensive experimental results demonstrate that our proposed framework outperforms several state-of-the-art deep learning-based image hiding methods in terms of imperceptibility and recovery quality of the hidden images while maintaining full-size image hidden capacity.

Multi-Scale Residual Pyramid Attention Network for Monocular Depth Estimation

Jing Liu, Xiaona Zhang, Zhaoxin Li, Tianlu Mao

Responsive image

Auto-TLDR; Multi-scale Residual Pyramid Attention Network for Monocular Depth Estimation

Slides Poster Similar

Monocular depth estimation is a challenging problem in computer vision and is crucial for understanding 3D scene geometry. Recently, deep convolutional neural networks (DCNNs) based methods have improved the estimation accuracy significantly. However, existing methods fail to consider complex textures and geometries in scenes, thereby resulting in loss of local details, distorted object boundaries, and blurry reconstruction. In this paper, we proposed an end-to-end Multi-scale Residual Pyramid Attention Network (MRPAN) to mitigate these problems.First,we propose a Multi-scale Attention Context Aggregation (MACA) module, which consists of Spatial Attention Module (SAM) and Global Attention Module (GAM). By considering the position and scale correlation of pixels from spatial and global perspectives, the proposed module can adaptively learn the similarity between pixels so as to obtain more global context information of the image and recover the complex structure in the scene. Then we proposed an improved Residual Refinement Module (RRM) to further refine the scene structure, giving rise to deeper semantic information and retain more local details. Experimental results show that our method achieves more promisin performance in object boundaries and local details compared with other state-of-the-art methods.

Dynamic Low-Light Image Enhancement for Object Detection Via End-To-End Training

Haifeng Guo, Yirui Wu, Tong Lu

Responsive image

Auto-TLDR; Object Detection using Low-Light Image Enhancement for End-to-End Training

Slides Poster Similar

Object detection based on convolutional neural networks is a hot research topic in computer vision. The illumination component in the image has a great impact on object detection, and it will cause a sharp decline in detection performance under low-light conditions. Using low-light image enhancement technique as a pre-processing mechanism can improve image quality and obtain better detection results.However, due to the complexity of low-light environments, the existing enhancement methods may have negative effects on some samples. Therefore, it is difficult to improve the overall detection performance in low-light conditions. In this paper, our goal is to use image enhancement to improve object detection performance rather than perceptual quality for humans. We propose a novel framework that combines low-light enhancement and object detection for end-to-end training. The framework can dynamically select different enhancement subnetworks for each sample to improve the performance of the detector. Our proposed method consists of two stage: the enhancement stage and the detection stage. The enhancement stage dynamically enhances the low-light images under the supervision of several enhancement methods and output corresponding weights. During the detection stage, the weights offers information on object classification to generate high-quality region proposals and in turn result in accurate detection. Our experiments present promising results, which show that the proposed method can significantly improve the detection performance in low-light environment.

Quantifying Model Uncertainty in Inverse Problems Via Bayesian Deep Gradient Descent

Riccardo Barbano, Chen Zhang, Simon Arridge, Bangti Jin

Responsive image

Auto-TLDR; Bayesian Neural Networks for Inverse Reconstruction via Bayesian Knowledge-Aided Computation

Slides Poster Similar

Recent advances in reconstruction methods for inverse problems leverage powerful data-driven models, e.g., deep neural networks. These techniques have demonstrated state-of-the-art performances for several imaging tasks, but they often do not provide uncertainty on the obtained reconstructions. In this work, we develop a novel scalable data-driven knowledge-aided computational framework to quantify the model uncertainty via Bayesian neural networks. The approach builds on and extends deep gradient descent, a recently developed greedy iterative training scheme, and recasts it within a probabilistic framework. Scalability is achieved by being hybrid in the architecture: only the last layer of each block is Bayesian, while the others remain deterministic, and by being greedy in training. The framework is showcased on one representative medical imaging modality, viz. computed tomography with either sparse view or limited view data, and exhibits competitive performance with respect to state-of-the-art benchmarks, e.g., total variation, deep gradient descent and learned primal-dual.

A Dual-Branch Network for Infrared and Visible Image Fusion

Yu Fu, Xiaojun Wu

Responsive image

Auto-TLDR; Image Fusion Using Autoencoder for Deep Learning

Slides Poster Similar

In recent years, deep learning has been used extensively in the field of image fusion. In this article, we propose a new image fusion method by designing a new structure and a new loss function for a deep learning model. Our backbone network is an autoencoder, in which the encoder has a dual branch structure. We input infrared images and visible light images to the encoder to extract detailed information and semantic information respectively. The fusion layer fuses two sets of features to get fused features. The decoder reconstructs the fusion features to obtain the fused image. We design a new loss function to reconstruct the image effectively. Experiments show that our proposed method achieves state-of-the-art performance.

DID: A Nested Dense in Dense Structure with Variable Local Dense Blocks for Super-Resolution Image Reconstruction

Longxi Li, Hesen Feng, Bing Zheng, Lihong Ma, Jing Tian

Responsive image

Auto-TLDR; DID: Deep Super-Residual Dense Network for Image Super-resolution Reconstruction

Slides Poster Similar

The success of single image super-resolution reconstruction (SR) relies on a refined mapping from low-resolution (LR) examples to high-resolution (HR) signals. However, the relation is sometimes chaos, especially in a deep SR network. We try to improve the mapping interpretability in two folds: i) The variable local dense blocks (VLDB) are suggested to match receptive fields in different depths of a residual dense network (RDN), with each VLDB a dyadic increment of layer numbers than its predecessor. ii) Based on VLDBs, a dense in dense (DID) network is created. It substitutes nodes in a regular RDN with super nodes, i.e. VLDBs; and formulates a joint learning by flexible hierarchical feature scaling, reusing and long-short term aggregating. VLDBs deal with feature underfitting occurred when a big receptive field meets a fixed-depth dense block, and the DID network provides a relative complete feature dictionary to preserve details for feature shift, dilating and grouping in high dimension image reconstruction. To demonstrate the validness of DID structure, detail experiments are performed on the benchmark datasets Set5, Set14, B100 and Urban100, where the accuracy PSNR and the visual perceptive SSIM are superior to most state-of-the-art methods. Besides, due to the depth adaption of VLDBs and its nesting in generalized RDN,DID network is converged easily and gradient explosion or disappearance are alleviated even when network deepens.

Single Image Deblurring Using Bi-Attention Network

Yaowei Li, Ye Luo, Jianwei Lu

Responsive image

Auto-TLDR; Bi-Attention Neural Network for Single Image Deblurring

Poster Similar

Recently, deep convolutional neural networks have been extensively applied into image deblurring and have achieved remarkable performance. However, most CNN-based image deblurring methods focus on simply increasing network depth, neglecting the contextual information of the blurred image and the reconstructed image. Meanwhile, most encoder-decoder based methods rarely exploit encoder's multi-layer features. To address these issues, we propose a bi-attention neural network for single image deblurring, which mainly consists of a bi-attention network and a feature fusion network. Specifically, two criss-cross attention modules are plugged before and after the encoder-decoder to capture long-range spatial contextual information in the blurred image and the reconstructed image simultaneously, and the feature fusion network combines multi-layer features from encoder to enable the decoder reconstruct the image with multi-scale features. The whole network is end-to-end trainable. Quantitative and qualitative experiment results validate that the proposed network outperforms state-of-the-art methods in terms of PSNR and SSIM on benchmark datasets.

DE-Net: Dilated Encoder Network for Automated Tongue Segmentation

Hui Tang, Bin Wang, Jun Zhou, Yongsheng Gao

Responsive image

Auto-TLDR; Automated Tongue Image Segmentation using De-Net

Slides Poster Similar

Automated tongue recognition is a growing research field due to global demand for personal health care. Using mobile devices to take tongue pictures is convenient and of low cost for tongue recognition. It is particularly suitable for self-health evaluation of the public. However, images taken by mobile devices are easily affected by various imaging environment, which makes fine segmentation a more challenging task compared with those taken by specialized acquisition devices. Deep learning approaches are promising for tongue image segmentation because they have powerful feature learning and representation capability. However, the successive pooling operations in these methods lead to loss of information on image details, making them fail when segmenting low-quality images captured by mobile devices. To address this issue, we propose a dilated encoder network (DE-Net) to capture more high-level features and get high-resolution output for automated tongue image segmentation. In addition, we construct two tongue image datasets which contain images taken by specialized devices and mobile devices, respectively, to verify the effectiveness of the proposed method. Experimental results on both datasets demonstrate that the proposed method outperforms the state-of-the-art methods in tongue image segmentation.

Novel View Synthesis from a 6-DoF Pose by Two-Stage Networks

Xiang Guo, Bo Li, Yuchao Dai, Tongxin Zhang, Hui Deng

Responsive image

Auto-TLDR; Novel View Synthesis from a 6-DoF Pose Using Generative Adversarial Network

Slides Poster Similar

Novel view synthesis is a challenging problem in 3D vision and robotics. Different from the existing works, which need the reference images or 3D model, we propose a novel paradigm to this problem. That is, we synthesize the novel view from a 6-DoF pose directly. Although this setting is the most straightforward way, there are few works addressing it. While, our experiments demonstrate that, with a concise CNN, we could get a meaningful parametric model which could reconstruct the correct scenery images only from the 6-DoF pose. To this end, we propose a two-stage learning strategy, which consists of two consecutive CNNs: GenNet and RefineNet. The GenNet generates a coarse image from a camera pose. The RefineNet is a generative adversarial network that could refine the coarse image. In this way, we decouple the geometric relationship mapping and texture detail rendering. Extensive experiments conducted on the public datasets prove the effectiveness of our method. We believe this paradigm is of high research and application value and could be an important direction in novel view synthesis. We will share our code after the acceptance of this work.

Do Not Treat Boundaries and Regions Differently: An Example on Heart Left Atrial Segmentation

Zhou Zhao, Elodie Puybareau, Nicolas Boutry, Thierry Geraud

Responsive image

Auto-TLDR; Attention Full Convolutional Network for Atrial Segmentation using ResNet-101 Architecture

Slides Similar

Atrial fibrillation is the most common heart rhythm disease. Due to a lack of understanding in matter of underlying atrial structures, current treatments are still not satisfying. Recently, with the popularity of deep learning, many segmentation methods based on fully convolutional networks have been proposed to analyze atrial structures, especially from late gadolinium-enhanced magnetic resonance imaging. However, two problems still occur: 1) segmentation results include the atrial-like background; 2) boundaries are very hard to segment. Most segmentation approaches design a specific network that mainly focuses on the regions, to the detriment of the boundaries. Therefore, this paper proposes an attention full convolutional network framework based on the ResNet-101 architecture, which focuses on boundaries as much as on regions. The additional attention module is added to have the network pay more attention on regions and then to reduce the impact of the misleading similarity of neighboring tissues. We also use a hybrid loss composed of a region loss and a boundary loss to treat boundaries and regions at the same time. We demonstrate the efficiency of the proposed approach on the MICCAI 2018 Atrial Segmentation Challenge public dataset.

FOANet: A Focus of Attention Network with Application to Myocardium Segmentation

Zhou Zhao, Elodie Puybareau, Nicolas Boutry, Thierry Geraud

Responsive image

Auto-TLDR; FOANet: A Hybrid Loss Function for Myocardium Segmentation of Cardiac Magnetic Resonance Images

Slides Poster Similar

In myocardium segmentation of cardiac magnetic resonance images, ambiguities often appear near the boundaries of the target domains due to tissue similarities. To address this issue, we propose a new architecture, called FOANet, which can be decomposed in three main steps: a localization step, a Gaussian-based contrast enhancement step, and a segmentation step. This architecture is supplied with a hybrid loss function that guides the FOANet to study the transformation relationship between the input image and the corresponding label in a threelevel hierarchy (pixel-, patch- and map-level), which is helpful to improve segmentation and recovery of the boundaries. We demonstrate the efficiency of our approach on two public datasets in terms of regional and boundary segmentations.