RSAN: Residual Subtraction and Attention Network for Single Image Super-Resolution

Shuo Wei, Xin Sun, Haoran Zhao, Junyu Dong

Responsive image

Auto-TLDR; RSAN: Residual subtraction and attention network for super-resolution

Slides

The single-image super-resolution (SISR) aims to recover a potential high-resolution image from its low-resolution version. Recently, deep learning-based methods have played a significant role in super-resolution field due to its effectiveness and efficiency. However, most of the SISR methods neglect the importance among the feature map channels. Moreover, they can not eliminate the redundant noises, making the output image be blurred. In this paper, we propose the residual subtraction and attention network (RSAN) for powerful feature expression and channels importance learning. More specifically, RSAN firstly implements one redundance removal module to learn noise information in the feature map and subtract noise through residual learning. Then it introduces the channel attention module to amplify high-frequency information and suppress the weight of effectless channels. Experimental results on extensive public benchmarks demonstrate our RSAN achieves significant improvement over the previous SISR methods in terms of both quantitative metrics and visual quality.

Similar papers

Residual Fractal Network for Single Image Super Resolution by Widening and Deepening

Jiahang Gu, Zhaowei Qu, Xiaoru Wang, Jiawang Dan, Junwei Sun

Responsive image

Auto-TLDR; Residual fractal convolutional network for single image super-resolution

Slides Poster Similar

The architecture of the convolutional neural network (CNN) plays an important role in single image super-resolution (SISR). However, most models proposed in recent years usually transplant methods or architectures that perform well in other vision fields. Thence they do not combine the characteristics of super-resolution (SR) and ignore the key information brought by the recurring texture feature in the image. To utilize patch-recurrence in SR and the high correlation of texture, we propose a residual fractal convolutional block (RFCB) and expand its depth and width to obtain residual fractal network (RFN), which contains deep residual fractal network (DRFN) and wide residual fractal network (WRFN). RFCB is recursive with multiple branches of magnified receptive field. Through the phased feature fusion module, the network focuses on extracting high-frequency texture feature that repeatedly appear in the image. We also introduce residual in residual (RIR) structure to RFCB that enables abundant low-frequency feature feed into deeper layers and reduce the difficulties of network training. RFN is the first supervised learning method to combine the patch-recurrence characteristic in SISR into network design. Extensive experiments demonstrate that RFN outperforms state-of-the-art SISR methods in terms of both quantitative metrics and visual quality, while the amount of parameters has been greatly optimized.

Single Image Super-Resolution with Dynamic Residual Connection

Karam Park, Jae Woong Soh, Nam Ik Cho

Responsive image

Auto-TLDR; Dynamic Residual Attention Network for Lightweight Single Image Super-Residual Networks

Slides Poster Similar

Deep convolutional neural networks have shown significant improvement in the single image super-resolution (SISR) field. Recently, there have been attempts to solve the SISR problem using lightweight networks, considering limited computational resources for real-world applications. Especially for lightweight networks, balancing between parameter demand and performance is very difficult to adjust, and most lightweight SISR networks are manually designed based on a huge number of brute-force experiments. Besides, a critical key to the network performance relies on the skip connection of building blocks that are repeatedly in the architecture. Notably, in previous works, these connections are pre-defined and manually determined by human researchers. Hence, they are less flexible to the input image statistics, and there can be a better solution for the given number of parameters. Therefore, we focus on the automated design of networks regarding the connection of basic building blocks (residual networks), and as a result, propose a dynamic residual attention network (DRAN). The proposed method allows the network to dynamically select residual paths depending on the input image, based on the idea of attention mechanism. For this, we design a dynamic residual module that determines the residual paths between the basic building blocks for the given input image. By finding optimal residual paths between the blocks, the network can selectively bypass informative features needed to reconstruct the target high-resolution (HR) image. Experimental results show that our proposed DRAN outperforms most of the existing state-of-the-arts lightweight models in SISR.

Cross-Layer Information Refining Network for Single Image Super-Resolution

Hongyi Zhang, Wen Lu, Xiaopeng Sun

Responsive image

Auto-TLDR; Interlaced Spatial Attention Block for Single Image Super-Resolution

Slides Poster Similar

Recently, deep learning-based image super-resolution (SR) has made a remarkable progress. However, previous SR methods rarely focus on the correlation between adjacent layers, which leads to underutilization of the information extracted by each convolutional layer. To address these problem, we design a simple and efficient cross-layer information refining network (CIRN) for single image super-resolution. Concretely, we propose the interlaced spatial attention block (ISAB) to measure the correlation between the adjacent layers feature maps and adaptively rescale spatial-wise features for refining the information. Owing to the two stage information propagation strategy, the CIRN can distill the primary information of adjacent layers without introducing too many parameters. Extensive experiments on benchmark datasets illustrate that our method achieves better accuracy than state-of-the-art methods even in 16× scale, spcifically it has a better banlance between performance and parameters.

Progressive Splitting and Upscaling Structure for Super-Resolution

Qiang Li, Tao Dai, Shutao Xia

Responsive image

Auto-TLDR; PSUS: Progressive and Upscaling Layer for Single Image Super-Resolution

Slides Poster Similar

Recently, very deep convolutional neural networks (CNNs) have shown great success in single image super-resolution (SISR). Most of these methods focus on the design of network architecture and adopt a sub-pixel convolution layer at the end of network, but few have paid attention to exploring potential representation ability of upscaling layer. Sub-pixel convolution layer aggregates several low resolution (LR) feature maps and builds super-resolution (SR) images in a single step. However, those LR feature maps share similar patterns as they are extracted from a single trunk network. We believe that the mapping relationships between input image and each LR feature map are not consistent. Inspired by this, we propose a novel progressive splitting and upscaling structure, termed PSUS, which generates decoupled feature maps for upscaling layer to get better SR image. Experiments show that our method can not only speed up the convergence, but also achieve considerable improvement on image quality with fewer parameters and lower computational complexity.

Hierarchically Aggregated Residual Transformation for Single Image Super Resolution

Zejiang Hou, Sy Kung

Responsive image

Auto-TLDR; HARTnet: Hierarchically Aggregated Residual Transformation for Multi-Scale Super-resolution

Slides Poster Similar

Visual patterns usually appear at different scales/sizes in natural images. Multi-scale feature representation is of great importance for the single-image super-resolution(SISR) task to reconstruct image objects at different scales.However, such characteristic has been rarely considered by CNN-based SISR methods. In this work, we propose a novel build-ing block, i.e. hierarchically aggregated residual transformation(HART), to achieve multi-scale feature representation in each layer of the network. Within each HART block, we connect multiple convolutions in a hierarchical residual-like manner, which greatly expands the range of effective receptive fields and helps to detect image features at different scales. To theoretically understand the proposed HART block, we recast SISR as an optimal control problem and show that HART effectively approximates the classical4th-order Runge-Kutta method, which has the merit of small local truncation error for solving numerical ordinary differential equation. By cascading the proposed HART blocks, we establish our high-performing HARTnet. Comparedwith existing SR state-of-the-arts (including those in NTIRE2019 SR Challenge leaderboard), the proposed HARTnet demonstrates consistent PSNR/SSIM performance improvements on various benchmark datasets under different degradation models.Moreover, HARTnet can efficiently restore more faithful high-resolution images than comparative SR methods (cf. Figure 1).

LiNet: A Lightweight Network for Image Super Resolution

Armin Mehri, Parichehr Behjati Ardakani, Angel D. Sappa

Responsive image

Auto-TLDR; LiNet: A Compact Dense Network for Lightweight Super Resolution

Slides Poster Similar

This paper proposes a new lightweight network, LiNet, that enhancing technical efficiency in lightweight super resolution and operating approximately like very large and costly networks in terms of number of network parameters and operations. The proposed architecture allows the network to learn more abstract properties by avoiding low-level information via multiple links. LiNet introduces a Compact Dense Module, which contains set of inner and outer blocks, to efficiently extract meaningful information, to better leverage multi-level representations before upsampling stage, and to allow an efficient information and gradient flow within the network. Experiments on benchmark datasets show that the proposed LiNet achieves favorable performance against lightweight state-of-the-art methods.

Boosting High-Level Vision with Joint Compression Artifacts Reduction and Super-Resolution

Xiaoyu Xiang, Qian Lin, Jan Allebach

Responsive image

Auto-TLDR; A Context-Aware Joint CAR and SR Neural Network for High-Resolution Text Recognition and Face Detection

Slides Poster Similar

Due to the limits of bandwidth and storage space, digital images are usually down-scaled and compressed when transmitted over networks, resulting in loss of details and jarring artifacts that can lower the performance of high-level visual tasks. In this paper, we aim to generate an artifact-free high-resolution image from a low-resolution one compressed with an arbitrary quality factor by exploring joint compression artifacts reduction (CAR) and super-resolution (SR) tasks. First, we propose a context-aware joint CAR and SR neural network (CAJNN) that integrates both local and non-local features to solve CAR and SR in one-stage. Finally, a deep reconstruction network is adopted to predict high quality and high-resolution images. Evaluation on CAR and SR benchmark datasets shows that our CAJNN model outperforms previous methods and also takes 26.2% less runtime. Based on this model, we explore addressing two critical challenges in high-level computer vision: optical character recognition of low-resolution texts, and extremely tiny face detection. We demonstrate that CAJNN can serve as an effective image preprocessing method and improve the accuracy for real-scene text recognition (from 85.30% to 85.75%) and the average precision for tiny face detection (from 0.317 to 0.611).

DID: A Nested Dense in Dense Structure with Variable Local Dense Blocks for Super-Resolution Image Reconstruction

Longxi Li, Hesen Feng, Bing Zheng, Lihong Ma, Jing Tian

Responsive image

Auto-TLDR; DID: Deep Super-Residual Dense Network for Image Super-resolution Reconstruction

Slides Poster Similar

The success of single image super-resolution reconstruction (SR) relies on a refined mapping from low-resolution (LR) examples to high-resolution (HR) signals. However, the relation is sometimes chaos, especially in a deep SR network. We try to improve the mapping interpretability in two folds: i) The variable local dense blocks (VLDB) are suggested to match receptive fields in different depths of a residual dense network (RDN), with each VLDB a dyadic increment of layer numbers than its predecessor. ii) Based on VLDBs, a dense in dense (DID) network is created. It substitutes nodes in a regular RDN with super nodes, i.e. VLDBs; and formulates a joint learning by flexible hierarchical feature scaling, reusing and long-short term aggregating. VLDBs deal with feature underfitting occurred when a big receptive field meets a fixed-depth dense block, and the DID network provides a relative complete feature dictionary to preserve details for feature shift, dilating and grouping in high dimension image reconstruction. To demonstrate the validness of DID structure, detail experiments are performed on the benchmark datasets Set5, Set14, B100 and Urban100, where the accuracy PSNR and the visual perceptive SSIM are superior to most state-of-the-art methods. Besides, due to the depth adaption of VLDBs and its nesting in generalized RDN,DID network is converged easily and gradient explosion or disappearance are alleviated even when network deepens.

Efficient Super Resolution by Recursive Aggregation

Zhengxiong Luo Zhengxiong Luo, Yan Huang, Shang Li, Liang Wang, Tieniu Tan

Responsive image

Auto-TLDR; Recursive Aggregation Network for Efficient Deep Super Resolution

Slides Poster Similar

Deep neural networks have achieved remarkable results on image super resolution (SR), but the efficiency problem of deep SR networks is rarely studied. We experimentally find that many sequentially stacked convolutional blocks in nowadays SR networks are far from being fully optimized, which largely damages their overall efficiency. It indicates that comparable or even better results could be achieved with less but sufficiently optimized blocks. In this paper, we try to construct more efficient SR model via the proposed recursive aggregation network (RAN). It recursively aggregates convolutional blocks in different orders, and avoids too many sequentially stacked blocks. In this way, multiple shortcuts are introduced in RAN, and help gradients easier flow to all inner layers, even for very deep SR networks. As a result, all blocks in RAN can be better optimized, thus RAN can achieve better performance with smaller model size than existing methods.

Face Super-Resolution Network with Incremental Enhancement of Facial Parsing Information

Shuang Liu, Chengyi Xiong, Zhirong Gao

Responsive image

Auto-TLDR; Learning-based Face Super-Resolution with Incremental Boosting Facial Parsing Information

Slides Poster Similar

Recently, facial priors based face super-resolution (SR) methods have obtained significant performance gains in dealing with extremely degraded facial images, and facial priors have also been proved useful in facilitating the inference of face images. Based on this, how to fully fuse facial priors into deep features to improve face SR performance has attracted a major attention. In this paper, we propose a learning-based face SR approach with incremental boosting facial parsing information (IFPSR) for high-magnification of low-resolution faces. The proposed IFPSR method consists of three main parts: i) a three-stage parsing map embedded features upsampling network, in which image recovery and prior estimation processes are performed simultaneously and progressively to improve the image resolution; ii) a progressive training method and a joint facial attention and heatmap loss to obtain better facial attributes; iii) the channel attention strategy in residual dense blocks to adaptively learn facial features. Extensive experimental results show that compared with the state-of-the-art methods in terms of quantitative and qualitative metrics, our approach can achieve an outstanding balance between SR image quality and low network complexity.

On-Device Text Image Super Resolution

Dhruval Jain, Arun Prabhu, Gopi Ramena, Manoj Goyal, Debi Mohanty, Naresh Purre, Sukumar Moharana

Responsive image

Auto-TLDR; A Novel Deep Neural Network for Super-Resolution on Low Resolution Text Images

Slides Poster Similar

Recent research on super-resolution (SR) has wit- nessed major developments with the advancements of deep convolutional neural networks. There is a need for information extraction from scenic text images or even document images on device, most of which are low-resolution (LR) images. Therefore, SR becomes an essential pre-processing step as Bicubic Upsampling, which is conventionally present in smartphones, performs poorly on LR images. To give the user more control over his privacy, and to reduce the carbon footprint by reducing the overhead of cloud computing and hours of GPU usage, executing SR models on the edge is a necessity in the recent times. There are various challenges in running and optimizing a model on resource-constrained platforms like smartphones. In this paper, we present a novel deep neural network that reconstructs sharper character edges and thus boosts OCR confidence. The proposed architecture not only achieves significant improvement in PSNR over bicubic upsampling on various benchmark datasets but also runs with an average inference time of 11.7 ms per image. We have outperformed state-of-the-art on the Text330 dataset. We also achieve an OCR accuracy of 75.89% on the ICDAR 2015 TextSR dataset, where ground truth has an accuracy of 78.10%.

Wavelet Attention Embedding Networks for Video Super-Resolution

Young-Ju Choi, Young-Woon Lee, Byung-Gyu Kim

Responsive image

Auto-TLDR; Wavelet Attention Embedding Network for Video Super-Resolution

Slides Poster Similar

Recently, Video super-resolution (VSR) has become more crucial as the resolution of display has been grown. The majority of deep learning-based VSR methods combine the convolutional neural networks (CNN) with motion compensation or alignment module to estimate high-resolution (HR) frame from low-resolution (LR) frames. However, most of previous methods deal with the spatial features equally and may result in the misaligned temporal features by pixel-based motion compensation and alignment module. It can lead to the damaging effect on the accuracy of the estimated HR feature. In this paper, we propose a wavelet attention embedding network (WAEN), including wavelet embedding network (WENet) and attention embedding network (AENet), to fully exploit the spatio-temporal informative features. The WENet is operated as a spatial feature extractor of individual low and high-frequency information based on 2-D Haar discrete wavelet transform. The meaningful temporal feature is extracted in the AENet through utilizing the weighted attention map between frames. Experimental results demonstrate that the proposed method achieves superior performance compared with state-of-the-art methods.

Deep Iterative Residual Convolutional Network for Single Image Super-Resolution

Rao Muhammad Umer, Gian Luca Foresti, Christian Micheloni

Responsive image

Auto-TLDR; ISRResCNet: Deep Iterative Super-Resolution Residual Convolutional Network for Single Image Super-resolution

Slides Similar

Deep convolutional neural networks (CNNs) have recently achieved great success for single image super-resolution (SISR) task due to their powerful feature representation capabilities. Most recent deep learning based SISR methods focus on designing deeper / wider models to learn the non-linear mapping between low-resolution (LR) inputs and the high-resolution (HR) outputs. These existing SR methods do not take into account the image observation (physical) model and thus require a large number of network's trainable parameters with a huge volume of training data. To address these issues, we propose a deep Iterative Super-Resolution Residual Convolutional Network (ISRResCNet) that exploits the powerful image regularization and large-scale optimization techniques by training the deep network in an iterative manner with a residual learning approach. Extensive experimental results on various super-resolution benchmarks demonstrate that our method with a few trainable parameters improves results for different scaling factors in comparison with the state-of-art methods.

Single Image Deblurring Using Bi-Attention Network

Yaowei Li, Ye Luo, Jianwei Lu

Responsive image

Auto-TLDR; Bi-Attention Neural Network for Single Image Deblurring

Poster Similar

Recently, deep convolutional neural networks have been extensively applied into image deblurring and have achieved remarkable performance. However, most CNN-based image deblurring methods focus on simply increasing network depth, neglecting the contextual information of the blurred image and the reconstructed image. Meanwhile, most encoder-decoder based methods rarely exploit encoder's multi-layer features. To address these issues, we propose a bi-attention neural network for single image deblurring, which mainly consists of a bi-attention network and a feature fusion network. Specifically, two criss-cross attention modules are plugged before and after the encoder-decoder to capture long-range spatial contextual information in the blurred image and the reconstructed image simultaneously, and the feature fusion network combines multi-layer features from encoder to enable the decoder reconstruct the image with multi-scale features. The whole network is end-to-end trainable. Quantitative and qualitative experiment results validate that the proposed network outperforms state-of-the-art methods in terms of PSNR and SSIM on benchmark datasets.

Neural Architecture Search for Image Super-Resolution Using Densely Connected Search Space: DeCoNAS

Joon Young Ahn, Nam Ik Cho

Responsive image

Auto-TLDR; DeCoNASNet: Automated Neural Architecture Search for Super-Resolution

Slides Poster Similar

Abstract—The recent progress of deep convolutional neural networks has enabled great success in single image superresolution (SISR) and many other vision tasks. Their performances are also being increased by deepening the networks and developing more sophisticated network structures. However, finding an optimal structure for the given problem is a difficult task, even for human experts. For this reason, neural architecture search (NAS) methods have been introduced, which automate the procedure of constructing the structures. In this paper, we expand the NAS to the super-resolution domain and find a lightweight densely connected network named DeCoNASNet. We use a hierarchical search strategy to find the best connection with local and global features. In this process, we define a complexitybased penalty for solving image super-resolution, which can be considered a multi-objective problem. Experiments show that our DeCoNASNet outperforms the state-of-the-art lightweight superresolution networks designed by handcraft methods and existing NAS-based design.

Improving Low-Resolution Image Classification by Super-Resolution with Enhancing High-Frequency Content

Liguo Zhou, Guang Chen, Mingyue Feng, Alois Knoll

Responsive image

Auto-TLDR; Super-resolution for Low-Resolution Image Classification

Slides Poster Similar

With the prosperous development of Convolutional Neural Networks, currently they can perform excellently on visual understanding tasks when the input images are high quality and common quality images. However, large degradation in performance always occur when the input images are low quality images. In this paper, we propose a new super-resolution method in order to improve the classification performance for low-resolution images. In an image, the regions in which pixel values vary dramatically contain more abundant high frequency contents compared to other parts. Based on this fact, we design a weight map and integrate it with a super-resolution CNN training framework. During the process of training, this weight map can find out positions of the high frequency pixels in ground truth high-resolution images. After that, the pixel-level loss function takes effect only at these found positions to minimize the difference between reconstructed high-resolution images and ground truth high-resolution images. Compared with other state-of-the-art super-resolution methods, the experiment results show that our method can recover more high-frequency contents in high-resolution image reconstructing, and better improve the classification accuracy after low-resolution image preprocessing.

Multi-Laplacian GAN with Edge Enhancement for Face Super Resolution

Shanlei Ko, Bi-Ru Dai

Responsive image

Auto-TLDR; Face Image Super-Resolution with Enhanced Edge Information

Slides Poster Similar

Face image super-resolution has become a research hotspot in the field of image processing. Nowadays, more and more researches add additional information, such as landmark, identity, to reconstruct high resolution images from low resolution ones, and have a good performance in quantitative terms and perceptual quality. However, these additional information is hard to obtain in many cases. In this work, we focus on reconstructing face images by extracting useful information from face images directly rather than using additional information. By observing edge information in each scale of face images, we propose a method to reconstruct high resolution face images with enhanced edge information. In additional, with the proposed training procedure, our method reconstructs photo-realistic images in upscaling factor 8x and outperforms state-of-the-art methods both in quantitative terms and perceptual quality.

Deep Residual Attention Network for Hyperspectral Image Reconstruction

Kohei Yorimoto, Xian-Hua Han

Responsive image

Auto-TLDR; Deep Convolutional Neural Network for Hyperspectral Image Reconstruction from a Snapshot

Slides Poster Similar

Coded aperture snapshot spectral imaging (CASSI) captures a full frame spectral image as a single compressive image and is mandatory to reconstruct the underlying hyperspectral image (HSI) from the snapshot as the post-processing, which is challenge inverse problem due to its ill-posed nature. Existing methods for HSI reconstruction from a snapshot usually employs optimization for solving the formulated image degradation model regularized with the empirically designed priors, and still cannot achieve enough reconstruction accuracy for real HS image analysis systems. Motivated by the recent advances of deep learning for different inverse problems, deep learning based HSI reconstruction method has attracted a lot of attention, and can boost the reconstruction performance. This study proposes a novel deep convolutional neural network (DCNN) based framework for effectively learning the spatial structure and spectral attribute in the underlying HSI with the reciprocal spatial and spectral modules. Further, to adaptively leverage the useful learned feature for better HSI image reconstruction, we integrate residual attention modules into our DCNN via exploring both spatial and spectral attention maps. Experimental results on two benchmark HSI datasets show that our method outperforms state-of-the-art methods in both quantitative values and visual effect.

TinyVIRAT: Low-Resolution Video Action Recognition

Ugur Demir, Yogesh Rawat, Mubarak Shah

Responsive image

Auto-TLDR; TinyVIRAT: A Progressive Generative Approach for Action Recognition in Videos

Slides Poster Similar

The existing research in action recognition is mostly focused on high-quality videos where the action is distinctly visible. In real-world surveillance environments, the actions in videos are captured at a wide range of resolutions. Most activities occur at a distance with a small resolution and recognizing such activities is a challenging problem. In this work, we focus on recognizing tiny actions in videos. We introduce a benchmark dataset, TinyVIRAT, which contains natural low-resolution activities. The actions in TinyVIRAT videos have multiple labels and they are extracted from surveillance videos which makes them realistic and more challenging. We propose a novel method for recognizing tiny actions in videos which utilizes a progressive generative approach to improve the quality of low-resolution actions. The proposed method also consists of a weakly trained attention mechanism which helps in focusing on the activity regions in the video. We perform extensive experiments to benchmark the proposed TinyVIRAT dataset and observe that the proposed method significantly improves the action recognition performance over baselines. We also evaluate the proposed approach on synthetically resized action recognition datasets and achieve state-of-the-art results when compared with existing methods. The dataset and code will be publicly available.

PSDNet: A Balanced Architecture of Accuracy and Parameters for Semantic Segmentation

Yue Liu, Zhichao Lian

Responsive image

Auto-TLDR; Pyramid Pooling Module with SE1Cblock and D2SUpsample Network (PSDNet)

Slides Poster Similar

Abstract—In this paper, we present our Pyramid Pooling Module (PPM) with SE1Cblock and D2SUpsample Network (PSDNet), a novel architecture for accurate semantic segmentation. Started from the known work called Pyramid Scene Parsing Network (PSPNet), PSDNet takes advantage of pyramid pooling structure with channel attention module and feature transform module in Pyramid Pooling Module (PPM). The enhanced PPM with these two components can strengthen context information flowing in the network instead of damaging it. The channel attention module we mentioned is an improved “Squeeze and Excitation with 1D Convolution” (SE1C) block which can explicitly model interrelationship between channels with fewer number of parameters. We propose a feature transform module named “Depth to Space Upsampling” (D2SUpsample) in the PPM which keeps integrity of features by transforming features while interpolating features, at the same time reducing parameters. In addition, we introduce a joint strategy in SE1Cblock which combines two variants of global pooling without increasing parameters. Compared with PSPNet, our work achieves higher accuracy on public datasets with 73.97% mIoU and 82.89% mAcc accuracy on Cityscapes Dataset based on ResNet50 backbone.

Automatical Enhancement and Denoising of Extremely Low-Light Images

Yuda Song, Yunfang Zhu, Xin Du

Responsive image

Auto-TLDR; INSNet: Illumination and Noise Separation Network for Low-Light Image Restoring

Slides Poster Similar

Deep convolutional neural networks (DCNN) based methodologies have achieved remarkable performance on various low-level vision tasks recently. Restoring images captured at night is one of the trickiest low-level vision tasks due to its high-level noise and low-level intensity. We propose a DCNN-based methodology, Illumination and Noise Separation Network (INSNet), which performs both denoising and enhancement on these extremely low-light images. INSNet fully utilizes global-ware features and local-ware features using the modified network structure and image sampling scheme. Compared to well-designed complex neural networks, our proposed methodology only needs to add a bypass network to the existing network. However, it can boost the quality of recovered images dramatically but only increase the computational cost by less than 0.1%. Even without any manual settings, INSNet can stably restore the extremely low-light images to desired high-quality images.

Detail-Revealing Deep Low-Dose CT Reconstruction

Xinchen Ye, Yuyao Xu, Rui Xu, Shoji Kido, Noriyuki Tomiyama

Responsive image

Auto-TLDR; A Dual-branch Aggregation Network for Low-Dose CT Reconstruction

Slides Poster Similar

Low-dose CT imaging emerges with low radiation risk due to the reduction of radiation dose, but brings negative impact on the imaging quality. This paper addresses the problem of low-dose CT reconstruction. Previous methods are unsatisfactory due to the inaccurate recovery of image details under the strong noise generated by the reduction of radiation dose, which directly affects the final diagnosis. To suppress the noise effectively while retain the structures well, we propose a detail-revealing dual-branch aggregation network to effectively reconstruct the degraded CT image. Specifically, the main reconstruction branch iteratively exploits and compensates the reconstruction errors to gradually refine the CT image, while the prior branch is to learn the structure details as prior knowledge to help recover the CT image. A sophisticated detail-revealing loss is designed to fuse the information from both branches and guide the learning to obtain better performance from pixel-wise and holistic perspectives respectively. Experimental results show that our method outperforms the state-of-art methods in both PSNR and SSIM metrics.

Deep Universal Blind Image Denoising

Jae Woong Soh, Nam Ik Cho

Responsive image

Auto-TLDR; Image Denoising with Deep Convolutional Neural Networks

Slides Similar

Image denoising is an essential part of many image processing and computer vision tasks due to inevitable noise corruption during image acquisition. Traditionally, many researchers have investigated image priors for the denoising, within the Bayesian perspective based on image properties and statistics. Recently, deep convolutional neural networks (CNNs) have shown great success in image denoising by incorporating large-scale synthetic datasets. However, they both have pros and cons. While the deep CNNs are powerful for removing the noise with known statistics, they tend to lack flexibility and practicality for the blind and real-world noise. Moreover, they cannot easily employ explicit priors. On the other hand, traditional non-learning methods can involve explicit image priors, but they require considerable computation time and cannot exploit large-scale external datasets. In this paper, we present a CNN-based method that leverages the advantages of both methods based on the Bayesian perspective. Concretely, we divide the blind image denoising problem into sub-problems and conquer each inference problem separately. As the CNN is a powerful tool for inference, our method is rooted in CNNs and propose a novel design of network for efficient inference. With our proposed method, we can successfully remove blind and real-world noise, with a moderate number of parameters of universal CNN.

Small Object Detection Leveraging on Simultaneous Super-Resolution

Hong Ji, Zhi Gao, Xiaodong Liu, Tiancan Mei

Responsive image

Auto-TLDR; Super-Resolution via Generative Adversarial Network for Small Object Detection

Poster Similar

Despite the impressive advancement achieved in object detection, the detection performance of small object is still far from satisfactory due to the lack of sufficient detailed appearance to distinguish it from similar objects. Inspired by the positive effects of super-resolution for object detection, we propose a general framework that can be incorporated with most available detector networks to significantly improve the performance of small object detection, in which the low-resolution image is super-resolved via generative adversarial network (GAN) in an unsupervised manner. In our method, the super-resolution network and the detection network are trained jointly and alternately with each other fixed. In particular, the detection loss is back-propagated into the super-resolution network during training to facilitate detection. Compared with available simultaneous super-resolution and detection methods which heavily rely on low-/high-resolution image pairs, our work breaks through such restriction via applying the CycleGAN strategy, achieving increased generality and applicability, while remaining an elegant structure. Extensive experiments on datasets from both computer vision and remote sensing communities demonstrate that our method works effectively on a wide range of complex scenarios, resulting in best performance that significantly outperforms many state-of-the-art approaches.

CT-UNet: An Improved Neural Network Based on U-Net for Building Segmentation in Remote Sensing Images

Huanran Ye, Sheng Liu, Kun Jin, Haohao Cheng

Responsive image

Auto-TLDR; Context-Transfer-UNet: A UNet-based Network for Building Segmentation in Remote Sensing Images

Slides Poster Similar

With the proliferation of remote sensing images, how to segment buildings more accurately in remote sensing images is a critical challenge. First, the high resolution leads to blurred boundaries in the extracted building maps. Second, the similarity between buildings and background results in intra-class inconsistency. To address these two problems, we propose an UNet-based network named Context-Transfer-UNet (CT-UNet). Specifically, we design Dense Boundary Block (DBB). Dense Block utilizes reuse mechanism to refine features and increase recognition capabilities. Boundary Block introduces the low-level spatial information to solve the fuzzy boundary problem. Then, to handle intra-class inconsistency, we construct Spatial Channel Attention Block (SCAB). It combines context space information and selects more distinguishable features from space and channel. Finally, we propose a novel loss function to enhance the purpose of loss by adding evaluation indicator. Based on our proposed CT-UNet, we achieve 85.33% mean IoU on the Inria dataset and 91.00% mean IoU on the WHU dataset, which outperforms our baseline (U-Net ResNet-34) by 3.76% and Web-Net by 2.24%.

Context-Aware Residual Module for Image Classification

Jing Bai, Ran Chen

Responsive image

Auto-TLDR; Context-Aware Residual Module for Image Classification

Slides Poster Similar

Attention module has achieved great success in numerous vision tasks. However, existing visual attention modules generally consider the features of a single-scale, and cannot make full use of their multi-scale contextual information. Meanwhile, the multi-scale spatial feature representation has demonstrated its outstanding performance in a wide range of applications. However, the multi-scale features are always represented in a layer-wise manner, i.e. it is impossible to know their contextual information at a granular level. Focusing on the above issue, a context-aware residual module for image classification is proposed in this paper. It consists of a novel multi-scale channel attention module MSCAM to learn refined channel weights by considering the visual features of its own scale and its surrounding fields, and a multi-scale spatial aware module MSSAM to further capture more spatial information. Either or both of the two modules can be plugged into any CNN-based backbone image classification architecture with a short residual connection to obtain the context-aware enhanced features. The experiments on public image recognition datasets including CIFAR10, CIFAR100,Tiny-ImageNet and ImageNet consistently demonstrate that our proposed modules significantly outperforms a wide-used state-of-the-art methods, e.g., ResNet and the lightweight networks of MobileNet and SqueezeeNet.

CSpA-DN: Channel and Spatial Attention Dense Network for Fusing PET and MRI Images

Bicao Li, Zhoufeng Liu, Shan Gao, Jenq-Neng Hwang, Jun Sun, Zongmin Wang

Responsive image

Auto-TLDR; CSpA-DN: Unsupervised Fusion of PET and MR Images with Channel and Spatial Attention

Slides Poster Similar

In this paper, we propose a novel unsupervised fusion framework based on a dense network with channel and spatial attention (CSpA-DN) for PET and MR images. In our approach, an encoder composed of the densely connected neural network is constructed to extract features from source images, and a decoder network is leveraged to yield the fused image from these features. Simultaneously, a self-attention mechanism is introduced in the encoder and decoder to further integrate local features along with their global dependencies adaptively. The extracted feature of each spatial position is synthesized by a weighted summation of those features at the same row and column with this position via a spatial attention module. Meanwhile, the interdependent relationship of all feature maps is integrated by a channel attention module. The summation of the outputs of these two attention modules is fed into the decoder and the fused image is generated. Experimental results illustrate the superiorities of our proposed CSpA-DN model compared with state-of-the-art methods in PET and MR images fusion according to both visual perception and objective assessment.

Arbitrary Style Transfer with Parallel Self-Attention

Tiange Zhang, Ying Gao, Feng Gao, Lin Qi, Junyu Dong

Responsive image

Auto-TLDR; Self-Attention-Based Arbitrary Style Transfer Using Adaptive Instance Normalization

Slides Poster Similar

Neural style transfer aims to create artistic images by synthesizing patterns from a given style image. Recently, the Adaptive Instance Normalization (AdaIN) layer is proposed to achieve real-time arbitrary style transfer. However, we observed that if crucial features based on AdaIN can be further emphasized during transfer, both content and style information will be better reflected in stylized images. Furthermore, it is always essential to preserve more details and reduce unexpected artifacts in order to generate appealing results. In this paper, we introduce an improved arbitrary style transfer method based on the self-attention mechanism. A self-attention module is designed to learn what and where to emphasize in the input image. In addition, an extra Laplacian loss is applied to preserve structure details of the content while eliminating artifacts. Experimental results demonstrate that the proposed method outperforms AdaIN and can generate more appealing results.

Selective Kernel and Motion-Emphasized Loss Based Attention-Guided Network for HDR Imaging of Dynamic Scenes

Yipeng Deng, Qin Liu, Takeshi Ikenaga

Responsive image

Auto-TLDR; SK-AHDRNet: A Deep Network with attention module and motion-emphasized loss function to produce ghost-free HDR images

Slides Poster Similar

Ghost-like artifacts caused by ill-exposed and motion areas is one of the most challenging problems in high dynamic range (HDR) image reconstruction.When the motion range is small, previous methods based on optical flow or patch-match can suppress ghost-like artifacts by first aligning input images before merging them.However, they are not robust enough and still produce artifacts for challenging scenes where large foreground motions exist.To this end, we propose a deep network with attention module and motion-emphasized loss function to produce ghost-free HDR images. In attention module, we use channel and spatial attention to guide network to emphasize important components such as motion and saturated areas automatically. With the purpose of being robust to images with different resolutions and objects with distinct scale, we adopt the selective kernel network as the basic framework for channel attention. In addition to the attention module, the motion-emphasized loss function based on the motion and ill-exposed areas mask is designed to help network reconstruct motion areas. Experiments on the public dataset indicate that the proposed SK-AHDRNet produces ghost-free results where detail in ill-exposed areas is well recovered. The proposed method scores 43.17 with PSNR metric and 61.02 with HDR-VDP-2 metric on test which outperforms all conventional works. According to quantitative and qualitative evaluations, the proposed method can achieve state-of-the-art performance.

Free-Form Image Inpainting Via Contrastive Attention Network

Xin Ma, Xiaoqiang Zhou, Huaibo Huang, Zhenhua Chai, Xiaolin Wei, Ran He

Responsive image

Auto-TLDR; Self-supervised Siamese inference for image inpainting

Slides Similar

Most deep learning based image inpainting approaches adopt autoencoder or its variants to fill missing regions in images. Encoders are usually utilized to learn powerful representational spaces, which are important for dealing with sophisticated learning tasks. Specifically, in the image inpainting task, masks with any shapes can appear anywhere in images (i.e., free-form masks) forming complex patterns. It is difficult for encoders to capture such powerful representations under this complex situation. To tackle this problem, we propose a self-supervised Siamese inference network to improve the robustness and generalization. Moreover, the restored image usually can not be harmoniously integrated into the exiting content, especially in the boundary area. To address this problem, we propose a novel Dual Attention Fusion module (DAF), which can combine both the restored and known regions in a smoother way and be inserted into decoder layers in a plug-and-play way. DAF is developed to not only adaptively rescale channel-wise features by taking interdependencies between channels into account but also force deep convolutional neural networks (CNNs) focusing more on unknown regions. In this way, the unknown region will be naturally filled from the outside to the inside. Qualitative and quantitative experiments on multiple datasets, including facial and natural datasets (i.e., Celeb-HQ, Pairs Street View, Places2 and ImageNet), demonstrate that our proposed method outperforms against state-of-the-arts in generating high-quality inpainting results.

GAN-Based Image Deblurring Using DCT Discriminator

Hiroki Tomosada, Takahiro Kudo, Takanori Fujisawa, Masaaki Ikehara

Responsive image

Auto-TLDR; DeblurDCTGAN: A Discrete Cosine Transform for Image Deblurring

Slides Poster Similar

In this paper, we propose high quality image debluring by using discrete cosine transform (DCT) with less computational complexity. Recently, Convolutional Neural Network (CNN) and Generative Adversarial Network (GAN) based algorithms have been proposed for image deblurring. Moreover, multi-scale architecture of CNN restores blurred image cleary and suppresses more ringing artifacts or block noise, but it takes much time to process. To solve these problems, we propose a method that preserves texture and suppresses ringing artifacts in the restored image without multi-scale architecture using DCT based loss named ``DeblurDCTGAN.''. It compares frequency domain of the images made from deblurred image and grand truth image by using DCT. Hereby, DeblurDCTGAN can reduce block noise or ringing artifacts while maintaining deblurring performance. Our experimental results show that DeblurDCTGAN gets the highest performances on both PSNR and SSIM comparing with other conventional methods in both GoPro test Dataset and DVD test Dataset. Also, the running time per pair of DeblurDCTGAN is faster than others.

Dynamic Guided Network for Monocular Depth Estimation

Xiaoxia Xing, Yinghao Cai, Yiping Yang, Dayong Wen

Responsive image

Auto-TLDR; DGNet: Dynamic Guidance Upsampling for Self-attention-Decoding for Monocular Depth Estimation

Slides Poster Similar

Self-attention or encoder-decoder structure has been widely used in deep neural networks for monocular depth estimation tasks. The former mechanism are capable to capture long-range information by computing the representation of each position by a weighted sum of the features at all positions, while the latter networks can capture structural details information by gradually recovering the spatial information. In this work, we combine the advantages of both methods. Specifically, our proposed model, DGNet, extends EMANet Network by adding an effective decoder module to refine the depth results. In the decoder stage, we further design dynamic guidance upsampling which uses local neighboring information of low-level features guide coarser depth to upsample. In this way, dynamic guidance upsampling generates content-dependent and spatially-variant kernels for depth upsampling which makes full use of spatial details information from low-level features. Experimental results demonstrate that our method obtains higher accuracy and generates the desired depth map.

Thermal Image Enhancement Using Generative Adversarial Network for Pedestrian Detection

Mohamed Amine Marnissi, Hajer Fradi, Anis Sahbani, Najoua Essoukri Ben Amara

Responsive image

Auto-TLDR; Improving Visual Quality of Infrared Images for Pedestrian Detection Using Generative Adversarial Network

Slides Poster Similar

Infrared imaging has recently played an important role in a wide range of applications including surveillance, robotics and night vision. However, infrared cameras often suffer from some limitations, essentially about low-contrast and blurred details. These problems contribute to the loss of observation of target objects in infrared images, which could limit the feasibility of different infrared imaging applications. In this paper, we mainly focus on the problem of pedestrian detection on thermal images. Particularly, we emphasis the need for enhancing the visual quality of images beforehand performing the detection step. % to ensure effective results. To address that, we propose a novel thermal enhancement architecture based on Generative Adversarial Network, and composed of two modules contrast enhancement and denoising modules with a post-processing step for edge restoration in order to improve the overall quality. The effectiveness of the proposed architecture is assessed by means of visual quality metrics and better results are obtained compared to the original thermal images and to the obtained results by other existing enhancement methods. These results have been conduced on a subset of KAIST dataset. Using the same dataset, the impact of the proposed enhancement architecture has been demonstrated on the detection results by obtaining better performance with a significant margin using YOLOv3 detector.

Fast, Accurate and Lightweight Super-Resolution with Neural Architecture Search

Chu Xiangxiang, Bo Zhang, Micheal Ma Hailong, Ruijun Xu, Jixiang Li, Qingyuan Li

Responsive image

Auto-TLDR; Multi-Objective Neural Architecture Search for Super-Resolution

Slides Poster Similar

Deep convolutional neural networks demonstrate impressive results in the super-resolution domain. A series of studies concentrate on improving peak signal noise ratio (PSNR) by using much deeper layers, which are not friendly to constrained resources. Pursuing a trade-off between the restoration capacity and the simplicity of models is still non-trivial. Recent contributions are struggling to manually maximize this balance, while our work achieves the same goal automatically with neural architecture search. Specifically, we handle super-resolution with a multi-objective approach. We also propose an elastic search tactic at both micro and macro level, based on a hybrid controller that profits from evolutionary computation and reinforcement learning. Quantitative experiments help us to draw a conclusion that our generated models dominate most of the state-of-the-art methods with respect to the individual FLOPS.

Attention Stereo Matching Network

Doudou Zhang, Jing Cai, Yanbing Xue, Zan Gao, Hua Zhang

Responsive image

Auto-TLDR; ASM-Net: Attention Stereo Matching with Disparity Refinement

Slides Poster Similar

Despite great progress, previous stereo matching algorithms still lack the ability to match textureless regions and slender structure areas. To tackle this problem, we propose ASM-Net, an attention stereo matching network. Attention module and disparity refinement module are constructed in the ASMNet. The attention module can improve correlation information between two images by channels and spatial attention.The feature-guided disparity refinement module learns more geometry information in different feature levels to refine the coarse prediction resolution constantly. The proposed approach was evaluated on several benchmark datasets. Experiments show that the proposed method achieves competitive results on KITTI and Scene-Flow datasets while running in real-time at 14ms.

An Improved Bilinear Pooling Method for Image-Based Action Recognition

Wei Wu, Jiale Yu

Responsive image

Auto-TLDR; An improved bilinear pooling method for image-based action recognition

Slides Poster Similar

Action recognition in still images is a challenging task because of the complexity of human motions and the variance of background in the same action category. And some actions typically occur in fine-grained categories, with little visual differences between these categories. So extracting discriminative features or modeling various semantic parts is essential for image-based action recognition. Many methods apply expensive manual annotations to learn discriminative parts information for action recognition, which may severely discourage potential applications in real life. In recent years, bilinear pooling method has shown its effectiveness for image classification due to its learning distinctive features automatically. Inspired by this model, in this paper, an improved bilinear pooling method is proposed for avoiding the shortcomings of traditional bilinear pooling methods. The previous bilinear pooling approaches contain lots of noisy background or harmful feature information, which limit their application for action recognition. In our method, the attention mechanism is introduced into hierarchical bilinear pooling framework with mask aggregation for action recognition. The proposed model can generate the distinctive and ROI-aware feature information by combining multiple attention mask maps from the channel and spatial-wise attention features. To be more specific, our method makes the network to better pay attention to discriminative region of the vital objects in an image. We verify our model on the two challenging datasets: 1) Stanford 40 action dataset and 2) our action dataset that includes 60 categories. Experimental results demonstrate the effectiveness of our approach, which is superior to the traditional and state-of-the-art methods.

Adaptive Image Compression Using GAN Based Semantic-Perceptual Residual Compensation

Ruojing Wang, Zitang Sun, Sei-Ichiro Kamata, Weili Chen

Responsive image

Auto-TLDR; Adaptive Image Compression using GAN based Semantic-Perceptual Residual Compensation

Slides Poster Similar

Image compression is a basic task in image processing. In this paper, We present an adaptive image compression algorithm that relies on GAN based semantic-perceptual residual compensation, which is available to offer visually pleasing reconstruction at a low bitrate. Our method adopt an U-shaped encoding and decoding structure accompanied by a well-designed dense residual connection with strip pooling module to improve the original auto-encoder. Besides, we introduce the idea of adversarial learning by introducing a discriminator thus constructed a complete GAN. To improve the coding efficiency, we creatively designed an adaptive semantic-perception residual compensation block based on Grad-CAM algorithm. In the improvement of the quantizer, we embed the method of soft-quantization so as to solve the problem to some extent that back propagation process is irreversible. Simultaneously, we use the latest FLIF lossless compression algorithm and BPG vector compression algorithm to perform deeper compression on the image. More importantly experimental results including PSNR, MS-SSIM demonstrate that the proposed approach outperforms the current state-of-the-art image compression methods.

SIDGAN: Single Image Dehazing without Paired Supervision

Pan Wei, Xin Wang, Lei Wang, Ji Xiang, Zihan Wang

Responsive image

Auto-TLDR; DehazeGAN: An End-to-End Generative Adversarial Network for Image Dehazing

Slides Poster Similar

Single image dehazing is challenging without scene airlight and transmission map. Most of existing dehazing algorithms tend to estimate key parameters based on manual designed priors or statistics, which may be invalid in some scenarios. Although deep learning-based dehazing methods provide an effective solution, most of them rely on paired training datasets, which are prohibitively difficult to be collected in real world. In this paper, we propose an effective end-to-end generative adversarial network for image dehazing, named DehazeGAN. The proposed DehazeGAN adopts a U-net architecture with a novel color-consistency loss derived from dark channel prior and perceptual loss, which can be trained in an unsupervised fashion without paired synthetic datasets. We create a RealHaze dataset for network training, including 4,000 outdoor hazy images and 4,000 haze-free images. Extensive experiments demonstrate that our proposed DehazeGAN achieves better performance than existing state-of-the-art methods on both synthetic datasets and real-world datasets in terms of PSNR, SSIM, and subjective visual experience.

Tarsier: Evolving Noise Injection inSuper-Resolution GANs

Baptiste Roziere, Nathanaël Carraz Rakotonirina, Vlad Hosu, Rasoanaivo Andry, Hanhe Lin, Camille Couprie, Olivier Teytaud

Responsive image

Auto-TLDR; Evolutionary Super-Resolution using Diagonal CMA

Slides Poster Similar

Super-resolution aims at increasing the resolution and level of detail within an image. The current state of the art in general single-image super-resolution is held by nESRGAN+,which injects a Gaussian noise after each residual layer at training time. In this paper, we harness evolutionary methods to improve nESRGAN+ by optimizing the noise injection at inference time. More precisely, we use Diagonal CMA to optimize the injected noise according to a novel criterion combining quality assessment and realism. Our results are validated by the PIRM perceptual score and a human study. Our method outperforms nESRGAN+ on several standard super-resolution datasets. More generally, our approach can be used to optimize any method based on noise injection.

Super-Resolution Guided Pore Detection for Fingerprint Recognition

Syeda Nyma Ferdous, Ali Dabouei, Jeremy Dawson, Nasser M. Nasarabadi

Responsive image

Auto-TLDR; Super-Resolution Generative Adversarial Network for Fingerprint Recognition Using Pore Features

Slides Poster Similar

Performance of fingerprint recognition algorithms substantially rely on fine features extracted from fingerprints. Apart from minutiae and ridge patterns, pore features have proven to be usable for fingerprint recognition. Although features from minutiae and ridge patterns are quite attainable from low-resolution images, using pore features is practical only if the fingerprint image is of high resolution which necessitates a model that enhances the image quality of the conventional 500 ppi legacy fingerprints preserving the fine details. To find a solution for recovering pore information from low-resolution fingerprints, we adopt a joint learning-based approach that combines both super-resolution and pore detection networks. Our modified single image Super-Resolution Generative Adversarial Network (SRGAN) framework helps to reliably reconstruct high-resolution fingerprint samples from low-resolution ones assisting the pore detection network to identify pores with a high accuracy. The network jointly learns a distinctive feature representation from a real low-resolution fingerprint sample and successfully synthesizes a high-resolution sample from it. To add discriminative information and uniqueness for all the subjects, we have integrated features extracted from a deep fingerprint verifier with the SRGAN quality discriminator. We also add ridge reconstruction loss, utilizing ridge patterns to make the best use of extracted features. Our proposed method solves the recognition problem by improving the quality of fingerprint images. High recognition accuracy of the synthesized samples that is close to the accuracy achieved using the original high-resolution images validate the effectiveness of our proposed model.

CAggNet: Crossing Aggregation Network for Medical Image Segmentation

Xu Cao, Yanghao Lin

Responsive image

Auto-TLDR; Crossing Aggregation Network for Medical Image Segmentation

Slides Poster Similar

In this paper, we present Crossing Aggregation Network (CAggNet), a novel densely connected semantic segmentation method for medical image analysis. The crossing aggregation network absorbs the idea of deep layer aggregation and makes significant innovations in layer connection and semantic information fusion. In this architecture, the traditional skip-connection structure of general U-Net is replaced by aggregations of multi-level down-sampling and up-sampling layers. This enables the network to fuse information interactively flows at different levels of layers in semantic segmentation. It also introduces weighted aggregation module to aggregate multi-scale output information. We have evaluated and compared our CAggNet with several advanced U-Net based methods in two public medical image datasets, including the 2018 Data Science Bowl nuclei detection dataset and the 2015 MICCAI gland segmentation competition dataset. Experimental results indicate that CAggNet improves medical object recognition and achieves a more accurate and efficient segmentation compared to existing improved U-Net and UNet++ structure.

BCAU-Net: A Novel Architecture with Binary Channel Attention Module for MRI Brain Segmentation

Yongpei Zhu, Zicong Zhou, Guojun Liao, Kehong Yuan

Responsive image

Auto-TLDR; BCAU-Net: Binary Channel Attention U-Net for MRI brain segmentation

Slides Poster Similar

Recently deep learning-based networks have achieved advanced performance in medical image segmentation. However, the development of deep learning is slow in magnetic resonance image (MRI) segmentation of normal brain tissues. In this paper, inspired by channel attention module, we propose a new architecture, Binary Channel Attention U-Net (BCAU-Net), by introducing a novel Binary Channel Attention Module (BCAM) into skip connection of U-Net, which can take full advantages of the channel information extracted from the encoding path and corresponding decoding path. To better aggregate multi-scale spatial information of the feature map, spatial pyramid pooling (SPP) modules with different pooling operations are used in BCAM instead of original average-pooling and max-pooling operations. We verify this model on two datasets including IBSR and MRBrainS18, and obtain better performance on MRI brain segmentation compared with other methods. We believe the proposed method can advance the performance in brain segmentation and clinical diagnosis.

MBD-GAN: Model-Based Image Deblurring with a Generative Adversarial Network

Li Song, Edmund Y. Lam

Responsive image

Auto-TLDR; Model-Based Deblurring GAN for Inverse Imaging

Slides Poster Similar

This paper presents a methodology to tackle inverse imaging problems by leveraging the synergistic power of imaging model and deep learning. The premise is that while learning-based techniques have quickly become the methods of choice in various applications, they often ignore the prior knowledge embedded in imaging models. Incorporating the latter has the potential to improve the image estimation. Specifically, we first provide a mathematical basis of using generative adversarial network (GAN) in inverse imaging through considering an optimization framework. Then, we develop the specific architecture that connects the generator and discriminator networks with the imaging model. While this technique can be applied to a variety of problems, from image reconstruction to super-resolution, we take image deblurring as the example here, where we show in detail the implementation and experimental results of what we call the model-based deblurring GAN (MBD-GAN).

CURL: Neural Curve Layers for Global Image Enhancement

Sean Moran, Steven Mcdonagh, Greg Slabaugh

Responsive image

Auto-TLDR; CURL: Neural CURve Layers for Image Enhancement

Slides Poster Similar

We present a novel approach to adjust global image properties such as colour, saturation, and luminance using human-interpretable image enhancement curves, inspired by the Photoshop curves tool. Our method, dubbed neural CURve Layers (CURL), is designed as a multi-colour space neural retouching block trained jointly in three different colour spaces (HSV, CIELab, RGB) guided by a novel multi-colour space loss. The curves are fully differentiable and are trained end-to-end for different computer vision problems including photo enhancement (RGB-to-RGB) and as part of the image signal processing pipeline for image formation (RAW-to-RGB). To demonstrate the effectiveness of CURL we combine this global image transformation block with a pixel-level (local) image multi-scale encoder-decoder backbone network. In an extensive experimental evaluation we show that CURL produces state-of-the-art image quality versus recently proposed deep learning approaches in both objective and perceptual metrics, setting new state-of-the-art performance on multiple public datasets.

A Dual-Branch Network for Infrared and Visible Image Fusion

Yu Fu, Xiaojun Wu

Responsive image

Auto-TLDR; Image Fusion Using Autoencoder for Deep Learning

Slides Poster Similar

In recent years, deep learning has been used extensively in the field of image fusion. In this article, we propose a new image fusion method by designing a new structure and a new loss function for a deep learning model. Our backbone network is an autoencoder, in which the encoder has a dual branch structure. We input infrared images and visible light images to the encoder to extract detailed information and semantic information respectively. The fusion layer fuses two sets of features to get fused features. The decoder reconstructs the fusion features to obtain the fused image. We design a new loss function to reconstruct the image effectively. Experiments show that our proposed method achieves state-of-the-art performance.

Attention As Activation

Yimian Dai, Stefan Oehmcke, Fabian Gieseke, Yiquan Wu, Kobus Barnard

Responsive image

Auto-TLDR; Attentional Activation Units for Convolutional Networks

Slides Similar

Activation functions and attention mechanisms are typically treated as having different purposes and have evolved differently. However, both concepts can be formulated as a non-linear gating function. Inspired by their similarity, we propose a novel type of activation units called attentional activation~(ATAC) units as a unification of activation functions and attention mechanisms. In particular, we propose a local channel attention module for the simultaneous non-linear activation and element-wise feature refinement, which locally aggregates point-wise cross-channel feature contexts. By replacing the well-known rectified linear units by such ATAC units in convolutional networks, we can construct fully attentional networks that perform significantly better with a modest number of additional parameters. We conducted detailed ablation studies on the ATAC units using several host networks with varying network depths to empirically verify the effectiveness and efficiency of the units. Furthermore, we compared the performance of the ATAC units against existing activation functions as well as other attention mechanisms on the CIFAR-10, CIFAR-100, and ImageNet datasets. Our experimental results show that networks constructed with the proposed ATAC units generally yield performance gains over their competitors given a comparable number of parameters.

Multi-Scale Residual Pyramid Attention Network for Monocular Depth Estimation

Jing Liu, Xiaona Zhang, Zhaoxin Li, Tianlu Mao

Responsive image

Auto-TLDR; Multi-scale Residual Pyramid Attention Network for Monocular Depth Estimation

Slides Poster Similar

Monocular depth estimation is a challenging problem in computer vision and is crucial for understanding 3D scene geometry. Recently, deep convolutional neural networks (DCNNs) based methods have improved the estimation accuracy significantly. However, existing methods fail to consider complex textures and geometries in scenes, thereby resulting in loss of local details, distorted object boundaries, and blurry reconstruction. In this paper, we proposed an end-to-end Multi-scale Residual Pyramid Attention Network (MRPAN) to mitigate these problems.First,we propose a Multi-scale Attention Context Aggregation (MACA) module, which consists of Spatial Attention Module (SAM) and Global Attention Module (GAM). By considering the position and scale correlation of pixels from spatial and global perspectives, the proposed module can adaptively learn the similarity between pixels so as to obtain more global context information of the image and recover the complex structure in the scene. Then we proposed an improved Residual Refinement Module (RRM) to further refine the scene structure, giving rise to deeper semantic information and retain more local details. Experimental results show that our method achieves more promisin performance in object boundaries and local details compared with other state-of-the-art methods.

Global-Local Attention Network for Semantic Segmentation in Aerial Images

Minglong Li, Lianlei Shan, Weiqiang Wang

Responsive image

Auto-TLDR; GLANet: Global-Local Attention Network for Semantic Segmentation

Slides Poster Similar

Errors in semantic segmentation task could be classified into two types: large area misclassification and local inaccurate boundaries. Previously attention based methods capture rich global contextual information, this is beneficial to diminish the first type of error, but local imprecision still exists. In this paper we propose Global-Local Attention Network (GLANet) with a simultaneous consideration of global context and local details. Specifically, our GLANet is composed of two branches namely global attention branch and local attention branch, and three different modules are embedded in the two branches for the purpose of modeling semantic interdependencies in spatial, channel and boundary dimensions respectively. We sum the outputs of the two branches to further improve feature representation, leading to more precise segmentation results. The proposed method achieves very competitive segmentation accuracy on two public aerial image datasets, bringing significant improvements over baseline.